Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix windows Socket::connect_timeout overflow #112464

Merged
merged 5 commits into from
Jun 20, 2023

Conversation

eval-exec
Copy link
Contributor

@eval-exec eval-exec commented Jun 9, 2023

This PR want to close #112405

  • add unit test

@rustbot
Copy link
Collaborator

rustbot commented Jun 9, 2023

r? @thomcc

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jun 9, 2023
@rust-log-analyzer

This comment has been minimized.

@eval-exec eval-exec force-pushed the exec/fix-connect_timeout-overflow branch 2 times, most recently from bb125c5 to 80b5ea8 Compare June 9, 2023 14:18
@eval-exec
Copy link
Contributor Author

@rustbot review

@eval-exec eval-exec force-pushed the exec/fix-connect_timeout-overflow branch 2 times, most recently from 26720ce to 845c89c Compare June 10, 2023 14:04
@ChrisDenton
Copy link
Member

This looks good to me.

add unit test

This is a bit more tricky but I think the easiest way would be to copy/paste connect_error and then change it to use connect_timeout(&next_test_ip4(), Duration::MAX) instead of connect.

@rust-log-analyzer

This comment has been minimized.

@eval-exec eval-exec force-pushed the exec/fix-connect_timeout-overflow branch from 1e6a863 to e23346b Compare June 11, 2023 01:06
@rust-log-analyzer

This comment has been minimized.

@eval-exec eval-exec force-pushed the exec/fix-connect_timeout-overflow branch from e23346b to 2953b71 Compare June 11, 2023 01:16
@rust-log-analyzer

This comment has been minimized.

@eval-exec eval-exec force-pushed the exec/fix-connect_timeout-overflow branch from 2953b71 to fc4cff2 Compare June 11, 2023 01:46
@rust-log-analyzer

This comment has been minimized.

@eval-exec eval-exec force-pushed the exec/fix-connect_timeout-overflow branch from fc4cff2 to 7bac07d Compare June 11, 2023 02:26
@eval-exec eval-exec requested a review from ChrisDenton June 11, 2023 03:16
@eval-exec eval-exec force-pushed the exec/fix-connect_timeout-overflow branch from 7bac07d to 11df849 Compare June 11, 2023 03:40
@@ -46,6 +46,22 @@ fn connect_error() {
}
}

#[test]
fn connect_timeout_error() {
match TcpStream::connect_timeout(&"0.0.0.0:1".parse().unwrap(), Duration::MAX) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you confirmed this test fails without the changes in net.rs?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eval-exec for an alternative idea of what a test could do:

  1. bind some free port on localhost using TcpListener::bind, then
  2. connect to that new port with Duration::MAX timeout.

Here's a draft in Rust:

use std::net::TcpListener;

let listener = TcpListener::bind("127.0.0.1:0").unwrap();  // :0 picks some free port
let port = listener.local_addr().unwrap().port();  // obtain the port it picked

[..]

[..] format!("127.0.0.1:{port}").as_str() [..]  // connect to that port

I have a real example of something very close at /~https://github.com/hartwork/rust-for-it/blob/4e51982c4465c3190d92a59747c5d2738baf9e07/src/network.rs#L160-L177 .

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I'm testing it on a win10 virtual machine.

Copy link

@hartwork hartwork Jun 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eval-exec I checked the latest version: 0.0.0.0:1 will only work if you start a TCP listener on port 1. And port 1 only works if it's unoccupied and you have privileges because ports up to 1024 are reserved. In a test context you'll probably want a dynamic free port as in my example above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, I try to connect to an unreachable address 8.8.8.8:9999, but it didn't get timeout.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eval-exec updated twice more (in case you only saw the version sent via notification e-mail)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be format!("0.0.0.0:{port}" or format!("127.0.0.1:{port}" for connect_timeout?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eval-exec I would argue for 127.0.0.1 for clarity. Both work in practice.

Copy link
Contributor Author

@eval-exec eval-exec Jun 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should bind and accept in another thread:

#[test]
fn connect_timeout_ok_bind() {
    let socket_addr = next_test_ip4();

    thread::spawn({
        let listener = t!(TcpListener::bind(&socket_addr));

        move || {
            let mut stream = t!(listener.accept()).0;
            let mut buf = [0];
            t!(stream.read(&mut buf));
        }
    });
    assert!(TcpStream::connect_timeout(&socket_addr, Duration::MAX).is_ok());
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should bind and accept in another thread:

@eval-exec that's not necessary, see /~https://github.com/hartwork/rust-for-it/blob/4e51982c4465c3190d92a59747c5d2738baf9e07/src/network.rs#L160-L177 .

@thomcc
Copy link
Member

thomcc commented Jun 17, 2023

Chris seems like he's reviewing this.

r? @ChrisDenton

@rustbot rustbot assigned ChrisDenton and unassigned thomcc Jun 17, 2023
@eval-exec eval-exec force-pushed the exec/fix-connect_timeout-overflow branch from 11df849 to 7871e56 Compare June 17, 2023 08:16
@rust-log-analyzer

This comment has been minimized.

@eval-exec eval-exec force-pushed the exec/fix-connect_timeout-overflow branch from a4cce33 to ca83467 Compare June 17, 2023 16:20
@rust-log-analyzer

This comment has been minimized.

@hartwork
Copy link

The failure mode this PR resolves is Duration::MAX being interpreted as a negative integer (I.e. less than zero time).

@ChrisDenton yes.

This causes an immediate return with a TimeOut error.

Cannot confirm, it caused infinite blocking for me as documented at #112405.

With the bug fixed Duration::MAX will be correctly interpreted and a TimeOut error will not be returned. Some other error will be but that is to be expected.

Cannot confirm, no error should be returned but success since this is about available targets.

@ChrisDenton
Copy link
Member

The failure condition in your issue was "Instead, this happened: the call never succeeds, not even for google.com:80 on Windows."

That is not infinite blocking.

@hartwork
Copy link

The failure condition in your issue was "Instead, this happened: the call never succeeds, not even for google.com:80 on Windows."

That is not infinite blocking.

@ChrisDenton maybe I should have put it more clearly: by "never succeeds" I meant that it should have returned success quickly for google.com:80 but instead it blocked forever and hence never finishes to return with due success. Given your spot-on early analysis at #112405 (comment) I do not understand how we have different understanding now. I'd be happy to learn what I a missing now. Are there two bugs here instead of a single one?

@hartwork
Copy link

@ChrisDenton PS: The CI at /~https://github.com/hartwork/rust-for-it/actions/runs/5204380736/jobs/9388510541 was cancelled by GitHub after near-exact 6 hours runtime due to that call while google.com:80 was available.

@eval-exec
Copy link
Contributor Author

@ChrisDenton PS: The CI at /~https://github.com/hartwork/rust-for-it/actions/runs/5204380736/jobs/9388510541 was cancelled by GitHub after near-exact 6 hours runtime due to that call while google.com:80 was available.

What's the related code of this canceled CI action?

@hartwork
Copy link

hartwork commented Jun 18, 2023

@ChrisDenton PS: The CI at /~https://github.com/hartwork/rust-for-it/actions/runs/5204380736/jobs/9388510541 was cancelled by GitHub after near-exact 6 hours runtime due to that call while google.com:80 was available.

What's the related code of this canceled CI action?

@eval-exec that would effectively be /~https://github.com/hartwork/rust-for-it/tree/4c816e51eb8134cef32571400f5583e3f75cec54 or:

git clone /~https://github.com/hartwork/rust-for-it
cd rust-for-it
git checkout 4c816e51eb8134cef32571400f5583e3f75cec54

@eval-exec
Copy link
Contributor Author

eval-exec commented Jun 18, 2023

connect to a reachable address:

use std::net::{TcpStream, ToSocketAddrs};
use std::time::Duration;

fn main() {
    let address_result = "1.1.1.1:80".to_socket_addrs();
    let address = address_result.unwrap().next().unwrap();
    TcpStream::connect_timeout(&address, Duration::MAX).expect("connect_timeout");
    println!("main fn returned");
}
  • On Linux platform, rustc 1.70.0, this program returned imediately, and no error.
  • On Windows platform, rustc 1.70.0, this program panic at an error imediately: thread 'main' panicked at 'connect_timeout: Error { kind: TimedOut, message: "connection timed out" }', src\main.rs:7:57
  • On Windows platform, with this PR's fix, and I run this program in a unit test, this program returned imediately, and no error

connect to an unreachable address:

use std::net::{TcpStream, ToSocketAddrs};
use std::time::Duration;

fn main() {
    let address_result = "1.1.1.1:9999".to_socket_addrs();
    let address = address_result.unwrap().next().unwrap();
    TcpStream::connect_timeout(&address, Duration::MAX).expect("connect_timeout");
    println!("main fn returned");
}
  • On linux platform, rustc 1.70.0, this program running 2minutes and 8second, and finaly got thread 'main' panicked at 'connect_timeout: Os { code: 110, kind: TimedOut, message: "Connection timed out" }', src/main.rs:7:57
  • On Windows platform, rustc 1.70.0, this program panic at an error imediately: thread 'main' panicked at 'connect_timeout: Error { kind: TimedOut, message: "connection timed out" }', src\main.rs:7:57
  • On Windows platform, with this PR's fix, and I run this program in a unit test, this program paniced after 21.04 seconds:
thread 'net::tcp::tests::connect_timeout_example' panicked at 'connect_timeout: Os { code: 10060, kind: TimedOut, message: "A connection attempt failed because the connected p
arty did not properly respond after a period of time, or established connection failed because connected host has failed to respond." }', library\std\src\net\tcp\tests.rs:15:88

@eval-exec
Copy link
Contributor Author

eval-exec commented Jun 18, 2023

So, I think I can add a unit test: connect to an unreachable address like 1.1.1.1:9999. assert connect_timeout(1.1.1.1:9999, Duration::MAX) will get a TimedOut error, and assert the action should cost at least 20 seconds.

#[test]
fn connect_timeout_to_unreachable_address() {
    let now = Instant::now();
    let result =
        TcpStream::connect_timeout(&format!("1.1.1.1:9999").parse().unwrap(), Duration::MAX);
    
    assert!(matches!(result, ErrorKind::TimedOut));
    
    assert!(now.elapsed() > Duration::from_secs(20));
}

What do you think? @hartwork @ChrisDenton

@ChrisDenton
Copy link
Member

@hartwork the reason your CI ran indefinitely is because you have an infinite loop. if running only the code in the issue, it errors immediately.

@eval-exec eval-exec force-pushed the exec/fix-connect_timeout-overflow branch from 13c995c to bf52e9b Compare June 18, 2023 07:42
@rust-log-analyzer

This comment has been minimized.

Signed-off-by: Eval EXEC <execvy@gmail.com>
@eval-exec eval-exec force-pushed the exec/fix-connect_timeout-overflow branch from bf52e9b to f65b5d0 Compare June 18, 2023 08:00
@@ -46,6 +46,37 @@ fn connect_error() {
}
}

#[test]
fn connect_timeout_to_unreachable_address() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested, this unit test will cost at 2 minutes and 8 seconds on Linux platform, and cost 20 seconds on Windows.

Copy link
Member

@ChrisDenton ChrisDenton Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only need one test here. We only need to focus on testing the problematic code path in one way. 2 minutes and 8 secs is a long time for a single unit test so I'd prefer if we just used connect_timeout_error test and removed the other two.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think 2 minutes and 8 seconds is a long time too.

@hartwork
Copy link

@hartwork the reason your CI ran indefinitely is because you have an infinite loop. if running only the code in the issue, it errors immediately.

@ChrisDenton thanks for clearing this up, you're 100% right, I was able to confirm your results in GitHub Actions now. Sorry 🙏

eval-exec and others added 2 commits June 20, 2023 18:43
Co-authored-by: Chris Denton <christophersdenton@gmail.com>
@eval-exec eval-exec requested a review from ChrisDenton June 20, 2023 10:48
Copy link
Member

@ChrisDenton ChrisDenton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me, thanks! Sorry testing end up being so involved.

@ChrisDenton
Copy link
Member

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Jun 20, 2023

📌 Commit a0c757a has been approved by ChrisDenton

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 20, 2023
@eval-exec
Copy link
Contributor Author

eval-exec commented Jun 20, 2023 via email

bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 20, 2023
…llaumeGomez

Rollup of 6 pull requests

Successful merges:

 - rust-lang#112464 (Fix windows `Socket::connect_timeout` overflow)
 - rust-lang#112720 (Rustdoc: search: color item type and reduce size to avoid clashing)
 - rust-lang#112762 (Sort the errors from arguments checking so that suggestions are handled properly)
 - rust-lang#112786 (change binders from tuple structs to named fields)
 - rust-lang#112794 (Fix linker failures when #[global_allocator] is used in a dependency)
 - rust-lang#112819 (Disable feature(unboxed_closures, fn_traits) in weird-exprs)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 816b659 into rust-lang:master Jun 20, 2023
@rustbot rustbot added this to the 1.72.0 milestone Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TcpStream::connect_timeout with Duration::MAX never succeeds on Windows
7 participants