Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix nonconcurrent tests #6900

Merged
merged 6 commits into from
Jun 7, 2019
Merged

Conversation

jethrogb
Copy link
Contributor

@jethrogb jethrogb commented May 2, 2019

The cargo testsuite relies on a clean test “root” for every test (i.e. #[test]-annotated function). It relied on the test crate's behavior to spawn a new thread for each test, which isn't done when tests aren't run concurrently, breaking the test suite. In this PR, I'm using backtraces to figure out which test is being run, which is much more robust. I also cleaned up the root initialization logic so that it no longer recursive calls the init function.

Fixes #6746

@rust-highfive
Copy link

r? @ehuss

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 2, 2019
@jethrogb jethrogb force-pushed the nonconcurrent-tests branch 2 times, most recently from 8ec2751 to c11c127 Compare May 2, 2019 22:05
@jethrogb jethrogb force-pushed the nonconcurrent-tests branch from c11c127 to 2520bcf Compare May 8, 2019 01:15
@jethrogb
Copy link
Contributor Author

jethrogb commented May 8, 2019

Ping @ehuss

@ehuss
Copy link
Contributor

ehuss commented May 9, 2019

Thanks for the PR! I don't think we can use backtraces to detect the current test, though. For example, on mac it runs about 5 times slower than normal.

Would you be willing to try a solution that involves creating a new proc-macro attribute? I think this should work: Create a new #[cargo_test] attribute which replaces the existing #[test] attribute. It should add a #[test] attribute, and also add a call to an initialization function at the top of the test. This initialization function can then use thread-local data to determine if it is a new test or not, and store the appropriate thread-local TASK_ID.

@jethrogb jethrogb force-pushed the nonconcurrent-tests branch 2 times, most recently from dbc2d36 to 2a7e916 Compare May 9, 2019 02:24
@jethrogb
Copy link
Contributor Author

jethrogb commented May 9, 2019

Done. I've put all the attribute changes in a separate commit for the reviewer's benefit.

@ehuss
Copy link
Contributor

ehuss commented May 10, 2019

Would it be possible to keep the attribute short, like #[cargo_test]? What I was thinking is that the attribute would inject a call to an initialization function at the top of the test, and then remove the call to init from paths::root. This should ensure the test is only initialized once, and only for that test. The init function would need to check a thread-local value to determine if it is a new test, or if it is a test reusing a thread (in which case an new ID would be issued). Also, keeping the project named cargo_test would make its intent clearer if/when adding other macros.

@jethrogb
Copy link
Contributor Author

Would it be possible to keep the attribute short, like #[cargo_test]?

I wanted to keep the proc_macro hygenic. If you don't think that's useful, that can be done.

What I was thinking is that the attribute would inject a call to an initialization function at the top of the test, and then remove the call to init from paths::root. This should ensure the test is only initialized once, and only for that test. The init function would need to check a thread-local value to determine if it is a new test, or if it is a test reusing a thread (in which case an new ID would be issued).

This is possible but it's a pretty significant rewrite of what I've done so far for IMO very little gain. Are you worried about the perf of the Mutex?

@bors
Copy link
Contributor

bors commented May 10, 2019

☔ The latest upstream changes (presumably #6883) made this pull request unmergeable. Please resolve the merge conflicts.

@ehuss
Copy link
Contributor

ehuss commented May 10, 2019

I wanted to keep the proc_macro hygenic. If you don't think that's useful, that can be done.

I don't know what this means here. I'm not sure how injecting a call to an init function should affect hygene.

Are you worried about the perf of the Mutex?

Well, this as-is doesn't work. I'm not sure, but it looks to have some race conditions. I would lean towards a simple solution, more analogous to traditional test suite "setup" functions, that gets run before the test starts.

@jethrogb
Copy link
Contributor Author

I don't know what this means here. I'm not sure how injecting a call to an init function should affect hygene.

The init function is not defined by the proc macro, so if you don't pass in the path to the init function, the macro is just making up the path out of thin air, which may or may not be imported at the place the function is defined.

Well, this as-is doesn't work. I'm not sure, but it looks to have some race conditions.

I think it works fine, and there's no opportunity for a race condition because mostly everything is thread-local. The tests are currently failing because I think I changed some #[test] attributes that were inside the test cases.

@jethrogb
Copy link
Contributor Author

jethrogb commented Jun 3, 2019

@ehuss I answered your questions.

@ehuss
Copy link
Contributor

ehuss commented Jun 5, 2019

I would still prefer a simpler attribute as explained above.

Also, I don't think we can't switch Cargo to a workspace because it is inside another workspace in rust-lang.

EDIT: We'll also need to figure out some way to publish this new package, we can't have a path-only dependency. My preference would be to relax this restriction, but it'll be some work.

@jethrogb jethrogb force-pushed the nonconcurrent-tests branch 2 times, most recently from 113ff07 to 6a2fecd Compare June 5, 2019 19:12
@jethrogb
Copy link
Contributor Author

jethrogb commented Jun 5, 2019

Pushed changes. If the current code is acceptable I'd prefer a swift review since this bitrots pretty quickly.

@ehuss ehuss force-pushed the nonconcurrent-tests branch from 6a2fecd to a8c22ca Compare June 7, 2019 20:22
@ehuss
Copy link
Contributor

ehuss commented Jun 7, 2019

Thanks!
I rebased and included some updates to make it publishable.
@bors r+

@bors
Copy link
Contributor

bors commented Jun 7, 2019

📌 Commit a8c22ca has been approved by ehuss

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 7, 2019
bors added a commit that referenced this pull request Jun 7, 2019
Fix nonconcurrent tests

The cargo testsuite relies on a clean test “root” for every test (i.e. `#[test]`-annotated function). It relied on the `test` crate's behavior to spawn a new thread for each test, which isn't done when tests aren't run concurrently, breaking the test suite. In this PR, I'm using backtraces to figure out which test is being run, which is much more robust. I also cleaned up the root initialization logic so that it no longer recursive calls the `init` function.

Fixes #6746
@bors
Copy link
Contributor

bors commented Jun 7, 2019

⌛ Testing commit a8c22ca with merge d5723eb...

@bors
Copy link
Contributor

bors commented Jun 7, 2019

☀️ Test successful - checks-travis, status-appveyor
Approved by: ehuss
Pushing d5723eb to master...

@bors bors merged commit a8c22ca into rust-lang:master Jun 7, 2019
@Eh2406
Copy link
Contributor

Eh2406 commented Jun 11, 2019

I am seeing thread 'publish_lockfile::note_resolve_changes' panicked at 'path.metadata() failed with The system cannot find the file specified. (os error 2)', tests\testsuite\support\paths.rs:197:25 and then all other tests are failing with thread 'registry::resolve_and_backtracking' panicked at 'Once instance has previously been poisoned', src\libstd\sync\once.rs:362:21

@jethrogb
Copy link
Contributor Author

Can you run it again with RUST_BACKTRACE=1?

@Eh2406
Copy link
Contributor

Eh2406 commented Jun 12, 2019

Looks like delling my cit folder fixed the problem. Sorry for the noize.

Centril added a commit to Centril/rust that referenced this pull request Jun 12, 2019
Update cargo

Update cargo

19 commits in 545f354259be4e9745ea00a524c0e4c51df01aa6..807429e1b6da4e2ec52488ef2f59e77068c31e1f
2019-05-23 17:45:30 +0000 to 2019-06-11 14:06:10 +0000
- Stabilize publish-lockfile. (rust-lang/cargo#7026)
- change package cache lock message (rust-lang/cargo#7029)
- Fix documenting an example. (rust-lang/cargo#7023)
- Fix nonconcurrent tests (rust-lang/cargo#6900)
- Update git2 crates for libgit2 0.28 (rust-lang/cargo#7018)
- fix bunch of clippy warnings (rust-lang/cargo#7019)
- Ignore remap-path-prefix in metadata hash. (rust-lang/cargo#6966)
- Don't synthesize feature diretives for non-optional deps (rust-lang/cargo#7010)
- Handle pipelined tests of libraries (rust-lang/cargo#7008)
- Import the cargo-vendor subcommand into Cargo (rust-lang/cargo#6869)
- Remove unnecessary outlives bounds (rust-lang/cargo#7000)
- Catch filename output collisions in rustdoc. (rust-lang/cargo#6998)
- the testing SAT solver was messed up by a refactor (rust-lang/cargo#6995)
- Add some hints to the docs for `cfg()` targets (rust-lang/cargo#6990)
- Test the Resolver against the varisat Library (rust-lang/cargo#6980)
- Update changelog. (rust-lang/cargo#6984)
- Update cache-messages tracking issue. (rust-lang/cargo#6987)
- zsh: Add --all-targets option to cargo-check and cargo-build (rust-lang/cargo#6985)
- Fix typo (rust-lang/cargo#6982)
Centril added a commit to Centril/rust that referenced this pull request Jun 14, 2019
Update cargo

Update cargo

19 commits in 545f354259be4e9745ea00a524c0e4c51df01aa6..807429e1b6da4e2ec52488ef2f59e77068c31e1f
2019-05-23 17:45:30 +0000 to 2019-06-11 14:06:10 +0000
- Stabilize publish-lockfile. (rust-lang/cargo#7026)
- change package cache lock message (rust-lang/cargo#7029)
- Fix documenting an example. (rust-lang/cargo#7023)
- Fix nonconcurrent tests (rust-lang/cargo#6900)
- Update git2 crates for libgit2 0.28 (rust-lang/cargo#7018)
- fix bunch of clippy warnings (rust-lang/cargo#7019)
- Ignore remap-path-prefix in metadata hash. (rust-lang/cargo#6966)
- Don't synthesize feature diretives for non-optional deps (rust-lang/cargo#7010)
- Handle pipelined tests of libraries (rust-lang/cargo#7008)
- Import the cargo-vendor subcommand into Cargo (rust-lang/cargo#6869)
- Remove unnecessary outlives bounds (rust-lang/cargo#7000)
- Catch filename output collisions in rustdoc. (rust-lang/cargo#6998)
- the testing SAT solver was messed up by a refactor (rust-lang/cargo#6995)
- Add some hints to the docs for `cfg()` targets (rust-lang/cargo#6990)
- Test the Resolver against the varisat Library (rust-lang/cargo#6980)
- Update changelog. (rust-lang/cargo#6984)
- Update cache-messages tracking issue. (rust-lang/cargo#6987)
- zsh: Add --all-targets option to cargo-check and cargo-build (rust-lang/cargo#6985)
- Fix typo (rust-lang/cargo#6982)
@ehuss
Copy link
Contributor

ehuss commented Jun 17, 2019

@Eh2406 I'm having a hard time reproducing the issue you are seeing. I had a few questions:

  • Does it get stuck permanently? That is, every time you run cargo test it fails until you manually delete cit?
  • Have you figured out a way to reliably cause it to happen? If so, can you list the exact steps?
  • Are you just running cargo test, or are you giving other flags?
  • If you can recreate it, can you modify the do_op function to print which file it is trying to delete?
  • Is there anything unusual about your system? Is it running Defender or any other AV? Are you using an editor or IDE that might be indexing the target directory? Do you have an SSD?

@ehuss
Copy link
Contributor

ehuss commented Jun 17, 2019

Oh, I reproduced it! I think lowering down to 2 cpu's was the trick.

I'll dig in a little more.

bors added a commit to rust-lang/rust that referenced this pull request Jun 18, 2019
Update cargo

Update cargo

19 commits in 545f354259be4e9745ea00a524c0e4c51df01aa6..807429e1b6da4e2ec52488ef2f59e77068c31e1f
2019-05-23 17:45:30 +0000 to 2019-06-11 14:06:10 +0000
- Stabilize publish-lockfile. (rust-lang/cargo#7026)
- change package cache lock message (rust-lang/cargo#7029)
- Fix documenting an example. (rust-lang/cargo#7023)
- Fix nonconcurrent tests (rust-lang/cargo#6900)
- Update git2 crates for libgit2 0.28 (rust-lang/cargo#7018)
- fix bunch of clippy warnings (rust-lang/cargo#7019)
- Ignore remap-path-prefix in metadata hash. (rust-lang/cargo#6966)
- Don't synthesize feature diretives for non-optional deps (rust-lang/cargo#7010)
- Handle pipelined tests of libraries (rust-lang/cargo#7008)
- Import the cargo-vendor subcommand into Cargo (rust-lang/cargo#6869)
- Remove unnecessary outlives bounds (rust-lang/cargo#7000)
- Catch filename output collisions in rustdoc. (rust-lang/cargo#6998)
- the testing SAT solver was messed up by a refactor (rust-lang/cargo#6995)
- Add some hints to the docs for `cfg()` targets (rust-lang/cargo#6990)
- Test the Resolver against the varisat Library (rust-lang/cargo#6980)
- Update changelog. (rust-lang/cargo#6984)
- Update cache-messages tracking issue. (rust-lang/cargo#6987)
- zsh: Add --all-targets option to cargo-check and cargo-build (rust-lang/cargo#6985)
- Fix typo (rust-lang/cargo#6982)
bors added a commit that referenced this pull request Jun 19, 2019
Revert test directory cleaning change.

#6900 changed it so that the entire `cit` directory was cleaned once when tests started. Previously, each `t#` directory was deleted just before each test ran. This restores the old behavior due to problems on Windows.

The problem is that the call to `rm_rf` would fail with various errors ("Not found", "directory not empty", etc.) if you run `cargo test` twice. The first panic would poison the lazy static initializer, causing all subsequent tests to fail.

There are a variety of reasons deleting a file on Windows is difficult. My hypothesis in this case is that services like the indexing service and Defender swoop in and temporarily hold handles to files. This seems to be worse on slower systems, where presumably these services take longer to process all the files created by the test suite. It may also be related to how files are "marked for deletion" but are not immediately deleted.

The solution here is to spread out the deletion over time, giving Windows more of an opportunity to release its handles. This is a poor solution, and should only help reduce the frequency, but not entirely fix it.

I believe that this cannot be solved using `DeleteFileW`. There are more details at rust-lang/rust#29497, which is a long-standing problem that there are no good Rust implementations for recursively deleting a directory.

An example of something that implements a "safe" delete is [Cygwin's unlink implementation](/~https://github.com/cygwin/cygwin/blob/ad101bcb0f55f0eb1a9f60187f949c3decd855e4/winsup/cygwin/syscalls.cc#L675-L1064). As you can see, it is quite complex. Of course our use case does not need to handle quite as many edge cases, but I think any implementation is going to be nontrivial, and require Windows-specific APIs not available in std.

Note: Even before #6900 I still get a lot of errors on a slow VM (particularly "directory not empty"), with Defender and Indexing off. I'm not sure why. This PR should make it more bearable, though.
@ehuss ehuss added this to the 1.37.0 milestone Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test suite fails when run with a single core
5 participants