-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(iroh-blobs): properly handle Drop in local pool during shutdown #2517
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rklaehn
changed the title
fix(iroh-bytes): properly handle Drop in local pool during shutdown
fix(iroh-blobs)!: properly handle Drop in local pool during shutdown
Jul 16, 2024
tokio really loves their thread locals...
...so we can call shutdown on it
…lled to avoid panicking the task, so it becomes easier to see the panics that actually matter.
Also add "tokio style" panic when shutdown versions for backwards compat
rklaehn
commented
Jul 19, 2024
@@ -454,7 +455,10 @@ where | |||
|
|||
async fn build_inner(self) -> Result<ProtocolBuilder<D>> { | |||
trace!("building node"); | |||
let lp = LocalPoolHandle::new(num_cpus::get()); | |||
let lp = LocalPool::new(local_pool::Config { | |||
panic_mode: PanicMode::LogAndContinue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Womp womp, this is because of tokio-rs/tracing#2870, mostly
rt is confusing because we also have a normal runtime
just use spawn or run
we now have run, which is just a wrapped oneshot sender
rklaehn
changed the title
fix(iroh-blobs)!: properly handle Drop in local pool during shutdown
fix(iroh-blobs): properly handle Drop in local pool during shutdown
Jul 22, 2024
## Description Make sure the local pool threads use the main tokio runtime instead of a current_thread runtime per local pool thread. This is mostly relevant for call to spawn_blocking and spawn from inside local tasks. Before they would go to the single threaded runtime of that thread, now they go to the blocking pool of the main runtime. This means that the local futures are more tightly integrated with the main runtime. Everything you can do from a spawned future, you should be able to also do from a local future. ## Breaking Changes Surprisingly, none ## Notes & open questions Still not sure if this is OK to do, but given that even tokio::runtime::Handle has a block_on fn, it seems intended usage. ## Change checklist - [x] Self-review. - [x] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [x] ~~Tests if relevant.~~ - [x] ~~All breaking changes documented.~~
dignifiedquire
approved these changes
Jul 22, 2024
2 tasks
github-merge-queue bot
pushed a commit
that referenced
this pull request
Aug 1, 2024
…#2559) ## Description There's this test that was introduced in [the `local_pool` PR](#2517). It was ignored via `#[ignore = "todo"]`. Notably, it's *not flaky*, it always fails. Our flaky tests are run with `cargo nextest run --run-ignored all [...]`. We can't be more specific with the `ignore`d tests. The only options are `default`, `ignored-only` and `all`. This kind of test is really hard to write. IIUC, `#[should_panic]` can only test for the panic happening in the thread that the test is initiated in, it doesn't detect panics that are thrown in threads spawned from the test. I assume this is the reason writing this test was abandoned. Keeping this test with the `#[ignore = "todo"]` on it means we're always running it in our flaky test suite, which is confusing. We thought this test was flaky, but it's not. IMO it's better to comment it out/remove it than to pollute our flaky test results. ## Breaking Changes None ## Notes & open questions In this PR I'm commenting this test. Should we remove it instead? Or do people have ideas on how to make this test work? Do we have an idea what we're *expecting* of our `LocalPool` implementation? Should a panic on one of the threads cause a panic in the `finish()` function? ## Change checklist - [X] Self-review. - ~~[ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant.~~ - ~~[ ] Tests if relevant.~~ - [X] All breaking changes documented.
rklaehn
pushed a commit
to n0-computer/iroh-blobs
that referenced
this pull request
Oct 22, 2024
… (#2559) ## Description There's this test that was introduced in [the `local_pool` PR](n0-computer/iroh#2517). It was ignored via `#[ignore = "todo"]`. Notably, it's *not flaky*, it always fails. Our flaky tests are run with `cargo nextest run --run-ignored all [...]`. We can't be more specific with the `ignore`d tests. The only options are `default`, `ignored-only` and `all`. This kind of test is really hard to write. IIUC, `#[should_panic]` can only test for the panic happening in the thread that the test is initiated in, it doesn't detect panics that are thrown in threads spawned from the test. I assume this is the reason writing this test was abandoned. Keeping this test with the `#[ignore = "todo"]` on it means we're always running it in our flaky test suite, which is confusing. We thought this test was flaky, but it's not. IMO it's better to comment it out/remove it than to pollute our flaky test results. ## Breaking Changes None ## Notes & open questions In this PR I'm commenting this test. Should we remove it instead? Or do people have ideas on how to make this test work? Do we have an idea what we're *expecting* of our `LocalPool` implementation? Should a panic on one of the threads cause a panic in the `finish()` function? ## Change checklist - [X] Self-review. - ~~[ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant.~~ - ~~[ ] Tests if relevant.~~ - [X] All breaking changes documented.
rklaehn
pushed a commit
to n0-computer/iroh-blobs
that referenced
this pull request
Oct 22, 2024
… (#2559) ## Description There's this test that was introduced in [the `local_pool` PR](n0-computer/iroh#2517). It was ignored via `#[ignore = "todo"]`. Notably, it's *not flaky*, it always fails. Our flaky tests are run with `cargo nextest run --run-ignored all [...]`. We can't be more specific with the `ignore`d tests. The only options are `default`, `ignored-only` and `all`. This kind of test is really hard to write. IIUC, `#[should_panic]` can only test for the panic happening in the thread that the test is initiated in, it doesn't detect panics that are thrown in threads spawned from the test. I assume this is the reason writing this test was abandoned. Keeping this test with the `#[ignore = "todo"]` on it means we're always running it in our flaky test suite, which is confusing. We thought this test was flaky, but it's not. IMO it's better to comment it out/remove it than to pollute our flaky test results. ## Breaking Changes None ## Notes & open questions In this PR I'm commenting this test. Should we remove it instead? Or do people have ideas on how to make this test work? Do we have an idea what we're *expecting* of our `LocalPool` implementation? Should a panic on one of the threads cause a panic in the `finish()` function? ## Change checklist - [X] Self-review. - ~~[ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant.~~ - ~~[ ] Tests if relevant.~~ - [X] All breaking changes documented.
matheus23
pushed a commit
that referenced
this pull request
Nov 14, 2024
…2517) ## Description The tokio_util LocalPoolHandle does not properly handle Drop during shutdown. Its threads are just spawned as detached. So any drop impl that runs in a local pool thread will be stopped as soon as the process terminates. This can have some bad consequences if that drop operation performs IO, like closing files and committing database transactions. Here is where the threads get spawned. The `std::thread::JoinHandle`s are just dropped. https://docs.rs/tokio-util/latest/src/tokio_util/task/spawn_pinned.rs.html#381 Here is some discussion of the observed effects: https://discord.com/channels/949724860232392765/1260571544414064670 LocalPoolHandle also, of course, is using an unbounded channel: https://docs.rs/tokio-util/latest/src/tokio_util/task/spawn_pinned.rs.html#372 ## Breaking Changes Public interfaces using tokio_util::task::LocalPoolHandle will now use our own LocalPool/LocalPoolHandle. ## Notes & open questions Should we use an unbounded channel like tokio::spawn or LocalPoolHandle::spawn_pinned? Seems like a big footgun. But if not, we need to somehow handle when the queue is full. <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist - [x] Self-review. - [x] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [x] Tests if relevant. - [x] All breaking changes documented.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
The tokio_util LocalPoolHandle does not properly handle Drop during shutdown. Its threads are just spawned as detached. So any drop impl that runs in a local pool thread will be stopped as soon as the process terminates. This can have some bad consequences if that drop operation performs IO, like closing files and committing database transactions.
Here is where the threads get spawned. The
std::thread::JoinHandle
s are just dropped.https://docs.rs/tokio-util/latest/src/tokio_util/task/spawn_pinned.rs.html#381
Here is some discussion of the observed effects:
https://discord.com/channels/949724860232392765/1260571544414064670
LocalPoolHandle also, of course, is using an unbounded channel:
https://docs.rs/tokio-util/latest/src/tokio_util/task/spawn_pinned.rs.html#372
Breaking Changes
Public interfaces using tokio_util::task::LocalPoolHandle will now use our own LocalPool/LocalPoolHandle.
Notes & open questions
Should we use an unbounded channel like tokio::spawn or LocalPoolHandle::spawn_pinned? Seems like a big footgun. But if not, we need to somehow handle when the queue is full.
Change checklist