[release/8.0] Fix cancellation unregistration in DataflowBlock.OutputAvailableAsync #102376
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
OutputAvailableAsync is not unregistering from the supplied CancellationToken. If a cancelable token is supplied and is long lived, each call with that token to OutputAvailableAsync will add another callback into that token, and that will continue to grow until either the token is dropped or has been cancellation requested. For a long-lived cancellation token, this is akin to a leak.
The implementation was trying to be too clever in avoiding an additional continuation that was previously there. However, this continuation makes it a lot easier to avoid possible deadlocks that can occur if a cancellation request comes in concurrently with a message being pushed. Instead of trying to avoid it, just use an async method, which still incurs the extra task but does so with less allocation and greatly simplifies the code while also fixing the issue, as all cleanup can now be done in the continuation as part of the async method.
Backport of #99632
Customer Impact
Unbounded memory growth if a long-lived CancellationToken is used with OutputAvailableAsync; each call would end up rooting a small object into the token.
Testing
Normal CI testing, plus local stress testing to show that the leak no longer exists.
Risk
Low