-
Notifications
You must be signed in to change notification settings - Fork 496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
for #10993: cancellation sensitivity for buffered writer #11052
Draft
problame
wants to merge
22
commits into
problame/enspc-must-not-use-after-returned-error
Choose a base branch
from
problame/enspc-must-not-use-after-returned-error--cancellation-sensitivity
base: problame/enspc-must-not-use-after-returned-error
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… mutable = None condition that we're fixing in this PR. We can go fix _that_ as welll, but, before we do that, explore alternatives
…gain the mutable = None condition that we're fixing in this PR. We can go fix _that_ as welll, but, before we do that, explore alternatives" This reverts commit ee9d68c.
…e result to go away
This reverts commit 62a5c84.
This reverts commit 8784be2.
…st in the result to go away" This reverts commit a53987c.
This reverts commit 1b7d4ab2a6fd732cb08baa67baed55e94cc52be4.
…ble() always present; nice in theory but causes lost writes if flush() is retried
…ake mutable() always present; nice in theory but causes lost writes if flush() is retried" This reverts commit 5086c561c426521074e758a2f9206ced98e73901.
This reverts commit e758d26.
…xposes again the mutable = None condition that we're fixing in this PR. We can go fix _that_ as welll, but, before we do that, explore alternatives
…; walreceiver shouldn't be retrying any writes, and will cause a panic if it does
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In
I added infinite retries for buffered writer flush IOs, primarily to gracefully handle ENOSPC but more generally so that the buffered writer is not left in a state where reads from the surrounding InMemoryLayer cause panics, but stall instead until the read succeeds. See the #10993 description for details.
However, I didn't add cancellation sensitivity, which is concerning because then there is no way to detach a timeline/tenant tenant that is encountering the problem.
This PR
Timeline::cancel
to the flush loopThe support for read-availability after cancellation is is necessary so that reads from the InMemoryLayer that are already queued up behind the RwLock that wraps the BufferedWriter won't panic because of the
mutable=None
that we leave behind in case the flush loop gets cancelled.Alternatives
One might think that we can drop #10993 + the cancellation sensitivity added in this PR, using only the changes that add read-availability after flush error.
The problem here is that read-availability sounds good but is really quite useless, because we cannot ingest new WAL and hence very soon, reads from compute are going to wait indefinitely on
wait_lsn
, because ingest isn't progressing.Thus, having the infinite flush retry still retries still make more sense because they're just "slowness" to the user
This avoids any sort of panic when detaching a
Remaining read-available allows in-flight getpage
One might think
The problem with cancellation sensitivity is the same as with handling IO errors.
is that it's basically