-
Notifications
You must be signed in to change notification settings - Fork 496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store prefetch results in LFC cache once as soon as they are received #10442
Conversation
7590 tests run: 7213 passed, 0 failed, 377 skipped (full report)Flaky tests (2)Postgres 17
Postgres 15
Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
a7ed5d3 at 2025-02-21T14:42:27.072Z :recycle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've only done a high-level review, but the approach makes sense to me. I expect the synchronization overhead would be negligible since this is on an IO path, but would be interested to see how benchmarks are affected.
4c3b6ca
to
071142f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing the documentation as to why it's OK to side-load the data here without holding a buffer lock.
Could you add that info?
071142f
to
582953e
Compare
Below are some results illustrating importance of storing prefetch results in LFC. Configuration:
Database
Query (range-query.sql):
Benchmark
Results:
|
15e88f7
to
bccf077
Compare
d40a973
to
81c5642
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last comment: Please coordinate with the release plans for 17.3 this week before merging, so that we have an appropriate amount of time for this to stew on staging before this is released to all customers.
It looks like the test added would be flaky from the beginning. I tried to run it in parallel and got:
All instances failed like this:
What do you think of increasing statement_timeout for that insert? |
Moreover, I could also get the following error:
Or with the complete plan:
Wouldn't it make sense to debug the new test before merging this PR? |
+1, thanks for looking at it. @knizhnik can you please debug test failures? |
@alexanderlaw - I wonder how are you running this test in parallel? Are there N independent Neon instances? |
If this PR added a GUC in the Postgres fork or
If you're an external contributor, a Neon employee will assist in |
Running this test in a loop and 5 instances in parallel, I can see the following numbers for the first 4 queries (with grep "Unused prefetches:") when it passes:
But when it fails, the numbers are:
|
I prefer to use something like:
So I have N independent pytest and neon instances. But the same effect wrt this test can also be achieved with just:
The above recipe produces for me:
|
66fa88a
to
c0af808
Compare
Commit c0af808 makes the test stable:
Thank you! |
I think, I've discovered another anomaly related to this PR, investigating... |
Please try the following modification for test_pg_regress: It makes test_runner/regress/test_pg_regress.py::test_pg_regress[release-pg17-None] or [debug-...] fail roughly on each second run with miscellaneous errors like:
and so on... |
|
It fails for me too, but with a more serious error:
I'm testing commit db32086. |
One more fix... |
Still no luck:
I wonder if that test survives, say, 10 iterations in your environment? |
Will try. |
Sorry, found one more bug:( |
As I understand, changes since 2025-02-13, roughly:
were not reviewed explicitly; wouldn't it make sense to review them now? |
The patch seems to regress several |
The failed assertion:
which means that compute is prefetching more pages than before? i.e., pageserver receives more requests than compute getpage count |
## Problem PR #10442 added prefetch_lookup function. It changed handling of getpage requests in compute. Before: 1. Lookup in LFC (return if found) 2. Register prefetch buffer 3. Wait prefetch result (increment getpage_hist) Now: 1. Lookup prefetch ring (return if prefetch request is already completed) 2. Lookup in LFC (return if found) 3. Register prefetch buffer 4. Wait prefetch result (increment getpage_hist) So if prefetch result is already available, then get page histogram is not incremented. It case failure of some test_throughtput benchmarks: https://neondb.slack.com/archives/C033RQ5SPDH/p1740425527249499 ## Summary of changes Increment getpage histogram in `prefetch_lookup` Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Problem
Prefetch is performed locally, so different backers can request the same pages form PS.
Such duplicated request increase load of page server and network traffic.
Making prefetch global seems to be very difficult and undesirable, because different queries can access chunks on different speed. Storing prefetch chunks in LFC will not completely eliminate duplicates, but can minimise such requests.
The problem with storing prefetch result in LFC is that in this case page is not protected by share buffer lock.
So we will have to perform extra synchronisation at LFC side.
See:
https://neondb.slack.com/archives/C0875PUD0LC/p1736772890602029?thread_ts=1736762541.116949&cid=C0875PUD0LC
@MMeent implementation of prewarm:
See #10312
Summary of changes
Use conditional variables to sycnhronize access to LFC entry.