-
Notifications
You must be signed in to change notification settings - Fork 496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: support 1 million relations #9516
Labels
c/storage/pageserver
Component: storage: pageserver
t/feature
Issue type: feature, for new features or requests
Comments
github-merge-queue bot
pushed a commit
that referenced
this issue
Jan 13, 2025
## Problem In preparation to #9516. We need to store rel size and directory data in the sparse keyspace, but it does not support inheritance yet. ## Summary of changes Add a new type of keyspace "sparse but inherited" into the system. On the read path: we don't remove the key range when we descend into the ancestor. The search will stop when (1) the full key range is covered by image layers (which has already been implemented before), or (2) we reach the end of the ancestor chain. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
awarus
pushed a commit
that referenced
this issue
Jan 24, 2025
github-merge-queue bot
pushed a commit
that referenced
this issue
Feb 14, 2025
## Problem Part of #9516 ## Summary of changes This patch adds the support for storing reldir in the sparse keyspace. All logic are guarded with the `rel_size_v2_enabled` flag, so if it's set to false, the code path is exactly the same as what's currently in prod. Note that we did not persist the `rel_size_v2_enabled` flag and the logic around it will be implemented in the next patch. (i.e., what if we enabled it, restart the pageserver, and then it gets set to false? we should still read from v2 using the rel_size_v2_migration_status in the index_part). The persistence logic I'll implement in the next patch will disallow switching from v2->v1 via config item. I also refactored the metrics so that it can work with the new reldir store. However, this metric is not correctly computed for reldirs (see the comments) before. With the refactor, the value will be computed only when we have an initial value for the reldir size. The refactor keeps the incorrectness of the computation when there are more than 1 database. For the tests, we currently run all the tests with v2, and I'll set it to false and add some v2-specific tests before merging, probably also v1->v2 migration tests. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>
All functionality patches are either merged or ready for review (the last one: #10980). We can flip the flag for staging soon and do more testing. |
Next step: investigate slowness on arm64 (#10997), consider moving relation sizes keys (could be a separate task), plan for staging tests and how to do full migrations. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
c/storage/pageserver
Component: storage: pageserver
t/feature
Issue type: feature, for new features or requests
We do not currently define a maximum number of relations that we support, but it is known that beyond about 10k relations things get dicey. The exact number of issues is unknown, but the primary architectural issue is how we store RelDirectory as a monlithic blob that gets rewritten whenever we add/remove one.
Postgres itself does not define a practical limit on relations per database: the hard limit is approximately one billion, but it is well known that the practical limit is much lower, and dependent on hardware+config:
To pick an arbitrary but realistic goal, let's support+test 1 million tables. This is realistic because:
A tiny initial step in this direction is #9507, which adds a test that creates 8000 tables (not very many!) to reproduce a specific scaling bug in transaction aborts. That test currently has a relatively long runtime (tens of seconds) because our code for tracking timeline metadata is still very inefficient.
The goal is to make it work "fast enough", in the sense that a database is usable and things don't time out, but not necessarily to implement every possible optimisation. For example, logical size calculations will be expensive with 1 million relations (requiring many megabytes of reads from storage), and that is okay as long as the expense does not cause the system to fail from the user's point of view.
Implement code for rewriting metadata to the new format on startup. This should run very early during startup so that no other parts of the code need to understand the old format: we can then maintain this for a long time.will have two read paths instead.test_historic_storage_formats
)get_rel_exists()
during WAL ingestion with many relations #9855Out of scope:
The text was updated successfully, but these errors were encountered: