pageserver: support 1 million relations #9516

jcsp · 2024-10-25T11:37:33Z

We do not currently define a maximum number of relations that we support, but it is known that beyond about 10k relations things get dicey. The exact number of issues is unknown, but the primary architectural issue is how we store RelDirectory as a monlithic blob that gets rewritten whenever we add/remove one.

Postgres itself does not define a practical limit on relations per database: the hard limit is approximately one billion, but it is well known that the practical limit is much lower, and dependent on hardware+config:

To pick an arbitrary but realistic goal, let's support+test 1 million tables. This is realistic because:

Something like an array of relation sizes is only single digit megabytes with a million tables (whereas with a billion tables, such structures would likely need to be disk-based rather than simple in-memory structures)
If we the can create a few thousand tables per second, then a test that creates a million tables can run in minutes, not hours (i.e. within the envelope of what our CI supports)

A tiny initial step in this direction is #9507, which adds a test that creates 8000 tables (not very many!) to reproduce a specific scaling bug in transaction aborts. That test currently has a relatively long runtime (tens of seconds) because our code for tracking timeline metadata is still very inefficient.

The goal is to make it work "fast enough", in the sense that a database is usable and things don't time out, but not necessarily to implement every possible optimisation. For example, logical size calculations will be expensive with 1 million relations (requiring many megabytes of reads from storage), and that is okay as long as the expense does not cause the system to fail from the user's point of view.

Out of scope:

High database counts (Neon cloud already limits databases per project to 500 by default)
Revising pg_stat (Persist pg_stat information in pageserver #6560 ) code to handle large relation counts (current code skips writing pg_stat if the snapshot exceeds a size threshold)
Any postgres CLI/tooling issues around high relation counts

The text was updated successfully, but these errors were encountered:

## Problem In preparation to #9516. We need to store rel size and directory data in the sparse keyspace, but it does not support inheritance yet. ## Summary of changes Add a new type of keyspace "sparse but inherited" into the system. On the read path: we don't remove the key range when we descend into the ancestor. The search will stop when (1) the full key range is covered by image layers (which has already been implemented before), or (2) we reach the end of the ancestor chain. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>

## Problem Part of #9516 per RFC at #10412 ## Summary of changes Adding the necessary config items and index_part items for the large relation count work. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>

## Problem Part of #9516 ## Summary of changes This patch adds the support for storing reldir in the sparse keyspace. All logic are guarded with the `rel_size_v2_enabled` flag, so if it's set to false, the code path is exactly the same as what's currently in prod. Note that we did not persist the `rel_size_v2_enabled` flag and the logic around it will be implemented in the next patch. (i.e., what if we enabled it, restart the pageserver, and then it gets set to false? we should still read from v2 using the rel_size_v2_migration_status in the index_part). The persistence logic I'll implement in the next patch will disallow switching from v2->v1 via config item. I also refactored the metrics so that it can work with the new reldir store. However, this metric is not correctly computed for reldirs (see the comments) before. With the refactor, the value will be computed only when we have an initial value for the reldir size. The refactor keeps the incorrectness of the computation when there are more than 1 database. For the tests, we currently run all the tests with v2, and I'll set it to false and add some v2-specific tests before merging, probably also v1->v2 migration tests. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>

skyzh · 2025-02-25T20:41:10Z

All functionality patches are either merged or ready for review (the last one: #10980). We can flip the flag for staging soon and do more testing.

skyzh · 2025-02-26T17:07:15Z

Next step: investigate slowness on arm64 (#10997), consider moving relation sizes keys (could be a separate task), plan for staging tests and how to do full migrations.

jcsp added c/storage/pageserver Component: storage: pageserver t/feature Issue type: feature, for new features or requests labels Oct 25, 2024

erikgrinaker mentioned this issue Nov 22, 2024

pageserver: slow get_rel_exists() during WAL ingestion with many relations #9855

Closed

skyzh self-assigned this Jan 6, 2025

skyzh mentioned this issue Jan 8, 2025

feat(pageserver): support inherited sparse keyspace #10313

Merged

skyzh mentioned this issue Jan 17, 2025

feat(pageserver): add reldir migration configs #10439

Merged

skyzh mentioned this issue Jan 30, 2025

feat(pageserver): store reldir in sparse keyspace #10593

Merged

skyzh mentioned this issue Feb 25, 2025

feat(pageserver): persist reldir v2 migration status #10980

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pageserver: support 1 million relations #9516

pageserver: support 1 million relations #9516

jcsp commented Oct 25, 2024 •

edited by skyzh

Loading

skyzh commented Feb 25, 2025

skyzh commented Feb 26, 2025 •

edited

Loading

pageserver: support 1 million relations #9516

pageserver: support 1 million relations #9516

Comments

jcsp commented Oct 25, 2024 • edited by skyzh Loading

skyzh commented Feb 25, 2025

skyzh commented Feb 26, 2025 • edited Loading

jcsp commented Oct 25, 2024 •

edited by skyzh

Loading

skyzh commented Feb 26, 2025 •

edited

Loading