Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: support 1 million relations #9516

Open
8 of 13 tasks
jcsp opened this issue Oct 25, 2024 · 2 comments
Open
8 of 13 tasks

pageserver: support 1 million relations #9516

jcsp opened this issue Oct 25, 2024 · 2 comments
Assignees
Labels
c/storage/pageserver Component: storage: pageserver t/feature Issue type: feature, for new features or requests

Comments

@jcsp
Copy link
Contributor

jcsp commented Oct 25, 2024

We do not currently define a maximum number of relations that we support, but it is known that beyond about 10k relations things get dicey. The exact number of issues is unknown, but the primary architectural issue is how we store RelDirectory as a monlithic blob that gets rewritten whenever we add/remove one.

Postgres itself does not define a practical limit on relations per database: the hard limit is approximately one billion, but it is well known that the practical limit is much lower, and dependent on hardware+config:

To pick an arbitrary but realistic goal, let's support+test 1 million tables. This is realistic because:

  • Something like an array of relation sizes is only single digit megabytes with a million tables (whereas with a billion tables, such structures would likely need to be disk-based rather than simple in-memory structures)
  • If we the can create a few thousand tables per second, then a test that creates a million tables can run in minutes, not hours (i.e. within the envelope of what our CI supports)

A tiny initial step in this direction is #9507, which adds a test that creates 8000 tables (not very many!) to reproduce a specific scaling bug in transaction aborts. That test currently has a relatively long runtime (tens of seconds) because our code for tracking timeline metadata is still very inefficient.

The goal is to make it work "fast enough", in the sense that a database is usable and things don't time out, but not necessarily to implement every possible optimisation. For example, logical size calculations will be expensive with 1 million relations (requiring many megabytes of reads from storage), and that is okay as long as the expense does not cause the system to fail from the user's point of view.

Out of scope:

  • High database counts (Neon cloud already limits databases per project to 500 by default)
  • Revising pg_stat (Persist pg_stat information in pageserver #6560 ) code to handle large relation counts (current code skips writing pg_stat if the snapshot exceeds a size threshold)
  • Any postgres CLI/tooling issues around high relation counts
@jcsp jcsp added c/storage/pageserver Component: storage: pageserver t/feature Issue type: feature, for new features or requests labels Oct 25, 2024
@skyzh skyzh self-assigned this Jan 6, 2025
github-merge-queue bot pushed a commit that referenced this issue Jan 13, 2025
## Problem

In preparation to #9516. We
need to store rel size and directory data in the sparse keyspace, but it
does not support inheritance yet.

## Summary of changes

Add a new type of keyspace "sparse but inherited" into the system.

On the read path: we don't remove the key range when we descend into the
ancestor. The search will stop when (1) the full key range is covered by
image layers (which has already been implemented before), or (2) we
reach the end of the ancestor chain.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
github-merge-queue bot pushed a commit that referenced this issue Jan 20, 2025
## Problem

Part of #9516 per RFC at #10412

## Summary of changes

Adding the necessary config items and index_part items for the large
relation count work.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
awarus pushed a commit that referenced this issue Jan 24, 2025
## Problem

Part of #9516 per RFC at #10412

## Summary of changes

Adding the necessary config items and index_part items for the large
relation count work.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
github-merge-queue bot pushed a commit that referenced this issue Feb 14, 2025
## Problem

Part of #9516

## Summary of changes

This patch adds the support for storing reldir in the sparse keyspace.
All logic are guarded with the `rel_size_v2_enabled` flag, so if it's
set to false, the code path is exactly the same as what's currently in
prod.

Note that we did not persist the `rel_size_v2_enabled` flag and the
logic around it will be implemented in the next patch. (i.e., what if we
enabled it, restart the pageserver, and then it gets set to false? we
should still read from v2 using the rel_size_v2_migration_status in the
index_part). The persistence logic I'll implement in the next patch will
disallow switching from v2->v1 via config item.

I also refactored the metrics so that it can work with the new reldir
store. However, this metric is not correctly computed for reldirs (see
the comments) before. With the refactor, the value will be computed only
when we have an initial value for the reldir size. The refactor keeps
the incorrectness of the computation when there are more than 1
database.

For the tests, we currently run all the tests with v2, and I'll set it
to false and add some v2-specific tests before merging, probably also
v1->v2 migration tests.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
@skyzh
Copy link
Member

skyzh commented Feb 25, 2025

All functionality patches are either merged or ready for review (the last one: #10980). We can flip the flag for staging soon and do more testing.

@skyzh
Copy link
Member

skyzh commented Feb 26, 2025

Next step: investigate slowness on arm64 (#10997), consider moving relation sizes keys (could be a separate task), plan for staging tests and how to do full migrations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/feature Issue type: feature, for new features or requests
Projects
None yet
Development

No branches or pull requests

2 participants