Compute release 2025-02-28 #11039

vipvap · 2025-02-28T11:18:48Z

Compute release 2025-02-28

Please merge this Pull Request using 'Create a merge commit' button

## Problem close neondatabase/cloud#24485 ## Summary of changes This patch adds a new chaos injection mode for the storcon. The chaos injector reads the crontab and exits immediately at the configured time. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>

For archived timelines, we would like to prevent all non-pageserver issued getpage requests, as users are not supposed to send these. Instead, they should unarchive a timeline before issuing any external read traffic. As this is non-trivial to do, at least prevent launches of new computes, by errorring on basebackup requests for archived timelines. In #10688, we started issuing a warning instead of an error, because an error would mean a stuck project. Now after we can confirm the the warning is not present in the logs for about a week, we can issue errors. Follow-up of #10688 Related: #9548

WaitEventSetWait() returns the number of "events" that happened, and only that many events in the WaitEvent array are updated. When the timeout is reached, it returns 0 and does not modify the WaitEvent array at all. We were reading 'event.events' without checking the return value, which would be uninitialized when the timeout was hit. No test included, as this is harmless at the moment. But this caused the test I'm including in PR #10882 to fail. That PR changes the logic to loop back to retry the PQgetCopyData() call if WL_SOCKET_READABLE was set. Currently, making an extra call to PQconsumeInput() is harmless, but with that change in logic, it turns into a busy-wait.

## Problem There's new rate-limits coming on docker hub. To reduce our reliance on docker hub and the problems the limits are going to cause for us, we want to prepare for this by also pushing our container images to ghcr.io ## Summary of changes Push our images to ghcr.io as well and not just docker hub.

…oncake (#10915) ## Problem Introducing pg_duckdb caused a conflict with pg_mooncake. Both use libduckdb.so in different versions. ## Summary of changes - Rename the libduckdb.so to libduckdb_pg_duckdb.so in the context of pg_duckdb so that it doesn't conflict with libduckdb.so referenced by pg_mooncake. - use a version map to rename the duckdb symbols to a version specific name - DUCKDB_1.1.3 for pg_mooncake - DUCKDB_1.2.0 for pg_duckdb For the concept of version maps see - https://www.man7.org/conf/lca2006/shared_libraries/slide19a.html - https://peeterjoot.com/2019/09/20/an-example-of-linux-glibc-symbol-versioning/ - https://akkadia.org/drepper/dsohowto.pdf

#10882) We've seen some cases in production where a compute doesn't get a response to a pageserver request for several minutes, or even more. We haven't found the root cause for that yet, but whatever the reason is, it seems overly optimistic to think that if the pageserver hasn't responded for 2 minutes, we'd get a response if we just wait patiently a little longer. More likely, the pageserver is dead or there's some kind of a network glitch so that the TCP connection is dead, or at least stuck for a long time. Either way, it's better to disconnect and reconnect. I set the default timeout to 2 minutes, which should be enough for any GetPage request under normal circumstances, even if the pageserver has to download several layer files from remote storage. Make the disconnect timeout configurable. Also make the "log interval", after which we print a message to the log configurable, so that if you change the disconnect timeout, you can set the log timeout correspondingly. The default log interval is still 10 s. The new GUCs are called "neon.pageserver_response_log_timeout" and "neon.pageserver_response_disconnect_timeout". Includes a basic test for the log and disconnect timeouts. Implements issue #10857

## Problem We recently added slow GetPage request logging. However, this unintentionally included the flush time when logging (which we already have separate logging for). It also logs at WARN level, which is a bit aggressive since we see this fire quite frequently. Follows #10906. ## Summary of changes * Only log the request execution time, not the flush time. * Extract a `pagestream_dispatch_batched_message()` helper. * Rename `warn_slow()` to `log_slow()` and downgrade to INFO.

## Problem While looking at logs I noticed that heartbeats are sent sequentially. The loop polling the UnorderedSet is at the wrong level of identation. Instead of doing it after we have the full set, we did after each entry. ## Summary of Changes Poll the UnorderedSet properly.

…10951) ## Problem The interpreted WAL reader tracks the start of the current logical batch. This needs to be invalidated when the reader is reset. This bug caused a couple of WAL gap alerts in staging. ## Summary of changes * Refactor to make it possible to write a reproducer * Add repro unit test * Fix by resetting the start with the reader Related neondatabase/cloud#23935

## Problem As part of #8614 we need to pass membership configurations between compute and safekeepers. ## Summary of changes Add version 3 of the protocol carrying membership configurations. Greeting message in both sides gets full conf, and other messages generation number only. Use protocol bump to include other accumulated changes: - stop packing whole structs on the wire as is; - make the tag u8 instead of u64; - send all ints in network order; - drop proposer_uuid, we can pass it in START_WAL_PUSH and it wasn't much useful anyway. Per message changes, apart from mconf: - ProposerGreeting: tenant / timeline id is sent now as hex cstring. Remove proto version, it is passed outside in START_WAL_PUSH. Remove postgres timeline, it is unused. Reorder fields a bit. - AcceptorGreeting: reorder fields - VoteResponse: timeline_start_lsn is removed. It can be taken from first member of term history, and later we won't need it at all when all timelines will be explicitly created. Vote itself is u8 instead of u64. - ProposerElected: timeline_start_lsn is removed for the same reasons. - AppendRequest: epoch_start_lsn removed, it is known from term history in ProposerElected. Both compute and sk are able to talk v2 and v3 to make rollbacks (in case we need them) easier; neon.safekeeper_proto_version GUC sets the client version. v2 code can be dropped later. So far empty conf is passed everywhere, future PRs will handle them. To test, add param to some tests choosing proto version; we want to test both 2 and 3 until we fully migrate. ref #10326 --------- Co-authored-by: Arthur Petukhovsky <petuhovskiy@yandex.ru>

## Problem PG14 uses separate backend for stats collector having no access to shaerd memory. As far as AUX mechanism requires access to shared memory, persisting pgstat.stat file is not supported at pg14. And so there is no definition of `neon_pgstat_file_size_limit` variable. It makes it impossible to provide same config for all Postgres version. ## Summary of changes Move neon_pgstat_file_size_limit to Neon extension. Postgres submodules PR: neondatabase/postgres#587 neondatabase/postgres#588 neondatabase/postgres#589 --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Tristan Partin <tristan@neon.tech>

cargo knows what edition each crate uses.

## Problem We use `docker cp` to copy the files required for the extension tests now. It causes problems if we run older images with the newer source tree. ## Summary of changes Copying the files was moved to the compute Dockerfile.

#10968) This pops up a few times during deployment. Not sure why it fires without `self.cancel` being cancelled, but could be e.g. ancestor timelines or sth.

## Problem We have `test_perf_many_relations` but it only runs on remote clusters, and we cannot directly modify tenant config. Therefore, I patched one of the current tests to benchmark relv2 performance. close #9986 ## Summary of changes * Add `v1/v2` selector to `test_tx_abort_with_many_relations`. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>

## Problem part of #9114 ## Summary of changes Add the auto trigger for gc-compaction. It computes two values: L1 size and L2 size. When L1 size >= initial trigger threshold, we will trigger an initial gc-compaction. When l1_size / l2_size >= gc_compaction_ratio_percent, we will trigger the "tiered" gc-compaction. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>

## Problem The patch for `semver` extensions relies on `PG_VERSION` environment variable. The files were named without the letter `v` so script cannot find them. ## Summary of changes The patch files were renamed.

60min is not enough for debug builds Signed-off-by: Alex Chi Z <chi@neon.tech>

…0931) ## Problem `neon endpoint list` shows a different LSN than what the state of the replica is. This is mainly down to what we define as LSN in this output. If we define it as the LSN that a compute was started with, it only makes sense to show it for static computes. ## Summary of changes Removed the output of `last_record_lsn` for primary/hot standby computes. Closes: #5825 --------- Co-authored-by: Tristan Partin <tristan@neon.tech>

## Problem PR #10442 added prefetch_lookup function. It changed handling of getpage requests in compute. Before: 1. Lookup in LFC (return if found) 2. Register prefetch buffer 3. Wait prefetch result (increment getpage_hist) Now: 1. Lookup prefetch ring (return if prefetch request is already completed) 2. Lookup in LFC (return if found) 3. Register prefetch buffer 4. Wait prefetch result (increment getpage_hist) So if prefetch result is already available, then get page histogram is not incremented. It case failure of some test_throughtput benchmarks: https://neondb.slack.com/archives/C033RQ5SPDH/p1740425527249499 ## Summary of changes Increment getpage histogram in `prefetch_lookup` Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>

Updates storage components to edition 2024. We like to stay on the latest edition if possible. There is no functional changes, however some code changes had to be done to accommodate the edition's breaking changes. The PR has two commits: * the first commit updates storage crates to edition 2024 and appeases `cargo clippy` by changing code. i have accidentially ran the formatter on some files that had other edits. * the second commit performs a `cargo fmt` I would recommend a closer review of the first commit and a less close review of the second one (as it just runs `cargo fmt`). part of #10918

Bump version to pick up changes introduced in neondatabase/autoscaling#1286 It's better to have a compute release for this change first, because: - vm-runner changes kernel loglevel from 7 to 6 - vm-builder has a change to bring it back to 7 after startup Previous update: #10015

…10979) In local dev environment, these steps take around 100 ms, and they are in the critical path of a compute startup on a compute pool hit. I don't know if it's like that in production, but as first step, add tracing spans to the functions so that they can be measured more easily.

…10985)

neondatabase/cloud#24464

We don't want to serialize to/from string all the time, so use `SchedulingPolicy` in `SafekeeperPersistence` via the use of a wrapper. Stacked atop #10891

... and start the initial reader with the correct lsn Closes #10978

Ref: neondatabase/cloud#24939 ## Problem I found that we are missing authorization for some container jobs, that will make them use anonymous pulls. It's not an issue for now, with high enough limits, but that could be an issue when new limits introduced in DockerHub (10 pulls / hour) ## Summary of changes - add credentials for the jobs that run in containers

Updates `compute_tools` and `compute_api` crates to edition 2024. We like to stay on the latest edition if possible. There is no functional changes, however some code changes had to be done to accommodate the edition's breaking changes. The PR has three commits: * the first commit updates the named crates to edition 2024 and appeases `cargo clippy` by changing code. * the second commit performs a `cargo fmt` that does some minor changes (not many) * the third commit performs a cargo fmt with nightly options to reorder imports as a one-time thing. it's completely optional, but I offer it here for the compute team to review it. I'd like to hear opinions about the third commit, if it's wanted and felt worth the diff or not. I think most attention should be put onto the first commit. Part of #10918

Migrates the remaining crates to edition 2024. We like to stay on the latest edition if possible. There is no functional changes, however some code changes had to be done to accommodate the edition's breaking changes. Like the previous migration PRs, this is comprised of three commits: * the first does the edition update and makes `cargo check`/`cargo clippy` pass. we had to update bindgen to make its output [satisfy the requirements of edition 2024](https://doc.rust-lang.org/edition-guide/rust-2024/unsafe-extern.html) * the second commit does a `cargo fmt` for the new style edition. * the third commit reorders imports as a one-off change. As before, it is entirely optional. Part of #10918

## Problem So far cumulative statistics have not been persisted when Neon scales to zero (suspends endpoint). With PR #6560 the cumulative statistics should now survive endpoint restarts and correctly trigger the auto- vacuum and auto analyze maintenance So far we did not have a testcase that validates that improvement in our dev cloud environment with a real project. ## Summary of changes Introduce testcase `test_cumulative_statistics_persistence`in the benchmarking workflow running daily to verify: - Verifies that the cumulative statistics are correctly persisted across restarts. - Cumulative statistics are important to persist across restarts because they are used - when auto-vacuum an auto-analyze trigger conditions are met. - The test performs the following steps: - Seed a new project using pgbench - insert tuples that by itself are not enough to trigger auto-vacuum - suspend the endpoint - resume the endpoint - insert additional tuples that by itself are not enough to trigger auto-vacuum but in combination with the previous tuples are - verify that autovacuum is triggered by the combination of tuples inserted before and after endpoint suspension ## Test run /~https://github.com/neondatabase/neon/actions/runs/13546879714/job/37860609089#step:6:282

…t to retries (#10991) Before this PR, re-attach and validate would log the same warning ``` calling control plane generation validation API failed ``` on retry errors. This can be confusing. This PR makes the message generically valid for any upcall and adds additional tracing spans to capture context. Along the way, clean up some copy-pasta variable naming. refs - #10381 (comment) --------- Co-authored-by: Alexander Lakhin <alexander.lakhin@neon.tech>

No longer true now that we eagerly notify the compaction loop.

## Problem We created extensions in a single database. The tests could interfere, i.e., discover some service tables left by other extensions and produce unexpected results. ## Summary of changes The tests are now run in a separate timeline, so only one extension owns the database, which prevents interference.

This reduces pressure on OS TCP buffers, reducing flush times in other systems like PageServer. ## Problem ## Summary of changes

## Problem We see `Inmem storage overflow` in page server logs: https://neondb.slack.com/archives/C033RQ5SPDH/p1740157873114339 walked process is using inseam SMGR with storage size limited by 64 pages with warning watermark 32 (based ion the assumption that XLR_MAX_BLOCK_ID is 32, so WAL record can not access more than 32 pages). Actually it is not true. We can update up to 3 forks for each block (including update of FSM and VM forks). ## Summary of changes This PR increases inseam SMGR size for walled process to 100 pages and print stack trace in case of overflow. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>

## Problem #10841 made building compute and neon images optional on releases that don't need them. The `push-<component>-image-prod` jobs had transitive dependencies that were skipped due to that, causing the images not to be pushed to production registries. ## Summary of changes Add `!failure() && !cancelled() &&` to the beginning of the conditions for these jobs to ensure they run even if some of their transitive dependencies are skipped.

## Problem layer_map access was unwrapped. It might return an error during shutdown. ## Summary of changes Propagate the layer_map access error back to the compaction loop. Signed-off-by: Alex Chi Z <chi@neon.tech>

…11006) Seems nice to have the function and all its subroutines in the same source file.

## Problem Storage controller will proxy GETs to pageserver-like tenant/timeline paths through to the pageserver. Usually GET passthroughs make sense to go to shard 0, e.g. if you want to list timelines. But sometimes you really want to know about a particular shard, e.g. reading its cache state or similar. ## Summary of changes - Accept shard IDs as well as tenant IDs in the passthrough route - Refactor node lookup to take a shard ID and make the tenant ID case a layer on top of that. This is one more lock take-drop during these requests, but it's not particularly expensive and these requests shouldn't be terribly frequent This is not immediately used by anything, but will be there any time we want to e.g. do a pass-through query to check the warmth of a tenant cache on a particular shard or somesuch.

) ## Problem We intend for cplane to use the heatmap layer download API to warm up timelines after unarchival. It's tricky for them to recurse in the ancestors, and the current implementation doesn't work well when unarchiving a chain of branches and warming them up. ## Summary of changes * Add a `recurse` flag to the API. When the flag is set, the operation recurses into the parent timeline after the current one is done. * Be resilient to warming up a chain of unarchived branches. Let's say we unarchived `B` and `C` from the `A -> B -> C` branch hierarchy. `B` got unarchived first. We generated the unarchival heatmaps and stash them in `A` and `B`. When `C` unarchived, it dropped it's unarchival heatmap since `A` and `B` already had one. If `C` needed layers from `A` and `B`, it was out of luck. Now, when choosing whether to keep an unarchival heatmap we look at its end LSN. If it's more inclusive than what we currently have, keep it.

## Problem CI does not pass for the compute release due to the absence of some images ## Summary of changes Now we use the images from the old non-compute releases for non-compute images

github-actions · 2025-02-28T12:27:18Z

7744 tests run: 7363 passed, 0 failed, 381 skipped (full report)

Code coverage* (full report)

functions: 32.8% (8639 of 26362 functions)
lines: 48.6% (73198 of 150552 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
0d3f7a2 at 2025-02-28T12:27:17.376Z :recycle:}

skyzh and others added 30 commits February 24, 2025 15:30

pre-commit: Switch to cargo fmt to handle per-crate editions (#10969)

f4fefd9

cargo knows what edition each crate uses.

Use the Dockerfile COPY instead of docker cp (#10943)

f78ac44

## Problem We use `docker cp` to copy the files required for the extension tests now. It causes problems if we run older images with the newer source tree. ## Summary of changes Copying the files was moved to the compute Dockerfile.

pageserver: ignore CollectKeySpaceError::Cancelled during compaction (

8deeddd

#10968) This pops up a few times during deployment. Not sure why it fires without `self.cancel` being cancelled, but could be e.g. ancestor timelines or sth.

Rename the patch files for the semver test (#10966)

1fb2faa

## Problem The patch for `semver` extensions relies on `PG_VERSION` environment variable. The files were named without the letter `v` so script cannot find them. ## Summary of changes The patch files were renamed.

fix(ci): extend timeout to 75min (#10963)

c69ebb4

60min is not enough for debug builds Signed-off-by: Alex Chi Z <chi@neon.tech>

Remove some redundant log lines at postgres startup (#10958)

e452f2a

Silence "sudo: unable to resolve host" messages at compute startup (#…

40ad42d

…10985)

proxy: Record and export user-agent header (#10955)

0d36f52

neondatabase/cloud#24464

storcon: use the SchedulingPolicy enum in SafekeeperPersistence (#10897)

26bda17

We don't want to serialize to/from string all the time, so use `SchedulingPolicy` in `SafekeeperPersistence` via the use of a wrapper. Stacked atop #10891

tests: use generated record lsn instead of hardcoded one (#10990)

622a9de

... and start the initial reader with the correct lsn Closes #10978

arpad-m and others added 14 commits February 27, 2025 09:40

pageserver: remove stale comment (#11016)

93b59e6

No longer true now that we eagerly notify the compaction loop.

PS/Prefetch: Use a timeout for reading data from TCP (#10834)

a283eda

This reduces pressure on OS TCP buffers, reducing flush times in other systems like PageServer. ## Problem ## Summary of changes

fix(pageserver): correctly access layer map in gc-compaction (#11021)

ab1f22b

## Problem layer_map access was unwrapped. It might return an error during shutdown. ## Summary of changes Propagate the layer_map access error back to the compaction loop. Signed-off-by: Alex Chi Z <chi@neon.tech>

compute_ctl: Refactor, moving spec_apply functions to spec_apply.rs (#…

a4b2009

…11006) Seems nice to have the function and all its subroutines in the same source file.

Make test extensions upgrade work with absent images (#11036)

7607686

## Problem CI does not pass for the compute release due to the absence of some images ## Summary of changes Now we use the images from the old non-compute releases for non-compute images

Compute release 2025-02-28

0d3f7a2

vipvap requested review from a team as code owners February 28, 2025 11:18

vipvap requested review from knizhnik, hlinnaka, jcsp, clipperhouse, nikitakalyanov and conradludgate and removed request for a team February 28, 2025 11:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute release 2025-02-28 #11039

Compute release 2025-02-28 #11039

vipvap commented Feb 28, 2025

github-actions bot commented Feb 28, 2025

Compute release 2025-02-28 #11039

Are you sure you want to change the base?

Compute release 2025-02-28 #11039

Conversation

vipvap commented Feb 28, 2025