Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

safekeeper: critical!() can hit on S3 errors #11027

Closed
jcsp opened this issue Feb 27, 2025 · 0 comments · Fixed by #11034
Closed

safekeeper: critical!() can hit on S3 errors #11027

jcsp opened this issue Feb 27, 2025 · 0 comments · Fixed by #11034
Assignees
Labels
c/storage/safekeeper Component: storage: safekeeper t/bug Issue Type: Bug

Comments

@jcsp
Copy link
Contributor

jcsp commented Feb 27, 2025

In #10657 one of the critical cases was safekeeper reads:

 git blame safekeeper/src/send_interpreted_wal.rs | grep critical
f4cfa725b8 (Erik Grinaker 2025-02-06 11:30:27 +0100   19) use utils::critical;
f4cfa725b8 (Erik Grinaker 2025-02-06 11:30:27 +0100  290)                     .inspect_err(|err| critical!("failed to read WAL record: {err:?}"))
f4cfa725b8 (Erik Grinaker 2025-02-06 11:30:27 +0100  351)             critical!("failed to read WAL record: {err:?}");

It turns out that can hit on S3 500s with a backtrace like this:

Caused by:
    Failed to download a remote file: download s3 object
    
    Caused by:
        0: service error
        1: unhandled error (InternalError)
        2: Error { code: "InternalError", message: "We encountered an internal error. Please try again.", s3_extended_request_id: "123/456=", aws_request_id: "123123" }
    
    Stack backtrace:
       0: anyhow::error::<impl anyhow::Error>::new
                 at /home/nonroot/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/anyhow-1.0.94/src/backtrace.rs:27:14
       1: remote_storage::s3_bucket::S3Bucket::download_object::{{closure}}
                 at /home/nonroot/libs/remote_storage/src/s3_bucket.rs:305:21
       2: <remote_storage::s3_bucket::S3Bucket as remote_storage::RemoteStorage>::download::{{closure}}
                 at /home/nonroot/libs/remote_storage/src/s3_bucket.rs:807:10
       3: remote_storage::GenericRemoteStorage<alloc::sync::Arc<Other>>::download::{{closure}}
                 at /home/nonroot/libs/remote_storage/src/lib.rs:518:62
       4: safekeeper::wal_backup::read_object::{{closure}}
                 at /home/nonroot/safekeeper/src/wal_backup.rs:536:10
       5: safekeeper::wal_storage::WalReader::open_segment::{{closure}}
                 at /home/nonroot/safekeeper/src/wal_storage.rs:817:71
       6: safekeeper::wal_storage::WalReader::read::{{closure}}
@jcsp jcsp added t/bug Issue Type: Bug c/storage/safekeeper Component: storage: safekeeper labels Feb 27, 2025
github-merge-queue bot pushed a commit that referenced this issue Feb 28, 2025
## Problem

This `critical!` could fire on IO errors, which is just noisy.

Resolves #11027.

## Summary of changes

Downgrade to error, except for decode errors. These could be either data
corruption or a bug, but seem worth investigating either way.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/safekeeper Component: storage: safekeeper t/bug Issue Type: Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants