Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build with frame pointers for improved profiling #10224

Closed
erikgrinaker opened this issue Dec 22, 2024 · 4 comments · Fixed by #10226
Closed

Build with frame pointers for improved profiling #10224

erikgrinaker opened this issue Dec 22, 2024 · 4 comments · Fixed by #10226
Assignees
Labels
a/observability Area: related to observability c/storage Component: storage

Comments

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Dec 22, 2024

Release binaries are currently built without frame pointers. This frees up a register for the compiler and avoids a couple of instructions per function call, which can improve performance (typically <1%). However, stack unwinding and profiling then has to use DWARF information to generate backtraces, which is far more expensive and can cause difficulty e.g. for perf and eBPF profilers.

We're considering continuous profiling, and jemalloc heap profiling already probabilistically takes stack traces during allocations. These stack traces will be much cheaper with frame pointers enabled. This might save more CPU than we lose with the dedicated frame pointer register, and allow us to profile at higher frequency.

The Rust stdlib recently enabled frame pointers by default for this reason. It's also possible that frame pointers are already enabled by default on aarch64 CPUs (used for Pageservers), since this architecture uses a dedicated frame pointer register.

Related reading:

@erikgrinaker erikgrinaker added a/performance Area: relates to performance of the system c/storage Component: storage labels Dec 22, 2024
@erikgrinaker erikgrinaker self-assigned this Dec 22, 2024
@erikgrinaker erikgrinaker added a/observability Area: related to observability and removed a/performance Area: relates to performance of the system labels Dec 22, 2024
This was referenced Dec 22, 2024
@erikgrinaker
Copy link
Contributor Author

It's also possible that frame pointers are already enabled by default on aarch64 CPUs (used for Pageservers)

This isn't the case. I looked at the assembly function prologue, which doesn't maintain frame pointers. This is confirmed by the rustc target specs, which default to FramePointer::MayOmiton Linux and doesn't override this for the aarch64_unknown_linux_gnu profile:

/~https://github.com/rust-lang/rust/blob/fd19773d2f8a070dc03f0072f9bc41a65fd04fed/compiler/rustc_target/src/spec/base/linux.rs

/~https://github.com/rust-lang/rust/blob/fd19773d2f8a070dc03f0072f9bc41a65fd04fed/compiler/rustc_target/src/spec/targets/aarch64_unknown_linux_gnu.rs

Apple aarch64 does enable frame pointers, since this is required by Apple debug tooling:

/~https://github.com/rust-lang/rust/blob/fd19773d2f8a070dc03f0072f9bc41a65fd04fed/compiler/rustc_target/src/spec/base/apple/mod.rs#L122

@erikgrinaker
Copy link
Contributor Author

tikv-jemallocator builds jemalloc with frame pointers:

[tikv-jemalloc-sys 0.6.0+5.3.0-1-ge13ca993e8ccb9ba9847cc330696e02839f328f7] CFLAGS="-O0 -ffunction-sections -fdata-sections -fPIC -gdwarf-4 -fno-omit-frame-pointer -m64 -Wall"

@erikgrinaker
Copy link
Contributor Author

The frame-pointer feature in pprof-rs is 10x slower than libunwind without frame pointers. See #10226 (comment).

@erikgrinaker
Copy link
Contributor Author

Reopening in an attempt to fix pprof-rs seg faults.

@erikgrinaker erikgrinaker reopened this Jan 5, 2025
github-merge-queue bot pushed a commit that referenced this issue Jan 6, 2025
## Problem

Frame pointers are typically disabled by default (depending on CPU
architecture), to improve performance. This frees up a CPU register, and
avoids a couple of instructions per function call. However, it makes
stack unwinding much more inefficient, since it has to use DWARF debug
information instead, and gives worse results with e.g. `perf` and eBPF
profiles. The `backtrace` implementation of `libunwind` is also
suspected to cause seg faults.

The performance benefit of frame pointer omission doesn't appear to
matter that much on modern 64-bit CPU architectures (which have plenty
of registers and optimized instruction execution), and benchmarks did
not show measurable overhead.

The Rust standard library and jemalloc already enable frame pointers by
default.

For more information, see
https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html.

Resolves #10224.
Touches #10225.

## Summary of changes

Enable frame pointers in all builds, and use frame pointers for pprof-rs
stack sampling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/observability Area: related to observability c/storage Component: storage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant