cargo: build with frame pointers #10226

erikgrinaker · 2024-12-22T22:01:30Z

Problem

Frame pointers are typically disabled by default (depending on CPU architecture), to improve performance. This frees up a CPU register, and avoids a couple of instructions per function call -- but this often doesn't matter on modern CPU architectures, and benchmarks did not show measurable overhead. However, it makes stack unwinding much more inefficient, since it has to use DWARF debug information instead, and gives worse results with e.g. perf and eBPF profiles. With continuous profiling, cheaper stack unwinding will likely be a net win, and allow us to use higher sampling resolution.

The Rust standard library and jemalloc already enable frame pointers by default.

For more information, see https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html.

Resolves #10224.

Summary of changes

Enable frame pointers in all builds, and use frame pointers for pprof-rs stack sampling.

github-actions · 2024-12-22T22:59:43Z

7095 tests run: 6797 passed, 0 failed, 298 skipped (full report)

Flaky tests (5)

Postgres 17

test_lr_with_slow_safekeeper: release-x86-64

Postgres 16

test_physical_replication_config_mismatch_too_many_known_xids: release-arm64
test_physical_replication_config_mismatch_max_locks_per_transaction: release-arm64

Postgres 15

test_lr_with_slow_safekeeper: release-x86-64

Postgres 14

test_metrics_normal_work: release-arm64

Code coverage* (full report)

functions: 31.2% (8399 of 26877 functions)
lines: 48.0% (66685 of 139059 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
de36026 at 2024-12-29T12:58:56.126Z :recycle:}

erikgrinaker · 2024-12-29T15:02:30Z

I ran some benchmarks for WalStreamDecoder::complete_record() on a Linux amd64 box. This isn't necessarily representative (it's mostly very optimized CRC32 checksumming), but it is a CPU-bound hot path that doesn't involve allocations or IO. The results don't show significant overhead -- there is a small amount of variation though, so I ran each benchmark for 30 seconds multiple times, with similar results.

complete_record/size=64 time:   [41.561 ns 41.891 ns 42.152 ns]
                        thrpt:  [1.4141 GiB/s 1.4229 GiB/s 1.4341 GiB/s]
                 change:
                        time:   [+0.7893% +1.1356% +1.4278%] (p = 0.00 < 0.05)
                        thrpt:  [-1.4077% -1.1228% -0.7831%]

complete_record/size=1024
                        time:   [144.64 ns 144.73 ns 144.83 ns]
                        thrpt:  [6.5847 GiB/s 6.5893 GiB/s 6.5935 GiB/s]
                 change:
                        time:   [-3.4100% -3.3531% -3.2898%] (p = 0.00 < 0.05)
                        thrpt:  [+3.4017% +3.4694% +3.5303%]

complete_record/size=8192
                        time:   [965.16 ns 965.27 ns 965.39 ns]
                        thrpt:  [7.9029 GiB/s 7.9039 GiB/s 7.9048 GiB/s]
                 change:
                        time:   [+0.4465% +0.4593% +0.4730%] (p = 0.00 < 0.05)
                        thrpt:  [-0.4708% -0.4572% -0.4445%]

complete_record/size=131072
                        time:   [13.298 µs 13.352 µs 13.425 µs]
                        thrpt:  [9.0927 GiB/s 9.1425 GiB/s 9.1799 GiB/s]
                 change:
                        time:   [+5.9949% +6.8990% +7.8233%] (p = 0.00 < 0.05)
                        thrpt:  [-7.2557% -6.4537% -5.6558%]

erikgrinaker · 2024-12-29T16:12:38Z

Interestingly, the frame-pointer feature of pprof-rs is an order of magnitude slower than just using libunwind without frame pointers (11 µs vs. 1.4 µs). libunwind performance did not change with or without frame pointers.

This was using a benchmark of pprof-rs TraceImpl::trace() with a stack depth of 40 on Linux.

This probably isn't worth it then. There was some hope that it might resolve #10225, as seen in grafana/pyroscope-rs#124, but if it requires the frame-pointer feature then it'll cause a 10x slowdown of traces which isn't acceptable.

erikgrinaker changed the title ~~cargo: build with fram pointers~~ cargo: build with frame pointers Dec 22, 2024

cargo: build with frame pointers

de36026

erikgrinaker force-pushed the erik/frame-pointer branch from c89b06c to de36026 Compare December 29, 2024 11:55

erikgrinaker closed this Dec 29, 2024

erikgrinaker mentioned this pull request Dec 29, 2024

Build with frame pointers for improved profiling #10224

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cargo: build with frame pointers #10226

cargo: build with frame pointers #10226

erikgrinaker commented Dec 22, 2024 •

edited

Loading

github-actions bot commented Dec 22, 2024 •

edited

Loading

Postgres 17

Postgres 16

Postgres 15

Postgres 14

erikgrinaker commented Dec 29, 2024

erikgrinaker commented Dec 29, 2024

cargo: build with frame pointers #10226

cargo: build with frame pointers #10226

Conversation

erikgrinaker commented Dec 22, 2024 • edited Loading

Problem

Summary of changes

github-actions bot commented Dec 22, 2024 • edited Loading

7095 tests run: 6797 passed, 0 failed, 298 skipped (full report)

Postgres 17

Postgres 16

Postgres 15

Postgres 14

Code coverage* (full report)

erikgrinaker commented Dec 29, 2024

erikgrinaker commented Dec 29, 2024

erikgrinaker commented Dec 22, 2024 •

edited

Loading

github-actions bot commented Dec 22, 2024 •

edited

Loading