Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TBS: Explore replacing badger with pebble #15246

Open
Tracked by #14931
carsonip opened this issue Jan 14, 2025 · 1 comment · May be fixed by #15235
Open
Tracked by #14931

TBS: Explore replacing badger with pebble #15246

carsonip opened this issue Jan 14, 2025 · 1 comment · May be fixed by #15235
Assignees

Comments

@carsonip
Copy link
Member

Look into whether it is feasible to replace badger with pebble to consolidate on KV store dependency. Ensure that there is no major performance regression, support existing apm-server features, and identify blockers to adopt pebble for TBS.

@carsonip
Copy link
Member Author

Benchmarks

TLDR: After some iterating on optimizations and analyzing some profiles, pebble is on par with badger in benchmark workflow.

Work done so far

  • Created new branch [tbs-pebble-rebase] which is rebased on main branch, see draft PR: [WIP] TBS: Replace badger with pebble #15235
  • Use a forked pebble for max batch to reduce allocs
  • Mem table size and flush threshold tuning
  • Disable level compression
  • Enable table bloom filter

badger on apm-server main benchmark workflow

Running benchmarks...
Benchmark warmup time: 5m
Benchmark agents: 512
Benchmark event rate: 0/s
Benchmark count: 6
Benchmark duration: 2m
Benchmark run expression : BenchmarkAgentAll
BenchmarkAgentAll-512	     553	 260541248 ns/op	         0 error_responses/sec	      1382 errors/sec	     21758 events/sec	       907.0 gc_cycles	       707.0 max_goroutines	1261444136 max_heap_alloc	   4682987 max_heap_objects	1433145344 max_rss	        16.51 mean_available_indexers	     10562 metrics/sec	      8970 spans/sec	  35637795 tbs_lsm_size	 392156266 tbs_vlog_size	       844.4 txs/sec	448555310 B/op	 4411221 allocs/op
BenchmarkAgentAll-512	     801	 252201992 ns/op	         0 error_responses/sec	      1427 errors/sec	     22481 events/sec	      1379 gc_cycles	       704.0 max_goroutines	1260306400 max_heap_alloc	   4736696 max_heap_objects	1438785536 max_rss	        16.29 mean_available_indexers	     10915 metrics/sec	      9266 spans/sec	  45601504 tbs_lsm_size	 351887394 tbs_vlog_size	       872.3 txs/sec	453655783 B/op	 4417256 allocs/op
BenchmarkAgentAll-512	     561	 261476961 ns/op	         0 error_responses/sec	      1377 errors/sec	     21688 events/sec	       940.0 gc_cycles	       691.0 max_goroutines	1227936304 max_heap_alloc	   4860484 max_heap_objects	1467277312 max_rss	        16.65 mean_available_indexers	     10533 metrics/sec	      8938 spans/sec	  60765003 tbs_lsm_size	 306052159 tbs_vlog_size	       841.4 txs/sec	449569352 B/op	 4413293 allocs/op
BenchmarkAgentAll-512	     885	 205185009 ns/op	         0 error_responses/sec	      1755 errors/sec	     27635 events/sec	      1546 gc_cycles	       702.0 max_goroutines	1042784816 max_heap_alloc	   4403011 max_heap_objects	1194815488 max_rss	        14.37 mean_available_indexers	     13418 metrics/sec	     11390 spans/sec	  50891789 tbs_lsm_size	 332519163 tbs_vlog_size	      1072 txs/sec	456159205 B/op	 4414346 allocs/op
BenchmarkAgentAll-512	     559	 281483320 ns/op	         0 error_responses/sec	      1279 errors/sec	     20144 events/sec	       905.0 gc_cycles	       703.0 max_goroutines	1276355808 max_heap_alloc	   5081951 max_heap_objects	1498091520 max_rss	        16.95 mean_available_indexers	      9781 metrics/sec	      8302 spans/sec	  36830045 tbs_lsm_size	 266372657 tbs_vlog_size	       781.6 txs/sec	449191625 B/op	 4425107 allocs/op
BenchmarkAgentAll-512	     739	 223036149 ns/op	         0 error_responses/sec	      1614 errors/sec	     25421 events/sec	      1417 gc_cycles	       697.0 max_goroutines	 811746504 max_heap_alloc	   3604364 max_heap_objects	1018437632 max_rss	        15.44 mean_available_indexers	     12343 metrics/sec	     10478 spans/sec	  52188296 tbs_lsm_size	 268349381 tbs_vlog_size	       986.4 txs/sec	458476559 B/op	 4407548 allocs/op
make[1]: Leaving directory '/home/runner/work/apm-server/apm-server/testing/benchmark'

pebble benchmark workflow on commit 1c89735

BenchmarkAgentAll-512	     472	 298160472 ns/op	         0 error_responses/sec	      1207 errors/sec	     19997 events/sec	       601.0 gc_cycles	       757.0 max_goroutines	1388689704 max_heap_alloc	   4322788 max_heap_objects	1505554432 max_rss	        16.94 mean_available_indexers	      9231 metrics/sec	      8764 spans/sec	 155904548 tbs_lsm_size	         0 tbs_vlog_size	       794.9 txs/sec	604305729 B/op	 4400824 allocs/op
BenchmarkAgentAll-512	     499	 309514313 ns/op	         0 error_responses/sec	      1163 errors/sec	     19268 events/sec	       632.0 gc_cycles	       725.0 max_goroutines	1404890096 max_heap_alloc	   4534565 max_heap_objects	1553813504 max_rss	        17.15 mean_available_indexers	      8897 metrics/sec	      8442 spans/sec	 156954754 tbs_lsm_size	         0 tbs_vlog_size	       765.7 txs/sec	613979377 B/op	 4435160 allocs/op
BenchmarkAgentAll-512	     782	 244716316 ns/op	         0 error_responses/sec	      1471 errors/sec	     24376 events/sec	       975.0 gc_cycles	       750.0 max_goroutines	1486198504 max_heap_alloc	   4398963 max_heap_objects	1604104192 max_rss	        15.34 mean_available_indexers	     11259 metrics/sec	     10678 spans/sec	 156741043 tbs_lsm_size	         0 tbs_vlog_size	       968.5 txs/sec	609378909 B/op	 4418804 allocs/op
BenchmarkAgentAll-512	     502	 301330682 ns/op	         0 error_responses/sec	      1195 errors/sec	     19788 events/sec	       631.0 gc_cycles	       728.0 max_goroutines	1323958592 max_heap_alloc	   4258571 max_heap_objects	1499250688 max_rss	        17.13 mean_available_indexers	      9135 metrics/sec	      8672 spans/sec	 156855826 tbs_lsm_size	         0 tbs_vlog_size	       786.5 txs/sec	604224580 B/op	 4408866 allocs/op
BenchmarkAgentAll-512	     636	 238955232 ns/op	         0 error_responses/sec	      1507 errors/sec	     24958 events/sec	       814.0 gc_cycles	       742.0 max_goroutines	1526017728 max_heap_alloc	   4568769 max_heap_objects	1647812608 max_rss	        15.22 mean_available_indexers	     11524 metrics/sec	     10935 spans/sec	 157031321 tbs_lsm_size	         0 tbs_vlog_size	       991.8 txs/sec	604787049 B/op	 4404682 allocs/op
BenchmarkAgentAll-512	     504	 304077702 ns/op	         0 error_responses/sec	      1184 errors/sec	     19612 events/sec	       634.0 gc_cycles	       798.0 max_goroutines	1349592872 max_heap_alloc	   4744070 max_heap_objects	1482018816 max_rss	        17.18 mean_available_indexers	      9056 metrics/sec	      8593 spans/sec	 156933769 tbs_lsm_size	         0 tbs_vlog_size	       779.4 txs/sec	606208797 B/op	 4422900 allocs/op

Benchstat

             │ badger-main.stat │     run-1122-1c89735.stat     │
             │      sec/op      │    sec/op     vs base         │
AgentAll-512       256.4m ± 20%   299.7m ± 20%  ~ (p=0.132 n=6)

             │  badger-main.stat   │         run-1122-1c89735.stat          │
             │ error_responses/sec │ error_responses/sec  vs base           │
AgentAll-512            0.000 ± 0%            0.000 ± 0%  ~ (p=1.000 n=6) ¹
¹ all samples are equal

             │ badger-main.stat │     run-1122-1c89735.stat     │
             │    errors/sec    │  errors/sec   vs base         │
AgentAll-512       1.405k ± 25%   1.201k ± 25%  ~ (p=0.132 n=6)

             │ badger-main.stat │     run-1122-1c89735.stat     │
             │    events/sec    │  events/sec   vs base         │
AgentAll-512       22.12k ± 25%   19.89k ± 25%  ~ (p=0.132 n=6)

             │ badger-main.stat │       run-1122-1c89735.stat        │
             │    gc_cycles     │  gc_cycles   vs base               │
AgentAll-512       1159.5 ± 33%   633.0 ± 54%  -45.41% (p=0.015 n=6)

             │ badger-main.stat │        run-1122-1c89735.stat         │
             │  max_goroutines  │ max_goroutines  vs base              │
AgentAll-512         702.5 ± 2%       746.0 ± 7%  +6.19% (p=0.002 n=6)

             │ badger-main.stat │         run-1122-1c89735.stat         │
             │  max_heap_alloc  │ max_heap_alloc  vs base               │
AgentAll-512       1.244G ± 35%      1.397G ± 9%  +12.27% (p=0.002 n=6)

             │ badger-main.stat │       run-1122-1c89735.stat       │
             │ max_heap_objects │ max_heap_objects  vs base         │
AgentAll-512       4.710M ± 23%        4.467M ± 6%  ~ (p=0.310 n=6)

             │ badger-main.stat │       run-1122-1c89735.stat       │
             │     max_rss      │   max_rss    vs base              │
AgentAll-512       1.436G ± 29%   1.530G ± 8%  +6.53% (p=0.004 n=6)

             │    badger-main.stat     │          run-1122-1c89735.stat           │
             │ mean_available_indexers │ mean_available_indexers  vs base         │
AgentAll-512               16.40 ± 12%               17.04 ± 11%  ~ (p=0.310 n=6)

             │ badger-main.stat │     run-1122-1c89735.stat     │
             │   metrics/sec    │ metrics/sec   vs base         │
AgentAll-512      10.739k ± 25%   9.183k ± 25%  ~ (p=0.132 n=6)

             │ badger-main.stat │     run-1122-1c89735.stat     │
             │    spans/sec     │  spans/sec    vs base         │
AgentAll-512       9.118k ± 25%   8.718k ± 25%  ~ (p=0.589 n=6)

             │ badger-main.stat │        run-1122-1c89735.stat         │
             │   tbs_lsm_size   │ tbs_lsm_size  vs base                │
AgentAll-512       48.25M ± 26%   156.89M ± 1%  +225.19% (p=0.002 n=6)

             │ badger-main.stat │         run-1122-1c89735.stat         │
             │  tbs_vlog_size   │ tbs_vlog_size  vs base                │
AgentAll-512       319.3M ± 23%       0.0M ± 0%  -100.00% (p=0.002 n=6)

             │ badger-main.stat │    run-1122-1c89735.stat     │
             │     txs/sec      │   txs/sec    vs base         │
AgentAll-512        858.3 ± 25%   790.7 ± 25%  ~ (p=0.310 n=6)

             │ badger-main.stat │        run-1122-1c89735.stat        │
             │       B/op       │     B/op      vs base               │
AgentAll-512       430.7Mi ± 2%   577.4Mi ± 1%  +34.07% (p=0.002 n=6)

             │ badger-main.stat │    run-1122-1c89735.stat     │
             │    allocs/op     │  allocs/op   vs base         │
AgentAll-512        4.414M ± 0%   4.414M ± 0%  ~ (p=0.937 n=6)

@carsonip carsonip changed the title TBS: Explore using pebble to replace badger TBS: Explore replacing badger with pebble Jan 15, 2025
@carsonip carsonip linked a pull request Jan 22, 2025 that will close this issue
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant