Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve deku performance #345

Closed
wants to merge 3 commits into from
Closed

Conversation

wcampbell0x2a
Copy link
Collaborator

Still want to do benchmarks and more testing, but it's done-ish

See #344

@github-actions
Copy link

Benchmark for d1b38cf

Click to view benchmark
Test Base PR %
deku_read_bits 608.6±0.50ns 608.7±0.69ns +0.02%
deku_read_byte 17.6±0.09ns 8.4±0.04ns -52.27%
deku_read_enum 30.1±0.54ns 16.7±0.12ns -44.52%
deku_read_vec 1402.0±11.22ns 934.5±8.60ns -33.35%
deku_read_vec_perf 1404.7±10.10ns 947.9±5.13ns -32.52%
deku_write_bits 108.0±0.20ns 111.4±0.61ns +3.15%
deku_write_byte 66.7±0.25ns 66.7±0.25ns 0.00%
deku_write_enum 111.8±0.37ns 115.0±0.63ns +2.86%
deku_write_vec 5.2±0.02µs 5.2±0.01µs 0.00%
deku_write_vec_perf 5.3±0.01µs 5.2±0.01µs -1.89%

- Instead of always returning a new performance costly BitSlice, just
  return the amount of bits read and just increment the starting len
  into the BitSlice in DekuRead and DekuContainerRead functions.
@wcampbell0x2a wcampbell0x2a force-pushed the improve-deku-performance branch from f3bc876 to bdd52bd Compare June 15, 2023 00:28
@github-actions
Copy link

Benchmark for a443c4b

Click to view benchmark
Test Base PR %
deku_read_bits 608.9±0.42ns 608.7±0.84ns -0.03%
deku_read_byte 17.6±0.09ns 8.4±0.00ns -52.27%
deku_read_enum 29.7±0.15ns 16.8±0.13ns -43.43%
deku_read_vec 1399.1±4.71ns 934.4±8.82ns -33.21%
deku_read_vec_perf 1398.2±5.90ns 949.8±5.92ns -32.07%
deku_write_bits 108.2±0.24ns 110.9±0.41ns +2.50%
deku_write_byte 66.7±0.34ns 66.4±0.26ns -0.45%
deku_write_enum 111.7±0.38ns 114.9±0.44ns +2.86%
deku_write_vec 5.2±0.01µs 5.2±0.01µs 0.00%
deku_write_vec_perf 5.3±0.01µs 5.2±0.01µs -1.89%

Gain another %5 increase in performance from just keeping
__deku_total_read and use that for the BitSlice reference instead of
assigning a new BitSlice every new offset
@github-actions
Copy link

Benchmark for 1b73cc4

Click to view benchmark
Test Base PR %
deku_read_bits 608.8±0.45ns 604.6±0.44ns -0.69%
deku_read_byte 17.7±0.13ns 9.7±0.16ns -45.20%
deku_read_enum 29.8±0.09ns 17.9±0.16ns -39.93%
deku_read_vec 1404.0±6.45ns 940.2±4.42ns -33.03%
deku_read_vec_perf 1403.3±8.17ns 909.6±4.37ns -35.18%
deku_write_bits 112.2±0.55ns 110.3±0.56ns -1.69%
deku_write_byte 68.5±6.98ns 64.7±0.16ns -5.55%
deku_write_enum 111.7±0.58ns 110.2±0.17ns -1.34%
deku_write_vec 5.2±0.01µs 5.1±0.01µs -1.92%
deku_write_vec_perf 5.3±0.02µs 5.1±0.01µs -3.77%

@v1gnesh
Copy link

v1gnesh commented Jun 15, 2023

Hey @wcampbell0x2a I have a dumb question -
Since in many cases, parsing is for static content (either sizes known fully at compile time or reasonable upper bounds may be known for dynamic content), would it help if memory for these were pre-allocated before getting into read?

@wcampbell0x2a
Copy link
Collaborator Author

Hey @wcampbell0x2a I have a dumb question - Since in many cases, parsing is for static content (either sizes known fully at compile time or reasonable upper bounds may be known for dynamic content), would it help if memory for these were pre-allocated before getting into read?

You can see some tests for allocation counts here: /~https://github.com/sharksforarms/deku/blob/master/tests/test_alloc.rs. Many of these allocations are quite dynamic based upon bit alignment when reading.

@wcampbell0x2a
Copy link
Collaborator Author

Benchmark for 1b73cc4

Click to view benchmark

This commit showed a %5 performance increase for deku deku_read_byte on my machine.

@sharksforarms
Copy link
Owner

Yeah not sure how much I trust the CI benchmarking, I'm not sure it's fully accurate. I'll take a look at this when I have more time, thanks for working on this!

@wcampbell0x2a
Copy link
Collaborator Author

This breaks the logging feature. I'll need to fix that.

@wcampbell0x2a
Copy link
Collaborator Author

wcampbell0x2a commented Jul 23, 2023

I also need to test the old + new with both having LTO enabled. The performance might be worse with this patch.

not true

@wcampbell0x2a wcampbell0x2a mentioned this pull request Aug 3, 2023
8 tasks
@wcampbell0x2a wcampbell0x2a deleted the improve-deku-performance branch August 12, 2023 03:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants