Improve deku performance #345

wcampbell0x2a · 2023-06-15T00:14:09Z

Still want to do benchmarks and more testing, but it's done-ish

github-actions · 2023-06-15T00:20:39Z

Benchmark for `d1b38cf`

Click to view benchmark

Test	Base	PR	%
deku_read_bits	608.6±0.50ns	608.7±0.69ns	+0.02%
deku_read_byte	17.6±0.09ns	8.4±0.04ns	-52.27%
deku_read_enum	30.1±0.54ns	16.7±0.12ns	-44.52%
deku_read_vec	1402.0±11.22ns	934.5±8.60ns	-33.35%
deku_read_vec_perf	1404.7±10.10ns	947.9±5.13ns	-32.52%
deku_write_bits	108.0±0.20ns	111.4±0.61ns	+3.15%
deku_write_byte	66.7±0.25ns	66.7±0.25ns	0.00%
deku_write_enum	111.8±0.37ns	115.0±0.63ns	+2.86%
deku_write_vec	5.2±0.02µs	5.2±0.01µs	0.00%
deku_write_vec_perf	5.3±0.01µs	5.2±0.01µs	-1.89%

- Instead of always returning a new performance costly BitSlice, just return the amount of bits read and just increment the starting len into the BitSlice in DekuRead and DekuContainerRead functions.

github-actions · 2023-06-15T00:35:10Z

Benchmark for `a443c4b`

Click to view benchmark

Test	Base	PR	%
deku_read_bits	608.9±0.42ns	608.7±0.84ns	-0.03%
deku_read_byte	17.6±0.09ns	8.4±0.00ns	-52.27%
deku_read_enum	29.7±0.15ns	16.8±0.13ns	-43.43%
deku_read_vec	1399.1±4.71ns	934.4±8.82ns	-33.21%
deku_read_vec_perf	1398.2±5.90ns	949.8±5.92ns	-32.07%
deku_write_bits	108.2±0.24ns	110.9±0.41ns	+2.50%
deku_write_byte	66.7±0.34ns	66.4±0.26ns	-0.45%
deku_write_enum	111.7±0.38ns	114.9±0.44ns	+2.86%
deku_write_vec	5.2±0.01µs	5.2±0.01µs	0.00%
deku_write_vec_perf	5.3±0.01µs	5.2±0.01µs	-1.89%

Gain another %5 increase in performance from just keeping __deku_total_read and use that for the BitSlice reference instead of assigning a new BitSlice every new offset

github-actions · 2023-06-15T03:56:31Z

Benchmark for `1b73cc4`

Click to view benchmark

Test	Base	PR	%
deku_read_bits	608.8±0.45ns	604.6±0.44ns	-0.69%
deku_read_byte	17.7±0.13ns	9.7±0.16ns	-45.20%
deku_read_enum	29.8±0.09ns	17.9±0.16ns	-39.93%
deku_read_vec	1404.0±6.45ns	940.2±4.42ns	-33.03%
deku_read_vec_perf	1403.3±8.17ns	909.6±4.37ns	-35.18%
deku_write_bits	112.2±0.55ns	110.3±0.56ns	-1.69%
deku_write_byte	68.5±6.98ns	64.7±0.16ns	-5.55%
deku_write_enum	111.7±0.58ns	110.2±0.17ns	-1.34%
deku_write_vec	5.2±0.01µs	5.1±0.01µs	-1.92%
deku_write_vec_perf	5.3±0.02µs	5.1±0.01µs	-3.77%

v1gnesh · 2023-06-15T05:50:35Z

Hey @wcampbell0x2a I have a dumb question -
Since in many cases, parsing is for static content (either sizes known fully at compile time or reasonable upper bounds may be known for dynamic content), would it help if memory for these were pre-allocated before getting into read?

wcampbell0x2a · 2023-06-15T11:35:16Z

Hey @wcampbell0x2a I have a dumb question - Since in many cases, parsing is for static content (either sizes known fully at compile time or reasonable upper bounds may be known for dynamic content), would it help if memory for these were pre-allocated before getting into read?

You can see some tests for allocation counts here: /~https://github.com/sharksforarms/deku/blob/master/tests/test_alloc.rs. Many of these allocations are quite dynamic based upon bit alignment when reading.

wcampbell0x2a · 2023-06-15T11:37:55Z

Benchmark for 1b73cc4

Click to view benchmark

This commit showed a %5 performance increase for deku deku_read_byte on my machine.

sharksforarms · 2023-06-15T13:53:54Z

Yeah not sure how much I trust the CI benchmarking, I'm not sure it's fully accurate. I'll take a look at this when I have more time, thanks for working on this!

wcampbell0x2a · 2023-06-20T23:04:23Z

This breaks the logging feature. I'll need to fix that.

wcampbell0x2a · 2023-07-23T17:53:34Z

~~I also need to test the old + new with both having LTO enabled. The performance might be worse with this patch.~~

not true

Reduce code for u8 byte read

47b2f7e

Refactor DekuRead to return bits read

bdd52bd

- Instead of always returning a new performance costly BitSlice, just return the amount of bits read and just increment the starting len into the BitSlice in DekuRead and DekuContainerRead functions.

wcampbell0x2a force-pushed the improve-deku-performance branch from f3bc876 to bdd52bd Compare June 15, 2023 00:28

Further improve DekuRead performance

32dc0a0

Gain another %5 increase in performance from just keeping __deku_total_read and use that for the BitSlice reference instead of assigning a new BitSlice every new offset

wcampbell0x2a mentioned this pull request Aug 3, 2023

Impl from_reader #352

Merged

8 tasks

wcampbell0x2a closed this Aug 12, 2023

wcampbell0x2a deleted the improve-deku-performance branch August 12, 2023 03:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve deku performance #345

Improve deku performance #345

wcampbell0x2a commented Jun 15, 2023

github-actions bot commented Jun 15, 2023

github-actions bot commented Jun 15, 2023

github-actions bot commented Jun 15, 2023

v1gnesh commented Jun 15, 2023

wcampbell0x2a commented Jun 15, 2023

wcampbell0x2a commented Jun 15, 2023

Benchmark for 1b73cc4

sharksforarms commented Jun 15, 2023

wcampbell0x2a commented Jun 20, 2023

wcampbell0x2a commented Jul 23, 2023 •

edited

Loading

Improve deku performance #345

Improve deku performance #345

Conversation

wcampbell0x2a commented Jun 15, 2023

github-actions bot commented Jun 15, 2023

Benchmark for d1b38cf

github-actions bot commented Jun 15, 2023

Benchmark for a443c4b

github-actions bot commented Jun 15, 2023

Benchmark for 1b73cc4

v1gnesh commented Jun 15, 2023

wcampbell0x2a commented Jun 15, 2023

wcampbell0x2a commented Jun 15, 2023

Benchmark for 1b73cc4

sharksforarms commented Jun 15, 2023

wcampbell0x2a commented Jun 20, 2023

wcampbell0x2a commented Jul 23, 2023 • edited Loading

Benchmark for `d1b38cf`

Benchmark for `a443c4b`

Benchmark for `1b73cc4`

wcampbell0x2a commented Jul 23, 2023 •

edited

Loading