Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differing impls for bits and bytes #44

Closed
sharksforarms opened this issue Jun 4, 2020 · 3 comments
Closed

Differing impls for bits and bytes #44

sharksforarms opened this issue Jun 4, 2020 · 3 comments
Labels
optimization Optimization

Comments

@sharksforarms
Copy link
Owner

If bytes attribute is used and the index is on a byte boundary, it may be quicker to read from a slice of &[u8] instead of reading 8*n bits.

I'd like for more benchmarks to be written before so this optimization can be measured

One option could be to feature flag the bits/bytes attributes

@sharksforarms sharksforarms changed the title Optimization: Differing impls for bits and bytes Differing impls for bits and bytes Jun 4, 2020
@sharksforarms sharksforarms added the optimization Optimization label Jun 4, 2020
@constfold
Copy link
Contributor

Since #61 has been merged, this could be done by:

  1. Add a new type ByteSize
  2. Add specialization like impl DekuRead<ByteSize> for xxx
  3. bytes uses type ByteSize, bits uses type BitSize, then compiler will decide which impl should be called.

@sharksforarms
Copy link
Owner Author

Great idea, I'd like to see a better suite of benchmarks before this is tackled so we can measure the difference

#25

wcampbell0x2a added a commit to wcampbell0x2a/deku that referenced this issue Sep 28, 2022
Remove Size in favor of a BitSize and ByteSize. This allows the
Deku{Read/Write}(Endian, BitSize) and Deku{Read/Write}(Endian, ByteSize)
impls to be created and allow the ByteSize impls to be faster. Most of
the assumption I made (and the perf that is gained) is from the removal
of this branch for bytes:

let value = if pad == 0
                      && bit_slice.len() == max_type_bits
                      && bit_slice.as_raw_slice().len() * 8 == max_type_bits
                  {

See sharksforarms#44

I Added some benchmarks, in order to get a good idea of what perf this
allows. The benchmarks look pretty good, but I'm on my old thinkpad.
These results are vs the benchmarks running from master.

deku_read_vec           time:   [744.39 ns 751.13 ns 759.58 ns]
                        change: [-23.681% -21.358% -19.142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

 See sharksforarms#25
wcampbell0x2a added a commit to wcampbell0x2a/deku that referenced this issue Sep 28, 2022
Remove Size in favor of a BitSize and ByteSize. This allows the
Deku{Read/Write}(Endian, BitSize) and Deku{Read/Write}(Endian, ByteSize)
impls to be created and allow the ByteSize impls to be faster. Most of
the assumption I made (and the perf that is gained) is from the removal
of this branch for bytes:

let value = if pad == 0
                      && bit_slice.len() == max_type_bits
                      && bit_slice.as_raw_slice().len() * 8 == max_type_bits
                  {

See sharksforarms#44

I Added some benchmarks, in order to get a good idea of what perf this
allows. The benchmarks look pretty good, but I'm on my old thinkpad.
These results are vs the benchmarks running from master.

deku_read_vec           time:   [744.39 ns 751.13 ns 759.58 ns]
                        change: [-23.681% -21.358% -19.142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

 See sharksforarms#25
wcampbell0x2a added a commit to wcampbell0x2a/deku that referenced this issue Sep 28, 2022
Remove Size in favor of a BitSize and ByteSize. This allows the
Deku{Read/Write}(Endian, BitSize) and Deku{Read/Write}(Endian, ByteSize)
impls to be created and allow the ByteSize impls to be faster. Most of
the assumption I made (and the perf that is gained) is from the removal
of this branch for bytes:

let value = if pad == 0
                      && bit_slice.len() == max_type_bits
                      && bit_slice.as_raw_slice().len() * 8 == max_type_bits
                  {

See sharksforarms#44

I Added some benchmarks, in order to get a good idea of what perf this
allows. The benchmarks look pretty good, but I'm on my old thinkpad.
These results are vs the benchmarks running from master.

deku_read_vec           time:   [744.39 ns 751.13 ns 759.58 ns]
                        change: [-23.681% -21.358% -19.142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

 See sharksforarms#25
wcampbell0x2a added a commit to wcampbell0x2a/deku that referenced this issue Sep 28, 2022
Remove Size in favor of a BitSize and ByteSize. This allows the
Deku{Read/Write}(Endian, BitSize) and Deku{Read/Write}(Endian, ByteSize)
impls to be created and allow the ByteSize impls to be faster. Most of
the assumption I made (and the perf that is gained) is from the removal
of this branch for bytes:

let value = if pad == 0
                      && bit_slice.len() == max_type_bits
                      && bit_slice.as_raw_slice().len() * 8 == max_type_bits
                  {

See sharksforarms#44

I Added some benchmarks, in order to get a good idea of what perf this
allows. The benchmarks look pretty good, but I'm on my old thinkpad.
These results are vs the benchmarks running from master.

deku_read_vec           time:   [744.39 ns 751.13 ns 759.58 ns]
                        change: [-23.681% -21.358% -19.142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

 See sharksforarms#25
sharksforarms pushed a commit to wcampbell0x2a/deku that referenced this issue Oct 6, 2022
Remove Size in favor of a BitSize and ByteSize. This allows the
Deku{Read/Write}(Endian, BitSize) and Deku{Read/Write}(Endian, ByteSize)
impls to be created and allow the ByteSize impls to be faster. Most of
the assumption I made (and the perf that is gained) is from the removal
of this branch for bytes:

let value = if pad == 0
                      && bit_slice.len() == max_type_bits
                      && bit_slice.as_raw_slice().len() * 8 == max_type_bits
                  {

See sharksforarms#44

I Added some benchmarks, in order to get a good idea of what perf this
allows. The benchmarks look pretty good, but I'm on my old thinkpad.
These results are vs the benchmarks running from master.

deku_read_vec           time:   [744.39 ns 751.13 ns 759.58 ns]
                        change: [-23.681% -21.358% -19.142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

 See sharksforarms#25
sharksforarms pushed a commit that referenced this issue Oct 6, 2022
Remove Size in favor of a BitSize and ByteSize. This allows the
Deku{Read/Write}(Endian, BitSize) and Deku{Read/Write}(Endian, ByteSize)
impls to be created and allow the ByteSize impls to be faster. Most of
the assumption I made (and the perf that is gained) is from the removal
of this branch for bytes:

let value = if pad == 0
                      && bit_slice.len() == max_type_bits
                      && bit_slice.as_raw_slice().len() * 8 == max_type_bits
                  {

See #44

I Added some benchmarks, in order to get a good idea of what perf this
allows. The benchmarks look pretty good, but I'm on my old thinkpad.
These results are vs the benchmarks running from master.

deku_read_vec           time:   [744.39 ns 751.13 ns 759.58 ns]
                        change: [-23.681% -21.358% -19.142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

 See #25
@wcampbell0x2a wcampbell0x2a mentioned this issue Aug 3, 2023
8 tasks
@wcampbell0x2a
Copy link
Collaborator

Closing, the reader does this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Optimization
Projects
None yet
Development

No branches or pull requests

3 participants