Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved hybrid rle decoding performance ~-40% #203

Merged
merged 1 commit into from
Nov 29, 2022

Conversation

ritchie46
Copy link
Collaborator

Due to the law of small numbers most values are 1 and therefore extending with a std::iter::repeat(value).take(1) is a large overhead. This adds an extra State branch for the single value case.

This reduces reading a single column with uuid strings by ca 40% on my real world dataset.

I also have removed a redundant Error from the types. There are more of those redundant errors that I want to follow up in a later PR.

@@ -25,6 +25,9 @@ enum State<'a> {
None,
Bitpacked(bitpacked::Decoder<'a, u32>),
Rle(std::iter::Take<std::iter::Repeat<u32>>),
// Add a special branch for a single value to
Copy link
Collaborator Author

@ritchie46 ritchie46 Nov 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new Single state that represent a single value. This branch is hot, hot, hot!

@codecov-commenter
Copy link

Codecov Report

Base: 85.75% // Head: 85.68% // Decreases project coverage by -0.06% ⚠️

Coverage data is based on head (b431d44) compared to base (0301708).
Patch coverage: 87.50% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #203      +/-   ##
==========================================
- Coverage   85.75%   85.68%   -0.07%     
==========================================
  Files          84       84              
  Lines        8233     8256      +23     
==========================================
+ Hits         7060     7074      +14     
- Misses       1173     1182       +9     
Impacted Files Coverage Δ
src/encoding/hybrid_rle/mod.rs 96.57% <50.00%> (-1.11%) ⬇️
src/encoding/delta_bitpacked/decoder.rs 96.05% <100.00%> (ø)
src/encoding/hybrid_rle/decoder.rs 94.73% <100.00%> (+0.98%) ⬆️
src/encoding/uleb128.rs 98.70% <100.00%> (ø)
src/encoding/zigzag_leb128.rs 100.00% <100.00%> (ø)
src/read/compression.rs 92.40% <0.00%> (-3.57%) ⬇️
src/read/page/reader.rs 85.71% <0.00%> (-1.86%) ⬇️
src/error.rs 20.51% <0.00%> (-0.54%) ⬇️
src/compression.rs 92.08% <0.00%> (-0.36%) ⬇️
src/write/indexes/serialize.rs 85.00% <0.00%> (-0.25%) ⬇️
... and 4 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jorgecarleitao
Copy link
Owner

This is an amazing idea!

@jorgecarleitao jorgecarleitao changed the title improve hybrid rle decoding performance ~-40% Improved hybrid rle decoding performance ~-40% Nov 29, 2022
@jorgecarleitao jorgecarleitao merged commit 92c0af6 into jorgecarleitao:main Nov 29, 2022
@ritchie46 ritchie46 deleted the small_numbers branch November 29, 2022 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants