Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Use ahash by default #1148

Merged
merged 1 commit into from
Jul 8, 2022
Merged

Conversation

ritchie46
Copy link
Collaborator

ahash is a dos resistant hashing algorithm that is much faster than std's default. Given that there are users with millions of columns, not paying unneeded cryptographic hashing costs seems worth it to me.

Aside from the crate being pulled in, this should not have much effect on build times as the hasmaps/sets get compiled on instantiation.

@ritchie46 ritchie46 force-pushed the default_ahash branch 2 times, most recently from 934259b to 4b4927b Compare July 8, 2022 13:03
@codecov
Copy link

codecov bot commented Jul 8, 2022

Codecov Report

Merging #1148 (211e62a) into main (e4947cc) will decrease coverage by 0.00%.
The diff coverage is 90.47%.

@@            Coverage Diff             @@
##             main    #1148      +/-   ##
==========================================
- Coverage   83.62%   83.62%   -0.01%     
==========================================
  Files         366      366              
  Lines       35906    35911       +5     
==========================================
+ Hits        30028    30031       +3     
- Misses       5878     5880       +2     
Impacted Files Coverage Δ
src/array/union/mod.rs 77.41% <ø> (ø)
src/io/avro/mod.rs 86.66% <ø> (ø)
src/io/ipc/read/reader.rs 96.65% <0.00%> (+0.66%) ⬆️
src/lib.rs 100.00% <ø> (ø)
src/compute/like.rs 41.29% <50.00%> (ø)
src/io/ipc/read/common.rs 94.92% <90.00%> (-0.30%) ⬇️
src/compute/merge_sort/mod.rs 95.60% <100.00%> (ø)
src/compute/regex_match.rs 82.35% <100.00%> (ø)
src/io/avro/read/header.rs 100.00% <100.00%> (ø)
src/io/avro/read/util.rs 96.29% <100.00%> (ø)
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1f2116e...211e62a. Read the comment docs.

@ritchie46 ritchie46 changed the title Proposal: default to ahash [Proposal] default to ahash Jul 8, 2022
@ritchie46 ritchie46 added the enhancement An improvement to an existing feature label Jul 8, 2022
@jorgecarleitao
Copy link
Owner

AFAIK std is already using ahash, so there isn't much to gain performance-wise. I think the benefit of ahash is a public API related to entry?

@ritchie46
Copy link
Collaborator Author

ritchie46 commented Jul 8, 2022

AFAIK std is already using ahash, so there isn't much to gain performance-wise. I think the benefit of ahash is a public API related to entry?

The default hasher is SipHash 1-3. See std. I know that hashbrown is using ahash, but not std.

Quoting:
"The default hashing algorithm is currently SipHash 1-3, though this is subject to change at any point in the future".

Here are some benchmarks of siphash vs ahash, showing that it's ~10x faster across all dtypes. /~https://github.com/tkaitchuck/aHash/blob/master/FAQ.md

@ritchie46
Copy link
Collaborator Author

Rebased. 👍

@jorgecarleitao jorgecarleitao changed the title [Proposal] default to ahash Use ahash by default Jul 8, 2022
@jorgecarleitao jorgecarleitao merged commit 48dd4ef into jorgecarleitao:main Jul 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants