Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a fast HashMap / hash function #631

Closed
steveklabnik opened this issue Jan 21, 2015 · 17 comments
Closed

Implement a fast HashMap / hash function #631

steveklabnik opened this issue Jan 21, 2015 · 17 comments
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.

Comments

@steveklabnik
Copy link
Member

Issue by brson
Saturday Jan 25, 2014 at 00:54 GMT

For earlier discussion, see rust-lang/rust#11783

This issue was labelled with: A-an-interesting-project, A-libs, I-compiletime, I-slow, I-wishlist in the Rust repository


Our current HashMap uses cryptographically secure hashing but it is slow. Sometimes this is a big problem. Implement another hash map that is very fast.

@ilyaraz
Copy link

ilyaraz commented Aug 23, 2015

The current version of HashMap is still around 4 times slower than C++ unordered map on int keys. It is quite unfortunate.

@divyekapoor
Copy link

Hash maps with integer keys are quite common and they usually operate on trusted data. A DoS resistant hash map should be the exception, not the rule (fast hashmaps should be the default). Is there any intent to change this behavior?

@Luthaf
Copy link

Luthaf commented Mar 21, 2016

Is that still true using BTreeMap and the FNV hasher? (see the first code example). I did not do the checks by myself, but I believe the situation has improved since the creation of this issue.

@steveklabnik
Copy link
Member Author

Yes, @Luthaf is correct.

A DoS resistant hash map should be the exception, not the rule (fast hashmaps should be the default).

This is not the current position of the libs team, in my understanding, but regardless, we cannot change this now; the ship has sailed. /cc @rust-lang/libs

@ticki
Copy link
Contributor

ticki commented Mar 22, 2016

but regardless, we cannot change this now; the ship has sailed.

It actually can. Changing it would be a non-breaking change.

@seanmonstar
Copy link
Contributor

@ticki

It actually can. Changing it would be a non-breaking change.

It'd be worse: it would change the behavior and its guarantee without a user knowing. Suddenly anyone depending on the strength of HashMap is susceptible to DOS attacks.

@ticki
Copy link
Contributor

ticki commented Mar 22, 2016

It'd be worse: it would change the behavior and its guarantee without a user knowing. Suddenly anyone depending on the strength of HashMap is susceptible to DOS attacks.

The HashMap guarantee no such thing.

@ilyaraz
Copy link

ilyaraz commented Mar 22, 2016

In 99.9% of the use cases hash tables prone to DOS attacks are perfectly adequate. In the remaining 0.1% percent, a person in charge of the corresponding part of code should be expert enough to be able to choose a right implementation of a hash table, or implement it themselves.

That's why the default option that uses collision-resistant hash functions is a bit stupid.

@apasel422
Copy link
Contributor

In 99.9% of the use cases hash tables prone to DOS attacks are perfectly adequate. In the remaining 0.1% percent, a person in charge of the corresponding part of code should be expert enough to be able to choose a right implementation of a hash table, or implement it themselves.

That's why the default option that uses collision-resistant hash functions is a bit stupid.

The same argument could be made in favor of having HashMap be secure by default. If someone really needs to squeeze some more performance out of a map, after profiling to determine that it is indeed a bottleneck, they can opt for a map that is prone to DOS.

@ticki
Copy link
Contributor

ticki commented Mar 22, 2016

The same argument could be made in favor of having HashMap be secure by default. If someone really needs to squeeze some more performance out of a map, after profiling to determine that it is indeed a bottleneck, they can opt for a map that is prone to DOS.

But the point is that the majority of cases DOS is not a problem.

@apasel422
Copy link
Contributor

But the point is that the majority of cases DOS is not a problem.

That may be true, but is map performance a problem in the majority of cases?

@ticki
Copy link
Contributor

ticki commented Mar 22, 2016

That may be true, but is map performance a problem in the majority of cases?

A lot actually. Let me give you an example:

Recently, I did a changed our ZFS implementation of Redox from using SipHasher to DJB2, this gave a massive speed-up (x2-4!)

@apasel422
Copy link
Contributor

A lot actually. Let me give you an example:

Recently, I did a changed our ZFS implementation of Redox from using SipHasher to DJB2, this gave a massive speed-up (x2-4!)

Right. You profiled your application and determined that you could gain increased performance by switching. There are certainly cases in which DOS is not a concern, but it should be a conscious choice to trade security for speed.

@ilyaraz
Copy link

ilyaraz commented Mar 22, 2016

That may be true, but is map performance a problem in the majority of cases?

If it's not, one should use binary search trees or lists or whatnot. Every data structure has certain benefits and drawbacks, and hash tables have a benefit of being extremely fast. One should not sacrifice it for 'security'.

@divyekapoor
Copy link

To trust or not to trust, that is the question. Hash tables hit
pathological behavior only under crafted inputs (which means, in essence,
most inputs are perfectly fine). Choosing a "secure" hash table trades off
the default case for the esoteric. Imagine, if the binary is "unzip", we're
trading off the common case performance for resilience in the uncommon
case. The costs add up in large applications.

Divye
On Tue, Mar 22, 2016 at 12:40 PM Andrew Paseltiner notifications@github.com
wrote:

A lot actually. Let me give you an example:

Recently, I did a changed our ZFS implementation of Redox from using
SipHasher to DJB2, this gave a massive speed-up (x2-4!)

Right. You profiled your application and determined that you could gain
increased performance by switching. There are certainly cases in which DOS
is not a concern, but it should be a conscious choice to trade security for
speed.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#631 (comment)

@apasel422
Copy link
Contributor

If it's not, one should use binary search trees or lists or whatnot. Every data structure has certain benefits and drawbacks, and hash tables has a benefit of being extremely fast. One should not sacrifice it for 'security'.

It doesn't matter if you pick an asymptotically slower data structure like a linked list over a hash map if it has no effect on your application's overall performance. If a hash map with DOS-resistant hashing is the application's bottleneck, and you determine that the tradeoff in security is acceptable, you can switch to a faster hash function. I'm not arguing against the existence of a non-DOS-resistant hash function, just that the default behavior is a default and can be changed. There are tradeoffs in both cases, whether it's secure-by-default or fast-by-default.

The costs add up in large applications.

I agree, and in this scenario the application can opt to use something non-default.

In any case, I agree with @steveklabnik that it can't be changed now. Despite it possibly not being a breaking change at the API level, I think the docs are pretty clear:

The hashes are all keyed by the thread-local random number generator on creation by default. This means that the ordering of the keys is randomized, but makes the tables more resistant to denial-of-service attacks (Hash DoS). This behavior can be overridden with one of the constructors.

One can get the improved performance by importing the type as

type HashMap<K, V> = std::collections::HashMap<K, V, FnvHasher>;

@Centril Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Feb 23, 2018
@Centril
Copy link
Contributor

Centril commented Oct 7, 2018

I think this is not actionable and there has been no movement during the last 2 years so I'm closing this.

@Centril Centril closed this as completed Oct 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests

9 participants