-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default hashmap implementation can cause processes to block #32953
Comments
Should the crates use a non-default hasher or should the default implementation not block when the random pool isn't initialized? |
/cc @BurntSushi |
cc @rust-lang/libs |
Interesting! One possible thing we could do is make the default hashing algorithm configurable, but that also seems pretty unfortunate in terms of how it may be implemented. Another possibility would be to build on a strategy like #31356 except also provide a function to manually seed a thread. That way in cases like this a thread could be manually seeded and all future hash maps on that thread would use the same seed. Exposing seeding though would be exposing the internals of using siphash, so that may not be that great either... |
So I believe the the reason that Rust uses SipHash is to avoid the denial of service attack against many hash table implementations since SipHash was designed to be resistant against it given a good seed. However that's really for unbounded data. My understanding is that if the data set is bounded a "weaker" hasher is not dangerous to a DoS. Obviously a second opinion is very welcome here. In that case it would then make sense to allow different crates to chose their hashing algorithm if they know their data set is bounded. Obviously that adds some complexity by exposing options that people could get wrong in a bad way but the default should be SipHash, not sure if that's enough to be safe. |
like @derekstraka points out, getrandom can tell you when it would block, so libstd could detect that and fall back to a different seed but it will inevitably be a weaker random key for the hash function then. |
Right. That would have to get exposed via the |
What should we do in the mean time? Does it make sense to start moving away from the default sip hasher in crates like regex? (i.e., preferring either fnv or BTreeMap?) |
@cardoe the relevant code is actually in this source tree, see libstd/sys/unix/rand.rs |
So, I would suggest migrating to something like FNV for crates that really don't need the DoS resiliency that SipHash provides (such as the The biggest issue is how to communicate when you should use one hasher versus another inside of a crate. On the one hand, it is great that HashMap is proactive and providing a hasher that protects against hash-flooding denial of service attacks by default, but it is unfortunate that, by default, all users of HashMap are paying a performance penalty when many (most?) of them do not need that protection. |
So I'll say that I checked systemd which is obviously used at early boot and their hash implementation uses SipHash and they also use |
@Cardo you left out what they do if the getrandom call errors, that's the solution part 😉 systemd's code is here So this was news to me, that on a system where getrandom may block since there is not yet enough "entropy", a read to urandom will still succeed and not block. I don't think we can produce a better fallback / weak randomness than simply using the same methodology. (Lwn on getrandom / urandom) I think that HashMap::new() using a randomly keyed hash function, HashMaps being ubiquitous, and Rust wanting to be as portable as possible, all this supports randomly keyed hash functions to be a best-effort approach. We can't require all crates that use HashMaps to change to use FNV. Besides, more uses of fnv is just a net reduction in denial of service protection. (We wouldn't do it in a best-effort approach if there was an explicit API named “create a HashMap with a strongly random hash function”. Our API is called |
Right my point is they seed their hasher in a best effort case by getting the best possible data they can get in a non-blocking fashion by using a fallback. I didn't expand on how they did it. I don't think @jcreekmore advocated changing all crates to FNV. He's advocating giving crates the option to change to FNV if they know their dataset is bounded and therefore they know they aren't vulnerable and can use a faster hasher function since FNV is faster than SipHash even in the random data is available immediately case. Maybe leave |
Using fnv with HashMap is a stable feature since Rust 1.7 (link) |
Yeah, I feel like I could be convinced to go tweak this to get past the bad behavior for now, but I feel like "don't use the default |
Oh wow, thanks for the tip @cardoe, and the code @bluss. Sounds like we should just implement that strategy in the standard library for now? I agree that we shouldn't stop recommending @jcreekmore it's definitely true that some crates don't need this DoS protection, but one of the arguments for doing it by default is that a surprising number of crates might. For example you may write a crate that's not considering consuming user input, but them it may be pulled into someone else's application where they do so. @BurntSushi I agree for now if you want to immediately work around this I'd just recommend using the FNV hashing for hash maps (or even btrees). Hopefully we can solve this in the standard library, however, so it's only a temporary solution. |
If we attempt a read with getrandom() on Linux the syscall can block before the random pool is initialized unless the GRND_NONBLOCK flag is passed. This flag causes getrandom() to instead return EAGAIN while the pool is uninitialized. To avoid downstream users of crate or std functionality that have no ability to avoid this blocking behavior this change causes Rust to read bytes from /dev/urandom while getrandom() would block and once getrandom() is available to use that. Fixes rust-lang#32953. Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
rand: don't block before random pool is initialized If we attempt a read with getrandom() on Linux the syscall can block before the random pool is initialized unless the GRND_NONBLOCK flag is passed. This flag causes getrandom() to instead return EAGAIN while the pool is uninitialized. To avoid downstream users of crate or std functionality that have no ability to avoid this blocking behavior this change causes Rust to read bytes from /dev/urandom while getrandom() would block and once getrandom() is available to use that. Fixes #32953. Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
When the /dev/urandom pool has not yet been initialized, the use of the default HashMap can cause a process to block. The behavior is problematic when a program is started by an initscript and is not intended to run in the background. The process will block until the urandom pool is initialized which on some embedded systems may be several minutes.
A concrete example of the issue in docopt/docopt.rs#178, but the issue can also occur in other places like regex.
The text was updated successfully, but these errors were encountered: