-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Robin hood hashmap #12081
Robin hood hashmap #12081
Conversation
use num; | ||
use option::{None, Option, Some}; | ||
use rand::Rng; | ||
use option::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're trying to avoid glob imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I import so, so many things from iter that it makes more sense to glob.
All other globs have been removed.
I'm not even sure how much controversy there is at this point, just thought I'd throw out my 2 cents ;) |
mod table { | ||
use std::clone::Clone; | ||
use std::cmp::Eq; | ||
use std::hash::{Hash, Hasher, sip}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the (legitimately) unused import, rather than the outer one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dear god that's embarassing.
Thanks, though..
Re assertions: #12616 gives a precedent for writing // FIXME #12049
if cfg!(test) { assert!(...) } |
Added the "debug asserts" referencing #12049. |
@cgaebel it's probably a staging thing. You might need a |
Naw huon got it. It was just me not looking at the line numbers and being confused. This should be good to merge. |
/me quietly sits on his hands |
BTW, are the numbers in the PR description still current with the various changes that have been made? |
No, but should be same ballpark. Nothing major that should affect performance has changed. |
On a recent flight, I finally found the time to read the PR in I will look over my notes later today and add them into the PR. I Regarding the mutable iterator in specific -- I think you could make |
(So I investigated my "safety" ideas a bit more. I think it can be done, but it's a non-trivial patch, and it's unclear if the resulting code is clearer. It would be somewhat nicer if we could parameterize over mutability, though using traits it might be possible to write the relevant code generically. Regardless, as the patch'd be non-trivial, it might affect perf, and it may not be more readable really, so I think it ought to be considered as a follow up.) |
//! # Example | ||
//! | ||
//! ```rust | ||
//! use collections::HashMap; | ||
//! use std::hashmap::HashMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a merge ended up with the old version here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Good catch!
The unsafety is unfortunate but we can't ignore this perf improvement. Let's land this but please do keep trying to abstract out the unsafe code here. @cgaebel Thanks. |
The public interface is safe. What pieces of unsafety should I focus on? The "double-take bug"? |
Feel free to comment on the PR whenever you force push, sadly none of us get notifications! |
Oops. /pushed |
Note: I have an in-progress branch addressing the double take bug. I'll push it somewhere public. |
Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
Force pushed. |
Partially addresses #11783. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: ``` comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(Option)+sizeof(K)+sizeof(V))/0.75 + 48ish bytes Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 112ish bytes Timing of building librustc: compile_and_link: x86_64-unknown-linux-gnu/stage0/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc time: 0.457 s parsing time: 0.028 s gated feature checking time: 0.000 s crate injection time: 0.108 s configuration 1 time: 1.049 s expansion time: 0.219 s configuration 2 time: 0.222 s maybe building test harness time: 0.223 s prelude injection time: 0.268 s assinging node ids and indexing ast time: 0.075 s external crate/lib resolution time: 0.026 s language item collection time: 1.016 s resolution time: 0.038 s lifetime resolution time: 0.000 s looking for entry point time: 0.030 s looking for macro registrar time: 0.061 s freevar finding time: 0.138 s region resolution time: 0.110 s type collecting time: 0.072 s variance inference time: 0.126 s coherence checking time: 9.110 s type checking time: 0.186 s const marking time: 0.049 s const checking time: 0.418 s privacy checking time: 0.057 s effect checking time: 0.033 s loop checking time: 1.293 s compute moves time: 0.182 s match checking time: 0.242 s liveness checking time: 0.866 s borrow checking time: 0.150 s kind checking time: 0.013 s reachability checking time: 0.175 s death checking time: 0.461 s lint checking time: 13.112 s translation time: 4.352 s llvm function passes time: 96.702 s llvm module passes time: 50.574 s codegen passes time: 154.611 s LLVM passes time: 2.821 s running linker time: 15.750 s linking compile_and_link: x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc time: 0.422 s parsing time: 0.031 s gated feature checking time: 0.000 s crate injection time: 0.126 s configuration 1 time: 1.014 s expansion time: 0.251 s configuration 2 time: 0.249 s maybe building test harness time: 0.273 s prelude injection time: 0.279 s assinging node ids and indexing ast time: 0.076 s external crate/lib resolution time: 0.033 s language item collection time: 1.028 s resolution time: 0.036 s lifetime resolution time: 0.000 s looking for entry point time: 0.029 s looking for macro registrar time: 0.063 s freevar finding time: 0.133 s region resolution time: 0.111 s type collecting time: 0.077 s variance inference time: 0.565 s coherence checking time: 8.953 s type checking time: 0.176 s const marking time: 0.050 s const checking time: 0.401 s privacy checking time: 0.063 s effect checking time: 0.032 s loop checking time: 1.291 s compute moves time: 0.172 s match checking time: 0.249 s liveness checking time: 0.831 s borrow checking time: 0.121 s kind checking time: 0.013 s reachability checking time: 0.179 s death checking time: 0.503 s lint checking time: 14.385 s translation time: 4.495 s llvm function passes time: 92.234 s llvm module passes time: 51.172 s codegen passes time: 150.809 s LLVM passes time: 2.542 s running linker time: 15.109 s linking ``` BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance. Please review this very closely. This is my first major contribution to Rust. Sorry for the ugly diff!
BTW, this reduced rustc's memory use (for building |
NICE! Btw is that a CPU usage regression, or just noise? |
noise. I was probably building another copy of rustc at the time, or On Mon, Mar 17, 2014 at 11:03 AM, Clark Gaebel notifications@github.comwrote:
|
fix: incorrect suggestions generated by `manual_retain` lint fixes rust-lang#10393, fixes rust-lang#11457, fixes rust-lang#12081 rust-lang#10393: In the current implementation of `manual_retain`, if the argument to the closure is matched using tuple, they are all treated as the result of a call to `map.into_iter().filter(<f>)`. However, such tuple pattern matching can also occur in many different containers that stores tuples internally. The correct approach is to apply different lint policies depending on whether the receiver of `into_iter` is a map or not. rust-lang#11457 and rust-lang#12081: In the current implementation of `manual_retain`, if the argument to the closure is `Binding`, the closure will be used directly in the `retain` method, which will result in incorrect suggestion because the first argument to the `retain` closure may be of a different type. In addition, if the argument to the closure is `Ref + Binding`, the lint will simply remove the `Ref` part and use the `Binding` part as the argument to the new closure, which will lead to bad suggestion for the same reason. The correct approach is to detect each of these cases and apply lint suggestions conservatively. changelog: [`manual_retain`] refactor and add check for various patterns
Partially addresses #11783.
Previously, rust's hashtable was totally unoptimized. It used an Option
per key-value pair, and used very naive open allocation.
The old hashtable had very high variance in lookup time. For an example,
see the 'find_nonexisting' benchmark below. This is fixed by keys in
'lucky' spots with a low probe sequence length getting their good spots
stolen by keys with long probe sequence lengths. This reduces hashtable
probe length variance, while maintaining the same mean.
Also, other optimization liberties were taken. Everything is as cache
aware as possible, and this hashtable should perform extremely well for
both large and small keys and values.
Benchmarks:
BUT accesses are much more cache friendly. In fact, if the probe
sequence length is below 8, only two cache lines worth of hashes will be
pulled into cache. This is unlike the old version which would have to
stride over the stoerd keys and values, and would be more cache
unfriendly the bigger the stored values got.
And did you notice the higher load factor? We can now reasonably get a
load factor of 0.9 with very good performance.
Please review this very closely. This is my first major contribution to Rust. Sorry for the ugly diff!