-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an unstable 'extra randomization' compile-time flag for hashers #65042
Comments
Isn't the correct way to deal with this to ... not use hashmaps? Like, FxHashMap is known to have an unstable iteration order, so we should either a) replace it with something that has a stable iteration order (maybe IndexMap, or I saw on Discord that this might have something to do with 32-bit vs. 64-bit systems hashing differently...). I think this feature as-is isn't tenable, it's just too problematic in practice. You'll probably never finish a compiler build if it does have high overhead, and I suspect it will. |
Instead of randomized iteration order you could start at a randomized offset and wrap around, that should be more cache-friendly than fully randomized memory accesses. |
Isn't the actual issue that Per-iteration randomization may lead to unexpected results such as The std::collections documentation is a bit vague on iteration order
This could be interpreted as iteration order being dependent on internal structure, and the structure of a shared reference is not expected to change. |
triage: leaving nominated for discussion, to see if there is any buy-in from any compiler team members. W.r.t. priority... I'll leave this unprioritized for now. (My instinct is that it is P-low, but if that's the case, that's ammunition for just closing it.) |
Some thoughts:
Then we can basically make a quest issue to resolve all the lint warnings. |
(It seems like a less intrusive and more reliable way of solving the problem.) |
During the meeting I said that a design meeting felt like overkill, but I'm mildly changing my mind here. =) It seems like it could be fruitful to put some thought into the best way to solve this category of bug in general. |
@Mark-Simulacrum The idea here is to detect incorrect dependencies on HashMap iteration order. Unless you're suggesting removing all HashMaps from the compiler, I'm not really sure what you mean,
@the8472: Yes, I believe that the issue would have been expoesd earlier if
@the8472: That's fine - the point of this is to catch issues, not to be actually be used by a normal |
Well, to be more concrete, we have ways of dealing with "iteration order is unstable". For example, Niko mentions that we could fairly easily lint against iterating over hashmaps, and provide a |
Previously, we were using an `FxHashMap` to collect module re-exports. However, re-exports end up getting serialized into crate metadata, which means that metadata generation was non-deterministic. This resulted in spurious error messages changes (e.g. PR rust-lang#64906) due to pretty-printing implicitly depending on the order of re-exports when computing the proper path to show to the user. See rust-lang#65042 for a long-term strategy to detect this kind of issue
…nkov Make re-export collection deterministic Fixes #65036 Previously, we were using an `FxHashMap` to collect module re-exports. However, re-exports end up getting serialized into crate metadata, which means that metadata generation was non-deterministic. This resulted in spurious error messages changes (e.g. PR #64906) due to pretty-printing implicitly depending on the order of re-exports when computing the proper path to show to the user. See #65042 for a long-term strategy to detect this kind of issue
It doesn't just appear to be the same, iteration order in The problem rather seems to be that iteration order differs between platforms? That could indeed cause trouble, but is no issue for determinism and also fine for reproducible builds (after all, compilation output will obviously differ between platforms). But indeed if the test suite expects things to be ordered the same way across all platforms, that would be one of the rare cases where such platform-dependent deterministic order could cause problems. |
triage: P-low; removing nomination label. I would want this task to wait until we have a broader discussion about determinism in the compiler (hopefully at a steering meeting).
|
Triage: not aware of any work done here. |
I just tried the following and I got a lot of errors and an ICE: diff --git a/compiler/rustc_data_structures/src/fx.rs b/compiler/rustc_data_structures/src/fx.rs
index 80e72250470..5661159814b 100644
--- a/compiler/rustc_data_structures/src/fx.rs
+++ b/compiler/rustc_data_structures/src/fx.rs
@@ -1,11 +1,12 @@
-use std::hash::BuildHasherDefault;
-
-pub use rustc_hash::{FxHashMap, FxHashSet, FxHasher};
+use std::collections::{HashMap, HashSet};
+use std::hash::RandomState;
pub type StdEntry<'a, K, V> = std::collections::hash_map::Entry<'a, K, V>;
-pub type FxIndexMap<K, V> = indexmap::IndexMap<K, V, BuildHasherDefault<FxHasher>>;
-pub type FxIndexSet<V> = indexmap::IndexSet<V, BuildHasherDefault<FxHasher>>;
+pub type FxHashMap<K, V> = HashMap<K, V>;
+pub type FxHashSet<K> = HashSet<K>;
+pub type FxIndexMap<K, V> = indexmap::IndexMap<K, V, RandomState>;
+pub type FxIndexSet<V> = indexmap::IndexSet<V, RandomState>;
pub type IndexEntry<'a, K, V> = indexmap::map::Entry<'a, K, V>;
pub type IndexOccupiedEntry<'a, K, V> = indexmap::map::OccupiedEntry<'a, K, V>;
diff --git a/compiler/rustc_data_structures/src/sharded.rs b/compiler/rustc_data_structures/src/sharded.rs
index 4b02b183460..c0f0cf36c13 100644
--- a/compiler/rustc_data_structures/src/sharded.rs
+++ b/compiler/rustc_data_structures/src/sharded.rs
@@ -1,4 +1,4 @@
-use crate::fx::{FxHashMap, FxHasher};
+use crate::fx::FxHashMap;
#[cfg(parallel_compiler)]
use crate::sync::{is_dyn_thread_safe, CacheAligned};
use crate::sync::{Lock, LockGuard, Mode};
@@ -224,7 +224,8 @@ pub fn contains_pointer_to<T: Hash + IntoPointer>(&self, value: &T) -> bool {
#[inline]
pub fn make_hash<K: Hash + ?Sized>(val: &K) -> u64 {
- let mut state = FxHasher::default();
+ // FIXME somehow randomize this
+ let mut state = rustc_hash::FxHasher::default();
val.hash(&mut state);
state.finish()
}
diff --git a/compiler/rustc_data_structures/src/unord.rs b/compiler/rustc_data_structures/src/unord.rs
index a99e2062039..ab16fcd4743 100644
--- a/compiler/rustc_data_structures/src/unord.rs
+++ b/compiler/rustc_data_structures/src/unord.rs
@@ -2,7 +2,6 @@
//! ordering. This is a useful property for deterministic computations, such
//! as required by the query system.
-use rustc_hash::{FxHashMap, FxHashSet};
use std::{
borrow::{Borrow, BorrowMut},
collections::hash_map::Entry,
@@ -13,6 +12,7 @@
use crate::{
fingerprint::Fingerprint,
+ fx::{FxHashMap, FxHashSet},
stable_hasher::{HashStable, StableCompare, StableHasher, ToStableHashKey},
};
diff --git a/compiler/rustc_parse/src/parser/mod.rs b/compiler/rustc_parse/src/parser/mod.rs
index dea2b9e6ca7..922517ab9c3 100644
--- a/compiler/rustc_parse/src/parser/mod.rs
+++ b/compiler/rustc_parse/src/parser/mod.rs
@@ -179,7 +179,7 @@ pub struct Parser<'a> {
// This type is used a lot, e.g. it's cloned when matching many declarative macro rules with nonterminals. Make sure
// it doesn't unintentionally get bigger.
#[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))]
-rustc_data_structures::static_assert_size!(Parser<'_>, 264);
+rustc_data_structures::static_assert_size!(Parser<'_>, 280);
/// Stores span information about a closure.
#[derive(Clone)]
diff --git a/compiler/rustc_pattern_analysis/src/usefulness.rs b/compiler/rustc_pattern_analysis/src/usefulness.rs
index a24da448766..d612b4c32a4 100644
--- a/compiler/rustc_pattern_analysis/src/usefulness.rs
+++ b/compiler/rustc_pattern_analysis/src/usefulness.rs
@@ -709,7 +709,7 @@
//! I (Nadrieril) prefer to put new tests in `ui/pattern/usefulness` unless there's a specific
//! reason not to, for example if they crucially depend on a particular feature like `or_patterns`.
-use rustc_hash::FxHashSet;
+use rustc_data_structures::fx::FxHashSet;
use rustc_index::bit_set::BitSet;
use smallvec::{smallvec, SmallVec};
use std::fmt; The errors
|
When working on #64906, I ran into a latent dependency on the iteration order of a
HashMap
. This resulted in the PR becoming unmergeable - since aFxHashMap
was being used, the iteration order ended up differing between platforms, which made it impossible to make the tests pass on all platforms.It would be nice to have a way of exposing these kinds of hidden ordering dependencies before they result in blocked PRs.
I propose the following:
randomize_iteration
tostd
andfxhash
. When enabled, this feature will cause the iteration order ofiter
anditer_mut
to be randomized per call.config.toml
key to enable both of these features.This will allow producing a special build of the compiler that will (hopefully) expose many hidden ordering dependencies. The overhead could be very high, but that shouldn't matter too much - the only purpose of this build would be to run the testsuite.
Once std-aware Cargo is fully implement, we could consider making this usable by user-written crates (e.g. allowing people to opt-in to recompiling std in this mode).
The text was updated successfully, but these errors were encountered: