-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BTree: remove Ord bound from new #88040
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No elements exist so there are nothing to compare anyway
I don't think "future proof" would be a blocker here
I couldn't find out any reason not to do it
This is not as clear cut as the PR description implies.
I am hesitant about this because this API change makes it impossible to selectively tackle some of the code bloat caused by monomorphizations inside the internals of these collections. According to https://www.stroustrup.com/SCARY.pdf such optimizations deliver 1.2x to 2.1x faster performance and 1x to 25x smaller code size in real applications.
I pasted below a sketch of the kind of restructure that this change hinders.
Clearly we still have bounds available via the insert
and contains
methods etc, so a comparator could be passed down from there as a function argument to every places that needs it, but the extra argument passing could negate a chunk of the performance benefit relative to stashing a comparator at suitable places inside of the data structure up front.
@rust-lang/libs-api @rust-lang/libs
use self::private::{Element, TypeErasedSet};
use std::cmp::Ordering;
use std::marker::PhantomData;
use std::mem;
pub struct Set<T> {
imp: TypeErasedSet,
marker: PhantomData<T>,
}
impl<T> Set<T> {
pub fn new() -> Self
where
T: Ord,
{
unsafe fn erased_cmp<T: Ord>(a: &dyn Element, b: &dyn Element) -> Ordering {
T::cmp(&*(a as *const _ as *const _), &*(b as *const _ as *const _))
}
Set {
imp: unsafe { TypeErasedSet::new(erased_cmp::<T>) },
marker: PhantomData,
}
}
pub fn insert(&mut self, value: T)
where
T: Ord,
{
let boxed = Box::new(value);
unsafe {
let erased = mem::transmute::<Box<dyn Element>, Box<dyn Element + 'static>>(boxed);
self.imp.insert(erased)
}
}
pub fn contains(&self, value: &T) -> bool
where
T: Ord,
{
unsafe { self.imp.contains(value) }
}
}
mod private {
use std::cmp::Ordering;
pub(super) trait Element {}
impl<Sized> Element for Sized {}
type Comparator = unsafe fn(&dyn Element, &dyn Element) -> Ordering;
pub(super) struct TypeErasedSet {
// Obviously you could build a more efficient representation than this.
// Just illustrating that the concept works without a monomorphized T.
repr: Vec<Box<dyn Element>>,
cmp: Comparator,
}
impl TypeErasedSet {
pub(super) unsafe fn new(cmp: Comparator) -> Self {
TypeErasedSet {
repr: Vec::new(),
cmp,
}
}
pub(super) unsafe fn insert(&mut self, value: Box<dyn Element>) {
match self.find(&*value) {
Ok(present) => self.repr[present] = value,
Err(absent) => self.repr.insert(absent, value),
}
}
pub(super) unsafe fn contains(&self, value: &dyn Element) -> bool {
self.find(value).is_ok()
}
unsafe fn find(&self, value: &dyn Element) -> Result<usize, usize> {
self.repr
.binary_search_by(|elem| unsafe { (self.cmp)(&**elem, value) })
}
}
}
fn main() {
let mut set = Set::new();
set.insert(5);
set.insert(2);
set.insert(9);
set.insert(4);
set.insert(9);
println!("{}", set.contains(&4));
println!("{}", set.contains(&3));
}
I suppose this wouldn't be a big deal for
|
Storing a comparator in the data structure might kill covariance and ignoring that issue might be unsound. I will investigate more on that later. Passing a comparator with every call (to insert, etc) seems more reasonable to me. Especially since you'd need to "pass down" the function anyway if it's stored at top level. Whether it originates from a read operation or from a compile-time constant passed down from one layer above in the call hierarchy doesn't seem like the most significant difference. But I don't even know where the proclaimed performance benefit in that "SCARY" paper comes from (it's a bit lengthy, havent read through much of it yet) so I can't be sure to have the full picture. (The code size reduction is obvious on the other hand.) Passing the comparator every call also allows for more flexible API, where you could offer a version of |
Here’s a demonstration using your code example. use std::any::Any;
use once_cell::sync::OnceCell;
fn main() {
let s = Set::<Wrapper<for<'a> fn(&'a str)>>::new();
let mut s: Set<Wrapper<fn(&'static str)>> = s;
static S: OnceCell<&'static str> = OnceCell::new();
let f = |s| {
let _ = S.try_insert(s);
};
s.insert(Wrapper(f));
s.contains(&Wrapper(f));
println!("{}", S.get().unwrap());
}
// for the record, BTreeSet is indeed covariant, too
use std::collections::BTreeSet;
fn _b_tree_set_is_covariant<'a: 'b, 'b, T>(x: BTreeSet<&'a T>) -> BTreeSet<&'b T> {
x
}
struct Wrapper<T>(T);
impl<T: 'static> Ord for Wrapper<T> {
fn cmp(&self, _other: &Self) -> Ordering {
if let Some(f) = (self as &dyn Any).downcast_ref::<Wrapper<for<'a> fn(&'a str)>>() {
f.0(&String::from("Hello world!"));
}
Ordering::Equal
}
}
impl<T: 'static> PartialOrd<Self> for Wrapper<T> {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
Some(self.cmp(other))
}
}
impl<T: 'static> Eq for Wrapper<T> {}
impl<T: 'static> PartialEq<Self> for Wrapper<T> {
fn eq(&self, other: &Self) -> bool {
self.cmp(other).is_eq()
}
} (in the playground)
This unsoundness is not a small bug in your code, but an inherent problem of storing a comparison function in a covariant collection. Since |
Incidentally hashbrown already does something similar internally (passing the comparator as an argument) but for different reasons:
|
It’s a very slight similarity only when considering the motivation discussed here (i.e. less monomorphization), since the |
This comment has been minimized.
This comment has been minimized.
@rfcbot fcp merge #88040 (comment) is persuasive as far as the Ord impl which exists during |
Team member @dtolnay has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
@ anyone reviewing this, FYI: this suggestion
happened in a (short) topic on Zulip which can be found here (or in the archive). |
EDIT: It is asserted just using a different name from the other containers. |
☔ The latest upstream changes (presumably #88094) made this pull request unmergeable. Please resolve the merge conflicts. |
☔ The latest upstream changes (presumably #86808) made this pull request unmergeable. Please resolve the merge conflicts. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
Naive question... Would this a solution for issue #88244 ? |
Yes |
I don’t think that this PR has anything to do with resolving or not resolving #88244. While it makes the particular code example compile again, there’s no reason why other trait bounds on other const functions shouldn’t have the same problem (w.r.t. usability in associated |
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. The RFC will be merged soon. |
@bors r+ |
📌 Commit f33f266 has been approved by |
Rollup of 9 pull requests Successful merges: - rust-lang#86376 (Emit specific warning to clarify that `#[no_mangle]` should not be applied on foreign statics or functions) - rust-lang#88040 (BTree: remove Ord bound from new) - rust-lang#88053 (Fix the flock fallback implementation) - rust-lang#88350 (add support for clobbering xer, cr, and cr[0-7] for asm! on OpenPower/PowerPC) - rust-lang#88410 (Remove bolding on associated constants) - rust-lang#88525 (fix(rustc_typeck): produce better errors for dyn auto trait) - rust-lang#88542 (Use the return value of readdir_r() instead of errno) - rust-lang#88548 (Stabilize `Iterator::intersperse()`) - rust-lang#88551 (Stabilize `UnsafeCell::raw_get()`) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
K: Ord
bound is unnecessary onBTree{Map,Set}::new
and theirDefault
impl. No elements exist so there are nothing to compare anyway, so I don't think "future proof" would be a blocker here. This is analogous toHashMap::new
not having aK: Eq + Hash
bound.#79245 originally does this and for some reason drops the change to
new
andDefault
. I can see why changes to other methods likeentry
orsymmetric_difference
need to be careful but I couldn't find out any reason not to do it onnew
.Removing the bound also makes the stabilisation of
const fn new
not depending on const trait bounds.cc @steffahn who suggests me to make this PR.
r? @dtolnay