-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BTreeMap: fix pointer provenance rules #78480
Conversation
b7f4051
to
8c9214c
Compare
@@ -342,28 +342,36 @@ impl<BorrowType, K, V, Type> NodeRef<BorrowType, K, V, Type> { | |||
/// # Safety | |||
/// The node has more than `idx` initialized elements. | |||
pub unsafe fn key_at(&self, idx: usize) -> &K { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is not even used any more in this file... is it even worth keeping (as opposed to expecting the caller to handle MaybeUninit
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it's there for search_linear
? in search.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I found it weird to have a function returning a reference to something that you can only use assuming it's initialized. But it's there because I didn't know how else to copy/write a slice into uninitialized elements. And it's somewhat nice to have functions that only do one thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it's there for search_linear? in search.rs
If that's the only user, it can just to assume_init_ref
itself though?
At first I found it weird to have a function returning a reference to something that you can only use assuming it's initialized. But it's there because I didn't know how else to copy/write a slice into uninitialized elements. And it's somewhat nice to have functions that only do one thing.
Well, I do not understand this code enough to suggest other APIs here, or to tell how many of these callers call this function on uninit elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it doesn't seem useful to change key_at
after all. To copy a slice from one node to another, miri-track-raw-pointers wants the source pointer to point something it can read entirely, not just the first element. Which is sensible and easy to do, but it leaves key_at
only useful to read a key.
The next problem, to which I don't see a way out at the moment, is as_internal_mut()
:
302 | unsafe { &mut *(self.node.as_ptr() as *mut InternalNode<K, V>) }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ trying to reborrow for Unique at alloc22839+0x140, but parent tag <58889> does not have an appropriate item in the borrow stack
|
We disguise a *mut InternalNode
as a *mut LeafNode
to squeeze it into BoxedNode
, and there's no way around that. Now how do we unpack such a party pooper and convince Miri that this is an InternalNode?
-Zmiri-track-pointer-tag tracks the tag to the creator, BoxedNode::from_internal
:
BoxedNode { ptr: Unique::from(&mut Box::leak(node).data) }
It does not help to make the reference raw:
BoxedNode { ptr: unsafe { Unique::new_unchecked(&raw mut Box::leak(node).data) } }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Given
fn from_internal(node: Box<InternalNode<K, V>>) -> Self {
let whole = ptr::raw_mut!(*Box::leak(node));
let part = unsafe { ptr::raw_mut!((*whole).data) };
BoxedNode { ptr: unsafe { Unique::new_unchecked(part) } }
}
the tag of the internal pointer still refers to:
|
127 | let part = unsafe { ptr::raw_mut!((*whole).data) };
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ created tag 58938
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is strange, I will have to investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does remind me of a much simpler example I've been trying to unravel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same behaviour with this example:
#![feature(ptr_internals, raw_ref_macros)]
use std::ptr::{self, NonNull, Unique};
struct Part {
_lame: i32,
}
#[repr(C)]
struct Whole {
part: Part,
extra: i32,
}
fn main() {
let it = Box::new(Whole { part: Part { _lame: 0 }, extra: 42 });
let whole = ptr::raw_mut!(*Box::leak(it));
let part = unsafe { ptr::raw_mut!((*whole).part) };
let unique = unsafe { Unique::new_unchecked(part) };
let raw = NonNull::from(unique);
let typed = unsafe { &mut *(raw.as_ptr() as *mut Whole) };
assert!(typed.extra == 42);
}
The UB also disappears by removing the _lame field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is a Miri / Stacked Borrows problem, good catch! See rust-lang/miri#1608.
535a99d
to
43e0c6a
Compare
This particular test case now passes, and I cleaned up a bit. Now over to the standard test cases… PS well it passes on the old master, but CI tries to compile on the latest master. |
That's great! Maybe we should review and land this then? It clearly improves the situation.
FWIW, I didn't even try to run any parts of the stdlib test suite with |
/// | ||
/// # Safety | ||
/// - The node has more than `idx` initialized elements. | ||
fn keys_mut(&mut self, start: usize) -> &mut [MaybeUninit<K>] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is safe but has a safety comment? What is idx
?
Also, the function is actually not safe, as I can do awful things like
foo.keys_mut()[0] = MaybeUninit::uninit();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed that already, wait a second...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any caller that does not use 0 for start
? I don't entirely understand why you added that parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the function is actually not safe, as I can do awful things like
But that's basically what slice_remove
does, and what MaybeUninit is for, so is that really unsafe? It's only unsafe when you later assume_init
it, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concern about the return type remains -- functions returning &mut MaybeUninit
are rarely safe because they allow the caller to de-initialize the given memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that for internal functions it can be less clear-cut who is responsible for upholding which invariant.
but because it might stump on other references.
it doesn't do anything odd with lifetimes though? So that seems more like something the caller needs to be aware of regardless to me.^^
But in the end it is somewhat arbitrary, so as long as it is properly documented I can live with any of these choices here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tagged that unsafe now
No I didn't. I forgot to tag as_leaf_mut
, and if I try that, the avalanche means you're tempted to make all bodies unsafe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I made the as_X
methods safe again, and made the ones returning MaybeUninit
references and slices unsafe. It doesn't make much sense, since you can access the MaybeUninit
array directly and safely. But you need an unsafe block to do use them anyway.
Maybe the fields shouldn't be open to the entire node code, and the methods returning MaybeUninit
stuff should be members of LeafNode
/InternalNode
instead of NodeRef
. I fenced off the BoxedNode
field just to see what the reviewer(s) will say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One class of methods that has been very clear to me from the start, is those on a plain self
like first_edge
, left_kv
: they simply return the same ownership, borrowship, or whatever you call it.
Next to the easy ones are those on a plain self
called into_X
(where I think the unwritten rule is that X is a part of the original, not a neighbour). Some of them return an 'a
lifetime, which stems from the NodeRef
's marker::Immut<'a>
/ValMut<'a>
/Mut<'a>
. It didn't understand what that implied for a long time, but they didn't look daunting either, thanks to the plain self
.
I now understand why there is both into_key_at
and key_at
: because the former profits from the lifetime incorporated into marker::Immut<'a>
/ValMut<'a>
/Mut<'a>
, and the latter is defined and used for all 3 of those, and the only way to do that is to define it over some BorrowType
without lifetime. Therefore you can write the latter using the former, but not the other way around. into_key_at
is easy enough and more sophisticated.
Therefore I turned all the new/changed function into the into_X
class, and I like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's a good place to document that structure with a comment in the code, that seems like it could be helpful. :)
I really don't know how to Miri the other test cases, other than writing each one as a standalone program (for those not peeking into internals). So I cleaned up some more instead. |
I can write instructions for this, but it is probably easier to wait until the things that we currently have in the pipeline have landed. |
/// Exposes the data of an internal node for reading. | ||
/// | ||
/// # Safety | ||
/// We have exclusive access to the node (and not just to the `NodeRef`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there supposed to be an invariant that a NodeRef<marker::Mut
always has exclusive access to the node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat, but with at least these exceptions:
- temporary ones with
reborrow_mut
- temporary one with
ptr::read
inremove_kv_tracking
(I have a PR queued to usereborrow_mut
instead) - just realized today, the pointer returned by
insert_fit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
temporary ones with reborrow_mut
Oh, reborrow_mut doesn't use some lifetimes to make sure only one of the two ptrs can be used?
But then I'd say it is reborrow_mut that should be unsafe, with the safety invariant being "do not use the original ref until you stopped using the reborrowed one". (Somewhat like dorman-mut-ref actually...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well reborrow_mut
is unsafe, with ample warning, and I think the borrow checker makes sure you can only use the top one, but you can still use it to deallocate the node that the original one refers to, and that's not cool…
/// | ||
/// # Safety | ||
/// - The node has more than `idx` initialized elements. | ||
fn keys_mut(&mut self, start: usize) -> &mut [MaybeUninit<K>] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concern about the return type remains -- functions returning &mut MaybeUninit
are rarely safe because they allow the caller to de-initialize the given memory.
That wasn't entirely true. I can Miri at any point, by rebasing the btree code on an older master, which I already did. But I can't -Zmiri-track-raw-pointers it. |
You are using /~https://github.com/RalfJung/miri-test-libstd, right? So that would be something like
But be aware that I have never tried this; even the libtest harness might already be breaking pointer provenance rules for all we know... in fact, until #78484 lands, many uses of EDIT: Oh and you'll have to |
No, never heard of that. I normally did |
Wow, I had no idea that that would even work.^^ |
c4cbc0c
to
d0d876d
Compare
It's not that bad, but of the further two test cases I converted manually, one failed miserably on the Anyway, it seems this is about the best I can get out of it. Doesn't fail with the normal Miri based on ffa2e7a. |
Okay so I guess this is ready for review by @Mark-Simulacrum then. :)
GH unfortunately doesn't send notifications for this so a reviewer would not notice such a status change. |
It's S-waiting-on-review, so I'll see it on my next round of reviews :) Probably this weekend or tomorrow, not sure yet. |
Okay. Seems fine. @bors r+ |
📌 Commit 3ce64acedb5c0219309dd2ae6439fb82c4d05778 has been approved by |
☔ The latest upstream changes (presumably #78889) made this pull request unmergeable. Please resolve the merge conflicts. Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:
|
It seems we get a conflict over the same commit, rebased here on a newer master than the original. @rustbot modify labels: +S-waiting-on-review -S-waiting-on-author |
@bors r+ |
📌 Commit 7ca6e8f has been approved by |
Rollup of 12 pull requests Successful merges: - rust-lang#74754 (Add `#[cfg(panic = '...')]`) - rust-lang#76468 (Improve lifetime name annotations for closures & async functions) - rust-lang#77016 (Test clippy on PR CI on changes) - rust-lang#78480 (BTreeMap: fix pointer provenance rules) - rust-lang#78502 (Update Chalk to 0.36.0) - rust-lang#78513 (Infer the default host target from the host toolchain if possible) - rust-lang#78566 (Enable LLVM Polly via llvm-args.) - rust-lang#78580 (inliner: Break inlining cycles) - rust-lang#78710 (rustc_ast: Do not panic by default when visiting macro calls) - rust-lang#78746 (Demote i686-unknown-freebsd to tier 2 compiler target) - rust-lang#78830 (fix `super_visit_with` for `Terminator`) - rust-lang#78844 (Monomorphize a type argument of size-of operation during codegen) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
…Mark-Simulacrum BTreeMap: fix pointer provenance rules in underfullness Continuing on rust-lang#78480, and for readability, and possibly for performance: avoid aliasing when handling underfull nodes, and consolidate the code doing that. In particular: - Avoid the rather explicit aliasing for internal nodes in `remove_kv_tracking`. - Climb down to the root to handle underfull nodes using a reborrowed handle, rather than one copied with `ptr::read`, before resuming on the leaf level. - Integrate the code tracking leaf edge position into the functions performing changes, rather than bolting it on. r? `@Mark-Simulacrum`
Fixes #78477 and includes #78476
r? @Mark-Simulacrum