-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure ptr::read
gets all the same LLVM load
metadata that dereferencing does
#109035
Conversation
(rustbot has picked a reviewer for you, use r? to override) |
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
bfb3857
to
b0f3e14
Compare
cc @rust-lang/opsem for awareness |
Does it make sense to give |
@saethlin It might, since As a simple demo, https://rust.godbolt.org/z/8GbEsEj43 shows that EDIT: there's also no |
@@ -148,11 +148,11 @@ LL | const DATA_FN_PTR: fn() = unsafe { mem::transmute(&13) }; | |||
HEX_DUMP | |||
} | |||
|
|||
error: accessing memory with alignment 1, but alignment 4 is required | |||
error[E0080]: evaluation of constant value failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, these future incompat warnings were because we wanted to make moving away from dubious const-eval patterns smoother as part of the Const UB Armistice of #99923, so that const UB doesn't immediately turn into const-break-the-build due to compiler changes. If people feel it's been enough time we can switch this off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular PR just affects ptr::read()
which should definitely be fine imo. I'll leave the bikeshedding about what do to with the other cases to everyone else :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, general temperature was that for these UB-in-const
cases, a single warning stable cycle is probably sufficient, two is definitely sufficient, and that we're within rights to do no warning releases if we wanted to (i.e. warning at all is a good faith best effort to give some time to migrate).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amusingly, the "see issue" is pointing at #68585, which is
Tracking issue for conflicting repr(...) hints future compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Purely if it was up to me: go for it.
It seems like this will be a hugely beneficial change, crates impacted were still technically "doing it wrong", we're landing this no sooner than 1.70 (so they've had 1.68 and will have 1.69 to fix it), and "const-stable since 1.63" actually means that we should cut it off sooner rather than later due to the "Lindy effect" that bad code patterns have (i.e. the longer a pattern exists, the longer it is expected to continue existing). "More time to migrate" is something we should be considering for const fn stabilizations with version numbers like 1.49
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is fine, we can by now probably make that entire lint into a hard error.
What I don't understand immediately is why this PR changes behavior here though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It changes behavior because unaligned copy_nonoverlapping
is (temporarily) allowed, but unaligned derefs are not: https://godbolt.org/z/M6f5MrjKo
error: accessing memory with alignment 1, but alignment 4 is required
--> /rustc/8a73f50d875840b8077b8ec080fa41881d7ce40d/library/core/src/intrinsics.rs:2393:9
|
= warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
= note: for more information, see issue #68585 </~https://github.com/rust-lang/rust/issues/104616>
note: inside `copy_nonoverlapping::<u32>`
--> /rustc/8a73f50d875840b8077b8ec080fa41881d7ce40d/library/core/src/intrinsics.rs:2393:9
note: inside `COPY_NONOVERLAPPING`
--> <source>:8:5
|
8 | ptr::copy_nonoverlapping(unaligned, ptr::addr_of_mut!(dest), 1);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= note: `#[deny(invalid_alignment)]` on by default
error[E0080]: evaluation of constant value failed
--> <source>:14:5
|
14 | *unaligned
| ^^^^^^^^^^ accessing memory with alignment 1, but alignment 4 is required
(and this PR changes ptr::read
from using copy_nonoverlapping
to a deref)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It changes behavior because unaligned copy_nonoverlapping is (temporarily) allowed, but unaligned derefs are not
That's the thing, unaligned derefs should also be temporarily allowed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking into `array::IntoIter` optimization, and noticed that it wasn't annotating the loads with `noundef` for simple things like `array::IntoIter<i32, N>`. Turned out to be a more general problem as `MaybeUninit::assume_init_read` isn't marking the load as initialized (<https://rust.godbolt.org/z/Mxd8TPTnv>), which is unfortunate since that's basically its reason to exist. This PR lowers `ptr::read(p)` to `copy *p` in MIR, which fortuitiously also improves the IR we give to LLVM for things like `mem::replace`.
b0f3e14
to
b2c717f
Compare
I fully expect this to be fine, but since |
This comment has been minimized.
This comment has been minimized.
This comment was marked as resolved.
This comment was marked as resolved.
LLVMさま please bless this patch. |
This comment was marked as resolved.
This comment was marked as resolved.
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (b7c032a129d0565b7e3f96e008ac8baf713fddb0): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
There's nothing special about |
It performs a read using This results in us giving the following IR to LLVM: https://rust.godbolt.org/z/z31dn1zjM define noundef i32 @demo(ptr noalias noundef readonly align 4 dereferenceable(4) %x) unnamed_addr #0 {
%tmp = alloca i32, align 4
call void @llvm.lifetime.start.p0(i64 4, ptr %tmp)
call void @llvm.memcpy.p0.p0.i64(ptr align 4 %tmp, ptr align 4 %x, i64 4, i1 false)
%self = load i32, ptr %tmp, align 4
call void @llvm.lifetime.end.p0(i64 4, ptr %tmp)
ret i32 %self
} Note that there is no |
All of the regressions seem to be due to LLVM doing more work, which makes sense--this new IR is much more optimizable. |
library/core/src/ptr/mod.rs
Outdated
// This uses a dedicated intrinsic, not `copy_nonoverlapping`, | ||
// so that it gets a *typed* copy, not an *untyped* one. | ||
crate::intrinsics::read_via_copy(src) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JakobDegen it would be super nice if this entire PR was just
mir!({
RET = *src;
Return()
})
Are there any plans to allow defining intrinsics with custom MIR? Maybe it's difficult because both are in core
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's https://stdrs.dev/nightly/x86_64-unknown-linux-gnu/std/intrinsics/mir/macro.mir.html (err, which of course you know because you used it in the example 🤦), but I don't know if that's something we'd ever want to use for productized things, rather than just in tests.
library/core/src/intrinsics.rs
Outdated
/// The stabilized form of this intrinsic is [`crate::ptr::read`], so | ||
/// that can be implemented without needing to do an *untyped* copy | ||
/// via [`copy_nonoverlapping`], and thus can get proper metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// The stabilized form of this intrinsic is [`crate::ptr::read`], so | |
/// that can be implemented without needing to do an *untyped* copy | |
/// via [`copy_nonoverlapping`], and thus can get proper metadata. | |
/// The stabilized form of this intrinsic is [`crate::ptr::read`], so that | |
/// it is easier for the compiler to generate a load with proper metadata. |
@bors r=WaffleLapkin,JakobDegen |
⌛ Testing commit e7c6ad8 with merge 51c1bef70e3a6986eb7f91880e9123f53fe24a08... |
💔 Test failed - checks-actions |
Apparently in CI it's getting generated in the opposite order, one function per file will make the test pass either way.
@bors r=WaffleLapkin,JakobDegen |
This comment has been minimized.
This comment has been minimized.
☀️ Test successful - checks-actions |
Finished benchmarking commit (e4b9f86): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
Improvements significantly outweigh regressions, plus there's a non-trivial improvement of ~5 seconds on bootstrap time. @rustbot label: +perf-regression-triaged |
I was looking into
array::IntoIter
optimization, and noticed that it wasn't annotating the loads withnoundef
for simple things likearray::IntoIter<i32, N>
. Trying to narrow it down, it seems that was becauseMaybeUninit::assume_init_read
isn't marking the load as initialized (https://rust.godbolt.org/z/Mxd8TPTnv), which is unfortunate since that's basically its reason to exist.The root cause is that
ptr::read
is currently implemented via the untypedcopy_nonoverlapping
, and thus theload
doesn't get any type-aware metadata: nonoundef
, no!range
. This PR solves that by loweringptr::read(p)
tocopy *p
in MIR, for which the backends already do the right thing.Fortuitiously, this also improves the IR we give to LLVM for things like
mem::replace
, and fixes a couple of long-standing bugs whereptr::read
onCopy
types was worse than*
ing them.Zulip conversation: https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Move.20array.3A.3AIntoIter.20to.20ManuallyDrop/near/341189936
cc @erikdesjardins @JakobDegen @workingjubilee @the8472
Fixes #106369
Fixes #73258