Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defuse the bomb that is mem::uninitialized #87032

Closed
wants to merge 1 commit into from

Conversation

bstrie
Copy link
Contributor

@bstrie bstrie commented Jul 10, 2021

EDIT: I think for now it makes sense to withdraw this PR while we wait for #66151 to explore tightening intrinsics::assert_zero_valid. I could also imagine an even better version of this PR that uses some value other than 0 (amusingly, 1 might be the most permissive/safe value?). Furthermore, while I still think mem::uninitialized deserves to be as discouraged as we can possibly make it, I am slightly less dim on its existence now that I understand the model of uninitialized memory better and that it actually is possible to use this function safely in some contrived circumstances (let x: () = mem::uninitialized();), which means that it isn't completely ridiculous to expect a hypothetical Rust specification to specify its behavior.

Original below:


The deficiencies with mem::uninitialized have long been well-known. It has been long since replaced by mem::MaybeUninit, and was officially deprecated in November 2019.

The problem with mem::unitialized is that, as originally conceived, it is fundamentally incompatible with Rust's stance on memory safety. The API as of Rust 1.0 presented a promise that this function could be safe; however, that promise could not be kept. There is no safe way for something with this API to safely provide uninitialized memory.

We can see this contradiction reflected in the current implementation: the return value is simply MaybeUninit::uninit().assume_init(). And yet if we look at the MaybeUninit docs, we see the following listed as a misuse of the API:

let x: i32 = unsafe { MaybeUninit::uninit().assume_init() }; // undefined behavior!

It isn't safe to use uninitialized memory like this, and yet the API of mem::uninitialized makes it unavoidable.

When a function is marked unsafe in Rust, that does not mean that that function is fundamentally memory-unsafe; it means that the caller must manually uphold certain invariants in order to ensure memory safety. This is why all unsafe functions in libstd are supposed to have a "Safety" section in their documentation to inform the user how the function may be safely used. And yet for mem::uninitialized there is no "Safety" section, because, as implemented, the caller-upheld invariant is essentially this: the return value of this function must never be used.

So at best this function is "just" useless. And when it is not just useless, it is immediate undefined behavior.

Rust's stability guarantee permits breakage in the event of unsoundness. In fact, a year prior to the deprecation of mem::uninitialized, a check was added that guarantees that the function will panic at runtime when used with many types that are known to have uninhabited values. This caused some breakage in the wild, and yet it was not sufficient to totally solve the issue (self-evidently, as otherwise MaybeUninit would not have later been stabilized and this function would not have been deprecated).

However, although it is arguably within Rust's right to outright remove this function from the stdlib for the sake of soundness, it is possible to be more conservative. Since we must choose between being unsafe and being safe-but-useless, this PR chooses the latter.

This PR replaces the internals of mem::uninitialized with a call to mem::zeroed. Semantically this is legal: if your code was written to expect any imaginable value, then it must work with any specific value (relevant XKCD). Philosophically this is also defensible: although this new implementation not strictly "uninitialized", the alternative is undefined behavior; since undefined behavior allows any behavior at all, up to and including nasal demons, an interpretation of undefined behavior wherein your uninitialized memory is replaced with initialized memory is legal.

This PR has three chief advantages:

  1. Any program using mem::uninitialized that has not been updated since the introduction of MaybeUninit is no longer all-but guaranteed to exhibit undefined behavior.
  2. Because we can safely remove the guaranteed-to-panic-on-types-with-uninhabited-values check, this means that legacy libraries that stopped working because of that check may now be usable once more. (mem::zeroed has its own, more narrowly-scoped check that rejects a subset of the types of the current check.)
  3. In a hypothetical future where Rust has a language specification, this interpretation of the contract of this function (transitioning from "this returns an uninitialized value" to "this returns an unspecified value") is possible to reasonably specify (even if it's not guaranteed to ever be useful).

This PR does not remove the unsafe marker on this function, nor does it reverse its deprecation, nor does it suggest that either of those ever be done. We're just building the sarcophagus on Chernobyl here.

This was discussed in #libs on Zulip, and there were no outright objections (although some desired to just see this removed from std entirely).

@rust-highfive
Copy link
Collaborator

r? @kennytm

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 10, 2021
@bstrie bstrie added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Jul 10, 2021
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@Aaron1011
Copy link
Member

Because we can safely remove the guaranteed-to-panic-on-types-with-uninhabited-values check, this means that legacy libraries that stopped working because of that check will now be usable once more.

I think it would be a good idea to keep this panic - that is, we would panic if mem::uninitialized is called with an uninhabited type, and otherwise return zeroed. This will make it easier for users to debug and fix undefined behavior in their code - instead of needing to wait for 'consequences' of the undefined behavior to occur (e.g. trying to use the 'instance' of the uninhabited type), they'll immediately get a panic message.

@bstrie
Copy link
Contributor Author

bstrie commented Jul 10, 2021

@Aaron1011 I'm not particularly opposed to leaving that panic check in, but is it necessary? mem::zeroed has its own panic check, for can't-be-zero types rather than has-any-uninhabited-values-at-all types.

This will make it easier for users to debug and fix undefined behavior in their code - instead of needing to wait for 'consequences' of the undefined behavior to occur (e.g. trying to use the 'instance' of the uninhabited type), they'll immediately get a panic message.

With this PR there won't be any undefined behavior in their code to find, at least not due to uninitialized memory (because there won't actually be any uninitialized memory), so there won't be any consequences that can occur. And as for helping users fix their code, I think the only thing to be done at this point is to point them towards MaybeUninit, which is what the deprecation notice has been doing for going on two years.

@Aaron1011
Copy link
Member

mem::zeroed has its own panic check, for can't-be-zero types rather than has-any-uninhabited-values-at-all types.

It looks like mem::zeroed() already panics for uninhabited types, so this PR already does what I want.

And as for helping users fix their code, I think the only thing to be done at this point is to point them towards MaybeUninit, which is what the deprecation notice has been doing for going on two years.

In a large dependency graph, this could easily go unnoticed (especially if it's an upstream crate triggering the deprecation warning). Of course, this is a moot point for this PR, as we already get the proper panic.

@bstrie
Copy link
Contributor Author

bstrie commented Jul 10, 2021

In a large dependency graph, this could easily go unnoticed (especially if it's an upstream crate triggering the deprecation warning).

I suppose that might be one advantage in favor of the "just remove this API entirely" camp: if this were a future-compatibility warning, it would penetrate the usual silencing of upstream warnings.

@inquisitivecrystal
Copy link
Contributor

This looks to me like it's strictly an improvement over the current situation. Removing the function would probably be best in the long run, but that raises serious questions about rust's stability policy. At least this is a change we can make right now.

#[rustc_diagnostic_item = "mem_uninitialized"]
pub unsafe fn uninitialized<T>() -> T {
// SAFETY: the caller must guarantee that an unitialized value is valid for `T`.
unsafe {
intrinsics::assert_uninit_valid::<T>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave this call in place? That will allow us to keep the "attempted to leave type T uninitialized, which is invalid" panics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression from your previous comments that you were content with only panicking on can't-be-zero types. If your concern is just the diagnostic regression, I would much rather solve this some other way than reintroducing this check, since one of the points of this PR is to deliberately make it possible to use this function with types like bool, so as to "un-break" code that was broken when these checks were added in the first place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I may have misunderstood what was happening.

These panics were intentionally added in #66059, as a way of catching misuses of uninitialized memory. Even though std::mem::uninitialized now 'happens' to zero out these types, users should not be relying on this behavior (as your doc comment states). That is, writing std::mem::uninitialized::<bool>() is still a bug - any code doing so is broken, and should not be 'un-broken' by removing the legitimate panic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aaron1011 I thought the whole point was that mem::uninitialized is broken for any type. Why are types with niches being singled out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also @thomcc's comments about how the panics have broken several of his old crates; I would love for this to fix them. https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/disarming.20mem.3A.3Auninitialized/near/244037069

Copy link
Contributor Author

@bstrie bstrie Jul 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in fact not at all broken for types that are allowed to be uninitialized, such as MaybeUninit.

Are there any types that are allowed to be uninitialized other than MaybeUninit (and arrays and tuples of MaybeUninit)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was started in #79296 but then stalled when the rather large crater results had to be analyzed.

I queued a newer Crater run in #87041, but it will be a while before it gets to the head of the queue and finished.

Copy link
Member

@RalfJung RalfJung Jul 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any types that are allowed to be uninitialized other than MaybeUninit (and arrays and tuples of MaybeUninit)?

Yes:

  • The type () (or really any inhabited zero-sized type... if you only consider the validity invariant, not the safety invariant)
  • Any user-defined union type... well to be fair this is not ratified yet, that's just what I would propose. But I think there is consensus that any union with a field of size zero (essentially, a user defining their own MaybeUninit) is okay to be left uninitialized.

Now, I am not saying that this is all that useful, I just want to get our facts straight so we have a solid basis to make decisions on. :) The only legitimate reason I can think of to call mem::uninitialized is to create an array of MaybeUninit, and we should probably just offer a better API for that (possibly using const generics, like other stdlib APIs are already doing AFAIK).

Copy link
Contributor Author

@bstrie bstrie Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting here too that even with this PR removing the panic and changing the implementation to zeroed, rustc still independently produces a scary warning when attempting to use uninitialized with any type with invalid values:

warning: the type `bool` does not permit being left uninitialized
 --> un.rs:3:22
  |
3 |     let x = unsafe { std::mem::uninitialized::<bool>() };
  |                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |                      |
  |                      this code causes undefined behavior when executed
  |                      help: use `MaybeUninit<T>` instead, and only call `assume_init` after initialization is done
  |
  = note: `#[warn(invalid_value)]` on by default
  = note: booleans must be either `true` or `false`

So even though the panic no longer exists (to the benefit of people using libraries that do this), people writing this code themselves (who will therefore be seeing warnings) will still be made aware that this isn't a good idea (even if, for the moment, it may not actually be invoking UB from the compiler's perspective).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That lint however has a lot of false negatives. For example, it will not fire when the code that calls mem::uninitialized is generic.

@bstrie
Copy link
Contributor Author

bstrie commented Jul 10, 2021

I've also had to update a UI test, as the panic message from attempting to use mem::uninitialized on a can't-be-zero type has now changed from "attempted to leave type foo uninitialized, which is invalid" to "attempted to zero-initialize type foo, which is invalid", which could be considered a minor diagnostic regression, but I don't think we particularly care about that in this circumstance (as evidenced by the fact that we don't even have #[track_caller] on these functions). This also removes some asserts that were testing that some types could be used with mem::zeroed but not with mem::uninitialized, but as mentioned above one of the points of this PR is that such types (e.g. bool) should no longer panic when used with mem::unitialized (as it is now safe), so it is intentional that those asserts should be removed.

@Aaron1011
Copy link
Member

cc @RalfJung

@RalfJung
Copy link
Member

Any legacy program using mem::uninitialized no longer suffers from undefined behavior (well, not from calls to this function, anyway).

This is not correct. zeroed is less dangerous that uninitialized, but still very dangerous. For example, zeroed on &T is still immediate UB.

a check was added that guarantees that the function will panic at runtime when used with a type that is known to have uninhabited values

What is a "type with uninhabited values"? Values cannot be (un)inhabited; it is types that are (un)inabited: a type is inhabited if and only if there is a value of that type. For example, ! is uninhabited, but bool is inhabited. Note that both of these types are immediate UB when used with mem::uninitialized (though this PR "defuses" the UB for bool).

@RalfJung
Copy link
Member

Cc #66151

@bstrie
Copy link
Contributor Author

bstrie commented Jul 10, 2021

For example, zeroed on &T is still immediate UB.

I don't see how it could be UB, as zeroed::<&i32>() is guaranteed to unconditionally panic.

What is a "type with uninhabited values"?

Whatever the jargon is for "a type where not all possible bit representations are valid", where an example of an invalid value is e.g. 3 for bool.

@RalfJung
Copy link
Member

RalfJung commented Jul 10, 2021

I don't see how it could be UB, as zeroed::<&i32>() is guaranteed to unconditionally panic.

It's library UB. The implementation happens to implement the UB as a panic, but that is certainly not guaranteed.

Moreover, if you use [&i32; 1] instead, there's no panic. Now the library UB indeed becomes language UB. We could tighten the panic (we didn't because each time we do, the crater results are enormous), but I don't think we'll be able to guarantee that we catch all types for which 0x00...00 is not a valid representation. And even if we do, one can still cause library UB and in turn language UB by instantiating types that have extra library invariants.

Whatever the jargon is for "a type where not all possible bit representations are valid", where an example of an invalid value is e.g. 3 for bool.

Ah, that has nothing to do with "inhabited" then. We don't really have any short jargon term for such types. (Some people say "type with a niche", but "niche" usually refers to niches the rustc layout algorithm actually uses.)

Also, "all possible bit representations" is doing a lot of work here: are "all possible bit representations" valid for i32? It might seem so, but actually uninitialized bits are not valid for i32.

@bstrie
Copy link
Contributor Author

bstrie commented Jul 10, 2021

Also, "all possible bit representations" is doing a lot of work here: are "all possible bit representations" valid for i32? It might seem so, but actually uninitialized bits are not valid for i32.

Yes, I am seeking to draw a distinction here between "the result of this operation is unspecified" and "the result of the operation is uninitialized". It is safe for an API to provide an unspecified value, and AFAIK not safe to provide an uninitialized value.

I'll change the original PR description to weaken its claims. However, I still think this is worth considering.

@RalfJung
Copy link
Member

Yes, I am seeking to draw a distinction here between "the result of this operation is unspecified" and "the result of the operation is uninitialized". It is safe for an API to provide an unspecified value, and AFAIK not safe to provide an uninitialized value.

I don't know what you mean by "safe" here, but the usual definition of "soundness of a safe API" is certainly not satisfied by any of the suggestions you are making here. A library might define a type like struct NotBothZero(i32, i32); which carefully maintains the invariant that never both elements will be zero, and then it could provide an operation like

pub fn check(x: &NotBothZero) {
  if x.0 == 0 && x.1 == 0 { unsafe { unreachable_unchecked(); } }
}

It is impossible to make any of the functions we are considering here "safe" or "sound". The best we can hope for is to make them not cause immediate (language) UB, but that is a much weaker property. I agree this might be a good idea, but I also feel like there are a lot of misunderstandings in this thread.

(This is leaving aside the issue that "unspecified result" is IMO a meaningless term until you define more precisely from which possible set of options the result is picked. If "uninitialized" is in that set, it is surely no less harmful than the alternative. But we need not go there, it is unnecessary to even talk about "unspecified result" in this context.)

@RalfJung
Copy link
Member

So taking a step back: first of all I fully agree that this PR is "correct" in the sense that as per the documentation of mem::uninitialized, we are totally allowed to do this.

Secondly, I agree that it might be a good idea to reduce the risk of old code breaking due to incorrect use of mem::uninitialized on types like bool. It will, however, still break on types like &i32. I guess that's better than nothing, but it's not entirely satisfying.

However, as already mentioned in the subthread, I think it might be a bit early to reduce the panic: we're being rather conservative with ramping that up, so e.g. code using mem::uninitialized::<[bool; N]>() has not received a panic yet and, under this proposal, never will.

Also I think if we are doing this, it might be worth to document. People might wonder why mem::uninitialized is so much slower than they would expect. The documentation should of course be explicit that the fact that we fill the data with zeros is not a guarantee, but at the same time I think it's better and less frustrating for our users to be upfront about this.

@bstrie
Copy link
Contributor Author

bstrie commented Jul 11, 2021

I don't know what you mean by "safe" here,

My apologies, I'm being rather informal and am using "safe" not to mean "this becomes a safe function" (as I am trying to repeatedly emphasize, I'm not seeking to remove the unsafe keyword from anything here) but merely to mean that it is reasonable to expect that users will be have the ability to use this unsafe function in a safe manner; in contrast to the current implementation of mem::uninitialized, which is both unsafe and IMO unlikely to be used safely in most contexts where it currently appears. To wit, even if it is safe to use mem::uninitialized to initialize an array of MaybeUninits, I think it's likely that the vast majority of the uses of mem::uninitialized come from before MaybeUninit was even stabilized, and are probably being used to create large arrays of primitives.

I also think, with this PR, the semantics of mem::uninitialized with respect to safety will be in line with what users expected when they originally reached for mem::uninitialized. They would have expected to be required to uphold the invariant that the result should be fully initialized before it is used, and that to fail to do so would be a memory safety violation. This was the correct assumption for mem::zeroed (types that can't be zero notwithstanding--if your type can't be zero, then it can't be uninitialized anyway), but surprisingly not for mem::uninitialized. With this PR, any code that was written in such a way will be able to uphold the safety invariants that the original author expected to uphold. This is also what I mean by "safe": in such contexts, unsafe blocks that were expected to be safe (but failed due to the surprising requirements of uninitialized memory) now can be made safe retroactively.

Finally, I think it's likely (but I could be convinced otherwise) that any code still using mem::uninitialized is neither using MaybeUninit and is probably never going to upgrade to use MaybeUninit, so the effect of this PR is to reduce the amount of permanently-broken code in the wild; more code will compile and "work", although with performance implications.

I think it might be a bit early to reduce the panic:

Sure, I'm not opposed to other methods of ameliorating the brokenness of this function, but I'll reiterate my opinion that any code using mem::uninitialized by now is probably unmaintained, so the choice is between making users of that code panic, or making it be """safe""" but slower.

Also I think if we are doing this, it might be worth to document. People might wonder why mem::uninitialized is so much slower than they would expect.

Sure, I can add more language to this effect.

@RalfJung
Copy link
Member

but merely to mean that it is reasonable to expect that users will be have the ability to use this unsafe function in a safe manner; in contrast to the current implementation of mem::uninitialized, which is both unsafe and IMO unlikely to be used safely in most contexts where it currently appears.

I agree with the second part. For the first part, note that we are not changing the contract provided by this function, so all these previously incorrect uses are still incorrect. We just avoid letting this blow up in people's faces.

This is likely what you mean, but I want to emphasize that "using this unsafe function in a safe manner" still means following the documentation, not following whatever the current implementation happens to do. We are not changing the contract between mem::uninitialized and its caller, we are just deciding to be a bit less aggressive about how we exploit violations of that contract (without any commitment on our side). Basically, they are still breaking the law, but police decided to look the other way -- for now. ;)

(I'm sorry if I come across as pedantic here. The terminology around safely using unsafe code is subtle but important.)

I also think, with this PR, the semantics of mem::uninitialized with respect to safety will be in line with what users expected when they originally reached for mem::uninitialized. They would have expected to be required to uphold the invariant that the result should be fully initialized before it is used, and that to fail to do so would be a memory safety violation. This was the correct assumption for mem::zeroed (types that can't be zero notwithstanding--if your type can't be zero, then it can't be uninitialized anyway), but surprisingly not for mem::uninitialized.

I disagree. This is not the correct assumption for mem::zeroed, precisely because some types cannot be zero. It's just "less incorrect" for mem::zeroed than for mem::uninitialized, and at the same time it's "more obvious" that mem::zeroed on &T is Wrong (TM).

The way the invariants work out is exactly the same for both zeroed and uninitialized. The only difference is that zeroed produces a bit pattern that is less likely to violate a type's validity invariant than uninitialized. There is no difference in what those invariants are (they are documented here) or where they are required to be upheld. I feel like you still have the wrong mental model here, or maybe this is just a communication problem.

Finally, I think it's likely (but I could be convinced otherwise) that any code still using mem::uninitialized is neither using MaybeUninit and is probably never going to upgrade to use MaybeUninit, so the effect of this PR is to reduce the amount of permanently-broken code in the wild; more code will compile and "work", although with performance implications.

That seems plausible, and it is why I like the thrust of this PR. I just think it comes too early since we haven't yet ramped up the panics around mem::uninitialized as far as I planned to. But maybe my hope that further ramping up these panics will encourage more code to switch to MaybeUninit is naive.

@bjorn3
Copy link
Member

bjorn3 commented Jul 12, 2021

Instead of all zeros should mem::uninitialized be changed to some clearly recognizable pattern like 0xdb repeating? This is what for example some hardened memory allocators use.

/~https://github.com/jedisct1/libsodium/blob/07c2f6c053dd2aed1716444d667dcc781f14739c/src/libsodium/sodium/utils.c#L65

@bstrie
Copy link
Contributor Author

bstrie commented Jul 12, 2021

fatal: remote error: upload-pack: not our ref 9ad6e5b8f68ee4bcd85886e50b8b0a70cbb91a52
Errors during submodule fetch:
	src/tools/cargo
Error: Process completed with exit code 1.

This must surely be spurious, can a reviewer retry?

@Mark-Simulacrum
Copy link
Member

@bors retry not our ref cargo

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 12, 2021
@thomcc
Copy link
Member

thomcc commented Jul 12, 2021

I think there's a real risk that this could do more damage than it prevents in the cases that the values are not valid for an all-zeroes bitpattern (which isn't unrealistic at all). In that case the compiler backend will see us definitively initializing the value to an impossible bitpattern, rather than leaving it uninitialized (which, given that many C/C++ variables start out uninitialized, seems unlikely to be exploited in any way).

I would love in the future for us to "defuse" mem::uninitialized, but I don't really feel like s/uninitialized/zeroed/g is the right defusal (or even a way to defuse it at all). There are a plethora of types in Rust that are not allowed to be zeroed. In fact, I'd argue that zeroed is one of the most dangerous initializers in Rust, rather than the least dangerous, given that references, Box, NonZero integers... all cannot be zeroed.

In practice, I think the "defused" version of mem::uninitialized would need to consider which bitpatterns are valid in the output type, and if initialize to one of those (this is likely to be a lot of work, and requires knowledge of the exact type being output by the call). Short of something like that I don't really see this actually making anything less dangerous...


Note: I'm someone who has had their old code broken by this, and I'd love to see the situation with mem::uninitialized improved. I've called the story of happened to mem::uninitialized a violation of Rust's stability guarantees, in a way that hasn't happened to me in many other languages (both conceptually — there was the tacit promise that this API existing meant it could be used, and in practice — I literally can't run some old code of mine because of a bad mem::uninitialized in a transitive dependency that triggers early enough in the program's execution to be effectively during startup).

@bstrie
Copy link
Contributor Author

bstrie commented Jul 13, 2021

@thomcc Out of curiosity, can you tell us the name of the library that caused your code to break when these panics were added?

@thomcc
Copy link
Member

thomcc commented Jul 13, 2021

It was inside an old version of glium, I believe.

@RalfJung
Copy link
Member

RalfJung commented Jul 13, 2021

Ok, so my original assertion can be equivalently reformulated as: legacy users of mem::uninitialized were almost certainly aware that their types had validity requirements (e.g. "this type cannot be zero") and IMO can be reasonably trusted to have upheld those requirements, with the sole exception of the undef requirement, which virtually nobody understood prior to the advent of MaybeUninit.

This is a claim that I do not see enough evidence for. I think the opposite is true: legacy users assumed that there is no such thing as a validity invariant that needs to be upheld at all times, and instead the invariant would only be relevant when one matches on the data or otherwise directly inspects it.

There is evidence for my claim: there is/was a lot of legacy code (that I saw while going through crater regressions for these panics), some of it written by veteran Rust experts in the Rust 1.0 days, that uses mem::uninitialized on arbitrary user-supplied types. The moment the user called that code with a type like bool, it does not uphold validity requirements.

To be fair, we don't panic for code that uses mem::uninitialized on integer-only types, so I would not have seen such code. Nor would Miri, since it is also deliberately relaxed still about this particular validity invariant. I plan to add a flag to Miri to be more strict here so we can start exploring how much of the ecosystem would be affected by that.

@RalfJung
Copy link
Member

RalfJung commented Jul 13, 2021

I've called the story of [what] happened to mem::uninitialized a violation of Rust's stability guarantees, in a way that hasn't happened to me in many other languages (both conceptually — there was the tacit promise that this API existing meant it could be used, and in practice — I literally can't run some old code of mine because of a bad mem::uninitialized in a transitive dependency that triggers early enough in the program's execution to be effectively during startup).

Yes, I know, this is not one of Rust's moments of glory. We screwed up in providing the original API that is not compatible with how the language works elsewhere, and then it took way too long to get MaybeUninit stabilized. I still think it was the right call as the old situation was unsustainable, and the longer we would have waited the worse it would have become, but doing this quicker would have been a lot better.

If you think the logic for the panic should be changed such that old code can at least try to run again (with UB, but there's a high chance that current rustc will be fine -- though future rustc will exploit this more and more), then please speak up at #66151. I honestly expected more backlash than we actually got so far (which is one reason why I cranked up the panic so slowly).

@RalfJung
Copy link
Member

This has not been r+ed yet, retry confused the state here.
@bors r-

I'll close and reopen the PR to hopefully retrigger PR CI.

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jul 13, 2021
@RalfJung RalfJung closed this Jul 13, 2021
@RalfJung RalfJung reopened this Jul 13, 2021
@bstrie
Copy link
Contributor Author

bstrie commented Jul 14, 2021

This is a claim that I do not see enough evidence for. I think the opposite is true: legacy users assumed that there is no such thing as a validity invariant that needs to be upheld at all times, and instead the invariant would only be relevant when one matches on the data or otherwise directly inspects it.

I base my assertion in that even the earliest rough drafts of the Rustonomicon mention that merely "producing" (not inspecting) invalid values is undefined behavior. Here's a version from June 2015: /~https://github.com/rust-lang/nomicon/blob/96efb3776ddef687bb56e8ccb0af1978df677b40/intro.md#what-does-unsafe-mean .

You surely know better than I do whether or not people managed to uphold this in the wild, especially for types like bool (I seem to recall for a long time there was debate as to whether Rust should actually require bool to be either 0 or 1, but even here in this old document it mentions this as a requirement), but IME even back in those days producing a null reference was taken just as seriously as aliasing a &mut, so people were not unaware of the existence of validity invariants in general.

@RalfJung
Copy link
Member

Interesting, I did not know the Nomicon said this from the start.

This does not match what I saw in code out there. But what I saw is heavily biased by what the panic is checking for, so it is not suited to make any representative statement.

@bstrie
Copy link
Contributor Author

bstrie commented Jul 14, 2021

I can certainly imagine that people used to C might not have treated the bool validity invariant as a strict requirement, since C is so historically loose with booleans (and conversely, people might have treated the "references can't be zero" requirement with more care due to prior experience with C++). That said, this specific PR wouldn't cause a problem with bool specifically, since zero is valid for them.

This PR is, of course, still putting a lot of faith in the intrinsics::assert_zero_valid panic inside of zeroed (and I think the risk of false negatives in this check is the gist of @thomcc's concern?). I'd be interested to see #66151 explored further (#87041 ?), specifically WRT zeroed.

///
/// The reason for deprecation is that the function basically cannot be used
/// **This function is deprecated with extreme prejudice.**
/// It is replaced by [`MaybeUninit<T>`], which must be used instead of this function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you looked at how this renders in rustdoc? The standard is one summary sentence, trying to keep it succinct so it renders on, in this case, the std::mem page. The previous one was already a teeny bit long.

@crlf0710
Copy link
Member

@bstrie Ping from triage, what's next steps here?

@crlf0710 crlf0710 added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 31, 2021
@bstrie
Copy link
Contributor Author

bstrie commented Jul 31, 2021

@crlf0710 That's a good question. I think for now it makes sense to withdraw this PR while we wait for #66151 to explore tightening intrinsics::assert_zero_valid. I could also imagine an even better version of this PR that uses some value other than 0 (amusingly, 1 might be the most permissive/safe value?). Furthermore, while I still think mem::uninitialized deserves to be as discouraged as we can possibly make it, I am slightly less dim on its existence now that I understand the model of uninitialized memory better and that it actually is possible to use this function safely in some contrived circumstances (let x: () = mem::uninitialized();), which means that it isn't completely ridiculous to expect a hypothetical Rust specification to specify its behavior.

@bstrie
Copy link
Contributor Author

bstrie commented Jul 31, 2021

I've filed an issue at #87675 for the general idea of "initialized" mem::uninitialized, specifically with a new idea wherein Rust leverages its understanding of niches to produce initialized values that are valid for any given type.

bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 28, 2022
mem::uninitialized: mitigate many incorrect uses of this function

Alternative to rust-lang#98966: fill memory with `0x01` rather than leaving it uninit. This is definitely bitewise valid for all `bool` and nonnull types, and also those `Option<&T>` that we started putting `noundef` on. However it is still invalid for `char` and some enums, and on references the `dereferenceable` attribute is still violated, so the generated LLVM IR still has UB -- but in fewer cases, and `dereferenceable` is hopefully less likely to cause problems than clearly incorrect range annotations.

This can make using `mem::uninitialized` a lot slower, but that function has been deprecated for years and we keep telling everyone to move to `MaybeUninit` because it is basically impossible to use `mem::uninitialized` correctly. For the cases where that hasn't helped (and all the old code out there that nobody will ever update), we can at least mitigate the effect of using this API. Note that this is *not* in any way a stable guarantee -- it is still UB to call `mem::uninitialized::<bool>()`, and Miri will call it out as such.

This is somewhat similar to rust-lang#87032, which proposed to make `uninitialized` return a buffer filled with 0x00. However
- That PR also proposed to reduce the situations in which we panic, which I don't think we should do at this time.
- The 0x01 bit pattern means that nonnull requirements are satisfied, which (due to references) is the most common validity invariant.

`@5225225` I hope I am using `cfg(sanitize)` the right way; I was not sure for which ones to test here.
Cc rust-lang#66151
Fixes rust-lang#87675
workingjubilee pushed a commit to tcdi/postgrestd that referenced this pull request Sep 15, 2022
mem::uninitialized: mitigate many incorrect uses of this function

Alternative to rust-lang/rust#98966: fill memory with `0x01` rather than leaving it uninit. This is definitely bitewise valid for all `bool` and nonnull types, and also those `Option<&T>` that we started putting `noundef` on. However it is still invalid for `char` and some enums, and on references the `dereferenceable` attribute is still violated, so the generated LLVM IR still has UB -- but in fewer cases, and `dereferenceable` is hopefully less likely to cause problems than clearly incorrect range annotations.

This can make using `mem::uninitialized` a lot slower, but that function has been deprecated for years and we keep telling everyone to move to `MaybeUninit` because it is basically impossible to use `mem::uninitialized` correctly. For the cases where that hasn't helped (and all the old code out there that nobody will ever update), we can at least mitigate the effect of using this API. Note that this is *not* in any way a stable guarantee -- it is still UB to call `mem::uninitialized::<bool>()`, and Miri will call it out as such.

This is somewhat similar to rust-lang/rust#87032, which proposed to make `uninitialized` return a buffer filled with 0x00. However
- That PR also proposed to reduce the situations in which we panic, which I don't think we should do at this time.
- The 0x01 bit pattern means that nonnull requirements are satisfied, which (due to references) is the most common validity invariant.

`@5225225` I hope I am using `cfg(sanitize)` the right way; I was not sure for which ones to test here.
Cc rust-lang/rust#66151
Fixes rust-lang/rust#87675
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.