-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Formally define repr(u32, i8, etc...) and repr(C) on enums with payloads #2195
Conversation
I really like this, and I think it'll significantly improve FFI. A few bits of feedback:
|
There's another representation that's slightly different from the one proposed: a union of structs where each struct's first field is the discriminant. (For FFI there could be an extra union case with only the discriminant, but I believe it's valid C to assume a struct's first field is at offset 0 and access it by pointer casting.) The difference from the struct-of-tag-and-payload representation shows up in cases like this: #[repr(u8)]
enum MyEnum {
Word(u64),
Bytes([u8; 15]),
} If the whole payload is a union, it gets 8-aligned and the enum takes 24 bytes; if each variant is laid out separately, it takes 16 bytes. Yet this isn't an “enum optimization” in any of the ways that cause the problems identified in this RFC: the C/C++ interface is well-defined and fairly straightforward, and the discriminant's meaning is unaffected by the bits stored in the data fields. This is, incidentally, how rustc currently handles enums when no optimizations or special cases apply; note this comment, currently in /// General-case enums: for each case there is a struct, and they
/// all start with a field for the discriminant.
|
@jld That's an interesting distinction. Such a representation would always have an advantage on size, but might potentially prove more annoying to define and access from non-C languages. On balance, though, since Rust already uses that layout, using the same layout for the Can I suggest switching the RFC to that representation, observing that Rust uses that representation by default, and mentioning the "two-element struct" representation as an alternative? |
I agree with @jld here, though it may turn out that C++ compatibility concerns are the deciding factor. |
text/0000-really-tagged-unions.md
Outdated
|
||
The correct C definition is essentially the same, but with the `enum class` replaced with a plain integer of the appropriate type. | ||
|
||
In addition, it is defined for Rust programs to cast/reinterpret/transmute such an enum into the equivalent tag+union Rust definition above. Seperately manipulating the tag and payload is also defined. The tag and payload need only be in a consistent/initialized state when the value is matched on (which includes Dropping it). This means that the contents of a `#[repr(X)]` enum cannot be inspected by outer enums for storage optimizations -- `Option<MyEnum>` must be larger than `MyEnum`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that the contents of a
#[repr(X)]
enum cannot be inspected by outer enums for storage optimizations --Option<MyEnum>
must be larger thanMyEnum
.
When is that needed? This breaks the rule that by-value enums must always be "valid", and I don't see you using it (e..g. if dest
is initialized when the function ends, then an Option<MyEnum>
would still work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, having reflected on this more, I believe you're correct. I'd just like @bluss to verify that their original use-case only requires the payload to be opaque. And it would be fine for Option<Flag> to use the the Flag's tag to represent None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having reflected on this even more, here is an interesting case where this matters:
I am recursively doing this in-place deserialization process, and at some point I need to deserialize Option<MyEnum>
in place. To do so, I need to do:
*result = Some(mem::uninitialized());
if let Some(ref mut my_enum) = result {
deserialize_my_enum(my_enum);
}
That is, if we make MyEnum "super opaque" it means Option<MyEnum>
effectively becomes a pseudo-repr(X) type, and we can defer initialization of the payload... except that this runs afoul of my "must be valid when you match" rule. It's possible we could create more fine-grained rules to allow it, though.
Alternatively we can declare this to be unsupported, and force people who want this to make a #[repr(u8)] OpaqueOption<T> { Some(T), None }
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it seems entirely right, it's just the payload.
It's about crate nodrop and arrayvec's use of it to store an uninitialized and/or partly uninitialized payload.
Yes as I understand it's ok if the enclosing Option
uses the repr(u8)
enum's tag. @eddyb has in fact already implemented that in the big enum layout PR(!), and it works fine with nodrop, even if it broke a test I wrote. Because my test was buggy.
I was envisioning a ManuallyDrop with more guarantees or a MaybeUninitialized or stable fully generic unions (without Copy bound) would the way to bring the arrayvec and nodrop use case to sound ground. (I've been discussing ideas with a few others the past weeks (breadcrumby link).)
If the uninitialized-in-repr-enum use case gets blessed by way of this rfc I think it's great if it can do so on other merits. But crate nodrop provides a working proof of concept.
Also, the representation should be more of the form: type Tag = uX;
#[repr(C)]
union MyEnum {
tag: Tag,
variant_1: Variant1,
variant_2: Variant2
}
#[repr(C)]
struct Variant1 {
tag: Tag,
field1: Field1,
...
}
#[repr(C)]
struct Variant2 {
tag: Tag,
field1: Field1,
...
} Because e.g. #[repr(u8)]
pub enum X {
Y { a: u8, b: u32 },
Z,
}
fn main() {
assert_eq!(std::mem::size_of::<X>(), 8);
} where the enum is represented as
Where the first field of the enum is allowed to not have the same alignment as the enum. |
@jld @arielb1 @pythonesque That (union of variants each containing the tag and their payload) is indeed what Rust enums use, as an optimization, but it's not, AFAIK, typically used in C or C++, and harder to work with in those languages than "tag followed by union of payloads". I think that minimizing the padding through this method, just like field reordering, shouldn't apply to C-compatible types as I do not see the optimization benefits justifying the complexity. |
Would it be reasonable to allow syntax like |
@glaebhoerl I'm pretty sure that's how it already works (except for the union-of-(tag, payload) vs (tag, union-of-payloads)). |
text/0000-really-tagged-unions.md
Outdated
* Probably should be a field to avoid conflicts with user-defined methods | ||
* Might need `#[repr(pub(X))]` for API design reasons | ||
* Compiler-generated definitions for the Repr types | ||
* With inherent type aliases on the enum? (`MyEnum::Tag`, `MyEnum::Payload`, `MyEnum::PayloadA`, etc.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that these should probably be implemented separate from this RFC. I'd be inclined to instead allow the ecosystem to generate these declarations using a Custom Derive like #[derive(EnumRepr)]
which would define an my_enum_repr
(or similar) module with Tag
, Payload
, and structs for each payload.
Right now I don't think we have the tools to expose these in a nice way, especially if we consider a theoretical enum with a variant called Tag
or Payload
(this change would be breaking for those enums).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was willing to punt on these since serde and cbindgen could both generate these automatically if need be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds partly like #1450.
I'm worried that this has limited usefulness without the ability to define untagged unions with non- |
text/0000-really-tagged-unions.md
Outdated
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
There aren't really any drawbacks: we already decided to do this. This is just making it official. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saying "it has been decided already" is neither a good attitude towards the RFC process, nor is it a substitute for a drawback section!
Would people like me to decouple the proposal so |
@gankro That sounds reasonable, though it needs very careful documentation. And I don't think either one should inhibit optimization of |
Note that the Rust And I believe C doesn't actually care about accessing unions in weird ways, so it's also valid there. So the only reason to care about the |
Yeah, ergonomically having separate tag and actual value structs is more convenient. Otherwise you will have an extra wrapper struct for each union variant which already has a nested named struct type, and that |
I have significantly updated the contents of the RFC based on feedback:
In addition, the Drawbacks section has been properly filled out, and two Real "unresolved questions" have been added:
|
Just to reconfirm: this doesn't impose similar On one hand, such restrictions would help with "it's very bad" note in the parsing example, but on another, it might be not a sufficient reason to limit the usefulness of tagged enums (and it's a breaking change). |
This RFC only defines the layout. So you won't be able to write out a MyEnumRepr type for non-Copy payloads in Rust today, but when/if unions remove that restriction you will be able to. (and you can do whatever in C(++)) |
@gankro Right, what I wanted to reconfirm is that this RFC doesn't impose additional restriction on variant contents. And since it doesn't, that "it would be very bad if we panicked past this point" comment seems a bit excessive, since for |
This specific enum isn't the best example of it, but e.g. |
Ah yes, indeed, but then the problem is much more generic than this proposal. |
…nd 🐉🐲 (from servo:derive-all-the-things); r=emilio We rely on rust-lang/rfcs#2195 to make some enums share the same discriminants by definition. This fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1428285. Source-Repo: /~https://github.com/servo/servo Source-Revision: 9faf0cdce50379dfb8125d7a9c913babd56382e2 UltraBlame original commit: 8bd0a2507ccbb5fad4588e43bbca20eead87a2bb
…nd 🐉🐲 (from servo:derive-all-the-things); r=emilio We rely on rust-lang/rfcs#2195 to make some enums share the same discriminants by definition. This fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1428285. Source-Repo: /~https://github.com/servo/servo Source-Revision: 9faf0cdce50379dfb8125d7a9c913babd56382e2 UltraBlame original commit: 8bd0a2507ccbb5fad4588e43bbca20eead87a2bb
…nd 🐉🐲 (from servo:derive-all-the-things); r=emilio We rely on rust-lang/rfcs#2195 to make some enums share the same discriminants by definition. This fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1428285. Source-Repo: /~https://github.com/servo/servo Source-Revision: 9faf0cdce50379dfb8125d7a9c913babd56382e2 UltraBlame original commit: 8bd0a2507ccbb5fad4588e43bbca20eead87a2bb
Has this been implemented? There is no tracking issue listed. |
Long ago, yes. There was no issue, I guess (?), because it didn't need change in implementation. |
I see, thanks. |
The reference for
However, according to this RFC (which is linked in the nomicon section on Am I missing something? Like, did I make a mistake in my example or misread the RFC or was the behavior updated in a later RFC? Or does this behavior actually disagree with the RFC? |
Unfortunately, it is risky to treat RFC's as authoritative, because they are not always kept up to date as things evolve. I do not know what the intention here is. It sounds like @zstewar1 has found a discrepancy that, as far as I can tell, has not been explicitly discussed. The section of the reference linked by @zstewar1 was changed in rust-lang/reference#879 as part of updating the reference to reflect #2195, so the text written there has been reviewed by @Gankra ... In any case, I think the best thing would be to turn the example you have come up with as an explicit issue on /~https://github.com/rust-lang/rust, and tag it as T-lang, (and maybe also with unsafe-code-guidelines? I'm not sure who besides T-lang should own the definition of repr(C) for things like this (i.e. enums with payloads) |
Well, that's not quite true: the distinction under description certainly sounds a lot like the two models that were discussed by @jld @arielb1 @eddyb and @Gankra way up here:
I guess I should look at the RFC text itself to try to puzzle where the discussion actually landed. |
@zstewar1 isn't this text from the RFC consistent with what the reference says?
|
@zstewar1 and now that I've read and re-read the relevant sections a few times, I think that the problem here may be in how you read the RFC? In particular, the text here that you wrote:
From what I can tell, the "union containing structs whose first field is the enum tag" interpretation is only applied to |
… r=jieyouxu Update E0517 message to reflect RFC 2195. E0517 occurs when a `#[repr(..)]` attribute is placed on an unsupported item. Currently, the explanation of the error implies that `#[repr(u*/i*)]` cannot be placed on fieldful enums, which is no longer the case since [RFC 2195](rust-lang/rfcs#2195) was [stabilized](rust-lang#60553), which allows placing `#[repr(u*/i*)]` and/or `#[repr(C)]` on fieldful enums to produce a defined layout. This PR doesn't (currently) add a description of the semantics of placing `#[repr(u*/i*)]` on a fieldful enum to the error explanation, it just removes the claims/implications that it is not allowed.
… r=jieyouxu Update E0517 message to reflect RFC 2195. E0517 occurs when a `#[repr(..)]` attribute is placed on an unsupported item. Currently, the explanation of the error implies that `#[repr(u*/i*)]` cannot be placed on fieldful enums, which is no longer the case since [RFC 2195](rust-lang/rfcs#2195) was [stabilized](rust-lang#60553), which allows placing `#[repr(u*/i*)]` and/or `#[repr(C)]` on fieldful enums to produce a defined layout. This PR doesn't (currently) add a description of the semantics of placing `#[repr(u*/i*)]` on a fieldful enum to the error explanation, it just removes the claims/implications that it is not allowed.
… r=jieyouxu Update E0517 message to reflect RFC 2195. E0517 occurs when a `#[repr(..)]` attribute is placed on an unsupported item. Currently, the explanation of the error implies that `#[repr(u*/i*)]` cannot be placed on fieldful enums, which is no longer the case since [RFC 2195](rust-lang/rfcs#2195) was [stabilized](rust-lang#60553), which allows placing `#[repr(u*/i*)]` and/or `#[repr(C)]` on fieldful enums to produce a defined layout. This PR doesn't (currently) add a description of the semantics of placing `#[repr(u*/i*)]` on a fieldful enum to the error explanation, it just removes the claims/implications that it is not allowed.
… r=jieyouxu Update E0517 message to reflect RFC 2195. E0517 occurs when a `#[repr(..)]` attribute is placed on an unsupported item. Currently, the explanation of the error implies that `#[repr(u*/i*)]` cannot be placed on fieldful enums, which is no longer the case since [RFC 2195](rust-lang/rfcs#2195) was [stabilized](rust-lang#60553), which allows placing `#[repr(u*/i*)]` and/or `#[repr(C)]` on fieldful enums to produce a defined layout. This PR doesn't (currently) add a description of the semantics of placing `#[repr(u*/i*)]` on a fieldful enum to the error explanation, it just removes the claims/implications that it is not allowed.
Rollup merge of rust-lang#128795 - zachs18:e0517-update-for-rfc-2195, r=jieyouxu Update E0517 message to reflect RFC 2195. E0517 occurs when a `#[repr(..)]` attribute is placed on an unsupported item. Currently, the explanation of the error implies that `#[repr(u*/i*)]` cannot be placed on fieldful enums, which is no longer the case since [RFC 2195](rust-lang/rfcs#2195) was [stabilized](rust-lang#60553), which allows placing `#[repr(u*/i*)]` and/or `#[repr(C)]` on fieldful enums to produce a defined layout. This PR doesn't (currently) add a description of the semantics of placing `#[repr(u*/i*)]` on a fieldful enum to the error explanation, it just removes the claims/implications that it is not allowed.
Formally define the enum
#[repr(u32, i8, etc..)]
and#[repr(C)]
attributes to force a non-C-like enum to have defined layouts. This serves two purposes: allowing low-level Rust code to independently initialize the tag and payload, and allowing C(++) to safely manipulate these types.Rendered