-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unions #1444
unions #1444
Conversation
|
||
A union may have trait implementations, using the same syntax as a struct. | ||
|
||
The compiler should warn if a union field has a type that implements the `Drop` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a warning or an error? I assume that the destructor of the field would not run when the union is dropped, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to make it an error, yes. However, Rust does not consider leaks or failing to run a destructor unsafe behavior, per the discussion that occurred around scoped threads. See the documentation of std::mem::forget
.
So, I assumed that people would object to making this an error. If not, then I can quite happily change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference would be to forbid Drop types for now. We can always change it to allow them later if there turn out to be compelling use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
It might be worth summarising some of the discussion from the internals thread - there is a lot of it and it's not easy to follow the threads of conversation. I strongly agree that we should support untagged unions in Rust. However, I think that Unions as enums works particularly nicely if are able to use variants as types (which may well be a long way off or may never happen, sadly). In that case only the downcast from the enum type to the variant type has to be unsafe (which it would be for any enum, I imagine) and then other use of the variant can be in safe code. In this case, the only difference for repr(union)/unsafe enums is that you can't match them. |
See the mention of I can certainly see the argument for that, given that Rust enums represent tagged unions. However, modeling untagged unions on enums produces some syntactic challenges, though. How do you access a field of a union? Enum normally only supports pattern-matching syntax; since the pattern-matching requires unsafe code, pulling out a field F would require something like this: I suspect such syntax would also drive people to include more code in the unsafe block than necessary. By contrast, field access syntax would simplify that to As discussed in the rust-internals thread and mentioned in the alternatives section of this RFC, you could potentially support struct field access syntax with An Writing to fields seems similarly more complicated with enums. As a minor additional nit, Rust warns by default for enum constructors that start with a lowercase character; many FFI interfaces would end up needing to disable those warnings. I think the case of defining an inline structure would work better with an RFC for anonymous struct and union types; I'd be quite happy to write such an RFC as well. Many FFI interfaces will want those anyway, for the common case of a struct containing an anonymous union. However, I don't think that should form part of this RFC; I would suggest a followup after resolving this one. In the meantime, it seems simple enough to define a struct (or tuple struct) and make that a field of the union. All that said, I could live with |
As far as I know, I've captured all the major threads of discussion (including alternatives raised and the reasons for them) in the alternatives section. If I've missed one, I can certainly add it. The largest discussion was between |
|
||
If a union contains multiple fields of different sizes, assigning to a field | ||
smaller than the entire union must not change the memory of the union outside | ||
that field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't the memory of the rest simply become undef?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, what happens here:
untagged_union X {
a: u8,
b: u16,
}
let mut x = X { b: 1 };
x.a = 1;
let y = x;
Does the compiler have to copy the unused part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying a union into some other variable must always copy the entire memory of the union, unless the compiler can prove that nothing reads from other fields of the destination, in which case it could potentially elide moving some data around.
For instance, if you pass y
to an FFI function, Rust can't know what parts of the union you intend to read, so it needs to copy the whole thing. On the other hand, if you pass y
to a Rust function, and rustc can see that the called function only reads y.a
, never y.b
, then rustc could potentially elide the copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying a union into some other variable must always copy the entire memory of the union
Why? Simply make accessing any variant but the one that was written to last undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would break many valid usages. For instance, consider a union of a common_header
struct and several structs that start with that header; writing to common_header
should not invalidate the rest of the data. Ditto for many other common patterns used with unions.
Note that factoring the common header out of the union does not solve the problem. For instance, you might have different types of common headers used for subsets of other fields. And in general, moving fields into or out of a union could require platform-specific understanding of size and alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have some examples of such code to see how unions must behave.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trivial example:
struct S {
header: COMMON_HEADER,
otherfields: SOME_TYPE,
}
untagged_union U {
header: COMMON_HEADER,
s: S,
// ...
}
Writing to u.header
(or fields of u.header
) should not invalidate u.s
and in particular u.s.otherfields
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean open source C code from well known projects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in my other reply, MSVC does support writing to one variant and reading from another, which means that writing to one variant does not invalidate the non-overlapping bytes of other variants. So regardless of what the C standard dictates, we'd have to support this case on Windows at the very least, and I'm sure other major C compilers behave similarly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mahkoh ACPICA includes a union with that exact pattern; see "union acpi_object" and ACPI_OBJECT_TYPE in /~https://github.com/acpica/acpica/blob/master/source/include/actypes.h .
I would rather have tuple structs with a |
I am desperate to have untagged unions in some form to make my life easier, and I support this RFC being accepted and implemented as soon as possible. |
Since |
@nagisa For FFI purposes, we need named fields. A tuple version of |
@joshtriplett that was more of a “instead of” comment. IMHO you should not expose these untagged unions to safe rust code anyway, so either way is fine, I guess, but I’m not sold by the named fields argument. Also, for additional point (assumingly, a negative one), untagged unions is something @graydon said he was happy to see rust 1.0 to ship without. All in all, my general opinion is that we should have some easier way to produce opaque untagged unions (essentially an opaque struct occupying as much space as necessary), perhaps implemented as a macro; but not the full blown way to do (and abuse) untagged unions. |
@nagisa Macro solutions are what I already use for unions, and they're a pain to write and use and ensure that they are correct. |
The @graydon post definitely suggests |
(The library I'm to interface with has essentially a tagged |
I skimmed the conversation but don't feel I can really summarize it. Still, here are some posts at least that I found interesting:
I've not yet read the final RFC to see where it ended up though! |
Microsoft uses unions in two ways.
|
This is covered by footnote 95.
|
@mahkoh Thanks for looking up the exact reference in the C standard! That's exactly the behavior I'd expect in Rust as well. |
As I was mentioned here, some opinions (caveat: still not a core-team member, just opinions):
|
@graydon I have no objection whatsoever to switching back to |
Didn't assume you did 😄 Just. Hrm. Sigh. Re-reading my words. I'm sorry, I need to figure out some way of participating in this community that doesn't come down like a ton of bricks / overstate my case all the time. I tried rewriting that 4 times and even still it's too pushy. Bleh. I realize my words carry more weight than they should. I haven't contributed a line of code in over 2 years now! |
@graydon: I didn't assume you assumed I did. :) I'm just trying to reiterate that I really don't care what color the declaration-syntax bikeshed is painted, just the syntax and semantics for usage. Thus far, that declaration syntax seems like the bit that has produced the largest arguments. I appreciated your comment, and I found it a compelling argument. The only thing making me hesitate to change the RFC on the basis of that argument is that I've also received one vehement complaint (via IRC) against any syntax that uses |
I don't think empty unions should act that way; I would suggest that
Agreed. |
Except
And |
Specifically.. InstantiatingAn n-element union can be instantiated in n different ways by specifying one of its n fields.
A zero-element union can be instantiated in zero different ways. It's statically impossible to create one. Reading/WritingAn n-element union can be read/written in n different ways by accessing one of its n fields.
A zero-element union can never be read/written. Pattern matchingThe RFC doesn't specify whether matching zero elements is allowed. All the examples show matching on a single element which means matching on a zero-element union is impossible. The unanswered questions section asks whether we should allow matching on a number of elements other than one. If so then RepresentationThe RFC leaves representation open but lets assume a union can be thought of a chunk of memory with a size and alignment equal to the max size and max alignment of its elements.
The max of the empty set is the identity of the max operation on two elements, ie. negative infinity. This is, conceptually, the size and alignment of uninhabited types. Possible statesThe number of possible states of a union is, at most, the sum of the number of possible states of it's elements. Of course some of these states may overlap, but this at least gives us an upper bound.
A zero-element union can only be in one of at most zero possible states. Therefore it is uninhabited. |
I wish there were a reaction icon to indicate "nice comment" without necessarily implying "I agree". |
#1444 (comment) is still strongly based on the "enum for which we don't know the discriminant" interpretation of union, it can be easily rewritten on the basis of "a struct with overlapping zero-offset fields" interpretation to "prove" that the opposite behavior is correct. For example, the first statement: "An n-element union can be instantiated in n different ways by specifying one of its n fields." is incorrect in the "struct" interpretation in which the union can be instantiated by providing "sufficient number" of fields, which is 0 for empty unions. |
So the two possible rules look like:
The former rule seems far more elegant. I can see the argument that the struct interpretation would work, but it seems clearly uglier. (Note that for the latter rule you cannot say something like "at most one field value", because for non-empty unions you must provide a field. Well, you can say "sufficient number" like @petrochenkov, but then you have to define what that means, which is what my parenthetical in the rule is doing.) |
@joshtriplett @canndrew Perhaps one of you could open an issue to discuss empty unions? I fear that discussion on an already merged PR will be ignored or lost (I only noticed my ping because I was clearing out an email folder). |
@nrc I don't feel strongly about it one way or another, and I don't have a use case for empty unions. However, I'd be happy to review an RFC or issue about this, or to discuss it further with anyone who does have a use case. |
I was looking at the The real problem here though is that unions are implemented using |
@canndrew That will never be true for unions though, because |
Until the definition of struct EmptyStruct {
x: !,
y: u32,
} Then this union would get mis-detected as empty: union NonEmptyUnion {
x: !,
y: u32,
} And yes, we could check the |
@canndrew DRY prevails, so checking the |
Btw the RFC says that the feature is |
True. It'd be nice to either fix rustc to use the feature name |
@joshtriplett |
Oh, and I changed the lint name to match conventions too, this is more important. |
That seems fine. I do think we should update the RFC with those two changes. Do you want to write that patch or should I? |
@joshtriplett |
Correct pull request URL in RFC #1444
RFC: native C-compatible unions via contextually recognized keyword
union
EDIT: After extensive discussion, and grammar experiments by @nikomatsakis to verify feasibility, this RFC and pull request now proposes recognizing
union
as a "contextual keyword", allowingunion
to introduce a union declaration while not breaking any existing code that usesunion
as an identifier.As discussed in the alternatives section, proposals for unions in Rust have extensively explored possible variations on declaration syntax, including longer keywords (
untagged_union
), built-in syntax macros (union!
), compound keywords (unsafe union
), pragmas (#[repr(union)] struct
), and combinations of existing keywords (unsafe enum
).Rendered
Discussion on rust-internals
(edited by @nrc to add old title)
(edited by @mbrubeck to link to final rendered version)