Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC2603] Extend <const> to include str and structural constants. #3161

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

eddyb
Copy link
Member

@eddyb eddyb commented Aug 10, 2021

I'm not sure what the process for amending RFCs is, but here goes nothing:

This is one of the last pieces of the v0 mangling, namely arbitrary constant values (for the full const generics feature). This has been tracked by rust-lang/rust#61486, and there's some discussion there.

The main takeaway is that the mangling is structural (ADT-like tree with integer-like leaves), matching the structural equality that type-level constants are required to follow, for soundness reasons.

Accompanying implementation PRs:

The summary of the added forms is:

  • e: str, followed by bytes encoded as two hex nibbles per byte
  • R/Q: &/&mut, followed by the pointee value
  • A...E: [...], containing any number of element values
  • T...E: (...), containing any number of field values
  • V: named variant/struct, followed by the constructor path and one of:
    • U: unit variant/struct (e.g. None)
    • T...E: tuple variant/struct (e.g. Some(...)), containing any number of field values
    • S...E: struct-like variant/struct (e.g. Foo { ... }), containing any number of (disambiguated-)identifier-prefixed (i.e. named) field values

Even if there may be constants in the future not covered by these forms, we can rely on the nominal V form to encode all sorts of pseudo-paths (while waiting for demanglers to support dedicated manglings), such as these hypothetical examples:

  • const::<SomeType>::h54723863eb99e89f (hashed constant, masquerading as unit struct)
  • core::mem::transmute::<usize, *mut T>(1) (function call, masquerading as tuple struct)

cc @michaelwoerister

@programmerjake

This comment has been minimized.

@eddyb

This comment has been minimized.

@programmerjake

This comment has been minimized.

@eddyb eddyb added the T-compiler Relevant to the compiler team, which will review and decide on the RFC. label Aug 11, 2021
@eddyb
Copy link
Member Author

eddyb commented Aug 11, 2021

Based on the RFC having been a T-compiler one, let's do an FCP like that:

@rfcbot merge

@rfcbot
Copy link
Collaborator

rfcbot commented Aug 11, 2021

Team member @eddyb has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. labels Aug 11, 2021
<const-data> = ["n"] {<hex-digit>} "_"
<const-fields> = "U" // X
| "T" {<const>} "E" // X(a, b, c, ...)
| "S" {<identifier> <const>} "E" // X { field: value, ... }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why U is necessary, UnitStruct and UnitStruct {} are the same constant, so U can be S {} E, the only benefit is slight compression.

I'm also not sure why identifiers for field names are necessary, fields are uniquely identifiable by their indices, so "S" {<const>} "E" should work equally well?

In other words, "S" {<const>} "E" appears to be usable for any structs and variants.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also "just" flatten the constant and only put the leaf values in, we wouldn't need any names or tuple/array distinctions and nesting.

The only reason we tell apart any of these things is being able to recover useful syntax for the users, and I somewhat assumed that to be a given, I suppose.

I'm obviously biased towards the approach I implemented, but maybe you could register a concern with rfcbot, that could get discussed by compiler team? (though I'm not sure in what kind of meeting)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say we keep the U just in order to make it consistent: we already encode () as u where we could also encode it as an empty tuple TE.

@petrochenkov
Copy link
Contributor

@rfcbot concern field names are unnecessary

(I guess it's a trade off between symbol length and readability of demangled symbols.)

@rfcbot concern unit structs are unnecessary

@eddyb
Copy link
Member Author

eddyb commented Aug 11, 2021

Nominating since we're close to being able to turn v0 on by default, and if there's spare time in the meeting, it might be good to get more discussion going and more eyes on this.

@michaelwoerister
Copy link
Member

After last weeks discussion I think the main open question is what to do with field names in struct values.

As @petrochenkov points out, from the perspective of keeping symbol names unambiguous the field names are not necessary. They would only be needed for making demangler output a bit more readable. Without them you would get

mangled: _RINvCs123_5krate3fooKVNvCs123_5krate3BarSi1Re12345678_b1_E
demangled: krate::foo<{krate::Bar { 1, "quux", true }}>

instead of

mangled: _RINvCs123_5krate3fooKVNvCs123_5krate3BarS3leni14nameRe12345678_4flagb1_E
krate::foo<{krate::Bar { len: 1, name: "quux", flag: true }}>

As the grammar is now, we unfortunately can't just make the field name optional: both <identifier> and <const> can start with an s (if the identifier has a disambiguator and the const is an i16).

That being said, I don't think we can make a decision without having numbers on how much field names cost us in terms of symbol name length/compile times/binary sizes. So my suggestion is:

  1. We merge the compiler and rustc-demangle PRs in their current form, that is, with field names.
  2. We make the new mangling scheme the default in the compiler.
  3. We implement an experimental option for the compiler to emit zero-length field names, so that we can compare the impact field names have on symbol name length.
  4. We collect numbers that allow us to make a better-informed decision. At that point we should also have a clearer picture on the performance impact of the new mangling scheme in general.
  5. Depending on the decision we make, we update this PR to either require field names or not, and merge it.

All of the above would need to happen before const generics are fully stabilized. I don't think the new mangling scheme should be blocked on this one aspect that only comes to bear when using an unstable feature.

@eddyb
Copy link
Member Author

eddyb commented Aug 19, 2021

We implement an experimental option for the compiler to emit zero-length field names, so that we can compare the impact field names have on symbol name length.

FWIW there are two ways one could do that, while staying within the already-implemented syntax:

  1. use the T...E form of <const-fields>
    • demangles as Foo(...), which may be undesirable because it doesn't match the definition of Foo
  2. use the S...E form of <const-fields>, but with zero-length identifiers (mangled just a 0)
    • demangles as Foo {...} (and we can even make the demangler print /*...*/: in that case, if we want)
    • unlike the T...E form, however, it would take up one extra byte per field

If neither of these is satisfactory, we can always come up with a different letter (than U/T/S) to indicate a <const-fields> that has braces-but-unnamed fields, or even a letter other than V, for <const>, to avoid the one extra byte that is currently spent on the choice of <const-fields>.

@apiraino
Copy link

Discussed in T-compiler meetings, here and here

@rustbot label -I-nominated

@rustbot
Copy link
Collaborator

rustbot commented Aug 19, 2021

Error: This repository is not enabled to use triagebot.
Add a triagebot.toml in the root of the master branch to enable it.

Please let @rust-lang/release know if you're having trouble with this bot.

@Aaron1011
Copy link
Member

Aaron1011 commented Aug 23, 2021

I assume that there are no plans to support unions (since it's not clear what it would mean to encode one)?

@eddyb
Copy link
Member Author

eddyb commented Aug 24, 2021

I assume that there are no plans to support unions (since it's not clear what it would mean to encode one)?

Correct, though they are disallowed by const generics themselves.

In general, you would need a way to know whether X<{a: T}> and X<{b: T}> are the same type, to allow const _: T generics, i.e. a version of a == b that has all the necessary properties for typesystem soundness (and so floats aren't supported for this reason either).

For edge cases that may eventually be supported (such as single-case unions), they can simply be mangled identically to a struct.

bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 26, 2021
…woerister,oli-obk

rustc_symbol_mangling: support structural constants and &str in v0.

This PR should unblock rust-lang#85530 (except for float `const` generics, which AFAIK should've never worked).
(cc `@tmiasko` could the rust-lang#85530 (comment) failures be retried with a quick crater "subset" run of this PR + changing the default to `v0`? Just to make sure I didn't miss anything other than the floats)

The encoding is the one suggested before in e.g. rust-lang#61486 (comment), tho this PR won't by itself finish rust-lang#61486, before closing that we'd likely want to move to `@oli-obk's` "valtrees" (i.e. rust-lang#83234 and other associated work).

<hr>

**EDITs**:
1. switched unit/tuple/braced-with-named-fields `<const-fields>` prefixes from `"u"`/`"T"`/`""` to `"U"`/`"T"`/`"S"` to avoid the ambiguity reported by `@tmiasko` in rust-lang#87194 (comment).

2. `rustc-demangle` PR: rust-lang/rustc-demangle#55

3. RFC amendment PR: rust-lang/rfcs#3161
    * also removed the grammar changes included in that PR, from this description

4. added tests (temporarily using my fork of `rustc-demangle`)

<hr>

r? `@michaelwoerister`
@eddyb
Copy link
Member Author

eddyb commented Mar 2, 2022

Feedback from @Havvy elsewhere (#3224 (comment)): we should probably not be amending RFCs.

Even if we make an exception for "fixing the RFC to reflect reality as it shipped" (#3224 (comment)), we would still want to split this PR into two:

about half of it could be applied to the RFC itself in order to clarify the original grammar in more detail (we were too handwavey initially), with the rest being a proper extension

The first half would be roughly this (plus comments and <int-type>/<const-int> definitions):

-<const> = <type> <const-data>
+<const> = <int-type> <const-int>
+        | "b" <const-int>           // false, true
+        | "c" <const-int>           // '...'

Whereas the second half would be adding the new structural cases to <const> (and also "e" <const-str>).

@eddyb eddyb mentioned this pull request Jun 24, 2022
6 tasks
@apiraino
Copy link

apiraino commented May 3, 2023

Hello checking this RFC for current status: are the raised concerns (comment) still to be untangled, has there been any progress? Is there a path to get this RFC approved or do I read correctly that the general feeling is close it?

thanks @eddyb for an update (and sorry for my hand-waving comment, can't fully grasp the context here 😅 )

@eddyb
Copy link
Member Author

eddyb commented May 10, 2023

@apiraino This amendment is describing functionality that has been implemented for almost two years (but I believe is still nightly-only, simply because const generics with such types are still not stable).

I don't know if this should be the responsibility of people involved with mangling (@wesleywiser @michaelwoerister @ehuss, off the top of my head), or with const generics (@oli-obk @BoxyUwU @lcnr, maybe?).


Looking again at the comments above, #3161 (comment) seems to be describing an overall plan, and of its 5 steps, step 1. happened already, and step 2. is/will be happening soon (AIUI, anyway, I haven't kept up that well).


do I read correctly that the general feeling is close it?

At most the specifics of the current form, maybe - something or other will have to be stabilized eventually, to support const generics with such types.

@wesleywiser
Copy link
Member

I think @michaelwoerister's plan in #3161 (comment) makes a lot of sense. Reducing the length of symbols by not encoding the field names as @petrochenkov suggests is very valuable especially for space constrained environments. I think the best way to determine the impact is to implement the feature with field names and then perform measurements to see what the cost actually ends up being. We may even want to allow users to decide for themselves; some users might prefer the smaller symbols and others prefer the more detailed symbols in backtraces.

@petrochenkov would it resolve your concerns if we left whether to encode field names in the symbols as an open question to be resolved before we stabilize more complex types in generics?

@petrochenkov
Copy link
Contributor

@rfcbot resolve field names are unnecessary
@rfcbot resolve unit structs are unnecessary

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Sep 10, 2023
@rfcbot
Copy link
Collaborator

rfcbot commented Sep 10, 2023

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this RFC. to-announce and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Sep 20, 2023
@rfcbot
Copy link
Collaborator

rfcbot commented Sep 20, 2023

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@kennytm
Copy link
Member

kennytm commented Feb 6, 2024

Is this not merged because of the merge conflict?

@ehuss
Copy link
Contributor

ehuss commented Feb 6, 2024

Sorry, this kind of fell of my plate.

Process-wise, instead of amending RFCs, this should probably be done as a PR to update the documentation at /~https://github.com/rust-lang/rust/blob/master/src/doc/rustc/src/symbol-mangling/v0.md.

However, I think part of my hesitation was that I wanted to avoid documenting unstable features. IIUC, some of these consts aren't stabilized, right? I'm a little uncertain how to handle that. Is the intent that these encodings are stable before the features themselves are stabilized? If so, we can just go ahead and document them. If not, then we might want to consider a different approach of how to document them. I would suggest that the encodings just get added to the docs whenever the corresponding const features are stabilized.

I guess, part of my uncertainty was what was the intent of this RFC?

@michaelwoerister
Copy link
Member

michaelwoerister commented Feb 7, 2024

From a pure grammar point of view, I don't see a problem making it expressive enough to enable encoding things that aren't stable yet, and then also document that. The only thing we need to make sure is that what we add to the grammar will actually be what we need for the stable version of complex const generics. But the set being proposed here looks pretty straightforward.

@apiraino
Copy link

@ehuss @michaelwoerister is the status of this RFC still uncertain? What could help to push this RFC to the finish line? I'm trying to understand if there are blockers or other things it depend on. Thanks

@apiraino
Copy link

ping @ehuss @michaelwoerister about status for this RFC. Thanks :-)

@michaelwoerister
Copy link
Member

This depends on two things, as far as I can tell:

  • Do we think that the non-trivial const generic values (as listed in the original message) are stable enough to stabilize their encoding in the mangling scheme? This is something that @rust-lang/project-const-generics could answer best.
  • Decide whether we want to emit field names or not. In a comment above I suggested comparing symbol name lengths and binary sizes with and without field names, but the problem with that idea is that we don't have a good test case because there are no real-world codebases that use #![feature(adt_const_params)] to a significant extent.
    I propose to just add another alternative to <const-fields> that is a struct without field names (as suggested by @eddyb as one option here):
    <const-fields> = "U"                            // X
                   | "T" {<const>} "E"              // X(a, b, c, ...)
                   | "S" {<const>} "E"              // X { value, ... }
                   | "N" {<identifier> <const>} "E" // X { field: value, ... }
    
    That comes at no extra cost to symbol name length and at basically no cost to mangling/demangling complexity. The compiler can then decide to use one form or the other, and we can simply change that decision after adt_const_params has been stabilized and we have real-world data about the impact.

@BoxyUwU
Copy link
Member

BoxyUwU commented Jun 26, 2024

The main point of instability with adt_const_params is the handling of padding and addresses, all of which is caused by allowing & in const parameters' types. For everything other than references I think we have pretty much hit the point where nothing is likely to change and we'd have no problem with this encoding.

When references are involved there's data that we currently don't encode that we might want to start doing at some point, these are solely issues with allowing references in const generics and do not occur otherwise. In general see rust-lang/rust#120961 for an overview of what we're not currently representing in values for const generics.

Some of the stuff we might want to start representing in const generic values would work perfectly fine in this scheme as is (e.g. padding bytes and what static a borrow is a reference to). Other stuff I have no idea how we would even handle mangling (e.g. bytes with provenance and more generally representing borrows as having their pointee be arbitrary bytes).

The stuff that wouldn't work under this scheme I don't know how we could make it work under any scheme so I think that largely does not pose an issue for this RFC. My only real point would be that if we wind up forbidding references in const parameter types, the R/Q syntax would wind up effectively unused. I don't know if that's okay or not, if it isn't we may want to remove it from the RFC ^^

edit: sorry for any ghost pings, I originally had this message contain some other information but removed it

@michaelwoerister
Copy link
Member

Thanks for the info, @BoxyUwU!

  • I think it's fine to erase information from the symbol-mangling representation of a const value as long as that can't lead to ambiguities. E.g. we don't need to be able to reconstruct the padding of a value, as long as there can't be two instances foo<X> and foo<Y> that only differ in padding.
  • provenance information might be a similar case: we probably don't need to represent that in the symbol name.
  • in general, there is always the option to add an opaque disambiguator to distinguish between things that have no difference at the surface level. That could take care of padding.
  • We could have an "opaque value" alternative for <const-fields> that is just a sufficiently wide hash of the value.
  • statics in const_refs_to_static could be represented by their path. We'd have to see how that fits into the grammar. If this is likely to be stabilized then it would be good to handle the case now.

Overall though, the fewer opaque hash values the better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-compiler Relevant to the compiler team, which will review and decide on the RFC. to-announce
Projects
None yet
Development

Successfully merging this pull request may close these issues.