From 4a705f99e5cfd2c16260b060cafef8a3649b9313 Mon Sep 17 00:00:00 2001 From: Boxy Date: Mon, 27 May 2024 22:03:01 +0100 Subject: [PATCH] Rewrite the "representing types" section to be more comprehensive --- src/SUMMARY.md | 13 ++- src/bound-vars-and-params.md | 59 ---------- src/generic_arguments.md | 50 -------- src/generics.md | 144 ------------------------ src/ty-fold.md | 7 +- src/ty.md | 82 +------------- src/ty_module/binders.md | 52 +++++++++ src/ty_module/early_binder.md | 76 +++++++++++++ src/ty_module/generic_arguments.md | 126 +++++++++++++++++++++ src/ty_module/instantiating_binders.md | 142 +++++++++++++++++++++++ src/ty_module/param_ty_const_regions.md | 99 ++++++++++++++++ 11 files changed, 508 insertions(+), 342 deletions(-) delete mode 100644 src/bound-vars-and-params.md delete mode 100644 src/generic_arguments.md delete mode 100644 src/generics.md create mode 100644 src/ty_module/binders.md create mode 100644 src/ty_module/early_binder.md create mode 100644 src/ty_module/generic_arguments.md create mode 100644 src/ty_module/instantiating_binders.md create mode 100644 src/ty_module/param_ty_const_regions.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index a2d287e6bcb40..8d9f94653eda4 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -119,19 +119,20 @@ - [Early vs Late bound parameters](./early-late-bound-params/early-late-bound-summary.md) - [Implementation nuances of early/late bound parameters](./early-late-bound-params/early-late-bound-implementation-nuances.md) - [Interactions with turbofishing](./early-late-bound-params/turbofishing-and-early-late-bound.md) - - [`TypeFolder` and `TypeFoldable`](./ty-fold.md) - - [Generic arguments](./generic_arguments.md) +- [The `ty` module: representing types](./ty.md) + - [ADTs and Generic Arguments](./ty_module/generic_arguments.md) + - [Parameter types/consts/regions](./ty_module/param_ty_const_regions.md) + - [`EarlyBinder` and instantiating parameters](./ty_module/early_binder.md) + - [`Binder` and Higher ranked regions](./ty_module/binders.md) + - [Instantiating binders](./ty_module/instantiating_binders.md) - [Constants in the type system](./constants.md) - - [Bound vars and Parameters](./bound-vars-and-params.md) +- [`TypeFolder` and `TypeFoldable`](./ty-fold.md) - [Parameter Environments](./param_env/param_env_summary.md) - [What is it?](./param_env/param_env_what_is_it.md) - [How are `ParamEnv`'s constructed internally](./param_env/param_env_construction_internals.md) - [Which `ParamEnv` do I use?](./param_env/param_env_acquisition.md) - [Type inference](./type-inference.md) - [Trait solving](./traits/resolution.md) - - [Early and Late Bound Parameter Definitions](./early-late-bound-params/early-late-bound-summary.md) - - [Implementation nuances of early/late bound parameters](./early-late-bound-params/early-late-bound-implementation-nuances.md) - - [Interactions with turbofishing](./early-late-bound-params/turbofishing-and-early-late-bound.md) - [Higher-ranked trait bounds](./traits/hrtb.md) - [Caching subtleties](./traits/caching.md) - [Implied bounds](./traits/implied-bounds.md) diff --git a/src/bound-vars-and-params.md b/src/bound-vars-and-params.md deleted file mode 100644 index bb6b27c5ff1df..0000000000000 --- a/src/bound-vars-and-params.md +++ /dev/null @@ -1,59 +0,0 @@ -# Bound vars and parameters - -## Early-bound parameters - -Early-bound parameters in rustc are identified by an index, stored in the -[`ParamTy`] struct for types or the [`EarlyParamRegion`] struct for lifetimes. -The index counts from the outermost declaration in scope. This means that as you -add more binders inside, the index doesn't change. - -For example, - -```rust,ignore -trait Foo { - type Bar = (Self, T, U); -} -``` - -Here, the type `(Self, T, U)` would be `($0, $1, $2)`, where `$N` means a -[`ParamTy`] with the index of `N`. - -In rustc, the [`Generics`] structure carries this information. So the -[`Generics`] for `Bar` above would be just like for `U` and would indicate the -'parent' generics of `Foo`, which declares `Self` and `T`. You can read more -in [this chapter](./generics.md). - -[`ParamTy`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.ParamTy.html -[`EarlyParamRegion`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/region/struct.EarlyParamRegion.html -[`Generics`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Generics.html - -## Late-bound parameters - -Late-bound parameters in `rustc` are handled differently. We indicate their -presence by a [`Binder`] type. The [`Binder`] doesn't know how many variables -there are at that binding level. This can only be determined by walking the -type itself and collecting them. So a type like `for<'a, 'b> ('a, 'b)` would be -`for (^0.a, ^0.b)`. Here, we just write `for` because we don't know the names -of the things bound within. - -Moreover, a reference to a late-bound lifetime is written `^0.a`: - -- The `0` is the index; it identifies that this lifetime is bound in the - innermost binder (the `for`). -- The `a` is the "name"; late-bound lifetimes in rustc are identified by a - "name" -- the [`BoundRegionKind`] enum. This enum can contain a - [`DefId`][defid] or it might have various "anonymous" numbered names. The - latter arise from types like `fn(&u32, &u32)`, which are equivalent to - something like `for<'a, 'b> fn(&'a u32, &'b u32)`, but the names of those - lifetimes must be generated. - -This setup of not knowing the full set of variables at a binding level has some -advantages and some disadvantages. The disadvantage is that you must walk the -type to find out what is bound at the given level and so forth. The advantage -is primarily that, when constructing types from Rust syntax, if we encounter -anonymous regions like in `fn(&u32)`, we just create a fresh index and don't have -to update the binder. - -[`Binder`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Binder.html -[`BoundRegionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.BoundRegionKind.html -[defid]: ./hir.html#identifiers-in-the-hir diff --git a/src/generic_arguments.md b/src/generic_arguments.md deleted file mode 100644 index 6e09e86200551..0000000000000 --- a/src/generic_arguments.md +++ /dev/null @@ -1,50 +0,0 @@ -# Generic arguments -A `ty::GenericArg<'tcx>` represents some entity in the type system: a type -(`Ty<'tcx>`), lifetime (`ty::Region<'tcx>`) or constant (`ty::Const<'tcx>`). -`GenericArg` is used to perform instantiation of generic parameters to concrete -arguments, such as when calling a function with generic parameters explicitly -with type arguments. Instantiations are represented using the -[`GenericArgs` type](#genericargs) as described below. - -## `GenericArgs` -`ty::GenericArgs<'tcx>` is intuitively simply a slice of `GenericArg<'tcx>`s, -acting as an ordered list of generic parameters instantiated to -concrete arguments (such as types, lifetimes and consts). - -For example, given a `HashMap` with two type parameters, `K` and `V`, an -instantiation of the parameters, for example `HashMap`, would be -represented by `&'tcx [tcx.types.i32, tcx.types.u32]`. - -`GenericArgs` provides various convenience methods to instantiate generic arguments -given item definitions, which should generally be used rather than explicitly -instantiating such slices. - -## `GenericArg` -The actual `GenericArg` struct is optimised for space, storing the type, lifetime or -const as an interned pointer containing a tag identifying its kind (in the -lowest 2 bits). Unless you are working with the `GenericArgs` implementation -specifically, you should generally not have to deal with `GenericArg` and instead -make use of the safe [`GenericArgKind`](#genericargkind) abstraction. - -## `GenericArgKind` -As `GenericArg` itself is not type-safe, the `GenericArgKind` enum provides a more -convenient and safe interface for dealing with generic arguments. An -`GenericArgKind` can be converted to a raw `GenericArg` using `GenericArg::from()` -(or simply `.into()` when the context is clear). As mentioned earlier, instantiation -lists store raw `GenericArg`s, so before dealing with them, it is preferable to -convert them to `GenericArgKind`s first. This is done by calling the `.unpack()` -method. - -```rust,ignore -// An example of unpacking and packing a generic argument. -fn deal_with_generic_arg<'tcx>(generic_arg: GenericArg<'tcx>) -> GenericArg<'tcx> { - // Unpack a raw `GenericArg` to deal with it safely. - let new_generic_arg: GenericArgKind<'tcx> = match generic_arg.unpack() { - GenericArgKind::Type(ty) => { /* ... */ } - GenericArgKind::Lifetime(lt) => { /* ... */ } - GenericArgKind::Const(ct) => { /* ... */ } - }; - // Pack the `GenericArgKind` to store it in a generic args list. - new_generic_arg.into() -} -``` diff --git a/src/generics.md b/src/generics.md deleted file mode 100644 index fd5402d484c2f..0000000000000 --- a/src/generics.md +++ /dev/null @@ -1,144 +0,0 @@ -# Generics and GenericArgs - -Given a generic type `MyType`, we may want to swap out the generics `A, B, …` for some -other types (possibly other generics or concrete types). We do this a lot while doing type -inference, type checking, and trait solving. Conceptually, during these routines, we may find out -that one type is equal to another type and want to swap one out for the other and then swap that out -for another type and so on until we eventually get some concrete types (or an error). - -In rustc this is done using [GenericArgs]. -Conceptually, you can think of `GenericArgs` as a list of types that are to be substituted for - the generic type parameters of the ADT. - -`GenericArgs` is a type alias of `&'tcx List>` (see [`List` rustdocs][list]). -[`GenericArg`] is essentially a space-efficient wrapper around [`GenericArgKind`], which is an enum -indicating what kind of generic the type parameter is (type, lifetime, or const). -Thus, `GenericArgs` is conceptually like a `&'tcx [GenericArgKind<'tcx>]` slice (but it is -actually a `List`). - -[list]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.List.html -[`GenericArg`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.GenericArg.html -[`GenericArgKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.GenericArgKind.html -[GenericArgs]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.GenericArgs.html - -So why do we use this `List` type instead of making it really a slice? It has the length "inline", -so `&List` is only 32 bits. As a consequence, it cannot be "subsliced" (that only works if the -length is out of line). - -This also implies that you can check two `List`s for equality via `==` (which would be not be -possible for ordinary slices). This is precisely because they never represent a "sub-list", only the -complete `List`, which has been hashed and interned. - -So pulling it all together, let’s go back to our example above: - -```rust,ignore -struct MyStruct -``` - -- There would be an `AdtDef` (and corresponding `DefId`) for `MyStruct`. -- There would be a `TyKind::Param` (and corresponding `DefId`) for `T` (more later). -- There would be a `GenericArgs` containing the list `[GenericArgKind::Type(Ty(T))]` - - The `Ty(T)` here is my shorthand for entire other `ty::Ty` that has `TyKind::Param`, which we - mentioned in the previous point. -- This is one `TyKind::Adt` containing the `AdtDef` of `MyStruct` with the `GenericArgs` above. - -Finally, we will quickly mention the -[`Generics`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Generics.html) type. It -is used to give information about the type parameters of a type. - -### Unsubstituted Generics - -So above, recall that in our example the `MyStruct` struct had a generic type `T`. When we are (for -example) type checking functions that use `MyStruct`, we will need to be able to refer to this type -`T` without actually knowing what it is. In general, this is true inside all generic definitions: we -need to be able to work with unknown types. This is done via `TyKind::Param` (which we mentioned in -the example above). - -Each `TyKind::Param` contains two things: the name and the index. In general, the index fully -defines the parameter and is used by most of the code. The name is included for debug print-outs. -There are two reasons for this. First, the index is convenient, it allows you to include into the -list of generic arguments when substituting. Second, the index is more robust. For example, you -could in principle have two distinct type parameters that use the same name, e.g. `impl Foo { -fn bar() { .. } }`, although the rules against shadowing make this difficult (but those language -rules could change in the future). - -The index of the type parameter is an integer indicating its order in the list of the type -parameters. Moreover, we consider the list to include all of the type parameters from outer scopes. -Consider the following example: - -```rust,ignore -struct Foo { - // A would have index 0 - // B would have index 1 - - .. // some fields -} -impl Foo { - fn method() { - // inside here, X, Y and Z are all in scope - // X has index 0 - // Y has index 1 - // Z has index 2 - } -} -``` - -When we are working inside the generic definition, we will use `TyKind::Param` just like any other -`TyKind`; it is just a type after all. However, if we want to use the generic type somewhere, then -we will need to do substitutions. - -For example suppose that the `Foo` type from the previous example has a field that is a -`Vec`. Observe that `Vec` is also a generic type. We want to tell the compiler that the type -parameter of `Vec` should be replaced with the `A` type parameter of `Foo`. We do that with -substitutions: - -```rust,ignore -struct Foo { // Adt(Foo, &[Param(0), Param(1)]) - x: Vec, // Adt(Vec, &[Param(0)]) - .. -} - -fn bar(foo: Foo) { // Adt(Foo, &[u32, f32]) - let y = foo.x; // Vec => Vec -} -``` - -This example has a few different substitutions: - -- In the definition of `Foo`, in the type of the field `x`, we replace `Vec`'s type parameter with - `Param(0)`, the first parameter of `Foo`, so that the type of `x` is `Vec`. -- In the function `bar`, we specify that we want a `Foo`. This means that we will - substitute `Param(0)` and `Param(1)` with `u32` and `f32`. -- In the body of `bar`, we access `foo.x`, which has type `Vec`, but `Param(0)` has been - substituted for `u32`, so `foo.x` has type `Vec`. - -Let’s look a bit more closely at that last substitution to see why we use indexes. If we want to -find the type of `foo.x`, we can get generic type of `x`, which is `Vec`. Now we can take -the index `0` and use it to find the right type substitution: looking at `Foo`'s `GenericArgs`, -we have the list `[u32, f32]` , since we want to replace index `0`, we take the 0-th index of this -list, which is `u32`. Voila! - -You may have a couple of followup questions… - - **`type_of`** How do we get the "generic type of `x`"? You can get the type of pretty much anything - with the `tcx.type_of(def_id)` query. In this case, we would pass the `DefId` of the field `x`. - The `type_of` query always returns the definition with the generics that are in scope of the - definition. For example, `tcx.type_of(def_id_of_my_struct)` would return the “self-view” of - `MyStruct`: `Adt(Foo, &[Param(0), Param(1)])`. - -How do we actually do the substitutions? There is a function for that too! You -use [`instantiate`] to replace a `GenericArgs` with another list of types. - -[Here is an example of actually using `instantiate` in the compiler][instantiatex]. -The exact details are not too important, but in this piece of code, we happen to be -converting from the `rustc_hir::Ty` to a real `ty::Ty`. You can see that we first get some args -(`args`). Then we call `type_of` to get a type and call `ty.instantiate(tcx, args)` to get a new -version of `ty` with the args made. - -[`instantiate`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/generic_args/struct.EarlyBinder.html#method.instantiate -[instantiatex]: /~https://github.com/rust-lang/rust/blob/8a562f9671e36cf29c9c794c2646bcf252d55535/compiler/rustc_hir_analysis/src/astconv/mod.rs#L905-L927 - -**Note on indices:** It is possible for the indices in `Param` to not match with what we expect. For -example, the index could be out of bounds or it could be the index of a lifetime when we were -expecting a type. These sorts of errors would be caught earlier in the compiler when translating -from a `rustc_hir::Ty` to a `ty::Ty`. If they occur later, that is a compiler bug. diff --git a/src/ty-fold.md b/src/ty-fold.md index a5f82cf78dbd4..e8b61abc42e00 100644 --- a/src/ty-fold.md +++ b/src/ty-fold.md @@ -1,9 +1,8 @@ # `TypeFoldable` and `TypeFolder` -How is this `subst` query actually implemented? As you can imagine, we might want to do -substitutions on a lot of different things. For example, we might want to do a substitution directly -on a type like we did with `Vec` above. But we might also have a more complex type with other types -nested inside that also need substitutions. +In the previous chapter we discussed instantiating binders. This must involves looking at everything inside of a `Early/Binder` +to find any usages of the bound vars in order to replace them. Binders can wrap an arbitrary rust type `T` not just a `Ty` so +how do we implement the `instantiate` methods on the `Early/Binder` types. The answer is a couple of traits: [`TypeFoldable`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/fold/trait.TypeFoldable.html) diff --git a/src/ty.md b/src/ty.md index 547b2d25667c8..a90abc359e949 100644 --- a/src/ty.md +++ b/src/ty.md @@ -10,7 +10,8 @@ The `ty` module defines how the Rust compiler represents types internally. It al When we talk about how rustc represents types, we usually refer to a type called `Ty` . There are quite a few modules and types for `Ty` in the compiler ([Ty documentation][ty]). -[ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/index.html +[ty]: https://doc.rust-lang.org/nightly/nightly-rustc/ru +]stc_middle/ty/index.html The specific `Ty` we are referring to is [`rustc_middle::ty::Ty`][ty_ty] (and not [`rustc_hir::Ty`][hir_ty]). The distinction is important, so we will discuss it first before going @@ -125,7 +126,7 @@ You can ignore `Interned` in general; you will basically never access it explici We always hide them within `Ty` and skip over it via `Deref` impls or methods. `TyKind` is a big enum with variants to represent many different Rust types -(e.g. primitives, references, abstract data types, generics, lifetimes, etc). +(e.g. primitives, references, algebraic data types, generics, lifetimes, etc). `WithCachedTypeInfo` has a few cached values like `flags` and `outer_exclusive_binder`. They are convenient hacks for efficiency and summarize information about the type that we may want to know, but they don’t come into the picture as much here. Finally, [`Interned`](./memory.md) allows @@ -276,54 +277,6 @@ In particular, since they are so common, the `Ty` and `TyCtxt` types are importe types are often referenced with an explicit `ty::` prefix (e.g. `ty::TraitRef<'tcx>`). But some modules choose to import a larger or smaller set of names explicitly. -## ADTs Representation - -Let's consider the example of a type like `MyStruct`, where `MyStruct` is defined like so: - -```rust,ignore -struct MyStruct { x: u8, y: T } -``` - -The type `MyStruct` would be an instance of `TyKind::Adt`: - -```rust,ignore -Adt(&'tcx AdtDef, GenericArgs<'tcx>) -// ------------ --------------- -// (1) (2) -// -// (1) represents the `MyStruct` part -// (2) represents the ``, or "substitutions" / generic arguments -``` - -There are two parts: - -- The [`AdtDef`][adtdef] references the struct/enum/union but without the values for its type - parameters. In our example, this is the `MyStruct` part *without* the argument `u32`. - (Note that in the HIR, structs, enums and unions are represented differently, but in `ty::Ty`, - they are all represented using `TyKind::Adt`.) -- The [`GenericArgs`][GenericArgs] is an interned list of values that are to be substituted -for the generic parameters. In our example of `MyStruct`, we would end up with a list like -`[u32]`. We’ll dig more into generics and substitutions in a little bit. - -[adtdef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.AdtDef.html -[GenericArgs]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.GenericArgs.html - -**`AdtDef` and `DefId`** - -For every type defined in the source code, there is a unique `DefId` (see [this -chapter](hir.md#identifiers-in-the-hir)). This includes ADTs and generics. In the `MyStruct` -definition we gave above, there are two `DefId`s: one for `MyStruct` and one for `T`. Notice that -the code above does not generate a new `DefId` for `u32` because it is not defined in that code (it -is only referenced). - -`AdtDef` is more or less a wrapper around `DefId` with lots of useful helper methods. There is -essentially a one-to-one relationship between `AdtDef` and `DefId`. You can get the `AdtDef` for a -`DefId` with the [`tcx.adt_def(def_id)` query][adtdefq]. `AdtDef`s are all interned, as shown -by the `'tcx` lifetime. - -[adtdefq]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.adt_def - - ## Type errors There is a `TyKind::Error` that is produced when the user makes a type error. The idea is that @@ -362,32 +315,3 @@ a redundant delayed bug. [terr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html#method.new_error [terrmsg]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html#method.new_error_with_message - -## Question: Why not substitute “inside” the `AdtDef`? - -Recall that we represent a generic struct with `(AdtDef, args)`. So why bother with this scheme? - -Well, the alternate way we could have chosen to represent types would be to always create a new, -fully-substituted form of the `AdtDef` where all the types are already substituted. This seems like -less of a hassle. However, the `(AdtDef, args)` scheme has some advantages over this. - -First, `(AdtDef, args)` scheme has an efficiency win: - -```rust,ignore -struct MyStruct { - ... 100s of fields ... -} - -// Want to do: MyStruct ==> MyStruct -``` - -in an example like this, we can subst from `MyStruct` to `MyStruct` (and so on) very cheaply, -by just replacing the one reference to `A` with `B`. But if we eagerly substituted all the fields, -that could be a lot more work because we might have to go through all of the fields in the `AdtDef` -and update all of their types. - -A bit more deeply, this corresponds to structs in Rust being [*nominal* types][nominal] — which -means that they are defined by their *name* (and that their contents are then indexed from the -definition of that name, and not carried along “within” the type itself). - -[nominal]: https://en.wikipedia.org/wiki/Nominal_type_system diff --git a/src/ty_module/binders.md b/src/ty_module/binders.md new file mode 100644 index 0000000000000..2404d57ea2890 --- /dev/null +++ b/src/ty_module/binders.md @@ -0,0 +1,52 @@ + +# `Binder` and Higher ranked regions + +Sometimes we define generic parmeters not on an item but as part of a type or a where clauses. As an example the type `for<'a> fn(&'a u32)` or the where clause `for<'a> T: Trait<'a>` both introduce a generic lifetime parameter named `'a`. Currently there is no stable syntax for `for` or `for` but on nightly the `non_lifetime_binders` feature can be used to write where clauses (but not types) using `for`/`for`. + +The `for` is referred to as a "binder" because it brings new names into scope. In rustc we use the `Binder` type to track where these parameters are introduced and what the parameters are (i.e. how many and whether they the parameter is a type/const/region). A type such as `for<'a> fn(&'a u32)` would be +represented in rustc as: +``` +Binder( + fn(&RegionKind::Bound(DebruijnIndex(0), BoundVar(0)) u32) -> (), + &[BoundVariableKind::Region(...)], +) +``` + +Usages of these parameters is represented by the `RegionKind::Bound` (or `TyKind::Bound`/`ConstKind::Bound` variants). These bound regions/types/consts are composed of two main pieces of data: +- A [DebruijnIndex](../appendix/background.md#what-is-a-de-bruijn-index) to specify which binder we are referring to. +- A [`BoundVar`] which specifies which of the parameters the `Binder` introduces we are referring to. +- We also sometimes store some extra information for diagnostics reasons via the [`BoundTyKind`]/[`BoundRegionKind`] but this is not important for type equality or more generally the semantics of `Ty`. (omitted from the above example) + +In debug output (and also informally when talking to eachother) we tend to write these bound variables in the format of `^DebruijnIndex_BoundVar`. The above example would instead be written as `Binder(fn(&'^0_0), &[BoundVariableKind::Region])`. Sometimes when the `DebruijnIndex` is `0` we just omit it and would write `^0`. + +Another concrete example, this time a mixture of `for<'a>` in a where clause and a type: +``` +where + for<'a> Foo fn(&'a &'b T)>: Trait, +``` +This would be represented as +``` +Binder( + Foo: Trait, + [BoundVariableKind::Region(...)] +) +``` + +Note how the `'^1_0` refers to the `'a` parameter. We use a `DebruijnIndex` of `1` to refer to the binder one level up from the innermost one, and a var of `0` to refer to the first parameter bound which is `'a`. We also use `'^0` to refer to the `'b` parameter, the `DebruijnIndex` is `0` (referring to the innermost binder) so we omit it, leaving only the boundvar of `0` referring to the first parameter bound which is `'b`. + +We did not always explicitly track the set of bound vars introduced by each `Binder`, this caused a number of bugs (read: ICEs [#81193](/~https://github.com/rust-lang/rust/issues/81193), [#79949](/~https://github.com/rust-lang/rust/issues/79949), [#83017](/~https://github.com/rust-lang/rust/issues/83017)). By tracking these explicitly we can assert when constructing higher ranked where clauses/types that there are no escaping bound variables or variables from a different binder. See the following example of an invalid type inside of a binder: +``` +Binder( + fn(&'^1_0 &'^1 T/#0), + &[BoundVariarbleKind::Region(...)], +) +``` +This would cause all kinds of issues as the region `'^1_0` refers to a binder at a higher level than the outtermost binder i.e. it is an escaping bound var. The `'^1` region (also writeable as `'^0_1`) is also ill formed as the binder it refers to does not introduce a second parameter. Modern day rustc will ICE when constructing this binder due to both of those regions, in the past we would have simply allowed this to work and then ran into issues in other parts of the codebase. + +[`Binder`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Binder.html +[`BoundVar]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.BoundVar.html +[`BoundRegionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.BoundRegionKind.html +[`BoundTyKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.BoundTyKind.html diff --git a/src/ty_module/early_binder.md b/src/ty_module/early_binder.md new file mode 100644 index 0000000000000..1d7a5503b557b --- /dev/null +++ b/src/ty_module/early_binder.md @@ -0,0 +1,76 @@ +# `EarlyBinder` and instantiating parameters + +Given an item that introduces a generic parameter `T`, whenever we refer to types inside of `foo` (i.e. the return type or argument types) from outside of `foo` we must take care to handle the generic parameters defined on `foo`. As an example: + +```rust,ignore +fn foo(a: T, _b: U) -> T { a } + +fn main() { + let c = foo::(1, 2); +} +``` + +When type checking `main` we cannot just naively look at the return type of `foo` and assign the type `T` to the variable `a`, after all the function `main` does not define any generic parameters, `T` is completely meaningless in this context. More generally whenever an item introduces (binds) generic parameters, when accessing types inside the item from outside, the generic parameters must be instantiated with values from the outer item. + +In rustc we track this via the [`EarlyBinder`] type, the return type of `foo` is represented as an `EarlyBinder` with the only way to acess `Ty` being to provide arguments for any generic parameters `Ty` might be using. This is implemented via the [`EarlyBinder::instantiate`] method which discharges the binder returning the inner value with all the generic parameters replaced by the provided arguments. + +To go back to our example, when type checking `main` the return type of `foo` would be represented as `EarlyBinder(T/#0)`. Then, because we called the function with `i32, u128` for the generic arguments, we would call `EarlyBinder::instantiate` on the return type with `[i32, u128]` for the args. This would result in an instantiated return type of `i32` that we can use as the type of the local `c`. + +Here are some more examples: + +```rust,ignore +fn foo() -> Vec<(u32, T)> { Vec::new() } +fn bar() { + // the return type of `foo` before instantiating it would be: + // `EarlyBinder(Adt(Vec, &[Tup(&[u32, T/#=0])]))` + // we then instantiate the binder with `[u64]` resulting in the type: + // `Adt(Vec, &[Tup(&[u32, u64])])` + let a = foo::(); +} +``` + +```rust,ignore +struct Foo { + x: Vec, + .. +} + +fn bar(foo: Foo) { + // the type of `foo`'s `x` field before instantiating it would be: + // `EarlyBinder(Vec)` + // we then instantiate the binder with `[u32, f32]` as those are the + // generic arguments to the `Foo` struct. This results in a type of: + // `Vec` + let y = foo.x; +} +``` + +In the compiler the `instantiate` call for this is done in [`FieldDef::ty`] ([src][field_def_ty_src]), at some point during type checking `bar` we will wind up calling `FieldDef::ty(x, &[u32, f32])` in order to obtain the type of `foo.x`. + +**Note on indices:** It is possible for the indices in `Param` to not match with what the `EarlyBinder` binds. For +example, the index could be out of bounds or it could be the index of a lifetime when we were expecting a type. +These sorts of errors would be caught earlier in the compiler when translating from a `rustc_hir::Ty` to a `ty::Ty`. +If they occur later, that is a compiler bug. + +[`FieldDef::ty`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.FieldDef.html#method.ty +[field_def_ty_src]: /~https://github.com/rust-lang/rust/blob/44d679b9021f03a79133021b94e6d23e9b55b3ab/compiler/rustc_middle/src/ty/mod.rs#L1421-L1426 +[`EarlyBinder`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.EarlyBinder.html +[`EarlyBinder::instantiate`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.EarlyBinder.html#method.instantiate + +--- + +As mentioned previously when _outside_ of an item, it is important to instantiate the `EarlyBinder` with generic arguments before accessing the value inside, but the setup for when we are conceptually inside of the binder already is a bit different. + +For example: +```rust +impl Trait for Vec { + fn foo(&self, b: Self) {} +} +``` + +When constructing a `Ty` to represent the `b` parameter's type we need to get the type of `Self` on the impl that we are inside. This can be acquired by calling the [`type_of`] query with the `impl`'s `DefId`, however, this will return a `EarlyBinder` as the impl block binds generic parameters that may have to be discharged if we are outside of the impl. + +The `EarlyBinder` type provides an [`instantiate_identity`] function for discharging the binder when you are "already inside of it". Conceptually this discharges the binder by instantiating it with placeholders in the root universe (we will talk about what this means in the next few chapters). In practice though it simply returns the inner value with no modification taking place. + +[`type_of`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/context/struct.TyCtxt.html#method.type_of +[`instantiate_identity`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.EarlyBinder.html#method.instantiate_identity diff --git a/src/ty_module/generic_arguments.md b/src/ty_module/generic_arguments.md new file mode 100644 index 0000000000000..c3f8cbf48cfe4 --- /dev/null +++ b/src/ty_module/generic_arguments.md @@ -0,0 +1,126 @@ +# ADTs and Generic Arguments + +## ADTs Representation + +Let's consider the example of a type like `MyStruct`, where `MyStruct` is defined like so: + +```rust,ignore +struct MyStruct { x: u8, y: T } +``` + +The type `MyStruct` would be an instance of `TyKind::Adt`: + +```rust,ignore +Adt(&'tcx AdtDef, GenericArgs<'tcx>) +// ------------ --------------- +// (1) (2) +// +// (1) represents the `MyStruct` part +// (2) represents the ``, or "substitutions" / generic arguments +``` + +There are two parts: + +- The [`AdtDef`][adtdef] references the struct/enum/union but without the values for its type + parameters. In our example, this is the `MyStruct` part *without* the argument `u32`. + (Note that in the HIR, structs, enums and unions are represented differently, but in `ty::Ty`, + they are all represented using `TyKind::Adt`.) +- The [`GenericArgs`][GenericArgs] is a list of values that are to be substituted +for the generic parameters. In our example of `MyStruct`, we would end up with a list like +`[u32]`. We’ll dig more into generics and substitutions in a little bit. + +[adtdef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.AdtDef.html +[GenericArgs]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.GenericArgs.html + +**`AdtDef` and `DefId`** + +For every type defined in the source code, there is a unique `DefId` (see [this +chapter](hir.md#identifiers-in-the-hir)). This includes ADTs and generics. In the `MyStruct` +definition we gave above, there are two `DefId`s: one for `MyStruct` and one for `T`. Notice that +the code above does not generate a new `DefId` for `u32` because it is not defined in that code (it +is only referenced). + +`AdtDef` is more or less a wrapper around `DefId` with lots of useful helper methods. There is +essentially a one-to-one relationship between `AdtDef` and `DefId`. You can get the `AdtDef` for a +`DefId` with the [`tcx.adt_def(def_id)` query][adtdefq]. `AdtDef`s are all interned, as shown +by the `'tcx` lifetime. + +[adtdefq]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.adt_def + +## Question: Why not substitute “inside” the `AdtDef`? + +Recall that we represent a generic struct with `(AdtDef, args)`. So why bother with this scheme? + +Well, the alternate way we could have chosen to represent types would be to always create a new, +fully-substituted form of the `AdtDef` where all the types are already substituted. This seems like +less of a hassle. However, the `(AdtDef, args)` scheme has some advantages over this. + +First, `(AdtDef, args)` scheme has an efficiency win: + +```rust,ignore +struct MyStruct { + ... 100s of fields ... +} + +// Want to do: MyStruct ==> MyStruct +``` + +in an example like this, we can subst from `MyStruct` to `MyStruct` (and so on) very cheaply, +by just replacing the one reference to `A` with `B`. But if we eagerly substituted all the fields, +that could be a lot more work because we might have to go through all of the fields in the `AdtDef` +and update all of their types. + +A bit more deeply, this corresponds to structs in Rust being [*nominal* types][nominal] — which +means that they are defined by their *name* (and that their contents are then indexed from the +definition of that name, and not carried along “within” the type itself). + +[nominal]: https://en.wikipedia.org/wiki/Nominal_type_system + + +## The `GenericArgs` type + +Given a generic type `MyType`, we have to store the list of generic arguments for `MyType`. + +In rustc this is done using [GenericArgs]. `GenericArgs` is a thin pointer to a slice of [`GenericArg`] representing a list of generic arguments for a generic item. For example, given a `struct HashMap` with two type parameters, `K` and `V`, the `GenericArgs` used to represent the type `HashMap` would be represented by `&'tcx [tcx.types.i32, tcx.types.u32]`. + +`GenericArg` is conceptually an `enum` with three variants, one for type arguments, one for const arguments and one for lifetime arguments. +In practice that is actually represented by [`GenericArgKind`] and [`GenericArg`] is a more space efficient version that has a method to +turn it into a `GenericArgKind`. + +The actual `GenericArg` struct stores the type, lifetime or const as an interned pointer with the discriminant stored in the lower 2 bits. +Unless you are working with the `GenericArgs` implementation specifically, you should generally not have to deal with `GenericArg` and instead +make use of the safe [`GenericArgKind`](#genericargkind) abstraction obtainable via the `GenericArg::unpack()` method. + +In some cases you may have to construct a `GenericArg`, this can be done via `Ty/Const/Region::into()` or `GenericArgKind::pack`. + +```rust,ignore +// An example of unpacking and packing a generic argument. +fn deal_with_generic_arg<'tcx>(generic_arg: GenericArg<'tcx>) -> GenericArg<'tcx> { + // Unpack a raw `GenericArg` to deal with it safely. + let new_generic_arg: GenericArgKind<'tcx> = match generic_arg.unpack() { + GenericArgKind::Type(ty) => { /* ... */ } + GenericArgKind::Lifetime(lt) => { /* ... */ } + GenericArgKind::Const(ct) => { /* ... */ } + }; + // Pack the `GenericArgKind` to store it in a generic args list. + new_generic_arg.pack() +} +``` + +[list]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.List.html +[`GenericArg`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.GenericArg.html +[`GenericArgKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.GenericArgKind.html +[GenericArgs]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.GenericArgs.html + +So pulling it all together: + +```rust,ignore +struct MyStruct(T); +type Foo = MyStruct +``` + +For the `MyStruct` written in the `Foo` type alias, we would represent it in the following way: + +- There would be an `AdtDef` (and corresponding `DefId`) for `MyStruct`. +- There would be a `GenericArgs` containing the list `[GenericArgKind::Type(Ty(u32))]` +- This is one `TyKind::Adt` containing the `AdtDef` of `MyStruct` with the `GenericArgs` above. diff --git a/src/ty_module/instantiating_binders.md b/src/ty_module/instantiating_binders.md new file mode 100644 index 0000000000000..ca0921f1233ab --- /dev/null +++ b/src/ty_module/instantiating_binders.md @@ -0,0 +1,142 @@ +# Instantiating `Binder`s + +Much like [`EarlyBinder`], when accessing the inside of a [`Binder`] we must first discharge it by replacing the bound vars with some other value. This is for much the same reason as with `EarlyBinder`, types referencing parameters introduced by the `Binder` do not make any sense outside of that binder, for example: +```rust,ignore +fn foo<'a>(a: &'a u32) -> &'a u32 { + a +} +fn bar(a: fn(&u32) -> T) -> T { + a(&10) +} + +fn main() { + let higher_ranked_fn_ptr = foo as for<'a> fn(&'a u32) -> &'a u32; + let references_bound_vars = bar(higher_ranked_fn_ptr); +} +``` +In this example we are providing an argument of type `for<'a> fn(&'^0 u32) -> &'^0 u32` to `bar`, we do not want to allow `T` to be inferred to the type `&'^0 u32` as it would be rather nonsensical (and likely unsound if we did not happen to ICE, `main` has no idea what `'a` is so how would the borrow checker handle a borrow with lifetime `'a`). + +Unlike `EarlyBinder` we do not instantiate `Binder` with some concrete set of arguments from the user, i.e. `['b, 'static]` as arguments to a `for<'a1, 'a2> fn(&'a1 u32, &'a2 u32)`. Instead we always instantiate the binder with inference variables of placeholders. + +## Instantiating with inference variables + +We instantiate binders with inference variables when we are trying to infer a possible instantiation of the binder, i.e. calling higher ranked function pointers or attempting to use a higher ranked where clause to prove some bound (non exhaustive list). For example, given the `higher_ranked_fn_ptr` from the example above, if we were to call it with `&10_u32` we would: +- Instantaite the binder with infer vars yielding a signature of `fn(&'?0 u32) -> &'?0 u32)` +- Equate the type of the provided argument `&10_u32` (&'static u32) with the type in the signature, `&'?0 u32`, inferring `'?0 = 'static` +- The provided arguments were correct as we were successfully able to unify the types of the provided arguments with the types of the arguments in fn ptr signature + +As another example of instantiating with infer vars, given some `where for<'a> T: Trait<'a>`, if we were attempting to prove that `T: Trait<'static>` holds we would: +- Instantiate the binder with infer vars yielding a where clause of `T: Trait<'?0>` +- Equate the goal of `T: Trait<'static>` with the instantiated where clause, inferring `'?0 = 'static` +- The goal holds because we were successfully able to unify `T: Trait<'static>` with `T: Trait<'?0>` + +Instantiating binders with inference variables can be accomplished by using the [`instantiate_binder_with_fresh_vars`] method on [`InferCtxt`]. Binders should be instantiated with infer vars when we only care about one specific instantiation of the binder, if instead we wish to reason about all possible instantiations of the binder then placeholders should be used instead. + +## Instantiating with placeholders + +Placeholders are very similar to `Ty/ConstKind::Param`/`ReEarlyParam`, they represent some unknown type that is only equal to itself. `Ty`/`Const` and `Region` all have a `Placeholder` variant that is comprised of a `Universe` and a `BoundVar`. + +The `Universe` tracks which binder the placeholder originated from, and the `BoundVar` tracks which parameter on said binder that this placeholder corresponds to. Equality of placeholders is determined solely by whether the universes are equal and the `BoundVar`s are equal. See the [chapter on Placeholders and Universes][ch_placeholders_universes] for more information. + +When talking with other rustc devs or seeing `Debug` formatted `Ty`/`Const`/`Region`s, `Placeholder` will often be written as `'!UNIVERSE_IDX`. For example given some type `for<'a> fn(&'a u32, for<'b> fn(&'b &'a u32))`, after instantiating both binders (assuming the `Universe` in the current `InferCtxt` was `U0` beforehand), the type of `&'b &'a u32` would be represented as `&'!2_0 &!1_0 u32`. + +When the universe of the placeholder is `0`, it will be entirely omitted from the debug output, i.e. `!0_2` would be printed as `!2`. This rarely happens in practice though as we increase the universe in the `InferCtxt` when instantiating a binder with placeholders so usually the lowest universe placeholders encounterable are ones in `U1`. + +`Binder`s can be instantiated with placeholders via the [`enter_forall`] method on `InferCtxt`. It should be used whenever the compiler should care about any possible instantiation of the binder instead of one concrete instantiation. + +Note: in the original example of this chapter it was mentioned that we should not infer a local variable to have type `&'^0 u32`. This code is prevented from compiling via universes (as explained in the linked chapter) + +### Why have both `RePlaceholder` and `ReBound`? + +You may be wondering why we have both of these variants, afterall the data stored in `Placeholder` is effectively equivalent to that of `ReBound`: something to track which binder, and an index to track which parameter the `Binder` introduced. + +The main reason for this is that `Bound` is a more syntactic representation of bound variables wheras `Placeholder` is a more semantic representation. As a concrete example: +```rust +impl<'a> Other<'a> for &'a u32 { } + +impl Trait for T +where + for<'a> T: Other<'a>, +{ ... } + +impl Bar for T +where + for<'a> &'a T: Trait +{ ... } +``` + +Given these trait implementations `u32: Bar` should _not_ hold. `&'a u32` only implements `Other<'a>` when the lifetime of the borrow and the lifetime on the trait are equal. However if we only used `ReBound` and did not have placeholders it may be easy to accidentally believe that trait bound does hold. To explain this let's walk through an example of trying to prove `u32: Bar` in a world where rustc did not have placeholders: +- We start by trying to prove `u32: Bar` +- We find the `impl Bar for T` impl, we would wind up instantiating the `EarlyBinder` with `u32` (note: this is not _quite_ accurate as we first instantiate the binder with an inference variable that we then infer to be `u32` but that distinction is not super important here) +- There is a where clause `for<'a> &'^0 T: Trait` on the impl, as we instantiated the early binder with `u32` we actually have to prove `for<'a> &'^0 u32: Trait` +- We find the `impl Trait for T` impl, we would wind up instantiating the `EarlyBinder` with `&'^0 u32` +- There is a where clause `for<'a> T: Other<'^0>`, as we instantiated the early binder with `&'^0 u32` we actually have to prove `for<'a> &'^0 u32: Other<'^0>` +- We find the `impl<'a> Other<'a> for &'a u32` and this impl is enoguh to prove the the bound as the lifetime on the borrow and on the trait are both `'^0` + +This end result is incorrect as we had two separate binders introducing their own generic parameters, the trait bound should have ended up as something like `for<'a1, 'a2> &'^1 u32: Other<'^0>` which is _not_ satisfied by the `impl<'a> Other<'a> for &'a u32`. + +While in theory we could make this work it would be quite involved and more complex than the current setup, we would have to: +- "rewrite" bound variables to have a higher `DebruijnIndex` whenever instantiating a `Binder`/`EarlyBinder` with a `Bound` ty/const/region +- When inferring an inference variable to a bound var, if that bound var is from a binder enterred after creating the infer var, we would have to lower the `DebruijnIndex` of the var. +- Separately track what binder an inference variable was created inside of, also what the innermost binder it can name parameters from (currently we only have to track the latter) +- When resolving inference variables rewrite any bound variables according to the current binder depth of the infcx +- Maybe more (while writing this list items kept getting added so it seems naive to think this is exhaustive) + +Fundamentally all of this complexity is because `Bound` ty/const/regions have a different representation for a given parameter on a `Binder` depending on how many other `Binder`s there are between the binder introducing the parameter, and its usage. For example given the following code: +```rust +fn foo() +where + for<'a> T: Trait<'a, for<'b> fn(&'b T, &'a u32)> +{ ... } +``` +That where clause would be written as: +`for<'a> T: Trait<'^0, for<'b> fn(&'^0 T, &'^1_0 u32)>` +Despite there being two references to the `'a` parameter they are both represented differently: `^0` and `^1_0`, due to the fact that the latter usage is nested under a second `Binder` for the inner function pointer type. + +This is in contrast to `Placeholder` ty/const/regions which do not have this limitation due to the fact that `Universe`s are specific to the current `InferCtxt` not the usage site of the parameter. + +It is trivially possible to instantiate `EarlyBinder`s and unify inference variables with existing `Placeholder`s as no matter what context the `Placeholder` is in, it will have the same representation. As an example if we were to instantiate the binder on the higher ranked where clause from above, it would be represented like so: +`T: Trait<'!1_0, for<'b> fn(&'^0 T, &'!1_0 u32)>` +the `RePlaceholder` representation for both usages of `'a` are the same despite one being underneath another `Binder`. + +If we were to then instantiate the binder on the function pointer we would get a type such as: +`fn(&'!2_0 T, ^'!1_0 u32)` +the `RePlaceholder` for the `'b` parameter is in a higher universe to track the fact that its binder was instantiated after the binder for `'a`. + +## Instantiating with `ReLateParam` + +As discussed in a previous chapter, `RegionKind` has two variants for representing generic parameters, `ReLateParam` and `ReEarlyParam`. `ReLateParam` is conceptually a `Placeholder` that is always in the root universe (`U0`). It is used when instantiating late bound parameters on functions/closures. It's actual representation is relatively different from both `ReEarlyParam` and `RePlaceholder`: +- A `DefId` for the item that introduced the late bound generic parameter +- A [`BoundRegionKind`] which either specifies the `DefId` of the generic parameter and its name (via a `Symbol`), or that this placeholder is representing the anonymous lifetime of a `Fn`/`FnMut` closure's self borrow. There is also a variant for `BrAnon` but this is not used for `ReLateParam`. + +For example, given the following code: +```rust,ignore +impl Trait for Whatever { + fn foo<'a>(a: &'a u32) -> &'a u32 { + let b: &'a u32 = a; + b + } +} +``` +the lifetime `'a` in the type `&'a u32` in the function body would be represented as: +``` +ReLateParam( + {impl#0}::foo, + BoundRegionKind::BrNamed({impl#0}::foo::'a, "'a") +) +``` + +In this specific case of referencing late bound generic parameters of a function from inside the body this is done implicitly during `hir_ty_lowering` rather than explicitly when instantiating a `Binder` somewhere. In some cases however, we do explicitly instantiate a `Binder` with `ReLateParam`s. + +Generally whenever we have a `Binder` for late bound parameters on a function/closure and we are conceptually inside of the binder already, we use [`liberate_late_bound_regions`] to instantiate it with `ReLateParam`s. That makes this operation the `Binder` equivalent to `EarlyBinder`'s `instantiate_identity`. + +As a concrete example, accessing the signature of a function we are type checking will be represented as `EarlyBinder>`. As we are already "inside" of these binders, we would call `instantiate_identity` followed by `liberate_late_bound_regions`. + +[`liberate_late_bound_regions`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/context/struct.TyCtxt.html#method.liberate_late_bound_regions +[`BoundRegionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.BoundRegionKind.html +[`enter_forall`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_trait_selection/infer/struct.InferCtxt.html#method.enter_forall +[ch_placeholders_universes]: ../borrow_check/region_inference/placeholders_and_universes.md +[`instantiate_binder_with_fresh_vars`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_trait_selection/infer/struct.InferCtxt.html#method.instantiate_binder_with_fresh_vars +[`InferCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_trait_selection/infer/struct.InferCtxt.html +[`EarlyBinder`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.EarlyBinder.html +[`Binder`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.Binder.html \ No newline at end of file diff --git a/src/ty_module/param_ty_const_regions.md b/src/ty_module/param_ty_const_regions.md new file mode 100644 index 0000000000000..ad56373c0490a --- /dev/null +++ b/src/ty_module/param_ty_const_regions.md @@ -0,0 +1,99 @@ + +# Parameter `Ty`/`Const`/`Region`s + +When inside of generic items, types can be written that use in scope generic parameters, for example `fn foo<'a, T>(_: &'a Vec)`. In this specific case +the `&'a Vec` type would be represented internally as: +``` +TyKind::Ref( + RegionKind::LateParam(DefId(foo), DefId(foo::'a), "'a"), + TyKind::Adt(Vec, &[TyKind::Param("T", 0)]) +) +``` + +There are three separate ways we represent usages of generic parameters: +- [`TyKind::Param`]/[`ConstKind::Param`]/[`RegionKind::EarlyParam`] for early bound generic parameters (note: all type and const parameters are considered early bound, see the [chapter on early vs late bound parameters][ch_early_late_bound] for more information) +- [`TyKind::Bound`]/[`ConstKind::Bound`]/[`RegionKind::Bound`] for references to parameters introduced via higher ranked bounds or higher ranked types i.e. `for<'a> fn(&'a u32)` or `for<'a> T: Trait<'a>`. This will be discussed in the [chapter on `Binder`s][ch_binders]. +- [`RegionKind::LateParam`] for late bound lifetime parameters, `LateParam` will be discussed in the [chapter on instantiating `Binder`s][ch_instantiating_binders]. + +This chapter will only cover `TyKind::Param` `ConstKind::Param` and `RegionKind::EarlyParam`. + +## Ty/Const Parameters + +As `TyKind::Param` and `ConstKind::Param` are implemented identically this section will only refer to `TyKind::Param` for simplicity. However +you should keep in mind that everything here also is true of `ConstKind::Param` + +Each `TyKind::Param` contains two things: the name of the parameter and an index. + +See the following concrete example of a usage of `TyKind::Param`: +```rust,ignore +struct Foo(Vec); +``` +The `Vec` type is represented as `TyKind::Adt(Vec, &[GenericArgKind::Type(Param("T", 0))])`. + +The name is somewhat self explanatory, it's the name of the type parameter. The index of the type parameter is an integer indicating +its order in the list of generic parameters in scope (note: this includes parameters defined on items on outter scopes than the item the parameter is defined on). Consider the following examples: + +```rust,ignore +struct Foo { + // A would have index 0 + // B would have index 1 + + .. // some fields +} +impl Foo { + fn method() { + // inside here, X, Y and Z are all in scope + // X has index 0 + // Y has index 1 + // Z has index 2 + } +} +``` + +Concretely given the `ty::Generics` for the item the parameter is defined on, if the index is `10` then starting from the root `parent`, it will be the eleventh parameter to be introduced. + +The index fully defines the `Ty` and is the only part of `TyKind::Param` that matters for reasoning about the code we are compiling. + +Generally we do not care what the name is and only use the index is included for diagnostics and debug logs as otherwise it would be +incredibly difficult to understand the output, i.e. `Vec: Sized` vs `Vec: Sized`. In debug output, parameter types are +often printed out as `{name}/#{index}`, for example in the function `foo` if we were to debug print `Vec` it would be written as `Vec`. + +An alternative representation would be to only have the name, however using an index is more efficient as it means we can index into `GenericArgs` when instantiating generic parameters with some arguments. We would otherwise have to store `GenericArgs` as a `HashMap` and do a hashmap lookup everytime we used a generic item. + +In theory an index would also allow for having multiple distinct parameters that use the same name, e.g. +`impl Foo { fn bar() { .. } }`. +The rules against shadowing make this difficult but those language rules could change in the future. + +### Lifetime parameters + +In contrast to `Ty`/`Const`'s `Param` singular `Param` variant, lifetimes have two variants for representing region parameters: [`RegionKind::EarlyParam`] and [`RegionKind::LateParam`]. The reason for this is due to function's distinguishing between [early and late bound parameters](../early-late-bound-params/early-late-bound-summary.md) which is discussed in an earlier chapter (see link). + +`RegionKind::EarlyParam` is structured identically to `Ty/Const`'s `Param` variant, it is simply a `u32` index and a `Symbol`. For lifetime parameters defined on non-function items we always use `ReEarlyParam`. For functions we use `ReEarlyParam` for any early bound parameters and `ReLateParam` for any late bound parameters. Note that just like `Ty` and `Const` params we often debug format them as `'SYMBOL/#INDEX`, see for example: + +```rust,ignore +// This function would have its signature represented as: +// +// ``` +// fn( +// T/#2, +// Ref('a/#0, Ref(ReLateParam(...), u32)) +// ) -> Ref(ReLateParam(...), u32) +// ``` +fn foo<'a, 'b, T: 'a>(one: T, two: &'a &'b u32) -> &'b u32 { + ... +} +``` + +`RegionKind::LateParam` will be discussed more in the chapter on [instantiating binders][ch_instantiating_binders]. + +[ch_early_late_bound]: ../early-late-bound-params/early-late-bound-summary.md +[ch_binders]: ./binders.md +[ch_instantiating_binders]: ./instantiating_binders.md +[`BoundRegionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.BoundRegionKind.html +[`RegionKind::EarlyParam`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.RegionKind.html#variant.ReEarlyParam +[`RegionKind::LateParam`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.RegionKind.html#variant.ReLateParam +[`ConstKind::Param`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.ConstKind.html#variant.Param +[`TyKind::Param`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.TyKind.html#variant.Param +[`TyKind::Bound`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.TyKind.html#variant.Bound +[`ConstKind::Bound`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.ConstKind.html#variant.Bound +[`RegionKind::Bound`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.RegionKind.html#variant.ReBound