From 6c4f737404c28def124e0a28d758db6bcc6ba3be Mon Sep 17 00:00:00 2001 From: Nick Cameron Date: Sun, 8 Jun 2014 11:54:49 +1200 Subject: [PATCH 1/3] Efficient single inheritance --- 0000-virtual.md | 577 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 577 insertions(+) create mode 100644 0000-virtual.md diff --git a/0000-virtual.md b/0000-virtual.md new file mode 100644 index 00000000000..5a920ae940a --- /dev/null +++ b/0000-virtual.md @@ -0,0 +1,577 @@ +- Start Date: +- RFC PR #: +- Rust Issue #: + +# Summary + +Efficient single inheritance via virtual structs and unification and nesting of +structs and enums. + +Allow inheritance between structs and virtual methods on struct objects. This +would support data structures such as the DOM which need to be efficient both in +speed (particularly non-virtual field access) and space (thin pointers). + +This approach unifies many of our data types so although we add features +(virtual methods, nested enums), we reduce complexity of the language and +compiler in other directions. + +# Motivation + +Supporting efficient, heterogeneous data structures such as the DOM. Precisely +we need a form of code sharing which satisfies the following constraints: + +* Cheap field access from internal methods; +* Cheap dynamic dispatch of methods; +* Cheap downcasting; +* Thin pointers; +* Sharing of fields and methods between definitions; +* Safe, i.e., doesn't require a bunch of transmutes or other unsafe code to be usable. + +Example (in pseudo-code): + +``` +class Element { + Element parent, left-sibling, right-sibling; + Element[] children; + + foo(); + + template() { + x = foo(); + ... + } +} + +class Element1 : Element { + Data some-data; + + template() { + return some-data; + } +} + +final class Element2 : Element { + ... +} +``` + +# Detailed design + +Syntactically, we unify structs and enums (but not the keywords) and allow +nesting. That means enums may have fields and structs may have variants. The +keyword (`struct` or `enum`) is only required at the top level. Unnamed fields +(tuple variants/tuple structs) are only allowed in leaf data. All existing uses +are preserved. Some examples: + +plain enum: + +``` +enum E1 { + Variant1, + Variant2(int) +} + +let x: E1 = Variant1; +let y: E1 = Variant2(4); +``` + +plain struct: + +``` +struct S1 { + f1: int, + f2: E1 +} + +let s: S1 = S1 {f1: 5, f: y}; +``` + +enum with fields: + +``` +enum E2 { + f: int, + Variant1, + Variant2(int) +} + +let x: E2 = Variant2(f: 34, 23); +``` + +Open question: should we use `()` or `{}` when instantiating items with a mix of +named and unnamed fields? Or allow either? Or forbid items having both kinds of +fields. + +nested enum: + +``` +enum E3 { + Variant1, + Variant2(int), + VariantNest { + Variant4, + Variant5 + } +} + +let x: E3 = Variant1; +let y: E3 = Variant4; +``` + +nested struct: + +``` +struct S1 { + f1: int, + f2: E1, + S2 { + f3: int + } +} + +let s1 = S1 {f1: 5, f: y} +let s2 = S2 {f1: 5, f: y, f3: 5} +``` + +All names of variants may be used as types (that is, from the above examples, +`E3`, `Variant1`, `VariantNest`, `Variant5`, `S1`, `S2` may all be used as +types, amongst others). Fields in outer items are inherited by inner items +(e.g., `S2` objects have fields `f1` and `f2`). Field shadowing is not allowed. + +All leaf variants may be instantiated. No non-leaf enums may be instantiated +(e.g., you can't create an `E3` or `VariantNest` object). By default, all +structs can be instantiated. Structs, including nested structs, but not leaf +structs, may be marked `virtual`, which means they cannot be instantiated. Put +another way, enums are virtual by default. E.g., `virtual struct S1 { ... S2 { +... } }` means `S1` cannot be instantiated, but `S2` can. `virtual struct S1 { +... virtual S2 { ... } }` would mean `S2` could not be instantiated, but would +be illegal because it is a leaf item. The `virtual` keyword can only be used at +the top level or inside another `virtual` struct. + +Open question: is the above use of the `virtual` keyword a good idea? We could +use `abstract` instead (some people do not like `virtual` in general, and this +use is different from the use described below for methods). Alternatively, we +could allow instantiation of all structs (unless they have pure virtual methods, +see below) or only allow instantiation of leaf structs. + +We allow logical nesting without lexical nesting by using `:`. In this case a +keyword (`struct` or `enum`) is required and must match the outer item. For +example, `struct S3 : S1 { ... }` adds another case to the `S1` defintion above +and objects of `S3` would inherit the fields `f1` and `f2`. Likewise, one could +write `enum Variant3 : E3;` to add a case to the defintion of `E3`. Such items +are only allowed in the same module, or a sub-module of, the outer item. Why? + + 1. Prevents people from abusing virtual structs to create an open-ended + abstraction: traits are more suitable in such cases. + 2. Downcasting is more optimizable, becomes O(1) instead of O(n). This is a + common complaint against C++ RTTI (as pointed out on the mailing list). + 3. Addresses the private fields initialization gotcha. (Without this + restriction, it is not clear how to initialise a struct with private fields + in a different module). + +When matching data types, you can use any names from any level of nesting to +cover all inner levels. E.g., + +``` +fn foo(x: E3) { + // All three versions give correct coverage + match x { + E3 => {} + } + match x { + Variant1 => {} + Variant2(_) => {} + VariantNest => {} + } + match x { + Variant1 => {} + Variant2(_) => {} + Variant4 => {} + Variant5 => {} + } +} +``` + +The only difference between structs and enums is in their representation (which +affects how they can be used). enum objects are represented as they are today. +They have a tag and are the size of the largest variant plus the tag. A pointer +or reference to an enum object is just a regular pointer to a regular enum +object. Nested variants should use a single tag and the largest variant must +take into account nesting. Event if we know the static type restricts us to a +small object, we must assume it could be a larger variant. That allows for +trivial coercions from nested variants to outer variants. We could optimise this +later, perhaps. + +Non-leaf struct values are unsized, that is they follow the rules for DSTs. You +cannot use non-leaf structs as value types, only pointers to such types. E.g., +(given the definition of `S1` above) one can write `x: S2` (since it is a leaf +struct), `x: &S2`, and `x: &S1`, but not `x: S1`. Struct values have their +minimal size (i.e., their size does not take into account other variants). This +is also current behaviour. Pointers to structs are DST pointers, but are not +fat. They point to a pointer to a vtable, followed by the data in the struct. +The vtable pointer allows identification of the struct variant. + +To summarise the important differences between enums and structs: enum objects +may be passed by value where an outer enum type is expected. Struct objects may +only be passed by reference (borrowed reference, or any kind of smart or built- +in pointer). enum values have the size of the largest variant plus a +discriminator (modulo optimisation). Struct values have their minimal size. For +example, + +``` +enum E { + EVar1, + EVar2, + EVar3(i64, i64, i64, i64), +} + +struct S { + SVar1, + SVar2, + SVar3(i64, i64, i64, i64), +} + +fn foo_e(e: E) {...} +//fn foo_s(s: S) {...} + +fn foo_er(e: &E) {...} +fn foo_sr(s: &S) {...} + +fn foo_e1(e: EVar1) {...} +fn foo_s1(s: SVar1) {...} +``` + +Here an instance of `EVar1` has size 40 bytes (assuming a 64 bit discriminator) +and an instance of SVar1 has zero size. An instance of `&E` has size 8 bytes +(assuming 64 bit pointers) and points to an object of size 40 bytes. An instance +of `&S` has size 8 bytes and pointers to an object of unknown size. If the +dynamic type is `&SVar1` then the pointed-to data has size 8 bytes (zero size +value plus a pointer to a vtable). + +The function `foo_s` is a type error because `S` is an unsized type (DST). The +other functions are all valid. + +A programmer would typically use enums for small, similar size objects where the +data is secondary and discrimination is primary, for example the current +`Result` type. Structs would be used for large or diversely sized objects where +discrimination is secondary, that is they are often used in a polymorphic +setting, a good candidate would be the AST enum in libsyntax (or of course, the +DOM in Servo). + +Matching struct objects (that is pointer-to-structs) takes into account the +dynamic type given by the vtable pointer and thus allows for safe and efficient +downcasting. + +Methods may be marked as `virtual` which allows them to be overridden in the +sub- struct's impl. Overriding methods must be marked `override`. It is an error +for a method to override without the annotation, or for an annotated method not +to override a super-struct method. (Methods marked `override` but not `virtual` +may not be overriden). Virtual methods may be given without a body, these are +pure virtual in the C++ terminology. This is only allowed if the struct is also +marked `virtual`. Non virtual methods will be statically dispatched as they are +currently. Virtual methods are dispatched dynamically using an object's vtable. +Methods may be marked as both `override` and `virtual` to indicate that override +and may in turn be overridden. A method without the `virtual` annotation is +final (in the Java sense) and may not be overridden. + +Open question: alternative to `virtual` keyword - `dynamic`. + +## Subtyping and coercion + +Nothing in this RFC introduces subtyping. + +Inner enum values can implicitly coerce to outer enum values. + +Inner struct pointer values can impliciitly coerce to outer struct pointer +values. Note that there is no coercion between struct values. Since all but leaf +structs are unsized, they may not dereferenced. Thus we are immune to the object +slicing problem from C++. + +Via the DST rules, it should fall out that these coercions work for smart +pointers as well as `&` and `Box` pointers. + +Note that this means if `R` is an inner struct of `S` and `S` implements a trait +`T`, but `R` does not, then given a pointer to an `R` object, it may be coerced +to an `S` in order to call methods defined in `T`, if the self type of those +methods is a pointer to self (e.g., `&self`). + +## Generics + +(I feel the syntax could be nicer here, any ideas?) + +Nested items must specify formal and actual type parameters. The outer items +type parameters must be given in `<>` after a `:` (similar to the inheritance +notation, but no need to name the outer item). E.g., + +``` +struct Sg { + Sgn : { + field: Foo + } +} + +let x = Sgn { field: ... }; +``` + +In the nested notation only, if an item has exactly the same type parameters as +its parent, they may be ommitted. That is for + +``` +struct Sg { + Sgn2 : { + field: Foo + } +} + +let x = Sgn2 { field: ... }; +``` + +the programmer may write + +``` +struct Sg { + Sgn2 { + field: Foo + } +} + +let x = Sgn2 { field: ... }; +``` + +When non-nested syntax is used, all type parameters must be specified, including +actual type parameters for the parent. E.g., + +``` +struct Sg {} + +struct Sgn : Sg { + field: Foo +} + +let x = Sgn { field: ... }; +``` + +## Privacy + +The privacy rules for fields remain unchanged. Nested items inherit their +privacy from their parent, so module private by default unless the parent is +marked `pub`. + +Open question: is there a use case for allowing nested items to be marked `pub`? +That is having a private parent but public child. What about the opposite? + +# JDM's example +From https://gist.github.com/jdm/9900569 + +``` +virtual struct Node { + parent: Rc, + first_child: Rc, +} + +struct TextNode : Node { +} + +virtual struct Element : Node { + attrs: HashMap +} + +impl Element { + fn set_attribute(&mut self, key: &str, value: &str) + { + self.before_set_attr(key, value); + //...update attrs... + self.after_set_attr(key, value); + } + + virtual fn before_set_attr(&mut self, key: &str, value: &str); + virtual fn after_set_attr(&mut self, key: &str, value: &str); +} + +struct HTMLImageElement : Element { +} + +impl HTMLImageElement { + override fn before_set_attr(&mut self, key: &str, value: &str) + { + if (key == "src") { + //..remove cached image with url |value|... + } + Element::before_set_attr(self, key, value); + } +} + +struct HTMLVideoElement : Element { + cross_origin: bool +} + +impl HTMLVideoElement { + override fn after_set_attr(&mut self, key: &str, value: &str) + { + if (key == "crossOrigin") { + self.cross_origin = value == "true"; + } + Element::after_set_attr(self, key, value); + } +} + +fn process_any_element(element: &Element) { + // ... +} + +fn foo() { + let videoElement: Rc = ...; + process_any_element(videoElement); + + let node = videoElement.first_child; + + // Downcasting + match node { + element @ Rc(Element{..}) => { ... } + _ => { ... } + } +} +``` + +# Drawbacks + +We are adding a fair bit of complexity here, in particular in allowing nesting +of structs/enums. The reduction in complexity by unifying structs and enums has +clearer advantages to the language implementation than to users of the language. +The difference between a struct and enum is subtle, and probably hard to get +across in a tutorial. On the other hand they are satisfying different use cases +with different priorities. I believe the extra complexity does not need to be +paid for by every user in the sense that, unless you specifically want to use +these features, you don't need to know about them. + +# Alternatives + +There have been many proposals for alternative designs and variations on this +design. One minor variation would be to use anonymous fields rather than `:` +extension for struct inheritance. An alternative proposal is to allow traits to +extend a single struct and add subtyping appropriately. We would then need to +add support for some kind of RTTI (possibly using a trait and macros) to allow +safe and efficient downcasting. + +## Some previous RFCs + +* [Virtual Structs (5)](/~https://github.com/rust-lang/rfcs/pull/5) Stays as + closely as possible to inheritance schemes in Java or C++. Touches only + structs so does not unify structs and enums. That means we end up with two + design choices, where there probably shouldn't be. The scheme for defining + virtual methods is used in this RFC> + +* [Fat objects (9)](/~https://github.com/rust-lang/rfcs/pull/9) Proposes using a + pointer to a vtable+data and treating it as DST for representing objects. A + very similar scheme is used in this RFC. RFC 9 does not actually propose a + mechanism for supporting inheritance and efficient virtual methods, just a + representation for objects (it suggests using Niko's earlier + [proposal](http://smallcultfollowing.com/babysteps/blog/2013/10/24/single-inheritance/) + for single inheritance by allowing struct inheritance and + traits to extend structs). This RFC can be considered to take the object + representation scheme from RFC 9 with a different mechanism for inheritance. + +* [Extending enums (11)](/~https://github.com/rust-lang/rfcs/pull/11) Proposes + combining enums and structs in a similar, but not identical to this RFC. + Introduces `impl ... as match` and `impl ... use ...` to handle method + dispatch. + +* [Unify and nest enums and structs (24)](/~https://github.com/rust-lang/rfcs/pull/24) + A variation of RFC 11, superseeded by this RFC. + + +# Unresolved questions + +## Multiple inheritance + +Do we need multiple inheritance? We _could_ add it, but there are lots of design +and implementation issues. The use case for multiple inheritance (from bz) is +that some DOM nodes require mixin-style use of classes which currently use +multiple inheritance, e.g., nsIConstraintValidation. + +Example with traits: + +``` +impl Element { + virtual fn bar() -> uint; +} + +trait NSICompositor { + fn x() -> uint; + fn y() -> uint; + fn bar() -> uint { self.x() + self.y() } +} + +impl NSICompositor for Element1 { + fn x() -> uint { self.x } + fn y() -> uint { self.y } +} + +impl Element1 { + override fn bar() -> uint { NSICompositor::bar(self) } +} + +impl NSICompositor for Element2 { + fn x() -> uint { self.x } + fn y() -> uint { self.y } +} + +impl Element2 { + override fn bar() -> uint { NSICompositor::bar(self) } +} +``` + +## Drop + +What to do about (virtual) destructors? I feel the C++ approach is too much of a +foot gun. By limiting struct inheritance to a module, we should always be able to +infer whether or not a destructor is virtual. Need to work out how exactly +implementing the drop trait interacts with inheritance. + +We need to cope with the situation where a struct object with static type T1 and +dynamic type T2 goes out of scope and T2 implements `Drop` and T1 doesn't - we +still need to call T2::drop. One solution could be that if an inner struct +implements `Drop` then so must the outer struct. Calling `drop` is then just a +regular virtual call and is only necessary if the static type implements `Drop`. + +A generalisation of this is should we have a mechanism for requiring inner items +to implement a trait (with or without implementing it in the outer item, the +former case is like saying "must override")? This is kind of dual to the idea +above that if an outer item implements a trait, then the inner trait appears to +implement it too, via coercion. (ht Niko). + +Should we automatically call drop on super-structs? Or rely on the programmer to +do that manually? + +### Straw man proposal + +Allow `virtual impl Tr for T;` syntax where `T` must be a struct or enum and +which has the semantics that any inner item of `T` must provide an implmentation +of `Tr`. Similarly to pure virtual methods, this implies that `T` cannot be +instantiated. + +Traits may be marked `inherit` (this is a terrible keyword, anyone got any +better ideas? I guess we could use `virtual` here too): `inherit Trait Tr +{...}`. This implies that for an item `T` to implement `Tr` any outer item of +`T` must also implement `Tr` (possibly providing a pure virtual impl). This is +checked where the impl is declared, so it is possible that an impl could be +declared for an outer item in a different module but due to the visibility +rules, it is invisible, this would be a compile error. Since `impl`s are not +imported, only traits, I believe this means that if a trait is inherit, then +anywhere an implementation for an inner item is visible, then an implementation +for the outer item is also visible. + +Drop is marked `inherit`. + +It is the programmer's responsibility to call `drop()` for outer-items from the +impl for the inner item, if necessary. + +I believe that this gives the desired behaviour and is backwards compatible, +other than the addition of the `virtual` and `inherit` keywords. + +## Calling overridden methods + +If a method is overridden, we should still be able to call it. C++ uses `::` +syntax to allow this, UFCS should let us do this. Since all such uses would use +static dispatch, we would use self-as-arg syntax, e.g., +`BaseType::method(self, ...)`. From afe444f058010ede6162bcfa69d73487231511eb Mon Sep 17 00:00:00 2001 From: Nick Cameron Date: Wed, 16 Jul 2014 17:36:27 +1200 Subject: [PATCH 2/3] changes --- 0000-virtual.md | 331 +++++++++++++++++++++++++----------------------- 1 file changed, 170 insertions(+), 161 deletions(-) diff --git a/0000-virtual.md b/0000-virtual.md index 5a920ae940a..2d0245e465c 100644 --- a/0000-virtual.md +++ b/0000-virtual.md @@ -4,64 +4,39 @@ # Summary -Efficient single inheritance via virtual structs and unification and nesting of -structs and enums. +Efficient single inheritance by unification and nesting of structs and enums, +and virtual dispatch of methods called on reference-to-concrete-type objects. -Allow inheritance between structs and virtual methods on struct objects. This -would support data structures such as the DOM which need to be efficient both in -speed (particularly non-virtual field access) and space (thin pointers). +This will support data structures such as the DOM which need to be efficient +both in speed (particularly non-virtual field access) and space (thin pointers +to abstract types). -This approach unifies many of our data types so although we add features -(virtual methods, nested enums), we reduce complexity of the language and -compiler in other directions. +The approach unifies many of our data types so although we add features (virtual +methods, nested enums), we reduce complexity of the language and compiler in +other directions (removing distinctions between structs and enums, makes support +for struct variants and tuple structs less ad hoc). # Motivation -Supporting efficient, heterogeneous data structures such as the DOM. Precisely -we need a form of code sharing which satisfies the following constraints: +Supporting efficient, heterogeneous data structures such as the DOM or an AST +(e.g., in the Rust compiler). Precisely we need a form of code sharing which +satisfies the following constraints: -* Cheap field access from internal methods; -* Cheap dynamic dispatch of methods; -* Cheap downcasting; -* Thin pointers; -* Sharing of fields and methods between definitions; -* Safe, i.e., doesn't require a bunch of transmutes or other unsafe code to be usable. - -Example (in pseudo-code): - -``` -class Element { - Element parent, left-sibling, right-sibling; - Element[] children; - - foo(); - - template() { - x = foo(); - ... - } -} - -class Element1 : Element { - Data some-data; - - template() { - return some-data; - } -} - -final class Element2 : Element { - ... -} -``` +* cheap field access from internal methods; +* cheap dynamic dispatch of methods; +* cheap downcasting; +* thin pointers; +* sharing of fields and methods between definitions; +* safe, i.e., doesn't require a bunch of transmutes or other unsafe code to be usable. # Detailed design -Syntactically, we unify structs and enums (but not the keywords) and allow -nesting. That means enums may have fields and structs may have variants. The -keyword (`struct` or `enum`) is only required at the top level. Unnamed fields -(tuple variants/tuple structs) are only allowed in leaf data. All existing uses -are preserved. Some examples: +Syntactically, we unify structs and enums (but not their keywords, but see +'alternatives', below) and allow nesting. That means enums may have fields and +structs may have variants. Both may have nested data; the keyword (`struct` or +`enum`) is only required at the top level. Unnamed fields (tuple variants/tuple +structs) are only allowed in leaf data. All existing uses are preserved. Some +examples: plain enum: @@ -98,9 +73,9 @@ enum E2 { let x: E2 = Variant2(f: 34, 23); ``` -Open question: should we use `()` or `{}` when instantiating items with a mix of -named and unnamed fields? Or allow either? Or forbid items having both kinds of -fields. +**Open question:** should we use `()` or `{}` when instantiating items with a +mix of named and unnamed fields? Or allow either? Or forbid items having both +kinds of fields? nested enum: @@ -138,7 +113,7 @@ All names of variants may be used as types (that is, from the above examples, types, amongst others). Fields in outer items are inherited by inner items (e.g., `S2` objects have fields `f1` and `f2`). Field shadowing is not allowed. -All leaf variants may be instantiated. No non-leaf enums may be instantiated +All leaf variants may be instantiated. Non-leaf enums may not be instantiated (e.g., you can't create an `E3` or `VariantNest` object). By default, all structs can be instantiated. Structs, including nested structs, but not leaf structs, may be marked `virtual`, which means they cannot be instantiated. Put @@ -148,18 +123,18 @@ another way, enums are virtual by default. E.g., `virtual struct S1 { ... S2 { be illegal because it is a leaf item. The `virtual` keyword can only be used at the top level or inside another `virtual` struct. -Open question: is the above use of the `virtual` keyword a good idea? We could -use `abstract` instead (some people do not like `virtual` in general, and this -use is different from the use described below for methods). Alternatively, we -could allow instantiation of all structs (unless they have pure virtual methods, -see below) or only allow instantiation of leaf structs. +**Open question:** is the above use of the `virtual` keyword a good idea? We +could use `abstract` instead (some people do not like `virtual` in general, and +this use is different from the use described below for methods). Alternatively, +we could allow instantiation of all structs (unless they have pure virtual +methods, see below) or only allow instantiation of leaf structs. We allow logical nesting without lexical nesting by using `:`. In this case a keyword (`struct` or `enum`) is required and must match the outer item. For example, `struct S3 : S1 { ... }` adds another case to the `S1` defintion above and objects of `S3` would inherit the fields `f1` and `f2`. Likewise, one could write `enum Variant3 : E3;` to add a case to the defintion of `E3`. Such items -are only allowed in the same module, or a sub-module of, the outer item. Why? +are only allowed in the same module or a sub-module of the outer item. Why? 1. Prevents people from abusing virtual structs to create an open-ended abstraction: traits are more suitable in such cases. @@ -195,28 +170,36 @@ fn foo(x: E3) { The only difference between structs and enums is in their representation (which affects how they can be used). enum objects are represented as they are today. They have a tag and are the size of the largest variant plus the tag. A pointer -or reference to an enum object is just a regular pointer to a regular enum -object. Nested variants should use a single tag and the largest variant must +or reference to an enum object is a thin pointer to a regular enum +object. Nested variants should use a single tag and the 'largest variant' must take into account nesting. Event if we know the static type restricts us to a small object, we must assume it could be a larger variant. That allows for trivial coercions from nested variants to outer variants. We could optimise this later, perhaps. -Non-leaf struct values are unsized, that is they follow the rules for DSTs. You -cannot use non-leaf structs as value types, only pointers to such types. E.g., -(given the definition of `S1` above) one can write `x: S2` (since it is a leaf -struct), `x: &S2`, and `x: &S1`, but not `x: S1`. Struct values have their -minimal size (i.e., their size does not take into account other variants). This -is also current behaviour. Pointers to structs are DST pointers, but are not -fat. They point to a pointer to a vtable, followed by the data in the struct. -The vtable pointer allows identification of the struct variant. +Non-leaf struct values are unsized, that is they follow the rules for DSTs. A +programmer cannot use non-leaf structs as value types, only pointers to such +types may exist. E.g., (given the definition of `S1` above) one can write `x: +S2` (since it is a leaf struct), `x: &S2`, and `x: &S1`, but not `x: S1`. Struct +values have their minimal size (i.e., their size does not take into account +other variants). This is also current behaviour. Pointers to structs are DST +pointers, but are not fat. They point to a pointer to a vtable, followed by the +data in the struct. The vtable pointer allows identification of the concrete +struct variant. Pointer-to-struct objects may only be dereferenced if the static +type is a leaf and gives only the concrete object with no indication of the +vtable. The runtime representation of pointer-to-leaf struct objects is changed +from the current representation in that the pointer is a pointer to +`[vtable_ptr, data]` rather than a pointer to `data`. However, since +dereferencing must give the data (and not the `[vtable_ptr, data]` +representation), this difference is only observable if the programmer uses +unsafe transmutes. To summarise the important differences between enums and structs: enum objects may be passed by value where an outer enum type is expected. Struct objects may only be passed by reference (borrowed reference, or any kind of smart or built- in pointer). enum values have the size of the largest variant plus a -discriminator (modulo optimisation). Struct values have their minimal size. For -example, +discriminator (modulo optimisation). Struct values have their minimal (concrete) +size plus one word (for the vtable pointer). For example, ``` enum E { @@ -252,29 +235,32 @@ The function `foo_s` is a type error because `S` is an unsized type (DST). The other functions are all valid. A programmer would typically use enums for small, similar size objects where the -data is secondary and discrimination is primary, for example the current +data is secondary and discrimination is primary. For example, the current `Result` type. Structs would be used for large or diversely sized objects where discrimination is secondary, that is they are often used in a polymorphic -setting, a good candidate would be the AST enum in libsyntax (or of course, the +setting. A good candidate would be the AST enum in libsyntax (or of course, the DOM in Servo). -Matching struct objects (that is pointer-to-structs) takes into account the -dynamic type given by the vtable pointer and thus allows for safe and efficient +Matching struct objects (that is pointer-to-structs) should take into account the +dynamic type given by the vtable pointer and thus allow for safe and efficient downcasting. -Methods may be marked as `virtual` which allows them to be overridden in the -sub- struct's impl. Overriding methods must be marked `override`. It is an error -for a method to override without the annotation, or for an annotated method not -to override a super-struct method. (Methods marked `override` but not `virtual` -may not be overriden). Virtual methods may be given without a body, these are -pure virtual in the C++ terminology. This is only allowed if the struct is also -marked `virtual`. Non virtual methods will be statically dispatched as they are -currently. Virtual methods are dispatched dynamically using an object's vtable. -Methods may be marked as both `override` and `virtual` to indicate that override -and may in turn be overridden. A method without the `virtual` annotation is -final (in the Java sense) and may not be overridden. +## Methods + +Methods may be marked as `virtual` which allows them to be overridden by a sub- +struct. Overriding methods must be marked `override`. It is an error for a +method to override without the annotation, or for an annotated method not to +override a super-struct method. Methods may be marked as both `override` and +`virtual` to indicate that override and may in turn be overridden. (Methods +marked `override` but not `virtual` must override but may not be overriden). A +method without the `virtual` annotation is final (in the Java sense) and may not +be overridden. Virtual methods may be given without a body, these are pure +virtual in C++ terms. This is only allowed if the struct is also marked +`virtual` (so that the struct cannot be instantiated). Non virtual methods will +be statically dispatched, as they are currently. Virtual methods are dispatched +dynamically using an object's vtable. -Open question: alternative to `virtual` keyword - `dynamic`. +**Open question:** alternative to `virtual` keyword - `dynamic`. ## Subtyping and coercion @@ -282,26 +268,32 @@ Nothing in this RFC introduces subtyping. Inner enum values can implicitly coerce to outer enum values. -Inner struct pointer values can impliciitly coerce to outer struct pointer +Inner struct pointer values can implicitly coerce to outer struct pointer values. Note that there is no coercion between struct values. Since all but leaf -structs are unsized, they may not dereferenced. Thus we are immune to the object -slicing problem from C++. +structs are unsized, they may not be dereferenced. Thus we are immune to the +object slicing problem from C++. + +**Open question: We could choose to force explicit coercions. It would make +**sense for the behaviour to match sub-traits, whatever we decide for that. Via the DST rules, it should fall out that these coercions work for smart pointers as well as `&` and `Box` pointers. Note that this means if `R` is an inner struct of `S` and `S` implements a trait `T`, but `R` does not, then given a pointer to an `R` object, it may be coerced -to an `S` in order to call methods defined in `T`, if the self type of those -methods is a pointer to self (e.g., `&self`). +(explicitly) to an `S` in order to call methods defined in `T`, if the self type +of those methods is a pointer to self (e.g., `&self`). + +If in the future we decide that subtyping is useful, we could add it backwards +compatibly. ## Generics (I feel the syntax could be nicer here, any ideas?) -Nested items must specify formal and actual type parameters. The outer items -type parameters must be given in `<>` after a `:` (similar to the inheritance -notation, but no need to name the outer item). E.g., +Nested items must specify formal and actual type parameters. The outer items' +type parameters must be given between `<>` after a `:` (similar to the +inheritance notation, but no need to name the outer item). E.g., ``` struct Sg { @@ -314,7 +306,7 @@ let x = Sgn { field: ... }; ``` In the nested notation only, if an item has exactly the same type parameters as -its parent, they may be ommitted. That is for +its parent, they may be ommitted. For example, for ``` struct Sg { @@ -339,7 +331,8 @@ let x = Sgn2 { field: ... }; ``` When non-nested syntax is used, all type parameters must be specified, including -actual type parameters for the parent. E.g., +actual type parameters for the parent. (Note also that the super-type is named +whether or not type parameters are present). E.g., ``` struct Sg {} @@ -357,10 +350,60 @@ The privacy rules for fields remain unchanged. Nested items inherit their privacy from their parent, so module private by default unless the parent is marked `pub`. -Open question: is there a use case for allowing nested items to be marked `pub`? -That is having a private parent but public child. What about the opposite? +**Open question:** is there a use case for allowing nested items to be marked +`pub`? That is having a private parent but public child. What about the +opposite? + + +## Drop + +Traits may be marked `inherit` (alternatively, we could use `virtual` here too, +although this is probably overloading one keyword too far): `inherit Trait Tr +{...}`. This implies that for an item `T` to implement `Tr` any outer item of +`T` must also implement `Tr` (possibly providing a pure virtual declaration if +the outer item is itself virtual). This is checked where the impl is declared, +so it would be possible that an impl could be declared for an outer item in a +different module but due to the visibility rules, it is invisible, this should +be a compile error. Since `impl`s are not imported, only traits, I believe this +means that if a trait is marked `inherit`, then anywhere an implementation for +an inner item is visible, then an implementation for the outer item is also +visible. + +`Drop` is marked `inherit`. + +Where an object goes out of scope, the compiler will check for the Drop trait +like it does today. However, if it finds one on the static type, then it will +generate code which calls all implementations of drop up the inheritance +hierarchy (rather than calling a single destructor). Note that by marking the +`Drop` trait as `inherit`, it is not possible that the dynamic type has a +destructor, but the static type does not. + +I believe this is possible by walking the vtable and calling all methods rather +than just the first. So this should not require any additional reflection +capabilty. + +I believe that this gives the desired behaviour and is backwards compatible, +other than the addition of the `inherit` keywords. It is the desired behviour +for destructors, but it is a little bizarre when thought of in terms of regular +virtual method calls. I think this is the least worst option, however. + +A possible generalisation of this is a mechanism for requiring inner items +to implement a trait (with or without implementing it in the outer item, the +former case is like saying "must override")? This is kind of dual to the idea +above that if an outer item implements a trait, then the inner trait appears to +implement it too, via coercion. (ht Niko). + + +## Calling overridden methods + +If a method is overridden, we should still be able to call it. C++ uses `::` +syntax to allow this, UFCS should let us do this. Since all such uses would use +static dispatch, we would use self-as-arg syntax, e.g., +`BaseType::method(self, ...)`. + # JDM's example + From https://gist.github.com/jdm/9900569 ``` @@ -438,11 +481,12 @@ fn foo() { We are adding a fair bit of complexity here, in particular in allowing nesting of structs/enums. The reduction in complexity by unifying structs and enums has clearer advantages to the language implementation than to users of the language. + The difference between a struct and enum is subtle, and probably hard to get -across in a tutorial. On the other hand they are satisfying different use cases -with different priorities. I believe the extra complexity does not need to be -paid for by every user in the sense that, unless you specifically want to use -these features, you don't need to know about them. +across in a tutorial. On the other hand, the two are satisfying clearly +different use cases with different priorities. I believe the extra complexity +does not need to be paid for by every user in the sense that, unless you +specifically want to use these features, you don't need to know about them. # Alternatives @@ -456,10 +500,10 @@ safe and efficient downcasting. ## Some previous RFCs * [Virtual Structs (5)](/~https://github.com/rust-lang/rfcs/pull/5) Stays as - closely as possible to inheritance schemes in Java or C++. Touches only + closely as possible to single inheritance in Java or C++. Touches only structs so does not unify structs and enums. That means we end up with two - design choices, where there probably shouldn't be. The scheme for defining - virtual methods is used in this RFC> + design choicesl (enums or virtual structs), where there probably shouldn't be. + The scheme for defining virtual methods is used in this RFC. * [Fat objects (9)](/~https://github.com/rust-lang/rfcs/pull/9) Proposes using a pointer to a vtable+data and treating it as DST for representing objects. A @@ -472,7 +516,7 @@ safe and efficient downcasting. representation scheme from RFC 9 with a different mechanism for inheritance. * [Extending enums (11)](/~https://github.com/rust-lang/rfcs/pull/11) Proposes - combining enums and structs in a similar, but not identical to this RFC. + combining enums and structs in a similar, but not identical way to this RFC. Introduces `impl ... as match` and `impl ... use ...` to handle method dispatch. @@ -480,12 +524,26 @@ safe and efficient downcasting. A variation of RFC 11, superseeded by this RFC. +## Variation - `data` for `struct` and `enum` + +An alternative to using `enum` and `struct` is to use a single keyword for both +constructs. `data` is my personal favourite and matches Haskell (I'm not aware +of other uses, Scala?). We would then need some way to indicate the sized-ness +of the datatype. The obvious way is to use another keyword. For DST we use +`Sized?` but this is not a keyword, it indicates the possible absence of the +default trait bound `Sized`, so that is probably not suitable. Furthermore, I'm +not sure whether the sized or unsized version should be the default. + +We could then either forbid the use of `enum` and `struct` or we could allow +them as syntactic sugar for `sized data` and `unsized virtual data`, +respectively. + # Unresolved questions ## Multiple inheritance Do we need multiple inheritance? We _could_ add it, but there are lots of design -and implementation issues. The use case for multiple inheritance (from bz) is +and implementation issues. An example use case for multiple inheritance (from bz) is that some DOM nodes require mixin-style use of classes which currently use multiple inheritance, e.g., nsIConstraintValidation. @@ -521,57 +579,8 @@ impl Element2 { } ``` -## Drop - -What to do about (virtual) destructors? I feel the C++ approach is too much of a -foot gun. By limiting struct inheritance to a module, we should always be able to -infer whether or not a destructor is virtual. Need to work out how exactly -implementing the drop trait interacts with inheritance. - -We need to cope with the situation where a struct object with static type T1 and -dynamic type T2 goes out of scope and T2 implements `Drop` and T1 doesn't - we -still need to call T2::drop. One solution could be that if an inner struct -implements `Drop` then so must the outer struct. Calling `drop` is then just a -regular virtual call and is only necessary if the static type implements `Drop`. +I believe that all such uses can be implemented using the traits mechanisms in +Rust and that these will interact cleanly with the proposed features for +efficient single inheritance. Therefore, we should not add any additional +mechanism for multiple inheritance. -A generalisation of this is should we have a mechanism for requiring inner items -to implement a trait (with or without implementing it in the outer item, the -former case is like saying "must override")? This is kind of dual to the idea -above that if an outer item implements a trait, then the inner trait appears to -implement it too, via coercion. (ht Niko). - -Should we automatically call drop on super-structs? Or rely on the programmer to -do that manually? - -### Straw man proposal - -Allow `virtual impl Tr for T;` syntax where `T` must be a struct or enum and -which has the semantics that any inner item of `T` must provide an implmentation -of `Tr`. Similarly to pure virtual methods, this implies that `T` cannot be -instantiated. - -Traits may be marked `inherit` (this is a terrible keyword, anyone got any -better ideas? I guess we could use `virtual` here too): `inherit Trait Tr -{...}`. This implies that for an item `T` to implement `Tr` any outer item of -`T` must also implement `Tr` (possibly providing a pure virtual impl). This is -checked where the impl is declared, so it is possible that an impl could be -declared for an outer item in a different module but due to the visibility -rules, it is invisible, this would be a compile error. Since `impl`s are not -imported, only traits, I believe this means that if a trait is inherit, then -anywhere an implementation for an inner item is visible, then an implementation -for the outer item is also visible. - -Drop is marked `inherit`. - -It is the programmer's responsibility to call `drop()` for outer-items from the -impl for the inner item, if necessary. - -I believe that this gives the desired behaviour and is backwards compatible, -other than the addition of the `virtual` and `inherit` keywords. - -## Calling overridden methods - -If a method is overridden, we should still be able to call it. C++ uses `::` -syntax to allow this, UFCS should let us do this. Since all such uses would use -static dispatch, we would use self-as-arg syntax, e.g., -`BaseType::method(self, ...)`. From a6b141874dbb6707f94c454e7f6352cfa9de8c37 Mon Sep 17 00:00:00 2001 From: Nick Cameron Date: Thu, 28 Aug 2014 17:45:12 +1200 Subject: [PATCH 3/3] more changes --- 0000-virtual.md | 210 +++++++++++++++++++++++++++++++++++------------- 1 file changed, 156 insertions(+), 54 deletions(-) diff --git a/0000-virtual.md b/0000-virtual.md index 2d0245e465c..eedab2d5ed3 100644 --- a/0000-virtual.md +++ b/0000-virtual.md @@ -4,22 +4,23 @@ # Summary -Efficient single inheritance by unification and nesting of structs and enums, -and virtual dispatch of methods called on reference-to-concrete-type objects. - -This will support data structures such as the DOM which need to be efficient -both in speed (particularly non-virtual field access) and space (thin pointers -to abstract types). - -The approach unifies many of our data types so although we add features (virtual -methods, nested enums), we reduce complexity of the language and compiler in -other directions (removing distinctions between structs and enums, makes support -for struct variants and tuple structs less ad hoc). +Support performance (time and space) sensitive data structures (such as the DOM +or an AST) by enhancing enums. In particular, offer a dynamically sized version +of an enum, called a struct. This struct is a generalisation of current structs +and supports efficient single inheritance. Structs and enum variants are unified +(up to the keyword and static vs dynamic size distinction) which means struct +variants, tuple structs, and enum variants as types emerge naturally. We add the +option of virtual dispatch of methods called on reference-to-concrete-type- +objects (in addition to virtual dispatch for reference-to-trait-objects which +exists today). + +These changes are mostly backwards compatible, see the staging section for more +details. # Motivation Supporting efficient, heterogeneous data structures such as the DOM or an AST -(e.g., in the Rust compiler). Precisely we need a form of code sharing which +(e.g., in the Rust compiler). Precisely, we need a form of code reuse which satisfies the following constraints: * cheap field access from internal methods; @@ -114,14 +115,15 @@ types, amongst others). Fields in outer items are inherited by inner items (e.g., `S2` objects have fields `f1` and `f2`). Field shadowing is not allowed. All leaf variants may be instantiated. Non-leaf enums may not be instantiated -(e.g., you can't create an `E3` or `VariantNest` object). By default, all -structs can be instantiated. Structs, including nested structs, but not leaf -structs, may be marked `virtual`, which means they cannot be instantiated. Put -another way, enums are virtual by default. E.g., `virtual struct S1 { ... S2 { -... } }` means `S1` cannot be instantiated, but `S2` can. `virtual struct S1 { -... virtual S2 { ... } }` would mean `S2` could not be instantiated, but would -be illegal because it is a leaf item. The `virtual` keyword can only be used at -the top level or inside another `virtual` struct. +(e.g., you can't create an `E3` or `VariantNest` object), but they my be used in +pattern matching. By default, all structs can be instantiated. Structs, +including nested structs, but not leaf structs, may be marked `virtual`, which +means they cannot be instantiated. Put another way, enums are virtual by +default. E.g., `virtual struct S1 { ... S2 { ... } }` means `S1` cannot be +instantiated, but `S2` can. `virtual struct S1 { ... virtual S2 { ... } }` would +mean `S2` could not be instantiated, but would be illegal because it is a leaf +item. The `virtual` keyword can only be used at the top level or inside another +`virtual` struct. **Open question:** is the above use of the `virtual` keyword a good idea? We could use `abstract` instead (some people do not like `virtual` in general, and @@ -134,18 +136,18 @@ keyword (`struct` or `enum`) is required and must match the outer item. For example, `struct S3 : S1 { ... }` adds another case to the `S1` defintion above and objects of `S3` would inherit the fields `f1` and `f2`. Likewise, one could write `enum Variant3 : E3;` to add a case to the defintion of `E3`. Such items -are only allowed in the same module or a sub-module of the outer item. Why? +are only allowed in the same crate as the outer item. Why? 1. Prevents people from abusing virtual structs to create an open-ended abstraction: traits are more suitable in such cases. 2. Downcasting is more optimizable, becomes O(1) instead of O(n). This is a common complaint against C++ RTTI (as pointed out on the mailing list). - 3. Addresses the private fields initialization gotcha. (Without this - restriction, it is not clear how to initialise a struct with private fields - in a different module). + 3. Prevents the 'fragile base class' problem - since there are no derived + structs outside the base struct's crate, changing the base class has only + limited and known effects. -When matching data types, you can use any names from any level of nesting to -cover all inner levels. E.g., +When pattern matching data types, you can use any names from any level of +nesting to cover all inner levels. E.g., ``` fn foo(x: E3) { @@ -172,7 +174,7 @@ affects how they can be used). enum objects are represented as they are today. They have a tag and are the size of the largest variant plus the tag. A pointer or reference to an enum object is a thin pointer to a regular enum object. Nested variants should use a single tag and the 'largest variant' must -take into account nesting. Event if we know the static type restricts us to a +take into account nesting. Even if we know the static type restricts us to a small object, we must assume it could be a larger variant. That allows for trivial coercions from nested variants to outer variants. We could optimise this later, perhaps. @@ -185,14 +187,17 @@ values have their minimal size (i.e., their size does not take into account other variants). This is also current behaviour. Pointers to structs are DST pointers, but are not fat. They point to a pointer to a vtable, followed by the data in the struct. The vtable pointer allows identification of the concrete -struct variant. Pointer-to-struct objects may only be dereferenced if the static -type is a leaf and gives only the concrete object with no indication of the -vtable. The runtime representation of pointer-to-leaf struct objects is changed -from the current representation in that the pointer is a pointer to -`[vtable_ptr, data]` rather than a pointer to `data`. However, since -dereferencing must give the data (and not the `[vtable_ptr, data]` -representation), this difference is only observable if the programmer uses -unsafe transmutes. +struct variant. + +Pointer-to-struct objects may only be dereferenced if the static type is a leaf +struct. Dereferencing gives only the concrete object with no indication of the +vtable. + +The runtime representation of pointer-to-leaf struct objects is changed from the +current representation in that the pointer is a pointer to `[vtable_ptr, data]` +rather than a pointer to `data`. However, since dereferencing must give the data +(and not the `[vtable_ptr, data]` representation), this difference is only +observable if the programmer uses unsafe transmutes. To summarise the important differences between enums and structs: enum objects may be passed by value where an outer enum type is expected. Struct objects may @@ -238,8 +243,8 @@ A programmer would typically use enums for small, similar size objects where the data is secondary and discrimination is primary. For example, the current `Result` type. Structs would be used for large or diversely sized objects where discrimination is secondary, that is they are often used in a polymorphic -setting. A good candidate would be the AST enum in libsyntax (or of course, the -DOM in Servo). +setting. A good candidate would be the AST enum in libsyntax or the DOM in +Servo. Matching struct objects (that is pointer-to-structs) should take into account the dynamic type given by the vtable pointer and thus allow for safe and efficient @@ -251,7 +256,7 @@ Methods may be marked as `virtual` which allows them to be overridden by a sub- struct. Overriding methods must be marked `override`. It is an error for a method to override without the annotation, or for an annotated method not to override a super-struct method. Methods may be marked as both `override` and -`virtual` to indicate that override and may in turn be overridden. (Methods +`virtual` to indicate that they override and may in turn be overridden. (Methods marked `override` but not `virtual` must override but may not be overriden). A method without the `virtual` annotation is final (in the Java sense) and may not be overridden. Virtual methods may be given without a body, these are pure @@ -266,24 +271,23 @@ dynamically using an object's vtable. Nothing in this RFC introduces subtyping. -Inner enum values can implicitly coerce to outer enum values. +Inner enum values can implicitly coerce to outer enum values as can enum pointer +values. Inner struct pointer values can implicitly coerce to outer struct pointer values. Note that there is no coercion between struct values. Since all but leaf structs are unsized, they may not be dereferenced. Thus we are immune to the -object slicing problem from C++. +object slicing problem from C++. Mutable references cannot be upcast (coerced) +in this way since it would be unsafe (once the scope of the 'super-type' borrow +expires, the 'sub-type' reference can be accessed again and the pointee may have +changed type). Coercion of mutable `Box` pointers is allowed. **Open question: We could choose to force explicit coercions. It would make -**sense for the behaviour to match sub-traits, whatever we decide for that. +sense for the behaviour to match sub-traits, whatever we decide for that. Via the DST rules, it should fall out that these coercions work for smart pointers as well as `&` and `Box` pointers. -Note that this means if `R` is an inner struct of `S` and `S` implements a trait -`T`, but `R` does not, then given a pointer to an `R` object, it may be coerced -(explicitly) to an `S` in order to call methods defined in `T`, if the self type -of those methods is a pointer to self (e.g., `&self`). - If in the future we decide that subtyping is useful, we could add it backwards compatibly. @@ -291,7 +295,7 @@ compatibly. (I feel the syntax could be nicer here, any ideas?) -Nested items must specify formal and actual type parameters. The outer items' +Nested structs must specify formal and actual type parameters. The outer items' type parameters must be given between `<>` after a `:` (similar to the inheritance notation, but no need to name the outer item). E.g., @@ -344,10 +348,14 @@ struct Sgn : Sg { let x = Sgn { field: ... }; ``` +For enums, only the outermost enum may have type parameters. All nested enums +implicitly inherit all those type parameters. This is necessary both for +backwards compatibilty and to know the size of any enum instance. + ## Privacy The privacy rules for fields remain unchanged. Nested items inherit their -privacy from their parent, so module private by default unless the parent is +privacy from their parent, i.e., module private by default unless the parent is marked `pub`. **Open question:** is there a use case for allowing nested items to be marked @@ -398,10 +406,20 @@ implement it too, via coercion. (ht Niko). If a method is overridden, we should still be able to call it. C++ uses `::` syntax to allow this, UFCS should let us do this. Since all such uses would use -static dispatch, we would use self-as-arg syntax, e.g., +static dispatch, we would use self-as-argument syntax, e.g., `BaseType::method(self, ...)`. +## Trait matching + +When searching for traits to match an expression, super-structs/enums should +also checked for impls. Searching for impl candidates is essentially a matter of +dereferencing a bunch of times and then trying to apply a subset of coercions +(auto-slicing, etc.), and then auto-borrowing. With this RFC, we would add +checking of outer items to the set of coercions checked. We would only consider +these candidates for structs if the type of `self` is a reference type. + + # JDM's example From https://gist.github.com/jdm/9900569 @@ -497,7 +515,7 @@ extend a single struct and add subtyping appropriately. We would then need to add support for some kind of RTTI (possibly using a trait and macros) to allow safe and efficient downcasting. -## Some previous RFCs +## Some other RFCs * [Virtual Structs (5)](/~https://github.com/rust-lang/rfcs/pull/5) Stays as closely as possible to single inheritance in Java or C++. Touches only @@ -523,6 +541,14 @@ safe and efficient downcasting. * [Unify and nest enums and structs (24)](/~https://github.com/rust-lang/rfcs/pull/24) A variation of RFC 11, superseeded by this RFC. +* [Trait based inheritance (223)](/~https://github.com/rust-lang/rfcs/pull/223) + "This is largely based upon #9 and #91, but fleshed out to make an actual + inheritance proposal.". I'm afraid I haven't spent enough time on this to give + an accurate summary. + +* Kimundi and eddyb have promised an RFC for a possible solution using trait + fields. + ## Variation - `data` for `struct` and `enum` @@ -540,6 +566,59 @@ respectively. # Unresolved questions +## Initialisation + +To initialise a struct you must give values for all its fields. There is a +technical and an ergonomic problem here: if the base struct is in a different +module, then its private fields cannot be named in the constructor for the +derived struct; and if the base struct has a lot of fields, it is painful and +error-prone to write out their values in multiple places. + +We can address the first problem by adjusting the privacy rules to always allow +the naming of private fields in constructors if the most derived struct's fields +are visible. + +To address the second problem, we start by adjusting the rules for struct +initialisers. An initialiser currently has the form +`Foo { f_0: value_0, ..., f_n: value_n, .. e }` where `e` is an expression with +type `Foo` and which supplies any fields not in the field list. We can make this +more general by accepting an expression with type `Foo` or any of its base +structs, where the programmer must explicitly give at least any missing fields. +This addresses both the first and second problems described above. However, it +has a problem of its own if the base struct is virtual - then we cannot +create an instance with the required type, so the derived struct is impossible +to instantiate. + +I don't see any good way to solve this problem. Here are some ideas (I think the +second or third are my favourites): + +* Where a struct is virtual, allow the struct to be instantiated, but do not + allow any method calls on such objects, nor taking its address. The only + operations allowed on objects with such type are field access and use in + initialiser expressions. This is yet more added complexity. + +* Some attribute similar to `deriving` for a virtual base struct which + automatically generates a non-virtual derived struct with no extra fields and + with an implementation which `fail!`s for every method call. A constructor for + the base struct can create one of these objects and return a `Box` (if + `Foo` is the base struct). We would adjust struct initialisers to allow + `Box` as well as `T` where `T` is the struct being initialised or any of + its base structs. This can be thought of as a dynamic version of the above + (static) proposal. It has the advantage of much less compiler complexity and + no new language rules (ish), however, this comes at the expense of risk of + runtime failure. + +* Allow fields to have default values, e.g., `struct Foo { x: int, y:int = 42 }`, + here `y` has a default. When instantiating `Foo`, `x` must be provided and `y` + may be. If `y` is provided then it overrides the default value. If a struct is + to have derived structs in different modules, then all private fields must + have defaults (this does not need to be enforced by the compiler - it is a + natural consequence). This has the disadvantage that fields can only ever have + a single default (as opposed to using multiple constructor functions) and + there can be no input to field defaults. It is also more language complexity, + but I believe this is useful in regular structs too. + + ## Multiple inheritance Do we need multiple inheritance? We _could_ add it, but there are lots of design @@ -580,7 +659,30 @@ impl Element2 { ``` I believe that all such uses can be implemented using the traits mechanisms in -Rust and that these will interact cleanly with the proposed features for -efficient single inheritance. Therefore, we should not add any additional -mechanism for multiple inheritance. - +Rust and that these will interact cleanly with the rest of this RFC. Therefore, +we should not add any additional mechanism for multiple inheritance. + + +# Staging + +There is a lot to implement here. Here is an approximate implementation plan: + +* Add enum variants to the type name space (this is backwards incompatible so +must happen before 1.0); +* Ensure the syntax for struct inheritance matches the syntax in this RFC for +non-lexical nesting; +* Allow private fields to be initialised in the initialisers of sub-structs +(possibly remove this later); +* Implement the rules around instantiation of virtual structs; +* Add vtables for virtual structs and ensure virtual structs behave as DST +unsized types; +* Add coercions for virtual structs; +* Implement trait matching rules; +* Implement rules for Drop and destructors; +* Implement virtual impls; +* Implement pattern matching downcasts for virtual structs (at this stage we +have everything Servo needs for the DOM); +* Implement initialisation mechanism; +* Allow lexically nested structs and deeply nested enums. Refactor the compiler +middle-end to have a single representation; +* Allow use of enum variants as types.