Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient single inheritance with traits. #245

Closed
wants to merge 5 commits into from
Closed

Conversation

nrc
Copy link
Member

@nrc nrc commented Sep 17, 2014

Efficient single inheritance via virtual structs and unification and nesting of structs and enums. In contrast to #142, this RFC uses traits for virtual dispatch rather than a custom system around impls. The parts of the RFC about nested structs and enums are identical to #142.

Efficient single inheritance via virtual structs and unification and nesting of structs and enums. In contrast to rust-lang#142, this RFC uses traits for virtual dispatch rather than a custom system around impls. The parts of the RFC about nested structs and enums are identical to rust-lang#142.
@nrc nrc self-assigned this Sep 17, 2014
@nrc
Copy link
Member Author

nrc commented Sep 17, 2014

Note that actually, one thing about structs has changed cf #142 - the virtual annotation is not needed.


Any concrete data type which implements a closed trait has a vtable pointer as a
first field in its memory layout (note that nested structs/enums would have such
a field in any case, other data structures would get an extra field). Since

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a struct/enum implements multiple unrelated closed traits?

Does it add a vtable pointer for each of them as C++ would for multiple inheritance, bloating the structure?

Or do you plan to assign non-overlapping vtable index sets to all the closed traits in a crate, thus making the total space consumed by vtables quadratic in the size of the source code? (if you have N closed traits of 1 functions, and N structs, each vtable is size N, hence N^2 total space consumed by vtables, mostly consisting of holes)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good questions! I will ponder these. If the traits are unrelated, I expect we have to fall back to fat pointers (with a warning) or give an error.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess overall quadratic vtable space is probably not going to be a problem in practice, especially if closed traits are used rarely, which they should be.

Thinking about it, even normal Java-style single inheritance causes quadratic vtable space, since every time you create a subclass you have to copy all previous vtable entries.

Also, one can reuse vtable slots for traits that are never implemented together on a single type (this reduces to a graph coloring problem where nodes are trait methods and there is an edge between all methods in the same trait between methods in different traits if any struct implements both traits).

@nrc
Copy link
Member Author

nrc commented Sep 18, 2014

ignore my first comment, it's not quite right

@bill-myers
Copy link

Overall, the proposal seems in a reasonable direction.

I wonder whether it's a good idea to decide at the moment of declaration whether structs/enums are sized or not: I think it might be better to do so at the point of usage.

This is because it seems like there might often be no "natural" choice of sizedness when defining data, so making a choice would be arbitrary and would restrict expressiveness.

In general, given a struct/enum hierarchy, one could have a system where the types are referred as 3-tuples of sets of variants [<D, W, R>].

D would be the set of variants that can be represented by the discriminant/vtable pointer representation of a [<D, W, R>]; W would be the set of variants that can be assigned to an &mut [<D, W, R>] (in addition to the current runtime type), and R the set of variants that can be read from a &[<D, W, R>]

Obviously W and R must be contained in D.

For instance (here Option is just syntax sugar for {Some, None}):

  • [<Option, Option, Option>] would be the current Option
  • [<Some, Some, Some>] would be isomorphic to T
  • [<None, None, None>] would be a 0-sized type
  • [<Option, Some, Some>] would be a 1 byte followed by T
  • [<Option, None, None>] would be a 0 byte
  • [<Option, Option, None>] would be a 0 byte, followed by space to store a T
  • [<Option, {}, Option>] would be a DST type equivalent to an unsized version of the current Option
  • [<Option, Option, {}>] would be uninitialized memory of Option size

These conversions are allowed:

  1. On a value, change the D set by converting the data (as long as W and R are still contained in it)
  2. On a & or &mut, change the D set to a singleton set by incrementing the pointer to skip the vtable
  3. On a value or &, expand the W set (subject to it being contained in D)
  4. On a value, & or &mut, restrict the W set (no need for checks)
  5. On a value, & or &mut, restrict the R set subject to a runtime check that the actual value is not being removed
  6. On a value, & or &mut, expand the R set (subject to it being contained in D)

Basically, this allows to express all "flavors" of hierarchical data, including the "body" of it without discriminant and even uninitialized memory.

Obviously, there would be much better syntax and syntax sugar than the illustrative syntax above.

[Note that the W set would be constrained to have an upper bound of size, so specifying the root enum would not be allowed if any generic parameters are introduced in a way that makes size unbounded.]

Maybe this could warrant a separate RFC, but I feel it might be interesting to think about this while evaluating this one.

@thestinger
Copy link

@nick29581: Can the old RFC be closed, or is there a good reason to leave both as alternatives? There's not going to be adequate input on any of these inheritance RFCs because there are too many.

@nrc
Copy link
Member Author

nrc commented Sep 18, 2014

I prefer to leave the old one open as an alternative - it has some advantages (conciseness in the Sevo use-case, mainly). There's no downside to leaving it open for a while, hopefully we'll make a call on this stuff soon.

@thestinger
Copy link

The downside is that there are a dozen of these RFCs and few people are actually going to invest the time reviewing each one. It's not possible to get consensus on any of these without narrowing down the alternatives first.

@glaebhoerl
Copy link
Contributor

FWIW I am evidence for @thestinger's claim. I haven't even tried reading, understanding, and evaluating most of these proposals yet. (There's a lot of them, and most of them are huge.) I keep planning to do so, but...

hopefully we'll make a call on this stuff soon

I would rather hope we won't. These seem like exactly the kind of features which it would be better to postpone any final decisions on until later. They're big and complex and interact with other still-baking aspects of the language and there's no critical need to finalize them before 1.0. (Servo is important, but it can muddle through and/or use feature gates. I don't think satisfying Servo's every desire needs to be a criterium for releasing 1.0.)

(Note that I'm not suggesting we stop thinking about and discussing it. Thinking forward is important and valuable, and may inform decisions on other parts of the language. We just shouldn't make finishing all of it a condition for 1.0.)

@nikomatsakis
Copy link
Contributor

@thestinger in the last week, I at least have been reading through all of them, one by one, so I am glad they are not closed. We will eventually start culling once we feel we have a good understanding of the roles of each RFC. We can then produce something more final incorporating bits of all of them which a wider audience can read and critique.

@nikomatsakis
Copy link
Contributor

Some initial comments:

  1. Overall, I approve of separating concerns. I think that "improving enums" is a valid goal and that tying virtual dispatch to enum/struct typing is not necessarily a good idea. So the direction and thrust of this RFC makes sense, probably, perhaps we can wind up dividing these ideas into distinct RFCs. That said, it is not clear to me that "unifying enums and structs" is meant to be, given some of the concerns I raise over future points. As a meta point, there are enough subtle details that I think we ought to consider building up a redex model for the type rules in question (the existing models focus more on lower-level semantics and ignore e.g. generic types and traits). Building a model for DST was very helpful and didn't wind up taking all that long.

  2. The Variant2(f: 34, 23) syntax doesn't obviously fit in with function calls. Currently, at least, tuple structs and variants are "just functions", at least with respect to calls (they have additional semantics for pattern matching). I'd be inclined to just forbid mixing and matching named and numbered fields for the time being (or, to put another way, to forbid a "tuple variant" from extending anything with fields of any kind). Alternatively, we could allow you to write {0: 23, f: 34}.

  3. The rules suggest that all structs would always have an implicit vtable or other type tag. Clearly we only want these tags in the case where they are "needed". This needs to be better defined.

  4. The lack of subtyping is going to cause inference problems that require annotation. This will not be backwards compatible. For example:

    let mut x = vec![None, None]; // x winds up with type Vec
    x[0] = Some(3); // Error!

  5. You state that if an outer data structure implements the trait then all children implement the trait too. This may indeed be safe but it is not obvious to me. There are all kinds of funky scenarios, like by-value self, or fn eq(&self, other: Self). Maybe they all work out if we think about things right -- i.e., this is a sort of "default implementation" -- but it merits elaboration.

  6. Your rule on closed traits states that they must be implemented within current crate and asserts that this is enough to forego vtables. This is not the case. If you implement a closed trait on an integer, for example, a vtable is needed. The rule needs to be that a closed trait can only be implemented for types declared within the current crate -- which implies your rule but is yet more restrictive. It seems a bit odd though that implementing a closed trait causes a vtable to magically "sprout" onto some other type. Action at a distance, basically -- it seems more natural for a type to declare the traits that are "inherent" to it, a la @eddyb and @Kimundi.

  7. Closed traits as you originally defined them are useful for other cases, though, like restricting your input to one of N known types.

  8. In the case of struct inheritance, where the non-leaf types are unsized, it should be safe for &mut Sub to be coerced to &mut Sup. This is because Sup is an unsized type and hence *super is unassignable, so there is still no slicing issue. This is important because otherwise inherent methods defined on the super type with &mut self could not be invoked on a subtype instance!

  9. Note that the previous section is not safe for enums. This is because enum base types are sized and hence the usual variance fears around mutability apply.

  10. We still need to abstraction of initialization of base structs.

  11. I haven't fully digested the inherit part yet...I had previously assumed that we would special case Drop. Since the hierarchy is closed it's plausible for us to know if any subtypes implement Drop and then ensure that it gets invoked appropriately, just as we do with enums.

@nrc
Copy link
Member Author

nrc commented Sep 22, 2014

replying to @nikomatsakis

*1. Good, the more I think about this, the more I prefer this to my previous suggestion. A formal model is probably useful.

*2. I would prefer to forbid using a tuple variant as a function if it has named fields (I would actually prefer to allow named params for all functions, but that is out of scope). I think using numbers for field names for unnamed fields in initialisers/patterns is a great idea, it complements using them as field names for access well. I should think about how exactly it would work.

*4. Ouch. True. I don't see this as too much of a problem (especially with type ascription or allowing trivial casts), but the backwards incompatibility is a pain.

*6. Yeah, I think I was assuming implementers must be in the same crate too but didn't write it. I think the action at a distance problem can be solved by requiring the concrete types to be annotated as closed too. Since they must be in the same crate, that doesn't seem like it would cause problems.

*8. I'm not clear on this, but I thought that was the problem we discussed in SF - once the borrow expires you have the &mut sub back and then you have issues, but the fact that everything is unsized seems to avoid the problem.

*10. I'm not sure what this means, sorry.

*11. I guess we could, the general solution seems nicer, although the inherit stuff is more complexity.

@nrc
Copy link
Member Author

nrc commented Sep 23, 2014

I have changed quite a bit of the proposal. Unfortunately I did not have time to polish this in the least, so there are some open questions around the details and it is a bit rough.

The main change compared to the earlier draft is that struct and enum are now synonyms, properly unifying these data types. I believe the earlier approach of them being almost the same was too confusing. This means we require a few more annotations/keywords, but I think it is a worthwhile change.

I would like to tell the story of this RFC in parts (I've done this in the summary in more depth):

  • make Rust data structures more powerful, more orthogonal, and more unified. This includes data inheritance which, in keeping with the data/behaviour separation in Rust is totally separate from trait inheritance/implementation.
  • Add behaviour inheritance which follows data inheritance when /implementing a specific trait/. I hope this feels natural as a way to share behaviour between related data types.
  • Offer optimisations for speed and space using the closed and unsized annotations.

I hope that this addressed the two big (concrete) criticisms of earlier proposals about the near-unification of enums and structs and the inheritance in trait-less impls.

@CloudiDust
Copy link
Contributor

I think "enums with top level fields" and "structs with no top level fields" don't make much sense.

So instead of making them synonyms, I think we can use the following rules:

  1. an enum cannot have top level fields;
  2. a struct must have at least one top level field.
  3. otherwise they are the same.

@nrc nrc mentioned this pull request Oct 3, 2014
@nrc nrc added the postponed RFCs that have been postponed and may be revisited at a later time. label Oct 3, 2014
@nrc
Copy link
Member Author

nrc commented Oct 3, 2014

Closed as postponed. Issue to be tracked in #349.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
postponed RFCs that have been postponed and may be revisited at a later time.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants