Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the usage of tuples in argument positions from our API #747

Closed
wants to merge 11 commits into from

Conversation

sgrif
Copy link
Member

@sgrif sgrif commented Feb 22, 2017

We are seeking feedback

The core team hasn't decided if we actually want to go forward with this proposal. We are seeking feedback from the community on their thoughts. The purpose of this PR is to provide a place to centralize discussion, and provide the ability for people to try out the proposal against their code bases. We would like to hear from you on how much this change impacts your codebase, how painful migrating to this would be, and whether you'd find this confusing as a newcomer.

Overview

This pull request changes any functions which could previously take a tuple as an argument to take an hlist instead. The TL;DR is that the rule of thumb for "things that are separated by a comma in SQL are represented as _ in diesel" changes from "a tuple" to "an hlist". This means that (x, y) changes to hlist!(x, y) (we may land on a different name, see the Unanswered Questions section below).

For those unfamiliar, "hlist" is short for "heterogeneous list". It is essentially a singly linked list where each element is able to be of a different type. The list encodes this in its own type. It is effectively the same as a bunch of nested pairs. e.g. Cons(1, Cons(2, Cons(3, Nil))) is more-or-less the same as (1, (2, 3)) or (1, (2, (3, ()))).

The specific functions that change are query.select, query.order, and update(table).set(...) when updating multiple columns but not using a struct that has #[derive(AsChangeset)].

A less visible API that is changed is the type argument to the sql function. However, this function may be deprecated and removed before 1.0 in favor of an escape hatch API that is untyped (see #650).

This also affects implementations of Queryable, AsChangeset, and Insertable. However, these are usually derived rather than implemented manually (with the possible exception of Queryable).

This does not affect the ability to use tuples in a return type position. This means that doing .load::<(A, B)> will still work, and associations still work in terms of tuples. This is purely due to how likely this change is to affect the average user. An application can be doing extremely complex queries, and still never call .select or .order explicitly with more than one argument. However, any time a join is done, or associations are loaded, tuples get involved in a much more visible fashion. Associations in particular are designed to work around Iterator::zip and Iterator::unzip, which is only feasible if they work on tuples.

Additionally, while code that is passing a tuple can be changed to pass an hlist simply by putting hlist! in front of it, the story is more involved for code working with tuples. Hlists cannot simply be indexed like tuples can, they have to instead be destructured. This is acceptable for us to do internally, but not something I'd like to impose on every user of the library.

What this gets us

First and foremost, the code is simpler, and our compile time goes down. To implement a trait for an arbitrarily sized hlist, you need to provide two impls: Cons<T, U> where T: SomeTrait, U: SomeTrait and Nil. (In some places we do Cons<T, Cons<U, Tail>> and Cons<T, Nil> instead when we have things like comma separation to worry about).

Implementing code for an arbitrary sized tuple is much harder. You basically have to write the impls as a macro, and invoke it for whatever size you want to support. This is what the standard library does for traits like PartialEq (which aren't implemented on tuples larger than 12 elements). This change removes half of the code in the main crate (as measured by cargo rustc -- -Zunstable-options --pretty=expanded | wc -l), and decreases compile times by 70% (measured by touch src/lib.rs && cargo build). Moving this code out of macros and into regular rust will also make it much easier for new contributors to understand.

This may also open the door for several future improvements that would not be possible with tuples. Several blanket impls that we want to write are impossible because they overlap with the impls for tuples. At the moment most of them overlap with hlists as well, but specialization and/or future improvements to specialization are easier to apply to hlists than tuples.

In particular, I think that this change will eventually allow us to remove the requirements that struct order match the column order (in cases where your struct is one-to-one with a table), and also remove the requirement that you must have a field for your struct for every member of the select statement. Right now we would be unable to write the impls required to make that happen, but several of the possibilities being discussed for specialization would allow us to make that happen. If you're interested in the details of that, I've written them up here.

You may notice that some of these gains are all assuming speculative future enhancements to the language. Why not wait until they actually happen to make this change? Ultimately we want to release 1.0 before those changes are likely to happen. If we want to avoid breaking backwards compatibility, we need to leave the door open to take advantage of those changes when they happen. Even if we limit the "wins" to the more accessible code and improved compile times, I do think it's still worth it.

Why didn't we use an existing library for this?

Ultimately if we make this change, hlists become a very fundamental part of people interacting with Diesel, which makes me uncomfortable leaving that in the hands of another crate unless it's already well established within the community. There also isn't a crate that is maintained and just does hlists. Having a type which is local to Diesel is useful for us from a coherence point of view.

Drawbacks

The most obvious drawback is that we take tuples instead of hlists. Tuples are in the standard library. hlists aren't. That said, I don't know that people are actually discovering that they can pass tuples on their own. Two common questions in gitter are "how do I order by multiple things", and "how do I update multiple columns without using a struct". We can fix that by giving some more up front examples of those cases, but if people aren't finding tuples without being shown, hlists aren't really any worse. They're just equally bad. Ultimately what we really want is a variadic function, but that's not going to happen.

Additionally, this will make some of our scary type signatures more scary. diesel::hlist::Cons<Foo, diesel::hlist::Cons<Bar, diesel::hlist::Nil>> is harder to read than (Foo, Bar). That said, the type would really only show up in errors that were already incomprehensible. Nobody is actually making sense of SelectStatement<five, terminal, windows, of, types> other than the core team when people paste them in gitter asking for help. Our goal has been to structure things so that the types don't get dumped in error messages, and will continue to be.

Some of our error messages got... well I'm not sure if it's worse, or better. For example, the error for the table having one more column than the struct has changed from (i32, String, String): FromSqlRow<(Integer, Text, Text, Double)> failed to Nil: FromSqlRow<Cons<Double, Nil>> failed. One could argue that without reading the documentation, a person might actually figure out what's going on from the first error message, but not from the second. In practice I'm not sure that's true though, and it's a lot easier to document that Nil: FromSqlRow means you have more columns on the table than the struct than it is for us to say "hey make sure you count the tuple elements if you see an error about a tuple and another tuple".

Alternatives

We could also keep this change entirely internal, provide an IntoHlist trait, and implement that for tuples of arbitrary sizes. This is a bad alternative for 2 reasons. First, it will bring back the "we need to implement things for really large tuples" problem, and hurt compile times. Second, since these functions can also take just one argument, we will need to brute force that trait for a lot of types, including some which are potentially inconvenient to do so. Specialization may improve that situation when it stabilizes.

Unanswered Questions

Naming. hlist! could conflict with other libraries, so we'd want to have the canonical name be diesel_hlist! (though I want to keep the short variant. People can choose not to import the short version if they don't want to). However, do we actually want to make the fact that we're using hlists apparent? Would a name like args! or columns! make more sense?

How we teach this

We should almost certainly add an example in the getting started guide which demonstrates how hlists work.

I'm exploring how it would feel to stop using tuples for anything, and
take an hlist instead. I don't think this would be too painful in most
places. If this does lead to reasonable ergonomics, I think we could
have some big wins. Our compile times would reduce drastically as we
could remove 80% of our code (`diesel/types/impls/tuples.rs` is *huge*
when you expand the macro). We may also be able to provide several
blanket impls that were previously coherence issues. I also have a proof
of concept that demonstrates how we could use hlists to remove the need
for struct fields to be ordered the same as the columns of a table.

The place that might be painful is to remove the `Queryable` and
`FromSqlRow` impls for tuples. Even if it's not painful, it's where it
would be the most visible.

Other places that it would be visible, but I don't think it will be
painful are:

- Passing `hlist!(...)` instead of `(...)` to `.select`
- Passing `hlist!(...)` instead of `(...)` to `.order`
- Passing `hlist!(...)` instead of `(...)` to `.set` when directly
  updating fields rather than using a struct w/
  `#[derive(AsChangeset)]`.

This would also affect the derived implementations of `AsChangeset`,
`Insertable`, and `Queryable`. However, I don't think those will be
noticed by most people, as these are usually derived not implemented
manually (`Queryable` is the only one that doesn't completely suck to
implement yourself).

I have no clue if the ergonomics will be reasonable, or if the wins I
suspect will come out of this will actually do so, but I'm going to
explore it.
Our types can be pretty scary when they show up in error messages.
Replacing tuples with hlists will probably make them even more scary.
`diesel::hlist::Cons(A, diesel::hlist::Cons(B, diesel::hlist::Nil))` is
noisier than `(A, B)` if nothing else.

While we can't change how the type will appear (if it appears), we can
change how it looks if it's printed out (because of an `assert_eq!`
failing or otherwise). Hlists are ultimately closer to tuples than
lists, so I've gone with tuple formatting for the output.
Only one test was affected by this change, which I think probably
reflects how uncommon it is to directly pass a tuple to `.set`.
`#[derive(AsChangeset)]` is automatically updated.
This one is almost certainly not visible to users. Implementing
`Insertable` manually is a hellish nightmare at the moment. I'm 99%
certain all impls in the wild are using `#[derive]`.
Now we start to get into where this spike would affect public API. This
touches a lot of files (more than I expected, in fact) but most of the
things it touches are internal. The three methods that this actually
affects are `.select`, `.order`, and `.returning`. In all cases, the
solution is to replace `((...))` with `(hlist!(...))`.

I'm unsure if removing these impls, or the queryable impl for tuples
will be more visible. The queryable impl will almost certainly affect
our test suite more, but I'm unsure how much people are deserializing
into straight tuples in real apps. My gut says this commit will affect
more apps, but removing the queryable impl will be more painful for apps
that are affected.

The fact that this broke `create_table` in our tests is pretty funny to
me. I rely way too much on tuples.

Continuing with the theme of "this makes our code simpler", this has the
side effect of closing a few issues/PRs.

Fixes #521.
Close #617. (Sorry @phungleson, your PR looked fine FWIW)
This change basically only impacts implementations of `Queryable`. I've
opted not to remove the `Queryable` impl in this commit, as it will be
the most visible change and not one that I'm 100% sure I want to make.

The `!#[recursion_limit]` addition is due to the *extremely helpful*
error message I received:

```
error: reached recursion limit while checking
                                             inhabitedness of `hlist::Cons<<P as query_source::Queryable<SP, DB>>::Row, hlist::Nil>`
```

Yes, that is really how the error message was indented.
I was originally going to remove this impl entirely, but I felt like
that change was far too invasive and confusing, especially after seeing
it affect the return values of joins. It may still be worthwhile to
explore in the future, but I need to see a more concrete win to make
that big of a change.

Since we're leaving the `Queryable` impls in place, we'll want to leave
the `BelongsTo` impls in place as well for associations, since they'll
work with the direct result of queries.

Since I can write a much simpler `Queryable` impl, I was able to
simplify the `tuple_impls!` macro as well. The remaining impls don't
make sense to implement for arbitrarily large tuples, since it's only
used for loading/grouping associations, and the return values of queries
when you don't want to use a struct. Neither one makes sense for very
large numbers. I've picked 12 as the new arbitrary limit because that is
the largest tuple for which stdlib provides impls of things like
`PartialEq`.
Since we no longer directly reference an hlist that is greater than 16
elements in length, we are no longer hitting the recursion limit. It's
worth noting that users who are putting `#[derive(Queryable)]` on a
struct with 16+ fields *will* need to up the recursion limit on their
crate. This will happen regardless of whether we have this attribute in
Diesel or not. We should probably compile a list of "FAQs" and include
this one in it, since the error message given by the compiler does not
present upping the recursion limit as a suggestion.
@sgrif sgrif requested a review from killercup February 22, 2017 18:30
@hiimtaylorjones
Copy link

I think restrictions and drawbacks are worth it for what we gain. It'll definitely require a lot of work to redo a lot of guide documentation, but it'll hopefully make Diesel better in the long run!

@sgrif
Copy link
Member Author

sgrif commented Feb 22, 2017

It'll definitely require a lot of work to redo a lot of guide documentation

Ironically the guide is 100% unaffected by this change. We never call .select or .order in the guide (and even if we did it would probably be with 1 argument since that's the most common case)

@Drakulix
Copy link

Drakulix commented Feb 22, 2017

A quick grep on my two largest code bases using diesel (both deployed at work and the only ones I really care about, because the rest is hobby and I would always happily accept using better code, even if it breaks compatibility) showed that I am not using tuples anywhere.

I barely use select, I mostly use filter, update().set(...) always with the full object (or use .save() instead) and order never with more then one argument.

This is probably a result from my programming background, I was mostly used to OOP before using Rust, which I would not necessarily call object orientated. I am not a big fan of databases and try to avoid them where not favorable in terms of performance and that probably results in trying to use common code patterns, when dealing with DBs.

That's why I keep most operations simple (get & set) and only use queries where I need to.
And diesel makes this very comfortable.

Regarding documentation I would definitely say the examples can be improved not only in regards to hlist!, but simply to provide more examples. I would maybe use more complex queries, if it wouldn't be so difficult and annoying to browse all those traits to find out, what I can use in combination with what else. If it does not provide an obvious performance penalty I mostly don't use, what else diesel has to offer.

Copy link
Member

@killercup killercup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all, I'm in favor of this change.

From a user's perspective, ideally, this shouldn't matter. We should probably add a bit of documentation (or a link to a new guide) to every public method that takes hlists as input parameters. I originally suggested to name it columns! as what our users are using the macro for is selecting columns. Sean's args! is more abstract and conveys that we want to use this to let methods support variadic arguments. Both of these names hide the heterogenous list as an implementation detail.

This will be one more concept to learn before contributing to diesel. This will however be something valuable and interesting to learn and replace some pretty hairy macros.

@sgrif wrote in "Why didn't we use an existing library for this?":

There also isn't a crate that is maintained and just does hlists.

On this note, I want to mention that I first saw a HList implementation in Rust (I've previously only known this from the Haskell module of the same name) in frunk, which uses it as a building block for a lot of fancy functional constructs.

In the funk repo, there is also the frunk-core crate that contains hlist and some generics shenanigans. It sounds like an internal thing, but maybe it doesn't have to be? → cc @lloydmeta

Oh, and also this just came in: rust-lang/rfcs#1921

(And by the way, frunk should definitely be renamed to frost, as the German word for Rust is Rost. And the reason for that is in no way that I wanted to make a diesel/glow plug joke! 😉 )

use types::Bool;

/// This method is used by `FindDsl` to work with tuples. Because we cannot
/// This method is used by `FindDsl` to work with hlists. Because we cannot
/// express this without specialization or overlapping impls, it is brute force
/// implemented on columns in the `column!` macro.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by comment: Actually, the macro is called __diesel_column!

Expr: Expression<SqlType=<Col::SqlType as IntoNullable>::Nullable> + QueryFragment<DB>,
Tail: InsertValuesRecursive<DB>,
{
fn column_names(&self, comma_needed: bool, out: &mut DB::QueryBuilder) -> BuildQueryResult {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd refactor this whole comma needed business with a simple MaybeComma enum instead of a bool: https://gist.github.com/killercup/e57f3f1cd83531a88510512fa2f634d4

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@@ -0,0 +1,88 @@
use std::fmt::{Debug, Formatter, Error as FmtError};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add some (module-level) docs here. At least for contributors looking to understand diesel's inner workings. Like this maybe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the whole thing will need a lot of documentation if merged.

@Fiedzia
Copy link
Contributor

Fiedzia commented Feb 22, 2017

I am planning to use diesel, but I haven't yet, so +1 for changing it now. Compile time matters to me.
Is the columns! as intuitive as I think (not being familiar with diesel)? In that case I am definitely for it.

@cramertj
Copy link

cramertj commented Feb 22, 2017

I'd prefer Diesel continue to use tuples, but that there was an easier way to impl traits for tuples. As I commented on the RFC, I want to see Split and Cons traits added to the language and implemented for all tuples through some sort of intrinsic. @eddyb has a working prototype here (this implementation uses macros, but that would be unnecessary if an intrinsic of some kind was provided).

I believe this solution mitigates many of the drawbacks you listed above, namely:

  • "Tuples are in the standard library. hlists aren't."
    This approach would allow users to keep using tuples, a familiar type, and to avoid introducing hlist! macros all over the place.
  • "diesel::hlist::Cons<Foo, diesel::hlist::Cons<Bar, diesel::hlist::Nil>> is harder to read than (Foo, Bar)"
    Since this approach allows users to use tuples directly, they don't have to try and parse new cons-cell types.

Overall, I think the addition of Split and Cons traits offers the best of both worlds: familiar std types, Rust syntax, smaller generated code, faster compile times, and a standard way to handle heterogeneous lists in Rust.

edit: Adding this here since it was mentioned in the RFC-- @eddyb thinks we could maybe add impl<H, ...T> Foo for (H, ...T) which would prevent the coherence issues with impl<T> for T: Split and co.

@PlasmaPower
Copy link

Ultimately if we make this change, hlists become a very fundamental part of people interacting with Diesel, which makes me uncomfortable leaving that in the hands of another crate unless it's already well established within the community. There also isn't a crate that is maintained and just does hlists. Having a type which is local to Diesel is useful for us from a coherence point of view.

Maybe we should create a crate hlists under the diesel-rs GitHub user, for both diesel and whoever else wants hlists.

@tupshin
Copy link

tupshin commented Feb 23, 2017

I am hugely in favor of moving to HLists in general. Regarding an internal HList vs using an existing crate, I partially agree with your reasoning. However, it is worth noting that frunk-core actually consists of not much more than HList.

So while it might not be the right time to standardize on an external one, worth keeping on eye on that one.

@PlasmaPower
Copy link

PlasmaPower commented Feb 23, 2017

I agree that frunk-core would be a reasonable solution, but I do have my concerns. First, frunk isn't currently all that popular. It only has about 4% of the downloads of diesel. Therefore, it's a lot less likely to be properly maintained. Second, at the moment frunk-core exists solely for frunk. It's subject to change, and later on it may expand its scope, adding unnecessary baggage to diesel projects.

While these might not seen like huge concerns, I am worried because changing the hlists backend would be a breaking change for pretty much all diesel projects (which we would want to avoid if possible). I guess we could re-export hlists to avoid a lot of the breakage, but it's still far from optimal.

@sgrif
Copy link
Member Author

sgrif commented Feb 23, 2017

Yeah, @PlasmaPower that sums up my concerns with that crate pretty nicely.

@tupshin
Copy link

tupshin commented Feb 23, 2017

Totally fair points.

@withoutboats
Copy link

withoutboats commented Feb 23, 2017

One thing that's interesting about hlists (but possibly not relevant to diesel) is the difficulty in pulling a particular type out of an hlist. Today, you need a trait that's indexed by peano numbers.

With intersection impls, if you don't care about being able to pull multiple values of the same type out of an HList, it becomes possible without that (but, again, it has the restriction that you essentially 'leak' any existing T if you put another T in the list). Here's a code sample of how it works:

trait Hlist { }

struct Cons<T, U: Hlist>(T, U);

impl Hlist for () { }
impl<T, U: Hlist> Hlist for Cons<T, U> { }

// Base case
impl<T, U> AsRef<T> for Cons<T, U> where U: Hlist {
    default fn as_ref(&self) -> &T {
        &self.0
    }
}

// Recursive case
impl<X, T, U> AsRef<X> for Cons<T, U> where U: Hlist + AsRef<X> {
    default fn as_ref(&self) -> &X {
        self.1.as_ref()
    }
}

// Intersection impl
impl<T, U> AsRef<T> for Cons<T, U> where U: Hlist + AsRef<T> {
    fn as_ref(&self) -> &T {
        &self.0
    }
}

@sgrif
Copy link
Member Author

sgrif commented Feb 23, 2017

I did link to an explanation on that too Your explanation is much shorter though. 😉

@J-F-Liu
Copy link

J-F-Liu commented Feb 23, 2017

In favor of this change, the main concern is how much impact of runtime performance.
For naming use a small word is better, like cols!, fields!.

@lloydmeta
Copy link

lloydmeta commented Feb 23, 2017

@ALL Just chiming in because I was summoned.

Regarding frunk (and frunk-core): I'm open to any suggestions of re-organising the repository and overall improving it, so that it can be made more useful. That also includes keeping the scope of it small. I should also mention that the "core" suffix in frunk-core doesn't mean it's meant just for frunk, all it means is that it contains functionality that frunk will depend on (e.g. Validation uses HList) :)

I also intend to maintain it for the foreseeable future, but more maintainers are always welcome !

EDIT:

From what I can see, there is a lot of overlap between the HList here and what is already in Frunk, so I think there are benefits to reusing the one I have in Frunk.

A bit of history

For what it's worth, I come from Scala where a similar situation happened in an ORM (Slick) where originally the HList implementation from Shapeless (a generic programming focused lib) was not used. This caused some headaches for a while because while Slick users had built code around Slick's HList, Shapeless's HList implementation, being a core focus of the lib, eventually surpassed it in many aspects. You could definitely convert between the two though, but that sometimes caused confusion and was a source of extra boilerplate.

Having said that, eventually a bridge library was made so that one could use Slick with Shapeless HList, but it took a couple years for one to show up, and migration is not exactly automatic.

@sgrif sgrif mentioned this pull request Feb 24, 2017
sgrif added 3 commits March 3, 2017 07:51
We'll want *some* generic name that we re-export the macro as (perhaps
something like `args!`). However, whatever name we choose will be fairly
likely to conflict with some other macro somewhere, so we will want to
give users the ability to opt out. The name that we give in that case
should be something that absolutely won't conflict. Adding a `diesel_`
prefix is the easiest way to go about doing that.
@blakepettersson
Copy link
Contributor

Is this change a prerequisite for being able to arbitrarily construct selects and order bys based upon run-time parameters (without having to directly fall back to SQL)? It's not currently clear to me how this works in Diesel today.

@sgrif
Copy link
Member Author

sgrif commented Mar 4, 2017

@blakepettersson If you want to order something dynamically, you need to either box the order clause, or box the query. The latter is more common. The same is true for varying select clauses (though keep in mind that the select clause for each branch has to have the same SQL type).

This PR has nothing to do with either of the uses that you mentioned.

@theduke
Copy link
Contributor

theduke commented Jun 9, 2017

Any movement on this?

Has it been decided to go forward with hlists?

Also, @sgrif , as I understand it, this would "fix" the restrictions on number of columns in a table, right?

@sgrif
Copy link
Member Author

sgrif commented Jun 9, 2017

This is unlikely to go forward until we at least know which direction the variadic generics RFC goes.

@sgrif sgrif closed this Jun 9, 2017
sstangl added a commit to sstangl/openpowerlifting that referenced this pull request Oct 5, 2017
Unfortunately, we have 28 columns, which requires Diesel's "huge-tables"
support. This dramatically increases compile time of the Diesel
dependency to ~3-4 minutes, up from negligible.

Diesel's team says it's tracked by:
  - diesel-rs/diesel#747
  - rust-lang/rfcs#1921
  - rust-lang/rfcs#1935
@pksunkara pksunkara deleted the sg-hlist-spike branch February 28, 2023 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.