-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the usage of tuples in argument positions from our API #747
Conversation
I'm exploring how it would feel to stop using tuples for anything, and take an hlist instead. I don't think this would be too painful in most places. If this does lead to reasonable ergonomics, I think we could have some big wins. Our compile times would reduce drastically as we could remove 80% of our code (`diesel/types/impls/tuples.rs` is *huge* when you expand the macro). We may also be able to provide several blanket impls that were previously coherence issues. I also have a proof of concept that demonstrates how we could use hlists to remove the need for struct fields to be ordered the same as the columns of a table. The place that might be painful is to remove the `Queryable` and `FromSqlRow` impls for tuples. Even if it's not painful, it's where it would be the most visible. Other places that it would be visible, but I don't think it will be painful are: - Passing `hlist!(...)` instead of `(...)` to `.select` - Passing `hlist!(...)` instead of `(...)` to `.order` - Passing `hlist!(...)` instead of `(...)` to `.set` when directly updating fields rather than using a struct w/ `#[derive(AsChangeset)]`. This would also affect the derived implementations of `AsChangeset`, `Insertable`, and `Queryable`. However, I don't think those will be noticed by most people, as these are usually derived not implemented manually (`Queryable` is the only one that doesn't completely suck to implement yourself). I have no clue if the ergonomics will be reasonable, or if the wins I suspect will come out of this will actually do so, but I'm going to explore it.
Our types can be pretty scary when they show up in error messages. Replacing tuples with hlists will probably make them even more scary. `diesel::hlist::Cons(A, diesel::hlist::Cons(B, diesel::hlist::Nil))` is noisier than `(A, B)` if nothing else. While we can't change how the type will appear (if it appears), we can change how it looks if it's printed out (because of an `assert_eq!` failing or otherwise). Hlists are ultimately closer to tuples than lists, so I've gone with tuple formatting for the output.
Only one test was affected by this change, which I think probably reflects how uncommon it is to directly pass a tuple to `.set`. `#[derive(AsChangeset)]` is automatically updated.
This one is almost certainly not visible to users. Implementing `Insertable` manually is a hellish nightmare at the moment. I'm 99% certain all impls in the wild are using `#[derive]`.
Now we start to get into where this spike would affect public API. This touches a lot of files (more than I expected, in fact) but most of the things it touches are internal. The three methods that this actually affects are `.select`, `.order`, and `.returning`. In all cases, the solution is to replace `((...))` with `(hlist!(...))`. I'm unsure if removing these impls, or the queryable impl for tuples will be more visible. The queryable impl will almost certainly affect our test suite more, but I'm unsure how much people are deserializing into straight tuples in real apps. My gut says this commit will affect more apps, but removing the queryable impl will be more painful for apps that are affected. The fact that this broke `create_table` in our tests is pretty funny to me. I rely way too much on tuples. Continuing with the theme of "this makes our code simpler", this has the side effect of closing a few issues/PRs. Fixes #521. Close #617. (Sorry @phungleson, your PR looked fine FWIW)
This change basically only impacts implementations of `Queryable`. I've opted not to remove the `Queryable` impl in this commit, as it will be the most visible change and not one that I'm 100% sure I want to make. The `!#[recursion_limit]` addition is due to the *extremely helpful* error message I received: ``` error: reached recursion limit while checking inhabitedness of `hlist::Cons<<P as query_source::Queryable<SP, DB>>::Row, hlist::Nil>` ``` Yes, that is really how the error message was indented.
I was originally going to remove this impl entirely, but I felt like that change was far too invasive and confusing, especially after seeing it affect the return values of joins. It may still be worthwhile to explore in the future, but I need to see a more concrete win to make that big of a change. Since we're leaving the `Queryable` impls in place, we'll want to leave the `BelongsTo` impls in place as well for associations, since they'll work with the direct result of queries. Since I can write a much simpler `Queryable` impl, I was able to simplify the `tuple_impls!` macro as well. The remaining impls don't make sense to implement for arbitrarily large tuples, since it's only used for loading/grouping associations, and the return values of queries when you don't want to use a struct. Neither one makes sense for very large numbers. I've picked 12 as the new arbitrary limit because that is the largest tuple for which stdlib provides impls of things like `PartialEq`.
Since we no longer directly reference an hlist that is greater than 16 elements in length, we are no longer hitting the recursion limit. It's worth noting that users who are putting `#[derive(Queryable)]` on a struct with 16+ fields *will* need to up the recursion limit on their crate. This will happen regardless of whether we have this attribute in Diesel or not. We should probably compile a list of "FAQs" and include this one in it, since the error message given by the compiler does not present upping the recursion limit as a suggestion.
I think restrictions and drawbacks are worth it for what we gain. It'll definitely require a lot of work to redo a lot of guide documentation, but it'll hopefully make Diesel better in the long run! |
Ironically the guide is 100% unaffected by this change. We never call |
A quick grep on my two largest code bases using diesel (both deployed at work and the only ones I really care about, because the rest is hobby and I would always happily accept using better code, even if it breaks compatibility) showed that I am not using tuples anywhere. I barely use This is probably a result from my programming background, I was mostly used to OOP before using Rust, which I would not necessarily call object orientated. I am not a big fan of databases and try to avoid them where not favorable in terms of performance and that probably results in trying to use common code patterns, when dealing with DBs. That's why I keep most operations simple (get & set) and only use queries where I need to. Regarding documentation I would definitely say the examples can be improved not only in regards to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All in all, I'm in favor of this change.
From a user's perspective, ideally, this shouldn't matter. We should probably add a bit of documentation (or a link to a new guide) to every public method that takes hlists as input parameters. I originally suggested to name it columns!
as what our users are using the macro for is selecting columns. Sean's args!
is more abstract and conveys that we want to use this to let methods support variadic arguments. Both of these names hide the heterogenous list as an implementation detail.
This will be one more concept to learn before contributing to diesel. This will however be something valuable and interesting to learn and replace some pretty hairy macros.
@sgrif wrote in "Why didn't we use an existing library for this?":
There also isn't a crate that is maintained and just does hlists.
On this note, I want to mention that I first saw a HList implementation in Rust (I've previously only known this from the Haskell module of the same name) in frunk, which uses it as a building block for a lot of fancy functional constructs.
In the funk repo, there is also the frunk-core crate that contains hlist and some generics shenanigans. It sounds like an internal thing, but maybe it doesn't have to be? → cc @lloydmeta
Oh, and also this just came in: rust-lang/rfcs#1921
(And by the way, frunk should definitely be renamed to frost, as the German word for Rust is Rost. And the reason for that is in no way that I wanted to make a diesel/glow plug joke! 😉 )
use types::Bool; | ||
|
||
/// This method is used by `FindDsl` to work with tuples. Because we cannot | ||
/// This method is used by `FindDsl` to work with hlists. Because we cannot | ||
/// express this without specialization or overlapping impls, it is brute force | ||
/// implemented on columns in the `column!` macro. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by comment: Actually, the macro is called __diesel_column!
Expr: Expression<SqlType=<Col::SqlType as IntoNullable>::Nullable> + QueryFragment<DB>, | ||
Tail: InsertValuesRecursive<DB>, | ||
{ | ||
fn column_names(&self, comma_needed: bool, out: &mut DB::QueryBuilder) -> BuildQueryResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd refactor this whole comma needed business with a simple MaybeComma
enum instead of a bool: https://gist.github.com/killercup/e57f3f1cd83531a88510512fa2f634d4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
@@ -0,0 +1,88 @@ | |||
use std::fmt::{Debug, Formatter, Error as FmtError}; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add some (module-level) docs here. At least for contributors looking to understand diesel's inner workings. Like this maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the whole thing will need a lot of documentation if merged.
I am planning to use diesel, but I haven't yet, so +1 for changing it now. Compile time matters to me. |
I'd prefer Diesel continue to use tuples, but that there was an easier way to impl traits for tuples. As I commented on the RFC, I want to see I believe this solution mitigates many of the drawbacks you listed above, namely:
Overall, I think the addition of edit: Adding this here since it was mentioned in the RFC-- @eddyb thinks we could maybe add |
Maybe we should create a crate |
I am hugely in favor of moving to HLists in general. Regarding an internal HList vs using an existing crate, I partially agree with your reasoning. However, it is worth noting that frunk-core actually consists of not much more than HList. So while it might not be the right time to standardize on an external one, worth keeping on eye on that one. |
I agree that frunk-core would be a reasonable solution, but I do have my concerns. First, frunk isn't currently all that popular. It only has about 4% of the downloads of diesel. Therefore, it's a lot less likely to be properly maintained. Second, at the moment frunk-core exists solely for frunk. It's subject to change, and later on it may expand its scope, adding unnecessary baggage to diesel projects. While these might not seen like huge concerns, I am worried because changing the hlists backend would be a breaking change for pretty much all diesel projects (which we would want to avoid if possible). I guess we could re-export hlists to avoid a lot of the breakage, but it's still far from optimal. |
Yeah, @PlasmaPower that sums up my concerns with that crate pretty nicely. |
Totally fair points. |
One thing that's interesting about hlists (but possibly not relevant to diesel) is the difficulty in pulling a particular type out of an hlist. Today, you need a trait that's indexed by peano numbers. With intersection impls, if you don't care about being able to pull multiple values of the same type out of an HList, it becomes possible without that (but, again, it has the restriction that you essentially 'leak' any existing trait Hlist { }
struct Cons<T, U: Hlist>(T, U);
impl Hlist for () { }
impl<T, U: Hlist> Hlist for Cons<T, U> { }
// Base case
impl<T, U> AsRef<T> for Cons<T, U> where U: Hlist {
default fn as_ref(&self) -> &T {
&self.0
}
}
// Recursive case
impl<X, T, U> AsRef<X> for Cons<T, U> where U: Hlist + AsRef<X> {
default fn as_ref(&self) -> &X {
self.1.as_ref()
}
}
// Intersection impl
impl<T, U> AsRef<T> for Cons<T, U> where U: Hlist + AsRef<T> {
fn as_ref(&self) -> &T {
&self.0
}
} |
I did link to an explanation on that too Your explanation is much shorter though. 😉 |
In favor of this change, the main concern is how much impact of runtime performance. |
@ALL Just chiming in because I was summoned. Regarding frunk (and frunk-core): I'm open to any suggestions of re-organising the repository and overall improving it, so that it can be made more useful. That also includes keeping the scope of it small. I should also mention that the "core" suffix in I also intend to maintain it for the foreseeable future, but more maintainers are always welcome ! EDIT: From what I can see, there is a lot of overlap between the HList here and what is already in Frunk, so I think there are benefits to reusing the one I have in Frunk. A bit of historyFor what it's worth, I come from Scala where a similar situation happened in an ORM (Slick) where originally the HList implementation from Shapeless (a generic programming focused lib) was not used. This caused some headaches for a while because while Slick users had built code around Slick's HList, Shapeless's HList implementation, being a core focus of the lib, eventually surpassed it in many aspects. You could definitely convert between the two though, but that sometimes caused confusion and was a source of extra boilerplate. Having said that, eventually a bridge library was made so that one could use Slick with Shapeless HList, but it took a couple years for one to show up, and migration is not exactly automatic. |
We'll want *some* generic name that we re-export the macro as (perhaps something like `args!`). However, whatever name we choose will be fairly likely to conflict with some other macro somewhere, so we will want to give users the ability to opt out. The name that we give in that case should be something that absolutely won't conflict. Adding a `diesel_` prefix is the easiest way to go about doing that.
Is this change a prerequisite for being able to arbitrarily construct selects and order bys based upon run-time parameters (without having to directly fall back to SQL)? It's not currently clear to me how this works in Diesel today. |
@blakepettersson If you want to order something dynamically, you need to either box the order clause, or box the query. The latter is more common. The same is true for varying select clauses (though keep in mind that the select clause for each branch has to have the same SQL type). This PR has nothing to do with either of the uses that you mentioned. |
Any movement on this? Has it been decided to go forward with hlists? Also, @sgrif , as I understand it, this would "fix" the restrictions on number of columns in a table, right? |
This is unlikely to go forward until we at least know which direction the variadic generics RFC goes. |
Unfortunately, we have 28 columns, which requires Diesel's "huge-tables" support. This dramatically increases compile time of the Diesel dependency to ~3-4 minutes, up from negligible. Diesel's team says it's tracked by: - diesel-rs/diesel#747 - rust-lang/rfcs#1921 - rust-lang/rfcs#1935
We are seeking feedback
The core team hasn't decided if we actually want to go forward with this proposal. We are seeking feedback from the community on their thoughts. The purpose of this PR is to provide a place to centralize discussion, and provide the ability for people to try out the proposal against their code bases. We would like to hear from you on how much this change impacts your codebase, how painful migrating to this would be, and whether you'd find this confusing as a newcomer.
Overview
This pull request changes any functions which could previously take a tuple as an argument to take an hlist instead. The TL;DR is that the rule of thumb for "things that are separated by a comma in SQL are represented as _ in diesel" changes from "a tuple" to "an hlist". This means that
(x, y)
changes tohlist!(x, y)
(we may land on a different name, see the Unanswered Questions section below).For those unfamiliar, "hlist" is short for "heterogeneous list". It is essentially a singly linked list where each element is able to be of a different type. The list encodes this in its own type. It is effectively the same as a bunch of nested pairs. e.g.
Cons(1, Cons(2, Cons(3, Nil)))
is more-or-less the same as(1, (2, 3))
or(1, (2, (3, ())))
.The specific functions that change are
query.select
,query.order
, andupdate(table).set(...)
when updating multiple columns but not using a struct that has#[derive(AsChangeset)]
.A less visible API that is changed is the type argument to the
sql
function. However, this function may be deprecated and removed before 1.0 in favor of an escape hatch API that is untyped (see #650).This also affects implementations of
Queryable
,AsChangeset
, andInsertable
. However, these are usually derived rather than implemented manually (with the possible exception ofQueryable
).This does not affect the ability to use tuples in a return type position. This means that doing
.load::<(A, B)>
will still work, and associations still work in terms of tuples. This is purely due to how likely this change is to affect the average user. An application can be doing extremely complex queries, and still never call.select
or.order
explicitly with more than one argument. However, any time a join is done, or associations are loaded, tuples get involved in a much more visible fashion. Associations in particular are designed to work aroundIterator::zip
andIterator::unzip
, which is only feasible if they work on tuples.Additionally, while code that is passing a tuple can be changed to pass an hlist simply by putting
hlist!
in front of it, the story is more involved for code working with tuples. Hlists cannot simply be indexed like tuples can, they have to instead be destructured. This is acceptable for us to do internally, but not something I'd like to impose on every user of the library.What this gets us
First and foremost, the code is simpler, and our compile time goes down. To implement a trait for an arbitrarily sized hlist, you need to provide two impls:
Cons<T, U> where T: SomeTrait, U: SomeTrait
andNil
. (In some places we doCons<T, Cons<U, Tail>>
andCons<T, Nil>
instead when we have things like comma separation to worry about).Implementing code for an arbitrary sized tuple is much harder. You basically have to write the impls as a macro, and invoke it for whatever size you want to support. This is what the standard library does for traits like
PartialEq
(which aren't implemented on tuples larger than 12 elements). This change removes half of the code in the main crate (as measured bycargo rustc -- -Zunstable-options --pretty=expanded | wc -l
), and decreases compile times by 70% (measured bytouch src/lib.rs && cargo build
). Moving this code out of macros and into regular rust will also make it much easier for new contributors to understand.This may also open the door for several future improvements that would not be possible with tuples. Several blanket impls that we want to write are impossible because they overlap with the impls for tuples. At the moment most of them overlap with hlists as well, but specialization and/or future improvements to specialization are easier to apply to hlists than tuples.
In particular, I think that this change will eventually allow us to remove the requirements that struct order match the column order (in cases where your struct is one-to-one with a table), and also remove the requirement that you must have a field for your struct for every member of the select statement. Right now we would be unable to write the impls required to make that happen, but several of the possibilities being discussed for specialization would allow us to make that happen. If you're interested in the details of that, I've written them up here.
You may notice that some of these gains are all assuming speculative future enhancements to the language. Why not wait until they actually happen to make this change? Ultimately we want to release 1.0 before those changes are likely to happen. If we want to avoid breaking backwards compatibility, we need to leave the door open to take advantage of those changes when they happen. Even if we limit the "wins" to the more accessible code and improved compile times, I do think it's still worth it.
Why didn't we use an existing library for this?
Ultimately if we make this change, hlists become a very fundamental part of people interacting with Diesel, which makes me uncomfortable leaving that in the hands of another crate unless it's already well established within the community. There also isn't a crate that is maintained and just does hlists. Having a type which is local to Diesel is useful for us from a coherence point of view.
Drawbacks
The most obvious drawback is that we take tuples instead of hlists. Tuples are in the standard library. hlists aren't. That said, I don't know that people are actually discovering that they can pass tuples on their own. Two common questions in gitter are "how do I order by multiple things", and "how do I update multiple columns without using a struct". We can fix that by giving some more up front examples of those cases, but if people aren't finding tuples without being shown, hlists aren't really any worse. They're just equally bad. Ultimately what we really want is a variadic function, but that's not going to happen.
Additionally, this will make some of our scary type signatures more scary.
diesel::hlist::Cons<Foo, diesel::hlist::Cons<Bar, diesel::hlist::Nil>>
is harder to read than(Foo, Bar)
. That said, the type would really only show up in errors that were already incomprehensible. Nobody is actually making sense ofSelectStatement<five, terminal, windows, of, types>
other than the core team when people paste them in gitter asking for help. Our goal has been to structure things so that the types don't get dumped in error messages, and will continue to be.Some of our error messages got... well I'm not sure if it's worse, or better. For example, the error for the table having one more column than the struct has changed from
(i32, String, String): FromSqlRow<(Integer, Text, Text, Double)> failed
toNil: FromSqlRow<Cons<Double, Nil>> failed
. One could argue that without reading the documentation, a person might actually figure out what's going on from the first error message, but not from the second. In practice I'm not sure that's true though, and it's a lot easier to document thatNil: FromSqlRow
means you have more columns on the table than the struct than it is for us to say "hey make sure you count the tuple elements if you see an error about a tuple and another tuple".Alternatives
We could also keep this change entirely internal, provide an
IntoHlist
trait, and implement that for tuples of arbitrary sizes. This is a bad alternative for 2 reasons. First, it will bring back the "we need to implement things for really large tuples" problem, and hurt compile times. Second, since these functions can also take just one argument, we will need to brute force that trait for a lot of types, including some which are potentially inconvenient to do so. Specialization may improve that situation when it stabilizes.Unanswered Questions
Naming.
hlist!
could conflict with other libraries, so we'd want to have the canonical name bediesel_hlist!
(though I want to keep the short variant. People can choose not to import the short version if they don't want to). However, do we actually want to make the fact that we're using hlists apparent? Would a name likeargs!
orcolumns!
make more sense?How we teach this
We should almost certainly add an example in the getting started guide which demonstrates how hlists work.