Skip to content

2010 09 05 how re linq represents the defaultifempty query operator

Fabian Schmied edited this page Sep 5, 2010 · 1 revision

Published on September 5th, 2010 at 20:35

How re-linq represents the DefaultIfEmpty query operator

In my last post, I explained what the DefaultIfEmpty query operator actually does, and why anybody should use it. In this post, I’ll analyze how re-linq’s front-end represents that query operator in its query model.

First, let’s look at one of the queries from the last post:

var query1 = from c in Cooks
             join a in Cooks on c equals a.Assisted into assistants
             from a in assistants.DefaultIfEmpty()
             select new { c, a };
Console.WriteLine (new QueryParser ().GetParsedQuery (query1.Expression));

This code causes the LINQ Expression tree built for the query to be analyzed by the re-linq front-end, and the ToString representation to be written to the console. After adding newlines and indentation, it looks like this:

from Cook c in TestQueryable<Cook>()
join Cook a in TestQueryable<Cook>()  
    on [c] equals [a].Assisted  
    into IEnumerable`1 assistants
from Cook a in {[assistants] => DefaultIfEmpty()}  
select new <>f__AnonymousType30`2(c = [c], a = [a])

Looks astoundingly similar to the C# query, doesn’t it? But what’s that “{[assistants] => DefaultIfEmpty()}” part?

Before explaining this notation, I should probably say a few more words about the philosophy behind re-linq’s query model. It’s not a coincidence that the query model’s ToString output looks similar to the C# query above. The main goal behind the re-linq front-end was to simplify the LINQ query and represent it in a form that would be easy to understand for programmers building LINQ providers.

As mentioned above, the front-end constructs this from the Expression tree produced for the query, which looks like this (this is the ToString representation of the Expression tree):

TestQueryable<Cook>()  
  .GroupJoin(  
    TestQueryable<Cook>(),  
    c => c,  
    a => a.Assisted,  
    (c, assistants) => new <>f__AnonymousType2f`2(  
      c = c,  
      assistants = assistants))  
  .SelectMany(  
    <>h__TransparentIdentifierf =>  
      <>h__TransparentIdentifierf.assistants.DefaultIfEmpty(),  
    (<>h__TransparentIdentifierf, a) =>  
      new <>f__AnonymousType30`2(  
        c = <>h__TransparentIdentifierf.c,  
        a = a))

It then creates a number of clauses – the main from clause, additional from clauses, where clauses, a select clause, etc – and puts them together into a query model object. This is very similar to how a C# programmer would think of the query, and very different from what the Expression tree looks like.

The problem we had, however, is that a simple model can only represent a subset of what’s possible with LINQ. For example, in re-linq’s query model, look-alike to C#, you can only have one select clause in a query. In the expression tree, however, there can be more than one call to the Select method. We didn’t want to introduce the complexity of having multiple select clauses in the query model, but we couldn’t ignore that such things were possible (and even easy) to do with pure LINQ.

When we devised the query model, we therefore used a famous trick: we made the data structure recursive. The query model was kept very simple, but we added the feature of _sub-_query models. In other words, when the model could not represent a certain query, for example because it had two Select method calls, we put part of the query into a sub-query, which was again simple and well-defined, and nested that sub-query inside an outer query, again simple and well-defined.

In re-linq, sub-queries can stand, among other things, for source sequences, eg., in a from clause. The curly braces in the “{[assistants] => DefaultIfEmpty()}” part of the query model above represent such a sub-query. It’s abbreviated for “(from x in [assistants] select x) => DefaultIfEmpty()”. The square brackets around assistants indicate a reference to items coming from another query source.

The arrow represents a second trick we used: a result operator. Result operators are used by re-linq to embed information about query operators for which we don’t have clauses. For example, Count() is a result operator, as is Distinct(), or DefaultIfEmpty(). Result operators always stand at the end of a query model – they operate on the results of the query. If they can’t stand at the end because of the semantics of the query, a sub-query is built around them; then they can stand at the end again.

That explained, let’s take a look at the full query model again:

from Cook c in TestQueryable<Cook>()
join Cook a in TestQueryable<Cook>()  
    on [c] equals [a].Assisted  
    into IEnumerable`1 assistants
from Cook a in {[assistants] => DefaultIfEmpty()}  
select new <>f__AnonymousType30`2(c = [c], a = [a])

Knowing what we do now, we can see that the second from clause selects its Cook items (named “a”) from a sub-query that gets its items directly from the elements of the assistants query source and has a DefaultIfEmpty result operator attached to its end.

Now, the one million dollars question: How do I handle such a sub-query with a DefaultIfEmpty result operator in my re-linq-based LINQ provider?

That will be answered in my next post.

- Fabian

Clone this wiki locally