- Expression variables in query expressions
- Irrefutable patterns and reachability
- Do-while loop scope
It seems desirable to allow expression variables in query clauses to be available in subsequent clauses:
from s in strings
where int.TryParse(s, out int i)
select i;
The idea is that the i
introduced in the where
clause becomes a sort of extra range variable for the query, and can be used in the select
clause. It would even be definitely assigned there, because the compiler is smart enough to figure out that variables that are "definitely assigned when true" in a where clause expression would always be definitely assigned in subsequent clauses.
This is intriguing, but when you dig in it does raise a number of questions.
How would a query like that be translated into calls of existing query methods? In the example above we would need to split the where
clause into a call to Select
to compute both the boolean result and the expression variable i
, then a call to Where
to filter out those where the boolean result was false. For instance:
strings
.Select(s => new { s, __w = int.TryParse(s, out int i) ? new { __c = true, i } : new { __c = false, i = default } })
.Where(__p => __p.__w.__c);
.Select(__p => __p.__c.i);
That first Select
call is pretty unappetizing. We can do better, though, by using a trick: since we know that the failure case is about to be weeded out by the Where
clause, why bother constructing an object for it? We can just null out the whole anonymous object to signify failure:
strings
.Select(s => int.TryParse(s, out int i) ? new { s, i } : null)
.Where(__p => __p != null)
.Select(__p => __p.i);
Much better!
We haven't really talked through how this would work for other kinds of query clauses. We'd have to go through them one by one and establish what the meaning is of expression variables in each expression in each kind of query clause. Can they all be propagated, and is it meaningful and reasonable to achieve?
One thing to note is that range variables are immutable, while expression variables are mutable. We don't have the option of making expression variables mutable across a whole query, so we would need to make them immutable either:
- everywhere, or
- outside of the query clause that introduces them.
Having them be mutable inside their own query clause would allow for certain coding patterns such as:
from o in objects
where o is int i || (o is string s && int.TryParse(s, out i))
select i;
Here i
is introduced and then mutated in the same query clause.
The above translation approaches would accommodate this "mutable then immutable" semantics if we choose to adopt it
With a naive query translation scheme, this could lead to a lot of hidden allocations even when an expression variable is not used in a subsequent clause. Today's query translation already has the problem of indiscriminately carrying forward all range variables, regardless of whether they are ever needed again. This feature would exacerbate that issue.
We could think in terms of language-mandated query optimizations, where the compiler is allowed to shed range variables once they are never referenced again, or at least if they are never referenced outside of their introducing clause.
We won't have time to do this feature in C# 7.0. If we want to leave ourselves room to do it in the future, we need to make sure that we don't allow expression variables in query clauses to mean something else today, that would contradict such a future.
The current semantics is that expression variables in query clauses are scoped to only the query clause. That means two subsequent query clauses can use the same name in expression variables, for instance. That is inconsistent with a future that allows those variables to share a scope across query clause boundaries.
Thus, if we want to allow this in the future we have to put in some restrictions in C# 7.0 to protect the design space. We have a couple of options:
- Disallow expression variables altogether in query clauses
- Require that all expression variables in a given query expression have different names
The former is a big hammer, but the latter requires a lot of work to get right - and seems at risk for not blocking off everything well enough.
A related feature request is to allow deconstruction in the query clauses that introduce new range variables:
from (x, y) in points
let (dx, dy) = (x - x0, y - y0)
select Sqrt(dx * dx + dy * dy)
This, again, would simply introduce extra range variables into the query, and would sort of be equivalent to the tedious manual unpacking:
from __p1 in points
let x = __p1.Item1
let y = __p1.Item2
let __p2 = (x - x0, y - y0)
let dx = __p2.Item1
let dy = __p2.Item2
select Sqrt(dx * dx, dy * dy)
Except that we could do a much better job of translating the query into fewer calls:
points
.Select(__p1 => new { x = __p1.Item1, y = __p1.Item2 })
.Select(__p2 => new { dx = __p2.x - x0, dy = __p2.y - y0, * = __p2 }
.Select(__p3 => Sqrt(__p3.dx * __p3.dx, __p3.dy * __p3.dy)
We will neither do expression variables nor deconstruction in C# 7.0, but would like to do them in the future. In order to protect our ability to do this, we will completely disallow expression variables inside query clauses, even though this is quite a big hammer.
We could be smarter about reachability around irrefutable patterns:
int i = 3
if (i is int j) {}
else { /* reachable? */ }
We could consider being smart, and realizing that the condition is always true, so the else clause is not reachable.
By comparison, though, in current C# we don't try to reason about non-constant conditions:
if (false && ...) {}
else { /* reachable today */ }
This is not worth making special affordances for. Let's stick with current semantics, and not introduce a new concept for "not constant, but we know it's true".
In the previous meeting we decided that while loops should have narrow scope for expression variables introduced in their condition. We did not explicitly say that the same is the case for do-while, but it is.