Alias references in subquery expressions #6407
Labels
analyzer
correctness
We don't return the same result as MySQL
enhancement
New feature or request
sql
Issue with SQL
Alias resolution is highly coupled to execution in a way that currently depends on a combination of (1) alias substitution and (2) not-scalable transforms that inject ladder projections that simulate LATERAL joins (we describe both more formally below). The conditions we use these rules is not super generalizable, and I think in its current form is only flexible enough to get specific tests to pass.
The new name resolution code resolves plans like
select 1 as a, (select a as a)
pretty easily, tracking references and definitions with number ids. During execution, the subquery expression has no way of receiving1 as a
with our current execution format. The two potential solutions are (1) alias substitution, and (2) lateral joins. If we replace the alias, we create this plan:select 1 as a, (select 1 as a)
. This falls apart for plans like this query:select x+1 as newX, (select newX from xy) from xy
, because we index execution fields using names, the expression ids (we should also fix this).Lateral joins are probably the best way to divide name resolution from execution without having to rewrite how we do execution indexing. We would rewrite the query as
select 1 as a LATERAL select a as a
. For most (all?) of the more complicated constructions of queries, this still works:select x+1 as newX from xy LATERAL select newX from xy
,select dt from (select (select 1 as a) dt LATERAL select 1 from dual group by dt)
. The aggregation cases are the trickiest because we have to rearrange aggregation functions into the grouping scope, and then restore the right output ordering with a top-level projection. The rearranging process can collide with the alias in subquery case --select sum(x) as a, (select a as a) from xy group by x
. We would error there, aggregations cannot be in aliases referenced by subquery expressions.The lateral joins aren't really a silver bullet that avoids needing (1) memoizing expressions so we can reliably use ref ids rather than recomputing expressions, and then (2) indexing based on ids rather than column/table name. Lateral join execution is also limited in at least two ways right now. One, we need to inject scope into lateral RHS by with prepend/strip nodes (source nodes only will return the LHS + source row; join nodes strip inputs to deduplicate appended rows in inputs). Two, dual scopes return an "x" row that sometimes may or may not interfere with indexing. So in
select 1 as a LATERAL select a as a
, the lateral RHS would have a schema of ("a", "x").The text was updated successfully, but these errors were encountered: