Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alias references in subquery expressions #6407

Closed
max-hoffman opened this issue Jul 27, 2023 · 1 comment
Closed

Alias references in subquery expressions #6407

max-hoffman opened this issue Jul 27, 2023 · 1 comment
Labels
analyzer correctness We don't return the same result as MySQL enhancement New feature or request sql Issue with SQL

Comments

@max-hoffman
Copy link
Contributor

Alias resolution is highly coupled to execution in a way that currently depends on a combination of (1) alias substitution and (2) not-scalable transforms that inject ladder projections that simulate LATERAL joins (we describe both more formally below). The conditions we use these rules is not super generalizable, and I think in its current form is only flexible enough to get specific tests to pass.

The new name resolution code resolves plans like select 1 as a, (select a as a) pretty easily, tracking references and definitions with number ids. During execution, the subquery expression has no way of receiving 1 as a with our current execution format. The two potential solutions are (1) alias substitution, and (2) lateral joins. If we replace the alias, we create this plan: select 1 as a, (select 1 as a). This falls apart for plans like this query: select x+1 as newX, (select newX from xy) from xy, because we index execution fields using names, the expression ids (we should also fix this).

Lateral joins are probably the best way to divide name resolution from execution without having to rewrite how we do execution indexing. We would rewrite the query as select 1 as a LATERAL select a as a. For most (all?) of the more complicated constructions of queries, this still works: select x+1 as newX from xy LATERAL select newX from xy, select dt from (select (select 1 as a) dt LATERAL select 1 from dual group by dt). The aggregation cases are the trickiest because we have to rearrange aggregation functions into the grouping scope, and then restore the right output ordering with a top-level projection. The rearranging process can collide with the alias in subquery case -- select sum(x) as a, (select a as a) from xy group by x. We would error there, aggregations cannot be in aliases referenced by subquery expressions.

The lateral joins aren't really a silver bullet that avoids needing (1) memoizing expressions so we can reliably use ref ids rather than recomputing expressions, and then (2) indexing based on ids rather than column/table name. Lateral join execution is also limited in at least two ways right now. One, we need to inject scope into lateral RHS by with prepend/strip nodes (source nodes only will return the LHS + source row; join nodes strip inputs to deduplicate appended rows in inputs). Two, dual scopes return an "x" row that sometimes may or may not interfere with indexing. So in select 1 as a LATERAL select a as a, the lateral RHS would have a schema of ("a", "x").

@max-hoffman max-hoffman added sql Issue with SQL analyzer labels Jul 27, 2023
@timsehn timsehn added enhancement New feature or request correctness We don't return the same result as MySQL labels Aug 30, 2023
@max-hoffman
Copy link
Contributor Author

Going to close this to avoid large projects in issues. We made progress with columns and tables all having expression ids that simplify analysis and execution planning. We still lack lateral join transformations that might make alias organization more correct/faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analyzer correctness We don't return the same result as MySQL enhancement New feature or request sql Issue with SQL
Projects
None yet
Development

No branches or pull requests

2 participants