-
-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding query-diff
cli command
#6514
Conversation
This seems like a lot of heavy lifting on the CLI side. Is there a reason we aren't doing the work in a table function? Then it would be available to SQL only users. Adding a CLI wrapper around it would be easy. |
I agree, I'm currently exploring ways to implement this as a table function or a stored procedure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
James and I talked about this yesterday, I think this is fine I just am worried about how limited it is in the broad search space of query diff. As is this will only return results for two queries that are identical and have the same underlying table, but different data versions. So like a slightly expanded form of diff that lets us do aggregation and join diffs.
Some of the other things in the space of query diff:
- queries are the same, schemas are different
- queries are the same, but aggregation or join precludes using dolt diff
- queries are different (schemas are different is arguably a subset of this)
It is almost like we are introducing a third diff dimension: schema->query plan->data. We have to diff the two query plans after diffing the schema in the general case to determine whether there are smarter ways to diff the data. Even if the queries are the same, we still have to walk the AST or plan to see whether materializing nodes prevent using dolt diff on underlying tables.
And re: bisection/temporary table materialization, both of those require reading the entire tables, which makes them more expensive than the naive algo.
Other thoughts:
- None of the current logic is really Dolt-specific. We could put all of this into GMS, have the schema comparison only test column shape+types (ignore table name), and have this be a generic query differ.
- The Dolt side could re-use our schema merge logic to go through the left/right schemas, match columns on name/type or tag, and then interpolate new query (strings or nodes) that will be schema-comparable. This would let us finesse queries to handle schema diff, with the generic query_diff living and testing independently in GMS.
I am not against merging what we have. It is just unclear to me what customers expect from this feature.
return nil, nil, fmt.Errorf("query must be a string, not %T", q) | ||
} | ||
qStr = strings.TrimSpace(qStr) | ||
if !strings.HasPrefix(strings.ToLower(qStr), "select") { // TODO: allow "with?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could parse.Parse, see whether the AST is a SELECT
if !expr.Resolved() { | ||
return nil, ErrInvalidNonLiteralArgument.New(tf.Name(), expr.String()) | ||
} | ||
// prepared statements resolve functions beforehand, so above check fails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this new or copied? I think expressions are going to be resolved by default now, even if not literal
if !tf.Resolved() { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think this should be possible?
if !tf.Resolved() { | ||
return nil | ||
} | ||
if tf.sqlSch == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also seems like we should have errored upstream to reach this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as a resurrection of the old query diff in a SQL context. We'll try to review options with people when you get back, or when a customer asks us for more.
Companion PR: dolthub/go-mysql-server#1955