-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: improve perf in SIP-68 migration #19416
perf: improve perf in SIP-68 migration #19416
Conversation
Codecov Report
@@ Coverage Diff @@
## master #19416 +/- ##
==========================================
- Coverage 66.48% 66.39% -0.10%
==========================================
Files 1670 1670
Lines 63968 63824 -144
Branches 6512 6510 -2
==========================================
- Hits 42531 42374 -157
- Misses 19748 19761 +13
Partials 1689 1689
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick turnaround. The table name extraction function seems to be a useful utility, can it be added to some more shared place?
superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
Outdated
Show resolved
Hide resolved
superset/migrations/versions/b8d3a24d9131_new_dataset_models.py
Outdated
Show resolved
Hide resolved
tests/unit_tests/migrations/versions/b8d3a24d9131_new_dataset_models_test.py
Outdated
Show resolved
Hide resolved
try: | ||
model = getattr(Base.classes, table) | ||
except AttributeError: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed to run the bechmark migration script on the SIP-68 migration.
@@ -2278,8 +2285,7 @@ def write_shadow_dataset( # pylint: disable=too-many-locals | |||
) | |||
|
|||
# physical dataset | |||
tables = [] | |||
if dataset.sql is None: | |||
if not dataset.sql: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of our example datasets have .sql == ''
, which made them to be marked as virtual during the migration. it's not a big deal, since they will still work when we switch to the new models (in the new Dataset
model the difference between virtual and physical is greatly reduced).
) | ||
tables = session.query(NewTable).filter(predicate).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were only assigning tables that already exist. The load_or_create_tables
function will create any tables that are referenced in the SQL but don't exist yet.
07e8169
to
413b157
Compare
bc59a0c
to
e863bb9
Compare
e863bb9
to
7d0de3c
Compare
7d0de3c
to
8d2318c
Compare
* chore: improve perf in SIP-68 migration * Small fixes * Create tables referenced in SQL * Update logic in SqlaTable as well * Fix unit tests (cherry picked from commit 63b5e2e)
* chore: improve perf in SIP-68 migration * Small fixes * Create tables referenced in SQL * Update logic in SqlaTable as well * Fix unit tests (cherry picked from commit 63b5e2e)
* chore: improve perf in SIP-68 migration * Small fixes * Create tables referenced in SQL * Update logic in SqlaTable as well * Fix unit tests
SUMMARY
This PR improves the performance on the SIP-68 migration script by using
sqloxide
(https://pypi.org/project/sqloxide/) to parse the SQL when extracting dependencies. In case the parsing fails it falls back to usingsqlparse
.It also addresses a few bugs:
sql
was an empty string.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A
TESTING INSTRUCTIONS
Migration still works and relationships are populated correctly. We can see all the datasets and the associated tables with this query:
And the results:
ADDITIONAL INFORMATION