-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(presto preview): re-enable schema previsualization for Trino/Presto table/schemas #26782
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @brouberol for the PR. The |
Thanks for the feedback @john-bodley! I have looked into the tests and realized that this I however found out that the parametrized For example, here's what I observe in a pdb shell when printing the (Pdb) pp columns
[
{
"column_name": "revision_id",
"name": "revision_id",
"type": "VARCHAR",
"is_dttm": None,
"type_generic": None,
"nullable": True,
"default": None,
},
{
"column_name": "damaging",
"name": "damaging",
"type": "BOOLEAN",
"is_dttm": None,
"type_generic": None,
"nullable": True,
"default": None,
},
{
"column_name": "goodfaith",
"name": "goodfaith",
"type": "BOOLEAN",
"is_dttm": None,
"type_generic": None,
"nullable": True,
"default": None,
},
{
"column_name": "reverted_for_damage",
"name": "reverted_for_damage",
"type": "BOOLEAN",
"is_dttm": None,
"type_generic": None,
"nullable": True,
"default": None,
},
{
"column_name": "wiki_db",
"name": "wiki_db",
"type": "VARCHAR",
"is_dttm": None,
"type_generic": None,
"nullable": True,
"default": None,
},
] I thus changed the parametrized @mock.patch("superset.db_engine_specs.presto.PrestoEngineSpec.latest_partition")
@pytest.mark.parametrize(
["column_type", "column_value", "expected_value"],
[
(types.DATE(), "2023-05-01", "DATE '2023-05-01'"),
(types.TIMESTAMP(), "2023-05-01", "TIMESTAMP '2023-05-01'"),
(types.VARCHAR(), "2023-05-01", "'2023-05-01'"),
(types.INT(), 1234, "1234"),
],
) to @mock.patch("superset.db_engine_specs.presto.PrestoEngineSpec.latest_partition")
@pytest.mark.parametrize(
["column_type", "column_value", "expected_value"],
[
("DATE", "2023-05-01", "DATE '2023-05-01'"),
("TIMESTAMP", "2023-05-01", "TIMESTAMP '2023-05-01'"),
("VARCHAR", "2023-05-01", "'2023-05-01'"),
("INT", 1234, "1234"),
],
) Without my patch, this triggered the exact same error message we were seeing in production:
With my patch applied, all |
@brouberol Thank you for the fix. Could you check if this PR will also close #26740 and #26768? If so, could you link them in the Development panel so that they are automatically closed when this PR is merged? Thanks. |
@michael-s-molina You're welcome! I believe #26740 should indeed be fixed by this patch. However, the |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #26782 +/- ##
==========================================
+ Coverage 67.19% 72.40% +5.20%
==========================================
Files 1899 1911 +12
Lines 74368 86320 +11952
Branches 8274 8274
==========================================
+ Hits 49969 62496 +12527
+ Misses 22344 21769 -575
Partials 2055 2055
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
I changed the patch slightly, as I didn't realize you could create an SQLAlchemy >>> from sqlalchemy import Column
>>> Column("name", None)
Column('name', NullType(), table=None) |
Thanks @brouberol : The issue that I faced for presto got fixed. |
@rabindragogoi just to clarify, do you mean that this patch fixes your issue, or that your issue was already fixed? Edit: nevermind, I just saw your comment. My pleasure :) On another note, I'm unsure if the MySQL integration test failure and the pre-commit changelog check failure are related to my change. Could anyone with more experience than myself could advise if there's anything I should do there? Thanks! |
@brouberol : I mean the fix that you have provided in presto.py file, I implemented that and now I am able to see the schema in sql lab. Thanks a lot for the fix . :) |
Looks like the precommit hook is failing on CI... you probably just need to run that locally to fix/commit whatever linting errors are happening there. |
…able/schemas This patch fixes failures occuring when performing a schema preview of a Presto table. The `PrestoBaseEngineSpec.where_latest_partition` attempts to construct SQLAlchemy `Column` objects based on a name and a type. However, this leads to the following error in our case: ```console sqlalchemy.exc.ArgumentError: 'SchemaItem' object, such as a 'Column' or a 'Constraint' expected, got 'VARCHAR' ``` This comes from the fact that we run `Column('column_name', 'VARCHAR')` instead of `Column('column_name', sqlalchemy.types.VARCHAR)`. We fix this particular error by passing the _actual_ type class, and not just a string. > [!NOTE] > This also fixes the same issue for Trino tables, as `TrinoEngineSpec` inherits > from `PrestoBaseEngineSpec`, the Presto db client class. Fixes apache#25962 Fixes apache#25962
Indeed, thanks @rusackas 👍 I've rebased on upstream ~/wmf/apache/superset fix-25636 ⇣1⇡152 ❯ pre-commit run -a
auto-walrus..............................................................Passed
pyupgrade................................................................Passed
pycln....................................................................Passed
isort....................................................................Passed
mypy.....................................................................Passed
pip-compile-multi verify.................................................Passed
check docstring is first.................................................Passed
check for added large files..............................................Passed
check yaml...............................................................Passed
debug statements (python)................................................Passed
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
black....................................................................Passed
prettier.................................................................Passed
Blacklist................................................................Passed
Helm Docs................................................................Passed So did running the associated unit tests: ~/wmf/apache/superset fix-25636 ❯ pytest tests/unit_tests/db_engine_specs/test_{presto,trino}.py --disable-warnings
===================================================================================================== test session starts =======================================================================
platform darwin -- Python 3.9.18, pytest-7.3.1, pluggy-1.2.0
rootdir: /Users/br/wmf/apache/superset
configfile: pytest.ini
plugins: pyfakefs-5.2.2, mock-3.10.0, cov-4.0.0
collected 62 items
tests/unit_tests/db_engine_specs/test_presto.py .................
tests/unit_tests/db_engine_specs/test_trino.py .............................................
===================================================================================================== 62 passed in 0.54s ========================================================================
~/wmf/apache/superset fix-25636 ❮ Here's to hoping for a green CI 🍸 |
I see that the CI is all green. Would the PR be safe to merge? |
Seems CI is happy, so technically mergeable! I'll ping a few Trino/Presto SMEs (@john-bodley @villebro @nytai ) to hopefully give it a final review. |
Seems this is still just awaiting a stamp... I'm not an expert here, so I'm nagging people for review again :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me! Thanks for fixing this!
My pleasure! |
Somehow not all checks are running. I'm going to close/reopen this to kick-start CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
SUMMARY
This patch fixes failures occuring when performing a schema preview of a Presto table.
The
PrestoBaseEngineSpec.where_latest_partition
method attempts to construct SQLAlchemyColumn
objects based on a name and a type. However, this leads to the following error, in our case:sqlalchemy.exc.ArgumentError: 'SchemaItem' object, such as a 'Column' or a 'Constraint' expected, got 'VARCHAR'
This comes from the fact that we run
Column('column_name', 'VARCHAR')
instead ofColumn('column_name', sqlalchemy.types.VARCHAR)
. We fix this particular error by passing the actual type class (whetherVARCHAR
,TEXT
,TIMESTAMP
, etc), and not just a string representation of it.Note
This also fixes the same issue for Trino tables (as described in this comment, as
TrinoEngineSpec
inheritsfrom
PrestoBaseEngineSpec
, the Presto db client class.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
superset-25636-working.mov
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION
Fixes #25636
Fixes #25962
Fixes #26740