Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(presto preview): re-enable schema previsualization for Trino/Presto table/schemas #26782

Merged
merged 1 commit into from
May 13, 2024

Conversation

brouberol
Copy link
Contributor

@brouberol brouberol commented Jan 24, 2024

SUMMARY

This patch fixes failures occuring when performing a schema preview of a Presto table.

The PrestoBaseEngineSpec.where_latest_partition method attempts to construct SQLAlchemy Column objects based on a name and a type. However, this leads to the following error, in our case:

sqlalchemy.exc.ArgumentError: 'SchemaItem' object, such as a 'Column' or a 'Constraint' expected, got 'VARCHAR'

This comes from the fact that we run Column('column_name', 'VARCHAR') instead of Column('column_name', sqlalchemy.types.VARCHAR). We fix this particular error by passing the actual type class (whether VARCHAR, TEXT, TIMESTAMP, etc), and not just a string representation of it.

Note

This also fixes the same issue for Trino tables (as described in this comment, as TrinoEngineSpec inherits
from PrestoBaseEngineSpec, the Presto db client class.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Before After
before
superset-25636-working.mov

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

Fixes #25636
Fixes #25962
Fixes #26740

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️

We hope to see you in our Slack community too! Not signed up? Use our Slack App to self-register.

@john-bodley
Copy link
Member

Thanks @brouberol for the PR. The PrestoBaseEngineSpec class impacts both Presto and Trino and thus it would be prudent to add unit/integration tests.

@brouberol
Copy link
Contributor Author

Thanks for the feedback @john-bodley! I have looked into the tests and realized that this PrestoBaseEngineSpec.where_latest_partition classmethod was already covered by the tests/unit_tests/db_engine_specs/test_presto.py:test_where_latest_partition unit test.

I however found out that the parametrized column_type argument was passed as an actual sqlalchemy.types instance, whereas I observed that the columns argument of where_latest_partition should a list of ResultSetColumnType objects which type value was of str type, in accordance with the ResultSetColumnType type.

For example, here's what I observe in a pdb shell when printing the columns value when reproducing the initial issue:

(Pdb) pp columns
[
    {
        "column_name": "revision_id",
        "name": "revision_id",
        "type": "VARCHAR",
        "is_dttm": None,
        "type_generic": None,
        "nullable": True,
        "default": None,
    },
    {
        "column_name": "damaging",
        "name": "damaging",
        "type": "BOOLEAN",
        "is_dttm": None,
        "type_generic": None,
        "nullable": True,
        "default": None,
    },
    {
        "column_name": "goodfaith",
        "name": "goodfaith",
        "type": "BOOLEAN",
        "is_dttm": None,
        "type_generic": None,
        "nullable": True,
        "default": None,
    },
    {
        "column_name": "reverted_for_damage",
        "name": "reverted_for_damage",
        "type": "BOOLEAN",
        "is_dttm": None,
        "type_generic": None,
        "nullable": True,
        "default": None,
    },
    {
        "column_name": "wiki_db",
        "name": "wiki_db",
        "type": "VARCHAR",
        "is_dttm": None,
        "type_generic": None,
        "nullable": True,
        "default": None,
    },
]

I thus changed the parametrized column_type value from

@mock.patch("superset.db_engine_specs.presto.PrestoEngineSpec.latest_partition")
@pytest.mark.parametrize(
    ["column_type", "column_value", "expected_value"],
    [
        (types.DATE(), "2023-05-01", "DATE '2023-05-01'"),
        (types.TIMESTAMP(), "2023-05-01", "TIMESTAMP '2023-05-01'"),
        (types.VARCHAR(), "2023-05-01", "'2023-05-01'"),
        (types.INT(), 1234, "1234"),
    ],
)

to

@mock.patch("superset.db_engine_specs.presto.PrestoEngineSpec.latest_partition")
@pytest.mark.parametrize(
    ["column_type", "column_value", "expected_value"],
    [
        ("DATE", "2023-05-01", "DATE '2023-05-01'"),
        ("TIMESTAMP", "2023-05-01", "TIMESTAMP '2023-05-01'"),
        ("VARCHAR", "2023-05-01", "'2023-05-01'"),
        ("INT", 1234, "1234"),
    ],
)

Without my patch, this triggered the exact same error message we were seeing in production:

sqlalchemy.exc.ArgumentError: 'SchemaItem' object, such as a 'Column' or a 'Constraint' expected, got 'DATE'

With my patch applied, all tests/unit_tests/db_engine_specs/test_presto.py::test_where_latest_partition tests passed.

@michael-s-molina
Copy link
Member

michael-s-molina commented Jan 26, 2024

@brouberol Thank you for the fix. Could you check if this PR will also close #26740 and #26768? If so, could you link them in the Development panel so that they are automatically closed when this PR is merged? Thanks.

Screenshot 2024-01-26 at 10 53 45

@brouberol
Copy link
Contributor Author

@michael-s-molina You're welcome! I believe #26740 should indeed be fixed by this patch. However, the Cursor.__init__() got an unexpected keyword argument 'http_scheme' message in #26768 makes me believe that it is unrelated.

Copy link

codecov bot commented Jan 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.40%. Comparing base (c77fc7d) to head (31f5340).
Report is 1590 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #26782      +/-   ##
==========================================
+ Coverage   67.19%   72.40%   +5.20%     
==========================================
  Files        1899     1911      +12     
  Lines       74368    86320   +11952     
  Branches     8274     8274              
==========================================
+ Hits        49969    62496   +12527     
+ Misses      22344    21769     -575     
  Partials     2055     2055              
Flag Coverage Δ
hive 53.24% <0.00%> (?)
mysql 78.95% <50.00%> (+0.95%) ⬆️
postgres 79.04% <50.00%> (+0.94%) ⬆️
presto 54.98% <0.00%> (?)
python 84.88% <100.00%> (+6.64%) ⬆️
sqlite 78.50% <50.00%> (+0.88%) ⬆️
unit 59.50% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@brouberol
Copy link
Contributor Author

brouberol commented Jan 26, 2024

I changed the patch slightly, as I didn't realize you could create an SQLAlchemy Column of NullType, and skipping cases in which the "type" field of ResultSetColumnType objects was None caused some integrations tests to fail.

>>> from sqlalchemy import Column
>>> Column("name", None)
Column('name', NullType(), table=None)

@rabindragogoi
Copy link

Thanks @brouberol : The issue that I faced for presto got fixed.

@brouberol
Copy link
Contributor Author

brouberol commented Jan 30, 2024

@rabindragogoi just to clarify, do you mean that this patch fixes your issue, or that your issue was already fixed?

Edit: nevermind, I just saw your comment. My pleasure :)

On another note, I'm unsure if the MySQL integration test failure and the pre-commit changelog check failure are related to my change. Could anyone with more experience than myself could advise if there's anything I should do there? Thanks!

@rabindragogoi
Copy link

@brouberol : I mean the fix that you have provided in presto.py file, I implemented that and now I am able to see the schema in sql lab. Thanks a lot for the fix . :)

@rusackas
Copy link
Member

rusackas commented Feb 7, 2024

Looks like the precommit hook is failing on CI... you probably just need to run that locally to fix/commit whatever linting errors are happening there.

…able/schemas

This patch fixes failures occuring when performing a schema preview of a
Presto table.

The `PrestoBaseEngineSpec.where_latest_partition` attempts to construct
SQLAlchemy `Column` objects based on a name and a type. However, this leads
to the following error in our case:

```console
sqlalchemy.exc.ArgumentError: 'SchemaItem' object, such as a 'Column' or a 'Constraint' expected, got 'VARCHAR'
```

This comes from the fact that we run `Column('column_name', 'VARCHAR')` instead of
`Column('column_name', sqlalchemy.types.VARCHAR)`. We fix this particular error by
passing the _actual_ type class, and not just a string.

> [!NOTE]
> This also fixes the same issue for Trino tables, as `TrinoEngineSpec` inherits
> from `PrestoBaseEngineSpec`, the Presto db client class.

Fixes apache#25962
Fixes apache#25962
@brouberol
Copy link
Contributor Author

brouberol commented Feb 7, 2024

Indeed, thanks @rusackas 👍

I've rebased on upstream master and ran pre-commit run -a locally, which passed:

~/wmf/apache/superset fix-25636 ⇣1⇡152 ❯ pre-commit run -a
auto-walrus..............................................................Passed
pyupgrade................................................................Passed
pycln....................................................................Passed
isort....................................................................Passed
mypy.....................................................................Passed
pip-compile-multi verify.................................................Passed
check docstring is first.................................................Passed
check for added large files..............................................Passed
check yaml...............................................................Passed
debug statements (python)................................................Passed
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
black....................................................................Passed
prettier.................................................................Passed
Blacklist................................................................Passed
Helm Docs................................................................Passed

So did running the associated unit tests:

~/wmf/apache/superset fix-25636 ❯ pytest tests/unit_tests/db_engine_specs/test_{presto,trino}.py --disable-warnings
===================================================================================================== test session starts =======================================================================
platform darwin -- Python 3.9.18, pytest-7.3.1, pluggy-1.2.0
rootdir: /Users/br/wmf/apache/superset
configfile: pytest.ini
plugins: pyfakefs-5.2.2, mock-3.10.0, cov-4.0.0
collected 62 items

tests/unit_tests/db_engine_specs/test_presto.py .................
tests/unit_tests/db_engine_specs/test_trino.py .............................................

===================================================================================================== 62 passed in 0.54s ========================================================================
~/wmf/apache/superset fix-25636 ❮

Here's to hoping for a green CI 🍸

@brouberol
Copy link
Contributor Author

I see that the CI is all green. Would the PR be safe to merge?

@rusackas
Copy link
Member

rusackas commented Feb 29, 2024

Seems CI is happy, so technically mergeable! I'll ping a few Trino/Presto SMEs (@john-bodley @villebro @nytai ) to hopefully give it a final review.

@rusackas
Copy link
Member

Seems this is still just awaiting a stamp... I'm not an expert here, so I'm nagging people for review again :)

@rusackas rusackas requested a review from michael-s-molina May 13, 2024 18:12
Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me! Thanks for fixing this!

@brouberol
Copy link
Contributor Author

My pleasure!

@rusackas
Copy link
Member

Somehow not all checks are running. I'm going to close/reopen this to kick-start CI.

@rusackas rusackas closed this May 13, 2024
@rusackas rusackas reopened this May 13, 2024
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@villebro villebro merged commit afdf038 into apache:master May 13, 2024
71 checks passed
aehanno pushed a commit to kosmos-education/superset that referenced this pull request May 14, 2024
@michael-s-molina michael-s-molina added the v4.0 Label added by the release manager to track PRs to be included in the 4.0 branch label May 15, 2024
michael-s-molina pushed a commit that referenced this pull request May 15, 2024
jzhao62 pushed a commit to jzhao62/superset that referenced this pull request May 16, 2024
EnxDev pushed a commit to EnxDev/superset that referenced this pull request May 31, 2024
@mistercrunch mistercrunch added 🍒 4.0.2 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels labels Jul 24, 2024
vinothkumar66 pushed a commit to vinothkumar66/superset that referenced this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/S v4.0 Label added by the release manager to track PRs to be included in the 4.0 branch 🍒 4.0.2 🚢 4.1.0
Projects
None yet
8 participants