Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): Fix row_by_key typing #19888

Merged
merged 4 commits into from
Nov 26, 2024

Conversation

gab23r
Copy link
Contributor

@gab23r gab23r commented Nov 20, 2024

fixes: #19887

@gab23r gab23r changed the title python(fix): Fix row_by_key typing fix(python): Fix row_by_key typing Nov 20, 2024
@github-actions github-actions bot added fix Bug fix python Related to Python Polars and removed title needs formatting labels Nov 20, 2024
@gab23r
Copy link
Contributor Author

gab23r commented Nov 20, 2024

I cant find why mypy complains...
@alexander-beedie could you help me out ?

Copy link

codecov bot commented Nov 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.50%. Comparing base (5f61d70) to head (39f86f1).
Report is 38 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main   #19888       +/-   ##
===========================================
+ Coverage   59.38%   79.50%   +20.11%     
===========================================
  Files        1554     1556        +2     
  Lines      215612   216407      +795     
  Branches     2452     2456        +4     
===========================================
+ Hits       128035   172046    +44011     
+ Misses      87019    43803    -43216     
  Partials      558      558               

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hamdanal
Copy link
Contributor

Thanks for fixing this, I am having the same problem in my code.

This function should have 4 overloads as it branches on both named and unique arguments:

  • If unique=False (default), we get a list of objects (type unknown statically), if it is True, we get one object (or a tuple, depending on the number of columns)
  • If named=False (default), we get an unknown object, if it is True, unknown objects get wrapped in a dict with string keys

Your current implementation only considers one combination out of four. The full signature should be this:

    @overload
    def rows_by_key(
        self,
        key: ColumnNameOrSelector | Sequence[ColumnNameOrSelector],
        *,
        named: Literal[False] = False,
        include_key: bool = False,
        unique: Literal[False] = False,
    ) -> dict[Any, list[Any]]: ...
    @overload
    def rows_by_key(
        self,
        key: ColumnNameOrSelector | Sequence[ColumnNameOrSelector],
        *,
        named: Literal[False] = False,
        include_key: bool = False,
        unique: Literal[True],
    ) -> dict[Any, Any]: ...
    @overload
    def rows_by_key(
        self,
        key: ColumnNameOrSelector | Sequence[ColumnNameOrSelector],
        *,
        named: Literal[True],
        include_key: bool = False,
        unique: Literal[False] = False,
    ) -> dict[Any, list[dict[str, Any]]]: ...
    @overload
    def rows_by_key(
        self,
        key: ColumnNameOrSelector | Sequence[ColumnNameOrSelector],
        *,
        named: Literal[True],
        include_key: bool = False,
        unique: Literal[True],
    ) -> dict[Any, dict[str, Any]]: ...
    def rows_by_key(
        self,
        key: ColumnNameOrSelector | Sequence[ColumnNameOrSelector],
        *,
        named: bool = False,
        include_key: bool = False,
        unique: bool = False,
    ) -> dict[Any, Any]:

@gab23r
Copy link
Contributor Author

gab23r commented Nov 25, 2024

Thanks hamdanal, I modified the signature based on your implementation and fixed the mypy error.

named: Literal[False] = ...,
include_key: bool = ...,
unique: Literal[False] = ...,
) -> dict[Any, Iterable[tuple[Any, ...]]]: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct, the values of the dictionary are not iterables of tuple, they should be a "list of Any":

import polars as pl
df = pl.DataFrame({
       "w": ["a", "b"],
       "x": ["q", "q"]
})
print(repr(df.rows_by_key(key=["w"])))
# defaultdict(<class 'list'>, {'a': ['q'], 'b': ['q']})

Actually all return types here are not correct. See my previous comment on this PR for the correct types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arff, the type depends on the number of columns, see:

import polars as pl
df = pl.DataFrame({
       "w": ["a", "b"],
       "x": ["q", "q"],
       "z": ["v", "v"]
})
print(repr(df.rows_by_key(key=["w"], named=False, unique=True)))
# {'a': ('q', 'v'), 'b': ('q', 'v')}

But in any case, you are right, this is not correct.

@alexander-beedie
Copy link
Collaborator

I cant find why mypy complains... @alexander-beedie could you help me out ?

Looks like I have been beaten to the punch - thanks @hamdanal ;)

@alexander-beedie alexander-beedie merged commit 0b1d520 into pola-rs:main Nov 26, 2024
13 checks passed
@gab23r gab23r deleted the rows_by_key_typing branch November 29, 2024 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong row_by_key typing
3 participants