Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): Fix row_by_key typing #19888

Merged
merged 4 commits into from
Nov 26, 2024
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 42 additions & 1 deletion py-polars/polars/dataframe/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -10349,14 +10349,55 @@ def rows(
else:
return self._df.row_tuples()

@overload
def rows_by_key(
self,
key: ColumnNameOrSelector | Sequence[ColumnNameOrSelector],
*,
named: Literal[False] = ...,
include_key: bool = ...,
unique: Literal[False] = ...,
) -> dict[Any, Iterable[tuple[Any, ...]]]: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct, the values of the dictionary are not iterables of tuple, they should be a "list of Any":

import polars as pl
df = pl.DataFrame({
       "w": ["a", "b"],
       "x": ["q", "q"]
})
print(repr(df.rows_by_key(key=["w"])))
# defaultdict(<class 'list'>, {'a': ['q'], 'b': ['q']})

Actually all return types here are not correct. See my previous comment on this PR for the correct types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arff, the type depends on the number of columns, see:

import polars as pl
df = pl.DataFrame({
       "w": ["a", "b"],
       "x": ["q", "q"],
       "z": ["v", "v"]
})
print(repr(df.rows_by_key(key=["w"], named=False, unique=True)))
# {'a': ('q', 'v'), 'b': ('q', 'v')}

But in any case, you are right, this is not correct.

@overload
def rows_by_key(
self,
key: ColumnNameOrSelector | Sequence[ColumnNameOrSelector],
*,
named: Literal[False] = ...,
include_key: bool = ...,
unique: Literal[True],
) -> dict[Any, tuple[Any, ...]]: ...
@overload
def rows_by_key(
self,
key: ColumnNameOrSelector | Sequence[ColumnNameOrSelector],
*,
named: Literal[True],
include_key: bool = ...,
unique: Literal[False] = ...,
) -> dict[Any, Iterable[dict[str, Any]]]: ...
@overload
def rows_by_key(
self,
key: ColumnNameOrSelector | Sequence[ColumnNameOrSelector],
*,
named: Literal[True],
include_key: bool = ...,
unique: Literal[True],
) -> dict[Any, dict[str, Any]]: ...
def rows_by_key(
self,
key: ColumnNameOrSelector | Sequence[ColumnNameOrSelector],
*,
named: bool = False,
include_key: bool = False,
unique: bool = False,
) -> dict[Any, Iterable[Any]]:
) -> (
dict[Any, Iterable[tuple[Any, ...]]]
| dict[Any, tuple[Any, ...]]
| dict[Any, Iterable[dict[str, Any]]]
| dict[Any, dict[str, Any]]
):
"""
Returns all data as a dictionary of python-native values keyed by some column.

Expand Down
Loading