Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy select(len()) not collecting on groupby aggregated LazyFrame with added row_index #20337

Closed
2 tasks done
jpfeuffer opened this issue Dec 17, 2024 · 4 comments · Fixed by #20223
Closed
2 tasks done
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@jpfeuffer
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

strs = [str(i) for i in range(3)]
q = pl.LazyFrame({"a": strs , "b": strs, "c": strs, "d": range(3)})


q = q.group_by(pl.col("c")).agg(
    (pl.col("d") * j).alias(f"mult {j}")
    for j in [1,2]
)

q = q.with_row_index("foo")
print(q.collect())
print(q.select(pl.len()).collect())

Log output

shape: (3, 4)
┌─────┬─────┬───────────┬───────────┐
│ foo ┆ c   ┆ mult 1    ┆ mult 2    │
│ --- ┆ --- ┆ ---       ┆ ---       │
│ u32 ┆ str ┆ list[i64] ┆ list[i64] │
╞═════╪═════╪═══════════╪═══════════╡
│ 0   ┆ 0   ┆ [0]       ┆ [0]       │
│ 1   ┆ 2   ┆ [2]       ┆ [4]       │
│ 2   ┆ 1   ┆ [1]       ┆ [2]       │
└─────┴─────┴───────────┴───────────┘
thread '<unnamed>' panicked at crates/polars-plan/src/plans/optimizer/projection_pushdown/functions/mod.rs:136:30:
called `Result::unwrap()` on an `Err` value: SchemaFieldNotFound(ErrString("d"))

---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[28], line 14
     12 q = q.with_row_index("foo")
     13 print(q.collect())
---> 14 print(q.select(pl.len()).collect())

File ~/Library/Caches/pypoetry/virtualenvs/pdp-Na9upAWA-py3.12/lib/python3.12/site-packages/polars/lazyframe/frame.py:2031, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, collapse_joins, no_optimization, streaming, engine, background, _eager, **_kwargs)
   2029 # Only for testing purposes
   2030 callback = _kwargs.get("post_opt_callback", callback)
-> 2031 return wrap_df(ldf.collect(callback))

PanicException: called `Result::unwrap()` on an `Err` value: SchemaFieldNotFound(ErrString("d"))

Issue description

note, that it works without the row_index

Expected behavior

print 3

Installed versions

--------Version info---------
Polars:              1.17.1
Index type:          UInt32
Platform:            macOS-14.7.1-arm64-arm-64bit
Python:              3.12.4 (main, Jun  6 2024, 18:26:44) [Clang 15.0.0 (clang-1500.3.9.4)]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
boto3                1.34.69
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2024.10.0
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           <not installed>
nest_asyncio         1.6.0
numpy                2.2.0
openpyxl             <not installed>
pandas               2.2.3
pyarrow              18.0.0
pydantic             2.10.3
pyiceberg            <not installed>
sqlalchemy           2.0.36
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@jpfeuffer jpfeuffer added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Dec 17, 2024
@cmdlineluser
Copy link
Contributor

I'm not sure if there was a previous issue, but this does appear to be fixed on main:

shape: (1, 1)
┌─────┐
│ len │
│ --- │
│ u32 │
╞═════╡
│ 3   │
└─────┘

@jpfeuffer
Copy link
Author

Ah , that sounds promising. Can you replicate it on 1.17.1, though? Or am I hitting an edge case?

@cmdlineluser
Copy link
Contributor

Sorry - yes, should have included that information also.

uv run --isolated --with 'polars==1.17.1' 20337.py
# shape: (3, 4)
# ┌─────┬─────┬───────────┬───────────┐
# │ foo ┆ c   ┆ mult 1    ┆ mult 2    │
# │ --- ┆ --- ┆ ---       ┆ ---       │
# │ u32 ┆ str ┆ list[i64] ┆ list[i64] │
# ╞═════╪═════╪═══════════╪═══════════╡
# │ 0   ┆ 2   ┆ [2]       ┆ [4]       │
# │ 1   ┆ 1   ┆ [1]       ┆ [2]       │
# │ 2   ┆ 0   ┆ [0]       ┆ [0]       │
# └─────┴─────┴───────────┴───────────┘
# thread '<unnamed>' panicked at crates/polars-plan/src/plans/optimizer/projection_pushdown/functions/mod.rs:136:30:
# called `Result::unwrap()` on an `Err` value: SchemaFieldNotFound(ErrString("d"))
# note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
# Traceback (most recent call last):
#   File "/Users/goldfish/20337.py", line 14, in <module>
#     print(q.select(pl.len()).collect())
#   File "/Users/goldfish/.cache/uv/archive-v0/3Ol4nDcRYvJjrhzb9L7rt/lib/python3.10/site-packages/polars/lazyframe/frame.py", line 2031, in collect
#     return wrap_df(ldf.collect(callback))
# pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: SchemaFieldNotFound(ErrString("d"))

Just browsing the recent commits, there is one that specifically mentions select(len())

@nameexhaustion
Copy link
Collaborator

Fixed by #20223

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
3 participants