Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic Exception with map_elements #17499

Closed
2 tasks done
GPapagiannopoulos opened this issue Jul 8, 2024 · 0 comments · Fixed by #20417
Closed
2 tasks done

Panic Exception with map_elements #17499

GPapagiannopoulos opened this issue Jul 8, 2024 · 0 comments · Fixed by #20417
Assignees
Labels
accepted Ready for implementation bug Something isn't working needs repro Bug does not yet have a reproducible example python Related to Python Polars

Comments

@GPapagiannopoulos
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

Log output

thread 'polars-0' panicked at crates/polars-plan/src/plans/conversion/scans.rs:204:56:
could not mmap file: IO { error: Os { code: 19, kind: Uncategorized, message: "No such device" }, msg: None }
--- PyO3 is resuming a panic after fetching a PanicException from Python. ---
Python stack trace below:
--- PyO3 is resuming a panic after fetching a PanicException from Python. ---
Python stack trace below:
PanicException: could not mmap file: IO { error: Os { code: 19, kind: Uncategorized, message: "No such device" }, msg: None }
---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
File <command-1692981537052478>, line 1
----> 1 allDerailmentsSegments.collect()

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-119d08cf-b9fa-4f34-b578-0aa914bedafa/lib/python3.11/site-packages/polars/lazyframe/frame.py:1942, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, no_optimization, streaming, background, _eager, **_kwargs)
   1939 # Only for testing purposes atm.
   1940 callback = _kwargs.get("post_opt_callback")
-> 1942 return wrap_df(ldf.collect(callback))

PanicException: could not mmap file: IO { error: Os { code: 19, kind: Uncategorized, message: "No such device" }, msg: None }

Issue description

I am currently working with a large dataset as part of a broader project modelling derailments. Because the dataset is too big to work within memory I am working almost exclusively with the Lazy API. This is one of the few exceptions where I have had to use map_elements to utilize a UDF.
For clarity this is the statement that returns the error when collected:
allDerailmentsDistance.with_columns(
pl.struct(["Section_ID", "direction", "Cone_length(m)"]).map_elements(lambda x: special_segments_covered(section_ID = x["Section_ID"],
direction = x["direction"],
cone_length = x["Cone_length(m)"]), return_dtype = pl.Int8).alias("Cone_length(sections)")

The confusing thing is that I only get an error when working with the full dataset but not when working with a subset. What is also very confusing is that this was working as intended on Friday (05/07/2024) and despite not making any changes to my code or the environment in which I am working in it now returns an error. It doesn't appear to be an issue with the RAM and streaming during collection doesn't solve the issue. Working with a DataFrame doesn't solve the issue.

Expected behavior

debug

This is the expected output - which only works when dealing with a subset of the data.

Installed versions

--------Version info---------
Polars:               1.1.0
Index type:           UInt32
Platform:             Linux-5.15.0-1061-azure-x86_64-with-glibc2.35
Python:               3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          2.2.1
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2023.5.0
gevent:               <not installed>
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           3.7.2
nest_asyncio:         1.5.6
numpy:                1.23.5
openpyxl:             <not installed>
pandas:               1.5.3
pyarrow:              14.0.1
pydantic:             1.10.6
pyiceberg:            <not installed>
sqlalchemy:           1.4.39
torch:                2.2.2+cpu
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>```

</details>
@GPapagiannopoulos GPapagiannopoulos added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jul 8, 2024
@coastalwhite coastalwhite added needs repro Bug does not yet have a reproducible example and removed needs triage Awaiting prioritization by a maintainer labels Jul 9, 2024
@c-peters c-peters added the accepted Ready for implementation label Dec 29, 2024
@github-project-automation github-project-automation bot moved this to Ready in Backlog Dec 29, 2024
@c-peters c-peters moved this from Ready to Done in Backlog Dec 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation bug Something isn't working needs repro Bug does not yet have a reproducible example python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants