Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

str.starts_with producing incorrect results on 1.15.0 #20003

Closed
2 tasks done
rhshadrach-8451 opened this issue Nov 26, 2024 · 1 comment · Fixed by #20006
Closed
2 tasks done

str.starts_with producing incorrect results on 1.15.0 #20003

rhshadrach-8451 opened this issue Nov 26, 2024 · 1 comment · Fixed by #20006
Assignees
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer P-high Priority: high python Related to Python Polars

Comments

@rhshadrach-8451
Copy link

rhshadrach-8451 commented Nov 26, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df = pl.DataFrame({"a": ["P 17:00", "P 17:00-20:00", "P 20:00"]})
print(df.select(pl.col("a").str.starts_with("P")))

Log output

shape: (3, 1)
┌───────┐
│ a     │
│ ---   │
│ bool  │
╞═══════╡
│ true  │
│ false │
│ true  │
└───────┘

Issue description

In Polars 1.15.0, str.starts_with is producing incorrect results. On 1.14.0, the results are correct.

Looking at the release notes, this might be related to #19583 (just a guess). cc @ritchie46

Expected behavior

shape: (3, 1)
┌───────┐
│ a     │
│ ---   │
│ bool  │
╞═══════╡
│ true  │
│ true  │
│ true  │
└───────┘

Installed versions

--------Version info---------
Polars:              1.15.0
Index type:          UInt32
Platform:            macOS-14.7-arm64-arm-64bit
Python:              3.10.15 (main, Sep  7 2024, 00:20:06) [Clang 15.0.0 (clang-1500.3.9.4)]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
boto3                <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            0.12.0
fsspec               2024.9.0
gevent               <not installed>
google.auth          2.35.0
great_tables         <not installed>
matplotlib           3.9.2
nest_asyncio         1.6.0
numpy                1.26.4
openpyxl             3.1.5
pandas               2.2.3
pyarrow              18.0.0
pydantic             2.9.2
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           3.2.0
@rhshadrach-8451 rhshadrach-8451 added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Nov 26, 2024
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Nov 26, 2024

Confirmed; one of our unit tests at work just started failing on 1.15:

pl.DataFrame({
    "s": ["_結局_きた_よ_", "你好%世界", "_x_"],
}).with_columns(
    s_starts_with_underscore=pl.col("s").str.starts_with("_"),
)
# ┌────────────────┬──────────────────────────┐
# │ s              ┆ s_starts_with_underscore │
# │ ---            ┆ ---                      │
# │ str            ┆ bool                     │
# ╞════════════════╪══════════════════════════╡
# │ _結局_きた_よ_ ┆ false                    │  # << incorrect
# │ 你好%世界      ┆ false                    │
# │ _x_            ┆ true                     │
# └────────────────┴──────────────────────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer P-high Priority: high python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants