Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add SQL support for the NORMALIZE string function #20705

Merged
merged 2 commits into from
Jan 15, 2025

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Jan 14, 2025

Closes #20692.

Adds SQL support (and docs) for the NORMALIZE1 string function; takes advantage of the recently-added normalize expression (see: #20483).

Available forms2:

  • NFC: Canonical Decomposition, followed by Canonical Composition.
  • NFD: Canonical Decomposition.
  • NFKC: Compatibility Decomposition, followed by Canonical Composition.
  • NFKD: Compatibility Decomposition.
NORMALIZE(strcol, NFC)
NORMALIZE(strcol, NFD)
NORMALIZE(strcol, NFKC)
NORMALIZE(strcol, NFKD)

If the normalization form is not provided, NFC is used by default.

Example

import polars as pl

pl.DataFrame({
    "txt": [
        "Test",
        "Ⓣⓔⓢⓣ",
        "𝕿𝖊𝖘𝖙",
        "𝕋𝕖𝕤𝕥",
        "𝗧𝗲𝘀𝘁",
    ],
}).sql("SELECT NORMALIZE(txt, NFKC) FROM self").to_series()

# shape: (5,)
# Series: 'txt' [str]
# [
#   "Test"
#   "Test"
#   "Test"
#   "Test"
#   "Test"
# ]

Footnotes

  1. PostgreSQL string functions

  2. Unicode normalization forms

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jan 14, 2025
@alexander-beedie alexander-beedie added the A-sql Area: Polars SQL functionality label Jan 14, 2025
Copy link

codecov bot commented Jan 14, 2025

Codecov Report

Attention: Patch coverage is 75.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 79.84%. Comparing base (a0d96f2) to head (a8ac92d).
Report is 18 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-sql/src/functions.rs 75.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #20705      +/-   ##
==========================================
+ Coverage   79.03%   79.84%   +0.81%     
==========================================
  Files        1559     1560       +1     
  Lines      221238   221433     +195     
  Branches     2529     2530       +1     
==========================================
+ Hits       174851   176813    +1962     
+ Misses      45806    44038    -1768     
- Partials      581      582       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alexander-beedie alexander-beedie marked this pull request as draft January 14, 2025 15:33
@alexander-beedie alexander-beedie marked this pull request as ready for review January 14, 2025 16:04
@ritchie46 ritchie46 merged commit 73cb2a2 into pola-rs:main Jan 15, 2025
31 checks passed
@alexander-beedie alexander-beedie deleted the sql-string-normalize branch January 15, 2025 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql Area: Polars SQL functionality enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add NORMALIZE to SQL String Functions while adding expanded Unicode Normalization support
2 participants