Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(python): Much faster Series construction from subclasses of standard Python types #20166

Merged

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Dec 5, 2024

Took another look at Series construction following #20157, and found some additional large speedups (though in a more niche context).

Data composed of types that inherit from the standard builtin types (int, float, str, bytes) currently use the generic new_object constructor; however, it is actually still valid to use the specialised/fast-path constructors in this case. This PR makes a minor update to py_type_to_constructor to detect the inheritance chain and assign the faster constructors.

Example

import polars as pl
import codecs

class BinaryReprInt(int):
    def __new__(cls, value):
        return super().__new__(cls, value)

    def __repr__(self) -> str:
        v = f"{abs(int(self)):>08b}"
        return f"-{v}" if self < 0 else v

    __str__ = __repr__


bn = BinaryReprInt(12345)
print(bn)
# 11000000111001

bn == 12345
# True
int_data = list(range(10_000_000))
custom_int_data = [BinaryReprInt(i) for i in int_data]

with Timer():
    s = pl.Series(custom_int_data)

Timings

Before: 0.2938 secs
 After: 0.0805 secs  (~3½ times faster)  🚀 

Type comparison

Applying this update shows a different degree of speedup for different base types, but all speedups are significant; the table below shows timings when loading 10,000,000 elements that inherit from the given type:

dtype before (secs) after (secs) speedup
bytes 0.9838 0.1071 9.2x
float 0.2554 0.0533 4.8x
int 0.3010 0.0816 3.7x
str 0.5904 0.3167 1.9x

@github-actions github-actions bot added performance Performance issues or improvements python Related to Python Polars labels Dec 5, 2024
@alexander-beedie alexander-beedie changed the title perf(python): Much faster Series construction from subclasses of builtin Python types perf(python): Much faster Series construction from subclasses of standard Python types Dec 5, 2024
Copy link

codecov bot commented Dec 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.63%. Comparing base (cbc0ea0) to head (434301e).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #20166      +/-   ##
==========================================
+ Coverage   79.58%   79.63%   +0.05%     
==========================================
  Files        1564     1564              
  Lines      217475   217448      -27     
  Branches     2472     2473       +1     
==========================================
+ Hits       173070   173162      +92     
+ Misses      43837    43717     -120     
- Partials      568      569       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46
Copy link
Member

I see you learned some Dutch.

@ritchie46 ritchie46 merged commit 7116b72 into pola-rs:main Dec 5, 2024
22 checks passed
@alexander-beedie alexander-beedie deleted the optimise-init-from-inheriting-types branch December 5, 2024 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance issues or improvements python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants