-
Notifications
You must be signed in to change notification settings - Fork 787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pyfunction: refactor argument extraction #1440
Conversation
cc @1frag thanks for reporting on Gitter |
Given the clippy failure I might rework this PR to also provide a solution for #1439 at the same time. |
I've refactored the argument extraction in |
6e5e6af
to
8d4f8f4
Compare
c072b29
to
9a01a81
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks!
I think required/total_positional_parameters
are nice but don't understand some parts in the implementation.
Thanks for the reviews. I pushed a couple of commits addressing the comments and making the keyword arguments portion clearer (hopefully). |
I'm sorry, but I'm struggling to understand all changes. Comments may delay. |
No problem. Feel free to ask as many questions on the code as necessary. I might need to add better inline comments & documentation. |
Just curious, why did you change the current for param in positional_params + keyword_params:
if param in kwargs:
... loop to for name, value in kwargs:
if name in positional_params:
....
elif name in keyword_params:
.... ? |
Great question. There's a few reasons I did this, however I am happy to change this design:
I know the algorithmic complexity of the new method looks worse. For most functions we're never going to have more than 10 parameters, so the fact that the new algorithm is O(n) whereas the old one was O(1) matters less. The cost of allocating the PyString (or even for Vectorcall, a whole new I'm primarily motivated by benchmarks - in the end we should pick whichever algorithm is fastest. But I think we should first implement vectorcall and the owned objects API, because they will also have big performance effects. So for now I wanted to keep the algorithm the same as cpython for simplicity of maintenance. |
Thank you for your explanation. I don't care about the computational complexity here, let's choose a simpler one :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I left some comments.
Thanks for the review! I will add the extra tests and try out all the suggested changes over the weekend. As (imo) this is performance-sensitive code I'll try each change in turn and see whether it has any noticeable effect on performance. I suspect that your suggested |
3e2a5a5
to
7c5b162
Compare
@@ -0,0 +1,33 @@ | |||
use pyo3::prelude::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this benchmark is useful.
It may be difficult to observe some speedup when trying to extract only a few args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually although the variance is quite high, there's still a reasonably clear signal as to how good a change is using these benchmarks. I wouldn't take these numbers super seriously for any kind of scientific conclusion. However I can see (for example) that there's still some overhead calling pyo3 functions vs CPython functions, and also that returning Vec
from extract_keyword_arguments
leads to noticeable slowdown.
Some example numbers I see at the moment:
-------------------------------------------------------------------------------------------- benchmark: 6 tests -------------------------------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers OPS (Mops/s) Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_no_args_py 60.0000 (1.0) 4,658.0000 (2.86) 67.6030 (1.0) 34.3033 (1.08) 66.0000 (1.0) 2.0000 (inf) 1857;3556 14.7923 (1.0) 149254 100
test_no_args 70.0000 (1.17) 1,627.0000 (1.0) 78.7283 (1.16) 31.8182 (1.0) 74.0000 (1.12) 2.0000 (inf) 3261;8374 12.7019 (0.86) 128206 100
test_args_and_kwargs_py 219.0476 (3.65) 5,566.6667 (3.42) 245.2637 (3.63) 57.6773 (1.81) 242.8571 (3.68) 4.7619 (inf) 1156;10254 4.0772 (0.28) 196079 21
test_mixed_args_py 288.2353 (4.80) 6,329.4118 (3.89) 320.9264 (4.75) 83.6458 (2.63) 317.6471 (4.81) 5.8824 (inf) 1406;18940 3.1160 (0.21) 192308 17
test_args_and_kwargs 299.9999 (5.00) 86,200.0001 (52.98) 412.2963 (6.10) 466.4672 (14.66) 400.0000 (6.06) 0.0000 (1.0) 359;47106 2.4254 (0.16) 163935 1
test_mixed_args 340.0000 (5.67) 21,400.0000 (13.15) 385.6509 (5.70) 207.9585 (6.54) 380.0000 (5.76) 15.0000 (inf) 1187;1616 2.5930 (0.18) 128206 20
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
At the moment it's clear we're a bit slower than CPython for function calls, but I hope vectorcall support and #1308 will improve things.
Later I would like to add more numbers to these benchmarks, for example #661.
7c5b162
to
f74daa6
Compare
f74daa6
to
52ca30a
Compare
52ca30a
to
0e29ef5
Compare
0e29ef5
to
e86c16d
Compare
LGTM 💯 , thank you! |
This PR fixes a bug in
#[pyfunction]
where a keyword-only argument is incorrectly rejected when used in combination with*args
:This is an example of the problem code:
And this is the call which didn't work:
At the same time I also changed error messages from
argument: foo
toargument 'foo'
, so that they match what CPython errors look like.