-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Benchmarks for int_pow Methods. #119430
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @cuviper (or someone else) soon. Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (
|
Surprisingly, at least on Aarch64, checked pow unwrapped is very slightly faster than wrapping pow, while checked pow unwrapped unchecked is among the slowest. I guessed the assume statement produced from the unwrap unchecked seems to be helping more than it is hurting. I found a way to unwrap it without generating an assume statement. *option.as_slice().as_ptr().cast::<$t>() When I benched it though, and it was significantly slower than the unwrap uncheck. On Aarch64 but not x86_64, overflowing i128 is dramatically (~5x) slower than overflowing u128. This is because on Aarch64, signed overflowing multiply does a compuler-rt function call while unsigned overflowing multiply is inlined. I'll have to check what happens on other architectures with Compiler Explorer. To work around this, we can take advantage of the fact that any integer to an even power is non-negative. I've already wrote a similar algorithm in Rust before. |
|
It could also have been fixed by removing a semicolon instead.
Also simplified the macros
f4742d7
to
c65c35b
Compare
@cuviper Updated based on your feedback. I also rebased and force-pushed it. |
Looks good, thanks! @bors r+ rollup |
Add Benchmarks for int_pow Methods. There is quite a bit of room for improvement in performance of the `int_pow` family of methods. I added benchmarks for those functions. In particular, there are benchmarks for small compile-time bases to measure the effect of rust-lang#114390. ~~I added a lot (245), but all but 22 of them are marked with `#[ignore]`. There are a lot of macros, and I would appreciate feedback on how to simplify them.~~ ~~To run benches relevant to rust-lang#114390, use `./x bench core --stage 1 -- pow_base_const --include-ignored`.~~
The job Click to see the possible cause of the failure (guessed by this bot)
|
💔 Test failed - checks-actions |
@bors retry |
Is this definitely a random error, or could it have anything to do with my changes? If I interpreted the logs correctly, it was the drift test that failed, and that seems unrelated to the changes. |
This message comes up occasionally on that particular runner, unrelated to any PR changes:
|
@cuviper Is there a way to run or view the results of the library benchmarks on all the platforms that the CI tests? |
☀️ Test successful - checks-actions |
Most of the targets are cross-compiled in CI, and AFAIK we don't even run those benchmarks on native targets in CI. They're really only useful for developers to run. For more holistic benchmarking, see https://perf.rust-lang.org/ (source /~https://github.com/rust-lang/rustc-perf) |
Finished benchmarking commit (6029085): comparison URL. Overall result: ❌ regressions - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 666.002s -> 666.091s (0.01%) |
There is quite a bit of room for improvement in performance of the
int_pow
family of methods. I added benchmarks for those functions. In particular, there are benchmarks for small compile-time bases to measure the effect of #114390.I added a lot (245), but all but 22 of them are marked with#[ignore]
. There are a lot of macros, and I would appreciate feedback on how to simplify them.To run benches relevant to #114390, use./x bench core --stage 1 -- pow_base_const --include-ignored
.