-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: fix encode_varint{32,64} #709
Conversation
The previous code was full of jumps that aren't easily predictable. This version is less code, generates less code, and performs better (well, on x86_64 anyway, I am not an aarch64 expert). All while using safe Rust.
Interestingly, the miri failure is in std: fn main() {
match 2u64.checked_pow(64) {
Some(n) => println!("2^64: {n}"),
None => {}
};
} Given this is a
I have not reported this upstream nor looked for existing issues yet. |
I opened an issue: rust-lang/rust#120537. |
This greatly cleans up the assembly as well, but this function is not as likely to matter from a perf perspective.
Merged the part about |
I merged in master, and Miri looks good now 👍🏻 |
OK, if this is still an issue, some benchmark is needed. Smaller machine code does not mean code is faster. (Project is not very well maintained now, so long responses, sorry.) |
Re-analyzing this, I made a mistake previously: the jumps are actually quite predictable. With this in mind, I would expect this PR to be a slight regression (since it isn't unrolled), but it gave me some ideas to improve it... ...but then I realized that someone already beat me to these ideas: /~https://github.com/as-com/varint-simd. I think it's worth considering using their implementation on x86_64, but there's no point in doing it in this PR. So for now, we'll just count the |
The previous code was full of jumps that aren't easily predictable. This version is less code, generates less code, and should perform better (well, on x86_64 anyway, I am not an aarch64 expert). All while using safe Rust.
You can compare assembly on godbolt. The current version of
encode_varint64
is 99 lines long and has ~10 labels, the new version is 27 lines long with only 3 labels.I haven't included benchmarking numbers because I wasn't sure what characteristics you'd be looking for. Based on the x86_64 assembly, it should definitely out-perform. Anything in particular you are looking for there?
Edit: I also simplified
encoded_varint64_len
while I was at it. It gets nicer assembly as a result as well but obviously matters less.