-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suboptimal debug codegen in rand
#43299
Comments
Also it looks like if the inner block is repeated in place Clang doesn't compile any more allocas (it's always 86) whereas Rust's number of alloca instances will grow linearly. |
Changing |
From a glance at the assmbly, I suspect Now, EDIT: This doesn't seem to be the main source of the alloca statements inside the loops though, but it's probably still worth looking into. |
So there are some differences between the cpp version you posted and the rust version. Firstly there is the loop. In the C++ version, the loop is a loop over an array of 4 elements, rather than a range iterator as in Rust. See this. It looks as though it avoids using iterators entirely in this case (I don't see any calls to Secondly, there are some differences in the functions on the wrapping type. In Rust the methods on I made an altered cpp version that gives results that are closer to the rust version, by force-inlining the method calls and calling wrapping add/sub through a function as in rust: (I didn't bother altering the loop iteration as that would be a bit more complex.) https://gist.github.com/oyvindln/cc65fc6c479d3347708be972798d49f7 EDIT: Fixed this This gives me 286 allocas, which is a fair bit closer to the rust version. Interestingly, the wrapping_{add/sub} functions have two allocas in c++, and only one in rust. Maybe there is some basic processing that could be done during this inlining to avoid all the redundant copies here. Another interesting observation is that both the rust and c++ versions treat the wrapping type as though it is an aggregate, and thus accesses the inner value by a pointer into the struct. Maybe |
Oh nice find on the I removed the annotations and the example dropped from 337 allocas to 129 allocas (yay!). Unfortunately though Rust still shows linear growth in terms of repetition while C++ still doesn't :( |
I've opened #43367 to remove |
Removing the redundant masking operation in |
We appear to generate a constant 46 allocas today in debug on stable for this program, including when repeating the body of the loop many times. I think this is fixed. The original program (random generator) produces a total of 103 allocas without filtering as well, so I think we're doing much better in this space. |
I found in one of my crates that when compiling in debug mode (
cargo build
) the largest function in the whole executable looked like it was this one. That looked pretty innocuous so I dug a little deeper. It turns out that function creates 1000alloc
instructions using this program asfind.rs
compiling this source:Trying to winnow it down further this source contains 337 allocas:
For comparison, the equivalent C++ program (I think? my C++ isn't very good) has only 86 allocas:
Interestingly enough the C++ program compiles in ~0.035s whereas the Rust program compiles in ~0.2s, over 4x slower. The major passes in Rust 0.07s in translation, 0.03s in expansion, and 0.01s in item-bodies checking and LLVM passes. As to where the other 0.08s went in
-Z time-passes
I'm not sure!The text was updated successfully, but these errors were encountered: