-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiling hello world on RPi spends 4 seconds in coherence checking #22068
Comments
cc me |
Above test is on a fairly heavily loaded machine. With light load, it's 2.6 seconds in coherence, total real time 8.4s, user time 7.4s. |
Yeah, single core ARM devices are pretty slow. On my beaglebone, I get:
The RPi does spend more relative time in coherence checking in comparison to other devices, but all of them spend a significant amount of time there: Raspberry Pi: 2.6s out of 8.4s (31%) (your numbers) No idea why though. Given that the three ARM devices are executing the same binary/instructions, perhaps is just a difference in CPU models (ARMv6 vs ARMv7a)? |
This might be due to the RBML/EBML decoding code that looks like it would cause lots of unaligned loads. Older ARMs probably handle those worse than newer ones. |
Oh, interesting! Is coherence checking a hot spot for metadata parsing? |
I guess it's not in general, but the metadata is loaded on demand, and coherence seems to be the first thing that wants some. |
I've hit this on my rid3 demo: loading all 120 Turns out coherence checking is using 16MB per crate which uses I've looked at it, and I don't know what could be using all the external Here are "desktop" timings for
|
So it seems the "load all impls ever" logic in coherence is a workaround for the fact that However, my initial fix doesn't actually seem to load any impls outside of the trait's definition crate, and I have no idea where else to look. EDIT: turns out I was looking for a tag which wasn't getting encoded in the list of all the impls in a crate. Should have a patch soon, if nothing else comes up. |
The loop to load all the known impls from external crates seems to have been used because `ty::populate_implementations_for_trait_if_necessary` wasn't doing its job, and solely relying on it resulted in loading only impls in the same crate as the trait. Coherence for `librustc` was reduced from 18.310s to 0.610s, from stage1 to stage2. Interestingly, type checking also went from 46.232s to 42.003s, though that could be noise or unrelated improvements. On a smaller scale, `fn main() {}` now spends 0.003s in coherence instead of 0.368s, which fixes #22068. It also peaks at only 1.2MB, instead of 16MB of heap usage.
My
rustc
is from @japaric's nightlies. It's pretty slow overall, but what stands out is that we spend nearly 4s on coherence checking.The text was updated successfully, but these errors were encountered: