-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression in 1.48.0
#79246
Comments
I suspect a good next step would be to try to narrow down the code being regressed -- maybe with something like a cachegrind diff -- to attempt to isolate the problem, and perhaps propose a fix. |
Just curious, are you using LTO when you notice the regression? I was looking at the code in the PR and there are a lot of small methods which aren't marked with Edit: after reading specifically that you're looking at |
Yes, LTO is enabled for release builds which what is being measured here. |
If it helps narrowing down which code to test: The changes should only affect code using The primary goal of that PR is to reduce malloc/free traffic by reusing allocations where possible. I only measured with the system allocator. Do you use that or a custom one? Part of that optimization also involves switching from loops over There also are a bunch of smaller tweaks relating to allocations in various
Afaiu inline annotations are only needed on non-generic methods and the whole SourceIter methods are generic. |
Small correction, it may also affect some iterators involving |
Ok thanks! That is valuable information. FYI the measured code has some (maybe heavy?) use of |
I continued my investigation and did a
This led me to focus on the I managed to provide a self-contained and (relatively) minimized example on /~https://github.com/marmeladema/bitvec-perf-regression I run the bisection again and it failed with:
but this time it seems to point to I have also bisected the |
The regression still triggers with the latest nightly patchdiff --git a/compiler/rustc_codegen_llvm/src/type_of.rs b/compiler/rustc_codegen_llvm/src/type_of.rs
index 8ea4768f77d..0876907e119 100644
--- a/compiler/rustc_codegen_llvm/src/type_of.rs
+++ b/compiler/rustc_codegen_llvm/src/type_of.rs
@@ -40,9 +40,7 @@ fn uncached_llvm_type<'a, 'tcx>(
// FIXME(eddyb) producing readable type names for trait objects can result
// in problematically distinct types due to HRTB and subtyping (see #47638).
// ty::Dynamic(..) |
- ty::Adt(..) | ty::Closure(..) | ty::Foreign(..) | ty::Generator(..) | ty::Str
- if !cx.sess().fewer_names() =>
- {
+ ty::Adt(..) | ty::Closure(..) | ty::Foreign(..) | ty::Generator(..) | ty::Str => {
let mut name = with_no_trimmed_paths(|| layout.ty.to_string());
if let (&ty::Adt(def, _), &Variants::Single { index }) =
(layout.ty.kind(), &layout.variants)
@@ -58,12 +56,6 @@ fn uncached_llvm_type<'a, 'tcx>(
}
Some(name)
}
- ty::Adt(..) => {
- // If `Some` is returned then a named struct is created in LLVM. Name collisions are
- // avoided by LLVM (with increasing suffixes). If rustc doesn't generate names then that
- // can improve perf.
- Some(String::new())
- }
_ => None,
};
fixes the regression. I am not sure what to do from there. cc @eddyb @davidtwco |
Ugh, choice of names shouldn't cause perf changes. At a glance, this looks like a LLVM bug. I was going to suggest testing with @marmeladema if you have the time, one thing you could try is "bisecting" for the name (i.e. choose between Then you pretty much want to diff the output of |
I have managed to isolate the precise name that needs to be generated to not cause the performance regression:
|
I have tried to decompile --- bitvec.extend.slow.decompiled.c 2020-11-23 23:19:02.971052349 +0000
+++ bitvec.extend.fast.decompiled.c 2020-11-23 23:10:45.442938517 +0000
@@ -99,11 +99,13 @@
do {
bVar2 = *param_2;
if (bVar2 == 2) {
- /* WARNING: Subroutine does not return */
core::panicking::panic
(
"called `Option::unwrap()` on a `None`valuelibrary/std/src/panicking.rslibrary/std/src/env.rsfailed to get environmentvariable ``: already mutablyborrowedlibrary/std/src/sys_common/thread_info.rsassertion failed:c.borrow().is_none()Rust panics must be rethrownRust cannot catch foreignexceptionslibrary/std/src/sys/unix/rwlock.rs<unnamed>note: run with`RUST_BACKTRACE=1` environment variable to display a backtrace\n\' panicked at\'\', rwlock maximum reader count exceededrwlock read lock would result indeadlockthread panicked while panicking. aborting.\nthread panicked whileprocessing panic.aborting.\n.debug_library/std/src/../../backtrace/src/symbolize/gimli/elf.rs/home/adema/code/rust/library/alloc/src/vec.rslibrary/std/src/path.rsinternal error:entered unreachable code"
,0x2b,&PTR_UINT_0014f3c8);
+ do {
+ invalidInstructionException();
+ } while( true );
}
pointer::BitPtr<T>::into_bitslice(*param_1,param_1[1]);
pointer::BitPtr<T>::into_bitslice(*param_1);
Does that help at all, or am I not looking in the right place? |
Are you sure you're looking at the right method? Previously said most of the time is spent in |
I decompiled this one because that's actually the method that is called in my reproducible code. |
LLVM bug report https://bugs.llvm.org/show_bug.cgi?id=51667 and proposed patch https://reviews.llvm.org/D109294. |
@tmiasko I see the llvm bug has been resolved. Has our llvm fork been updated with the fix? If so should this be closed? |
@marmeladema No, the fix will be pulled in with the LLVM 14 upgrade. |
Closing this issue now that we are at LLVM 15. |
Hello everyone!
We detected a performance regression on a closed-source project after an upgrade of rust from version
1.47.0
to version1.48.0
.The benchmark we run internally are using the low-level
perf_event_open
Linux API to measure various hardware performance counters on specific code section.The most stable and reliable counter we care about is the number of retired instructions, corresponding to
instructions:u
in classicperf
output.On certain datasets, the regressions are up to +16% in the number of instructions executed.
I used
cargo-bisect-rustc
as instructed on https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/Performance.20regressions.20in.201.2E48.2E0 in order to search for the commit that introduced the regression:bisected with cargo-bisect-rustc v0.6.0
Host triple: x86_64-unknown-linux-gnu
Reproduce with:
The bisection points to #70793
I was able to reproduce the regression on two different machines.
I will also try to provide a minimized code example to reproduce the issue but that will take some time as the code being measured is quite big.
If anyone else has noticed this regression and could help me towards tools / ideas / whatever that could be of help to reproduce the issue would greatly appreciated!
EDIT:
MCVE at /~https://github.com/marmeladema/bitvec-perf-regression
The text was updated successfully, but these errors were encountered: