-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform instsimplify before inline to eliminate some trivial calls #128265
Conversation
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
This comment has been minimized.
This comment has been minimized.
Hmm… @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
…=<try> Perform instsimplify before inline to eliminate some trivial calls I am currently working on rust-lang#128081. In the current pipeline, we can get the following clone statements ([godbolt](https://rust.godbolt.org/z/931316fhP)): ``` bb0: { StorageLive(_2); _2 = ((*_1).0: i32); StorageLive(_3); _3 = ((*_1).1: u64); _0 = Foo { a: move _2, b: move _3 }; StorageDead(_3); StorageDead(_2); return; } ``` Analyzing such statements will be simple and fast. We don't need to consider branches or some interfering statements. However, this requires us to run `InstSimplify`, `ReferencePropagation`, and `SimplifyCFG` at least once. I can introduce a new pass, but I think the best place for it would be within `InstSimplify`. I put `InstSimplify` before `Inline`, which takes some of the burden away from `Inline`. r? `@saethlin`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (3689ca8): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary 0.3%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -3.5%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.1%, secondary -0.6%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 770.829s -> 772.303s (0.19%) |
_4 = _14; | ||
_3 = _4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to update the pass name in tests/mir-opt/dataflow-const-prop/slice_len.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I also changed all name from InstSimplify-before-inline
to InstSimplify-after-simplifycfg
, which can reduce some changes.
8686a55
to
95b83d7
Compare
Details
I'm not sure what happened here. I added |
Execution time difference for
It looks like this has triggered more optimization analysis by LLVM. I'm looking into it. I'm not sure what happened with |
Simplify the canonical clone method and the copy-like forms to copy Fixes rust-lang#128081. Currently being blocked by rust-lang#128265. `@rustbot` label +S-blocked r? `@saethlin`
Perf reports like the above (no changes to LLVM itself, but build time changes primariliy in opt builds) are usually due to MIR inlining changes. This change probably reduces the size of some functions, which makes them inlined, which is sometimes good and sometimes bad. |
Yeah, that's the usual experience with these tests. They should probably have a blessing mechanism rather than manually updating the except list. |
This makes sense to me. There have been many changes to the binary size. I tried different |
While it would be educational to know exactly what's up with the perf changes, whatever you find doesn't change whether the existing changes here are an improvement. I suspect it might motivate additional improvement, like what happened with #123174. I've wanted a tool that can generate a summary of the MIR inlining changes between two builds, because that would really explain exactly what's up here. I've had little luck attempting to slap something together by doing textual analysis of If you want to squash the 3 commits together, you can do that. I'm happy either way. @bors delegate=DianQk |
I can notice locally that the LLVM backend (instruction selection) time has increased, so I'm thinking either the inline has added code or triggered some kind of slow matching pattern. |
95b83d7
to
ac1c81b
Compare
@bors r=saethlin |
It may make sense for the LLVM-like |
☔ The latest upstream changes (presumably #125443) made this pull request unmergeable. Please resolve the merge conflicts. |
ac1c81b
to
ae681c9
Compare
@bors r=saethlin |
Considering the increasing time in LLVM, it might be useful to summarize the changes in IR: # before
$ opt -passes=instcount --stats llir-a526d7ce45fd2284e0e7c7556ccba2425b9d25e5-ripgrep-13.0.0-Opt-Full --disable-output
64915 instcount - Number of Call insts
53656 instcount - Number of basic blocks
5650 instcount - Number of non-external functions
332469 instcount - Number of instructions (of all types)
# after
$ opt -passes=instcount --stats llir-3689ca8f298a3bf6116ce7aaacb123f88df258df-ripgrep-13.0.0-Opt-Full --disable-output
64614 instcount - Number of Call insts
53643 instcount - Number of basic blocks
5648 instcount - Number of non-external functions
331044 instcount - Number of instructions (of all types) I can see that the IR has become smaller (note: this might not be the IR I want to examine). |
☀️ Test successful - checks-actions |
Finished benchmarking commit (4db3d12): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary 0.5%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary 1.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.1%, secondary -0.6%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 768.493s -> 770.467s (0.26%) |
Simplify the canonical clone method and the copy-like forms to copy Fixes rust-lang#128081. Currently being blocked by rust-lang#128265. `@rustbot` label +S-blocked r? `@saethlin`
Visiting for weekly rustc-perf triage
@rustbot label: +perf-regression-triaged |
I am currently working on #128081. In the current pipeline, we can get the following clone statements (godbolt):
Analyzing such statements will be simple and fast. We don't need to consider branches or some interfering statements. However, this requires us to run
InstSimplify
,ReferencePropagation
, andSimplifyCFG
at least once. I can introduce a new pass, but I think the best place for it would be withinInstSimplify
.I put
InstSimplify
beforeInline
, which takes some of the burden away fromInline
.r? @saethlin