-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#[cold] on match arms #120193
base: master
Are you sure you want to change the base?
#[cold] on match arms #120193
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @davidtwco (or someone else) soon. Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (
|
These commits modify the If this was unintentional then you should revert the changes before this PR is merged. This PR changes MIR cc @oli-obk, @RalfJung, @JakobDegen, @davidtwco, @celinval, @vakaras |
There are merge commits (commits with multiple parents) in your changes. We have a no merge policy so these commits will need to be removed for this pull request to be merged. You can start a rebase with the following commands:
The following commits are merge commits: |
This should be in T-lang. I'm not sure how I can change it. There is discussion: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Allow.20.23.5Bcold.5D.20on.20match.20and.20if.20arms |
This comment has been minimized.
This comment has been minimized.
I have opinions on the impl side, so I wanna review the impl once the lang design is complete |
some thoughts:
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
#[cold] on match arms ### Edit This should be in T-lang. I'm not sure how I can change it. There is discussion: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Allow.20.23.5Bcold.5D.20on.20match.20and.20if.20arms ### Summary Adds the possibility to use `#[cold]` attribute on match arms to hint the optimizer that the arm is unlikely to be taken. ### Motivation These hints are sometimes thought to help branch prediction, but the effect is probably marginal. Modern CPUs don't support hints on conditional branch instructions. They either have the current branch in the BTB (branch prediction buffer), or not, in which case the branch is predicted not to be taken. These hints are, however, helpful in letting the compiler know what is the fast path, so it can be optimized at the expense of the slow path. `grep`-ing the LLVM code for BlockFrequencyInfo and BranchProbabilityInfo shows that these hints are used at many places in the optimizer. Such as: - block placement - improve locality by making the fast path compact and move everything else out of the way - inlining, loop unrolling - these optimizations can be less aggressive on the cold path therefore reducing code size - register allocation - preferably keep in registers the data needed on the fast path ### History RFC 1131 ( rust-lang#26179 ) added `likely` and `unlikely` intrinsics, which get converted to `llvm.expect.i8`. However this LLVM instruction is fragile and may get removed by some optimization passes. The problems with the intrinsics have been reported several times: rust-lang#96276 , rust-lang#96275 , rust-lang#88767 ### Other languages Clang and GCC C++ compilers provide `__builtin_expect`. Since C++20, it is also possible to use `[[likely]]` and `[[unlikely]]` attributes. Use: ``` if (__builtin_expect(condition, false)) { ... this branch is UNlikely ... } if (condition) [[likely]] { ... this branch is likely... } ``` Note that while clang provides `__builtin_expect`, it does not convert it to `llvm.expect.i8`. Instead, it looks at the surrounding code and if there is a condition, emits branch weight metadata for conditional branches. ### Design Implementing `likely`/`unlikely` type of functions properly to emit branch weights would add significant complexity to the compiler. Additionally, these functions are not easy to use with `match` arms. Replacing the functions with attributes is easier to implement and will also work with `match`. A question remains whether these attributes should be named `likely`/`unlikely` as in C++, or if we could reuse the already existing `#[cold]` attribute. `#[cold]` has the same meaning as `unlikely`, i.e., marking the slow path, but it can currently only be used on entire functions. I personally prefer `#[cold]` because it already exists in Rust and is a short word that looks better in code. It has one disadvantage though. This code: ``` if cond #[likely] { ... } ``` becomes: ``` if cond { ... } #[cold] { ... empty cold branch ... } ``` In this PR, I implemented the possibility to add `#[cold]` attribute on match arms. Use is as follows: ``` match x { #[cold] true => { ... } // the true arm is UNlikely _ => { ... } // the false arm is likely } ``` ### Limitations The implementation only works on bool, or integers with single value arm and an otherwise arm. Extending it to other types and to `if` statements should not be too difficult.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
@oli-obk Thank you for comments!
I will look how the rust tests are done. What I could do is create code like this:
Then generate llvm file and it should contain something like
looking at the weights, %bb1 should contain call to hot_function(), %bb2 should contain call to cold_function().
I like the idea of using just an index. But I need to think what would be the complexity of the shuffling and if it is worth it, because the vector only allocates memory if cold branches are used.
If the performance tests find a regression, is there a way I can run them on my own? I would try to identify what is the bottleneck. |
Finished benchmarking commit (4d4892d): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 663.975s -> 662.782s (-0.18%) |
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
There are merge commits (commits with multiple parents) in your changes. We have a no merge policy so these commits will need to be removed for this pull request to be merged. You can start a rebase with the following commands:
The following commits are merge commits: |
On the impl side, you'll need to review mir opt passes that modify switches to update the additional information. At least simplify-cfg and unreachable-propagation, probably additional ones. Alternate idea to encode this in MIR: in source scope data. This avoids changing the syntax and keeping passes in sync. I have not looked in depth to verify if feasible. |
Another possible idea -- no idea if it's good or not -- would be to put the data in the instruction stream in a better way than it currently is It looks like today the bb0: {
_2 = likely(_1) -> [return: bb1, unwind unreachable];
}
bb1: {
switchInt(move _2) -> [0: bb3, otherwise: bb2];
} But it could maybe be a https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/syntax/enum.NonDivergingIntrinsic.html instead, like bb0: {
likely(_1, const 0);
switchInt(_1) -> [0: bb3, otherwise: bb2];
} which might be fine for the backend to track. |
I've been thinking about this and you are right that the change is rather invasive. Plus as @cjgillot points out, we may need to update passes to preserve the information. One of the workarounds I considered initially before starting the implementation was this:
The idea being that if the optimizer notices a call to Unfortunately, it didn't work. But if we wanted to go in this direction, the change may be very localized. I could create an optimizer pass that looks for basic blocks that call cold functions, then emits weights for the conditional branches going to the cold basic blocks. If the pass runs sufficiently late in the compilation, we would not need to change any other passes. |
@scottmcm I will check if this can make |
@x17jiri I agree that emitting the branch metadata is a better approach, but I was thinking that you could do that with something like BB-local state in cg_llvm, or even saying that it only works when the statement(s) about the branch expectations are at the end of the block. Basically I'd hope it would be an easier version of the "is the target block cold?" since it would be in the caller block, and thus optimistically easier than looking at the callee block. |
@scottmcm I'll give that some thought. I looked into why the likely intrinsic is in a separate BB and it is because in MIR, any function call is BB terminator. The |
@oli-obk What is your opinion on this? b24a73e
Edit: The second part of the pass is not correct. I will update it |
@scottmcm In the simple tests I tried, the |
That's right. A few dozen of those calls do get lowered to simpler things in /~https://github.com/rust-lang/rust/blob/master/compiler/rustc_mir_transform/src/lower_intrinsics.rs |
I think it basically does need to be so, because the MIR building would always set up the MIR that way. If for some reason it's not:
|
@scottmcm I'll implement the change and if it seems reasonably reliable, I'll probably open a new PR for that. |
@x17jiri Sounds good! Just for clarity, though: I'm not on compiler, just proposing ideas. Feel free to push back or use another approach if you think something else is better, and if compiler folks like oli say to do something specific, you should probably listen to them instead of me 🙃 |
@scottmcm Sure. I don't yet have too much experience with the code and sometimes I tend to underestimate the size of a change. So I may push back later :-) |
Marking this as blocked on #120370 |
This should be unblocked now that #120370 is merged. With how #120370 evolved, the following code suggested by the current PR: match x {
#[cold] true => { ... }
_ => { ... }
} can now be written as either: match unlikely(x) {
true => { ... }
_ => { ... }
} or match x {
true => { cold_path(); ... }
_ => { ... }
} The question now is whether having attribute brings any additional value or not. If yes, I could reimplement this PR to put |
Edit
This should be in T-lang. I'm not sure how I can change it.
There is discussion: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Allow.20.23.5Bcold.5D.20on.20match.20and.20if.20arms
Summary
Adds the possibility to use
#[cold]
attribute on match arms to hint the optimizer that the arm is unlikely to be taken.Motivation
These hints are sometimes thought to help branch prediction, but the effect is probably marginal. Modern CPUs don't support hints on conditional branch instructions. They either have the current branch in the BTB (branch prediction buffer), or not, in which case the branch is predicted not to be taken.
These hints are, however, helpful in letting the compiler know what is the fast path, so it can be optimized at the expense of the slow path.
grep
-ing the LLVM code for BlockFrequencyInfo and BranchProbabilityInfo shows that these hints are used at many places in the optimizer. Such as:History
RFC 1131 ( #26179 ) added
likely
andunlikely
intrinsics, which get converted tollvm.expect.i8
. However this LLVM instruction is fragile and may get removed by some optimization passes. The problems with the intrinsics have been reported several times: #96276 , #96275 , #88767Other languages
Clang and GCC C++ compilers provide
__builtin_expect
. Since C++20, it is also possible to use[[likely]]
and[[unlikely]]
attributes.Use:
Note that while clang provides
__builtin_expect
, it does not convert it tollvm.expect.i8
. Instead, it looks at the surrounding code and if there is a condition, emits branch weight metadata for conditional branches.Design
Implementing
likely
/unlikely
type of functions properly to emit branch weights would add significant complexity to the compiler. Additionally, these functions are not easy to use withmatch
arms.Replacing the functions with attributes is easier to implement and will also work with
match
.A question remains whether these attributes should be named
likely
/unlikely
as in C++, or if we could reuse the already existing#[cold]
attribute.#[cold]
has the same meaning asunlikely
, i.e., marking the slow path, but it can currently only be used on entire functions.I personally prefer
#[cold]
because it already exists in Rust and is a short word that looks better in code. It has one disadvantage though.This code:
becomes:
In this PR, I implemented the possibility to add
#[cold]
attribute on match arms. Use is as follows:Limitations
The implementation only works on bool, or integers with single value arm and an otherwise arm. Extending it to other types and to
if
statements should not be too difficult.