#[cold] on match arms #120193

x17jiri · 2024-01-21T10:12:52Z

Edit

This should be in T-lang. I'm not sure how I can change it.

There is discussion: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Allow.20.23.5Bcold.5D.20on.20match.20and.20if.20arms

Summary

Adds the possibility to use #[cold] attribute on match arms to hint the optimizer that the arm is unlikely to be taken.

Motivation

These hints are sometimes thought to help branch prediction, but the effect is probably marginal. Modern CPUs don't support hints on conditional branch instructions. They either have the current branch in the BTB (branch prediction buffer), or not, in which case the branch is predicted not to be taken.

These hints are, however, helpful in letting the compiler know what is the fast path, so it can be optimized at the expense of the slow path.

grep-ing the LLVM code for BlockFrequencyInfo and BranchProbabilityInfo shows that these hints are used at many places in the optimizer. Such as:

block placement - improve locality by making the fast path compact and move everything else out of the way
inlining, loop unrolling - these optimizations can be less aggressive on the cold path therefore reducing code size
register allocation - preferably keep in registers the data needed on the fast path

History

RFC 1131 ( #26179 ) added likely and unlikely intrinsics, which get converted to llvm.expect.i8. However this LLVM instruction is fragile and may get removed by some optimization passes. The problems with the intrinsics have been reported several times: #96276 , #96275 , #88767

Other languages

Clang and GCC C++ compilers provide __builtin_expect. Since C++20, it is also possible to use [[likely]] and [[unlikely]] attributes.

Use:

if (__builtin_expect(condition, false)) { ... this branch is UNlikely ... }

if (condition) [[likely]] { ... this branch is likely... }

Note that while clang provides __builtin_expect, it does not convert it to llvm.expect.i8. Instead, it looks at the surrounding code and if there is a condition, emits branch weight metadata for conditional branches.

Design

Implementing likely/unlikely type of functions properly to emit branch weights would add significant complexity to the compiler. Additionally, these functions are not easy to use with match arms.

Replacing the functions with attributes is easier to implement and will also work with match.

A question remains whether these attributes should be named likely/unlikely as in C++, or if we could reuse the already existing #[cold] attribute. #[cold] has the same meaning as unlikely, i.e., marking the slow path, but it can currently only be used on entire functions.

I personally prefer #[cold] because it already exists in Rust and is a short word that looks better in code. It has one disadvantage though.
This code:

if cond #[likely] { ... }

becomes:

if cond { ... } #[cold] { ... empty cold branch ... }

In this PR, I implemented the possibility to add #[cold] attribute on match arms. Use is as follows:

match x {
    #[cold] true => { ... } // the true arm is UNlikely
    _ => { ... } // the false arm is likely
}

Limitations

The implementation only works on bool, or integers with single value arm and an otherwise arm. Extending it to other types and to if statements should not be too difficult.

rustbot · 2024-01-21T10:12:59Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @davidtwco (or someone else) soon.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

@rustbot author: the review is finished, PR author should check the comments and take action accordingly
@rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

rustbot · 2024-01-21T10:13:01Z

These commits modify the Cargo.lock file. Unintentional changes to Cargo.lock can be introduced when switching branches and rebasing PRs.

If this was unintentional then you should revert the changes before this PR is merged.
Otherwise, you can ignore this comment.

This PR changes MIR

cc @oli-obk, @RalfJung, @JakobDegen, @davidtwco, @celinval, @vakaras

rustbot · 2024-01-21T10:13:04Z

There are merge commits (commits with multiple parents) in your changes. We have a no merge policy so these commits will need to be removed for this pull request to be merged.

You can start a rebase with the following commands:

$ # rebase
$ git rebase -i master
$ # delete any merge commits in the editor that appears
$ git push --force-with-lease

The following commits are merge commits:

1024689

x17jiri · 2024-01-21T10:18:35Z

This should be in T-lang. I'm not sure how I can change it.

There is discussion: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Allow.20.23.5Bcold.5D.20on.20match.20and.20if.20arms

oli-obk · 2024-01-21T11:00:21Z

I have opinions on the impl side, so I wanna review the impl once the lang design is complete

traviscross · 2024-01-21T20:14:59Z

@rustbot labels +I-lang-nominated

Let's nominate so we can discuss this. @scottmcm has proposed we might want to charter this as an experiment in such a way that we first make this work again in the compiler and then work on syntax.

oli-obk · 2024-01-22T13:43:12Z

some thoughts:

You also need to implement the logic for codgen_gcc. It is totally fine to just not support it there and emit normal branches (and leave a FIXME comment for the codegen_gcc authors).
Is there some reasonable way we can test something here? How are we testing #[cold] functions beyond checking that they don't get inlined?
The ThinVec could also just be a single index, stating that all arms after that one are cold
- this will require shuffling arms around so that the cold ones are at the back
- may need an additional boolean to decide what to do here
we'll need to run perf to see if this affects performance when not used at all, because it touches a hot type in the MIR
I'd prefer if this change weren't so invasive (bubbling it through a bunch of places, needing to add special logic to backends that duplicates some of the existing branching logic, yes I'm aware these two concerns are contradictory). Not sure how to do it much better, I'll give it some more thought

@bors try @rust-timer queue

#[cold] on match arms ### Edit This should be in T-lang. I'm not sure how I can change it. There is discussion: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Allow.20.23.5Bcold.5D.20on.20match.20and.20if.20arms ### Summary Adds the possibility to use `#[cold]` attribute on match arms to hint the optimizer that the arm is unlikely to be taken. ### Motivation These hints are sometimes thought to help branch prediction, but the effect is probably marginal. Modern CPUs don't support hints on conditional branch instructions. They either have the current branch in the BTB (branch prediction buffer), or not, in which case the branch is predicted not to be taken. These hints are, however, helpful in letting the compiler know what is the fast path, so it can be optimized at the expense of the slow path. `grep`-ing the LLVM code for BlockFrequencyInfo and BranchProbabilityInfo shows that these hints are used at many places in the optimizer. Such as: - block placement - improve locality by making the fast path compact and move everything else out of the way - inlining, loop unrolling - these optimizations can be less aggressive on the cold path therefore reducing code size - register allocation - preferably keep in registers the data needed on the fast path ### History RFC 1131 ( rust-lang#26179 ) added `likely` and `unlikely` intrinsics, which get converted to `llvm.expect.i8`. However this LLVM instruction is fragile and may get removed by some optimization passes. The problems with the intrinsics have been reported several times: rust-lang#96276 , rust-lang#96275 , rust-lang#88767 ### Other languages Clang and GCC C++ compilers provide `__builtin_expect`. Since C++20, it is also possible to use `[[likely]]` and `[[unlikely]]` attributes. Use: ``` if (__builtin_expect(condition, false)) { ... this branch is UNlikely ... } if (condition) [[likely]] { ... this branch is likely... } ``` Note that while clang provides `__builtin_expect`, it does not convert it to `llvm.expect.i8`. Instead, it looks at the surrounding code and if there is a condition, emits branch weight metadata for conditional branches. ### Design Implementing `likely`/`unlikely` type of functions properly to emit branch weights would add significant complexity to the compiler. Additionally, these functions are not easy to use with `match` arms. Replacing the functions with attributes is easier to implement and will also work with `match`. A question remains whether these attributes should be named `likely`/`unlikely` as in C++, or if we could reuse the already existing `#[cold]` attribute. `#[cold]` has the same meaning as `unlikely`, i.e., marking the slow path, but it can currently only be used on entire functions. I personally prefer `#[cold]` because it already exists in Rust and is a short word that looks better in code. It has one disadvantage though. This code: ``` if cond #[likely] { ... } ``` becomes: ``` if cond { ... } #[cold] { ... empty cold branch ... } ``` In this PR, I implemented the possibility to add `#[cold]` attribute on match arms. Use is as follows: ``` match x { #[cold] true => { ... } // the true arm is UNlikely _ => { ... } // the false arm is likely } ``` ### Limitations The implementation only works on bool, or integers with single value arm and an otherwise arm. Extending it to other types and to `if` statements should not be too difficult.

bors · 2024-01-22T13:48:18Z

⌛ Trying commit 71b694c with merge 4d4892d...

bors · 2024-01-22T15:14:37Z

☀️ Try build successful - checks-actions
Build commit: 4d4892d (4d4892d72d55a8d6e12c6c7869c8737c5772ab9a)

x17jiri · 2024-01-22T15:52:23Z

@oli-obk Thank you for comments!

* Is there some reasonable way we can test something here? How are we testing `#[cold]` functions beyond checking that they don't get inlined?

I will look how the rust tests are done.

What I could do is create code like this:

match x {
    true => hot_function(),
    #[cold] false => cold_function(),
}

Then generate llvm file and it should contain something like

br i1 %x, label %bb1, label %bb2, !prof !3
...
!3 = !{!"branch_weights", i32 2000, i32 1}

looking at the weights, %bb1 should contain call to hot_function(), %bb2 should contain call to cold_function().

* The `ThinVec` could also just be a single index, stating that all arms after that one are cold
  
  * this will require shuffling arms around so that the cold ones are at the back
  * may need an additional boolean to decide what to do here

I like the idea of using just an index. But I need to think what would be the complexity of the shuffling and if it is worth it, because the vector only allocates memory if cold branches are used.

* we'll need to run perf to see if this affects performance when not used at all, because it touches a hot type in the MIR

If the performance tests find a regression, is there a way I can run them on my own? I would try to identify what is the bottleneck.

rust-timer · 2024-01-22T17:34:43Z

Finished benchmarking commit (4d4892d): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.6%	[1.6%, 3.6%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.1%	[0.0%, 0.1%]	24
Regressions ❌ (secondary)	0.0%	[0.0%, 0.0%]	6
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.1%	[0.0%, 0.1%]	24

Bootstrap: 663.975s -> 662.782s (-0.18%)
Artifact size: 308.39 MiB -> 308.43 MiB (0.01%)

rustbot · 2024-01-22T22:59:45Z

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

rustbot · 2024-01-22T22:59:47Z

There are merge commits (commits with multiple parents) in your changes. We have a no merge policy so these commits will need to be removed for this pull request to be merged.

You can start a rebase with the following commands:

$ # rebase
$ git rebase -i master
$ # delete any merge commits in the editor that appears
$ git push --force-with-lease

The following commits are merge commits:

1024689

cjgillot · 2024-01-23T00:39:50Z

On the impl side, you'll need to review mir opt passes that modify switches to update the additional information. At least simplify-cfg and unreachable-propagation, probably additional ones.

Alternate idea to encode this in MIR: in source scope data. This avoids changing the syntax and keeping passes in sync. I have not looked in depth to verify if feasible.

scottmcm · 2024-01-23T08:28:01Z

Another possible idea -- no idea if it's good or not -- would be to put the data in the instruction stream in a better way than it currently is

It looks like today the likely is in a whole separate BB, which would plausibly be annoying for the codegen backend https://rust.godbolt.org/z/bzsWhdPdY

    bb0: {
        _2 = likely(_1) -> [return: bb1, unwind unreachable];
    }

    bb1: {
        switchInt(move _2) -> [0: bb3, otherwise: bb2];
    }

But it could maybe be a https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/syntax/enum.NonDivergingIntrinsic.html instead, like

    bb0: {
        likely(_1, const 0);
        switchInt(_1) -> [0: bb3, otherwise: bb2];
    }

which might be fine for the backend to track.

x17jiri · 2024-01-23T11:10:17Z

@oli-obk

* I'd prefer if this change weren't so invasive (bubbling it through a bunch of places, needing to add special logic to backends that duplicates some of the existing branching logic, yes I'm aware these two concerns are contradictory). Not sure how to do it much better, I'll give it some more thought

I've been thinking about this and you are right that the change is rather invasive. Plus as @cjgillot points out, we may need to update passes to preserve the information.

One of the workarounds I considered initially before starting the implementation was this:

#[cold]
fn cold() {}

pub fn test(x: bool) {
	match x {
	true => { hot_function(); }
	false => { cold(); cold_function(); }
	}
}

The idea being that if the optimizer notices a call to cold() function, it may pessimize the false arm.

Unfortunately, it didn't work.

But if we wanted to go in this direction, the change may be very localized. I could create an optimizer pass that looks for basic blocks that call cold functions, then emits weights for the conditional branches going to the cold basic blocks. If the pass runs sufficiently late in the compilation, we would not need to change any other passes.

x17jiri · 2024-01-23T11:19:49Z

@scottmcm I will check if this can make llvm.expect less fragile. I'm a little worried that it may again stop working silently, because it is not what clang uses.

scottmcm · 2024-01-23T11:35:07Z

@x17jiri I agree that emitting the branch metadata is a better approach, but I was thinking that you could do that with something like BB-local state in cg_llvm, or even saying that it only works when the statement(s) about the branch expectations are at the end of the block.

Basically I'd hope it would be an easier version of the "is the target block cold?" since it would be in the caller block, and thus optimistically easier than looking at the callee block.

x17jiri · 2024-01-24T11:36:05Z

@scottmcm I'll give that some thought.

I looked into why the likely intrinsic is in a separate BB and it is because in MIR, any function call is BB terminator. The NonDivergingIntrinsic may make it little easier on the codegen.

x17jiri · 2024-01-24T18:14:45Z

@oli-obk What is your opinion on this? b24a73e

for how it's used, see tests
I had to update bunch of files again, but only to default initialize the is_cold flag. Otherwise the change is very self-contained
the new function in the backend has default implementation, so it is opt-in for backends
for most functions, the new pass stops after the 'init' loop, i.e., it only traverses the list of basic blocks once
if it actually gets to the second part of the algorithm, it usually stops after a few iterations
I noticed that rust has its own inlining pass. Is it guaranteed that cold functions will not be inlined and that this will not change in the future? It would complicate things if the function got inlined before the pass is executed. But it would still be solvable with new intrinsic

Edit: The second part of the pass is not correct. I will update it

x17jiri · 2024-01-24T18:20:03Z

@scottmcm In the simple tests I tried, the likely is always right before the branch. But does it have to be so? If there were any instructions between the two, can I assume they don't change the probability?

scottmcm · 2024-01-24T19:14:14Z

and it is because in MIR, any function call is BB terminator

That's right. A few dozen of those calls do get lowered to simpler things in /~https://github.com/rust-lang/rust/blob/master/compiler/rustc_mir_transform/src/lower_intrinsics.rs

scottmcm · 2024-01-24T19:24:39Z

But does it have to be so?

I think it basically does need to be so, because the MIR building would always set up the MIR that way.

If for some reason it's not:

Commonly that would be because a MIR optimization collapsed the branch so it doesn't actually exist any more, and it's good that it's ignored since it's not that branch anyway.
Even if it's not that, this is always just a hint anyway so it's never a critical bug, just a QoI "this could be better" that we could then look into how much of an issue it would be.

x17jiri · 2024-01-25T11:40:30Z

@scottmcm I'll implement the change and if it seems reasonably reliable, I'll probably open a new PR for that.

scottmcm · 2024-01-25T11:44:35Z

@x17jiri Sounds good!

Just for clarity, though: I'm not on compiler, just proposing ideas. Feel free to push back or use another approach if you think something else is better, and if compiler folks like oli say to do something specific, you should probably listen to them instead of me 🙃

x17jiri · 2024-01-25T12:40:12Z

@scottmcm Sure. I don't yet have too much experience with the code and sometimes I tend to underestimate the size of a change. So I may push back later :-)

compiler/rustc_mir_build/src/build/matches/mod.rs

x17jiri · 2024-02-03T09:29:43Z

@oli-obk I added the comments. Note that I also opened a second PR to fix likely and unlikely ( #120370 ). If it is accepted, I would rebase and reuse the backend change.

Dylan-DPC · 2024-02-21T13:49:15Z

Marking this as blocked on #120370

x17jiri · 2024-11-19T05:06:50Z

This should be unblocked now that #120370 is merged.

With how #120370 evolved, the following code suggested by the current PR:

match x {
    #[cold] true => { ... }
    _ => { ... }
}

can now be written as either:

match unlikely(x) {
    true => { ... }
    _ => { ... }
}

or

match x {
    true => { cold_path(); ... }
    _ => { ... }
}

The question now is whether having attribute brings any additional value or not. If yes, I could reimplement this PR to put cold_path() as the first thing in arms marked with #[cold]. If not, we can probably close this.

x17jiri added 4 commits January 19, 2024 21:44

Initial implementation of #[cold] attribute on match arms

368ca13

Cargo.lock: forgotten file with dependencies

04e1f91

rebase

1024689

formatting changes to pass 'test tidy'

71b694c

rustbot assigned davidtwco Jan 21, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 21, 2024

rustbot added has-merge-commits PR has merge commits, merge with caution. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 21, 2024

This comment has been minimized.

Sign in to view

oli-obk self-assigned this Jan 21, 2024

oli-obk added T-lang Relevant to the language team, which will review and decide on the PR/issue. and removed T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 21, 2024

rustbot added the I-lang-nominated Nominated for discussion during a lang team meeting. label Jan 21, 2024

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 22, 2024

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 22, 2024

x17jiri added 2 commits January 22, 2024 23:51

removed redundant functions in the codegen

b8ffc96

formating change to pass 'test tidy'

3aef65d

initial tests

9d7d846

oli-obk requested changes Jan 29, 2024

View reviewed changes

compiler/rustc_mir_build/src/build/matches/mod.rs Outdated Show resolved Hide resolved

compiler/rustc_mir_build/src/build/matches/mod.rs Outdated Show resolved Hide resolved

compiler/rustc_mir_build/src/build/matches/mod.rs Outdated Show resolved Hide resolved

rustbot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 29, 2024

added comments

e8b9ad5

typo

7472434

cjgillot mentioned this pull request Feb 18, 2024

Likely unlikely fix #120370

Merged

Dylan-DPC added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 21, 2024

tgross35 mentioned this pull request Dec 13, 2024

Reexport likely/unlikely in std::hint #133695

Merged

x17jiri mentioned this pull request Dec 21, 2024

Export cold_path() in std::hint rust-lang/libs-team#510

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#[cold] on match arms #120193

#[cold] on match arms #120193

x17jiri commented Jan 21, 2024 •

edited

Loading

rustbot commented Jan 21, 2024

rustbot commented Jan 21, 2024

rustbot commented Jan 21, 2024

x17jiri commented Jan 21, 2024

This comment has been minimized.

oli-obk commented Jan 21, 2024

traviscross commented Jan 21, 2024

oli-obk commented Jan 22, 2024

This comment has been minimized.

bors commented Jan 22, 2024

bors commented Jan 22, 2024

This comment has been minimized.

x17jiri commented Jan 22, 2024

rust-timer commented Jan 22, 2024

rustbot commented Jan 22, 2024

rustbot commented Jan 22, 2024

cjgillot commented Jan 23, 2024

scottmcm commented Jan 23, 2024

x17jiri commented Jan 23, 2024

x17jiri commented Jan 23, 2024

scottmcm commented Jan 23, 2024

x17jiri commented Jan 24, 2024 •

edited

Loading

x17jiri commented Jan 24, 2024 •

edited

Loading

x17jiri commented Jan 24, 2024

scottmcm commented Jan 24, 2024

scottmcm commented Jan 24, 2024

x17jiri commented Jan 25, 2024

scottmcm commented Jan 25, 2024

x17jiri commented Jan 25, 2024

x17jiri commented Feb 3, 2024

Dylan-DPC commented Feb 21, 2024

x17jiri commented Nov 19, 2024

#[cold] on match arms #120193

Are you sure you want to change the base?

#[cold] on match arms #120193

Conversation

x17jiri commented Jan 21, 2024 • edited Loading

Edit

Summary

Motivation

History

Other languages

Design

Limitations

rustbot commented Jan 21, 2024

rustbot commented Jan 21, 2024

rustbot commented Jan 21, 2024

x17jiri commented Jan 21, 2024

This comment has been minimized.

oli-obk commented Jan 21, 2024

traviscross commented Jan 21, 2024

oli-obk commented Jan 22, 2024

This comment has been minimized.

bors commented Jan 22, 2024

bors commented Jan 22, 2024

This comment has been minimized.

x17jiri commented Jan 22, 2024

rust-timer commented Jan 22, 2024

Overall result: no relevant changes - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Binary size

rustbot commented Jan 22, 2024

rustbot commented Jan 22, 2024

cjgillot commented Jan 23, 2024

scottmcm commented Jan 23, 2024

x17jiri commented Jan 23, 2024

x17jiri commented Jan 23, 2024

scottmcm commented Jan 23, 2024

x17jiri commented Jan 24, 2024 • edited Loading

x17jiri commented Jan 24, 2024 • edited Loading

x17jiri commented Jan 24, 2024

scottmcm commented Jan 24, 2024

scottmcm commented Jan 24, 2024

x17jiri commented Jan 25, 2024

scottmcm commented Jan 25, 2024

x17jiri commented Jan 25, 2024

x17jiri commented Feb 3, 2024

Dylan-DPC commented Feb 21, 2024

x17jiri commented Nov 19, 2024

x17jiri commented Jan 21, 2024 •

edited

Loading

x17jiri commented Jan 24, 2024 •

edited

Loading

x17jiri commented Jan 24, 2024 •

edited

Loading