-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generator optimization: Overlap locals that never have storage live at the same time #60187
Conversation
r? @davidtwco (rust_highfive has picked a reviewer for you, use r? to override) |
@bors try |
[WIP] Generator optimization: Overlap locals that never have storage live at the same time This depends on #59897 (which is currently failing). Only the commits from "Emit StorageDead for all locals" and on are relevant. The reason I'm opening this now is so we can get some perf numbers and discuss the approach.
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
☀️ Try build successful - checks-travis |
@rust-timer build faed5e2 |
Success: Queued faed5e2 with parent 004c549, comparison URL. |
Finished benchmarking try commit faed5e2 |
Benchmarking results do not look good. Next I'm trying an approach that doesn't create more basic blocks in the MIR. Instead it propagates If the benchmarks still do not look good, I'll try a smarter approach of assembling the |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
@bors try |
⌛ Trying commit 8bbe3449442738b1a545953a5bcf2a3ec20cdbd4 with merge 32434c7ac565769649a967e525da3d6dabfaca0a... |
☀️ Try build successful - checks-travis |
@rust-timer build 32434c7ac565769649a967e525da3d6dabfaca0a |
Success: Queued 32434c7ac565769649a967e525da3d6dabfaca0a with parent 4eff852, comparison URL. |
Finished benchmarking try commit 32434c7ac565769649a967e525da3d6dabfaca0a |
@tmandry Only changing |
Finished benchmarking try commit 32434c7ac565769649a967e525da3d6dabfaca0a |
7294f02
to
dcd5da8
Compare
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
☔ The latest upstream changes (presumably #60250) made this pull request unmergeable. Please resolve the merge conflicts. |
📌 Commit aeefbc4 has been approved by |
Generator optimization: Overlap locals that never have storage live at the same time The specific goal of this optimization is to optimize async fns which use `await!`. Notably, `await!` has an enclosing scope around the futures it awaits ([definition](/~https://github.com/rust-lang/rust/blob/08bfe16129b0621bc90184f8704523d4929695ef/src/libstd/macros.rs#L365-L381)), which we rely on to implement the optimization. More generally, the optimization allows overlapping the storage of some locals which are never storage-live at the same time. **We care about storage-liveness when computing the layout, because knowing a field is `StorageDead` is the only way to prove it will not be accessed, either directly or through a reference.** To determine whether we can overlap two locals in the generator layout, we look at whether they might *both* be `StorageLive` at any point in the MIR. We use the `MaybeStorageLive` dataflow analysis for this. We iterate over every location in the MIR, and build a bitset for each local of the locals it might potentially conflict with. Next, we assign every saved local to one or more variants. The variants correspond to suspension points, and we include the set of locals live across a given suspension point in the variant. (Note that we use liveness instead of storage-liveness here; this ensures that the local has actually been initialized in each variant it has been included in. If the local is not live across a suspension point, then it doesn't need to be included in that variant.). It's important to note that the variants are a "view" into our layout. For the layout computation, we use a simplified approach. 1. Start with the set of locals assigned to only one variant. The rest are disqualified. 2. For each pair of locals which may conflict *and are not assigned to the same variant*, we pick one local to disqualify from overlapping. Disqualified locals go into a non-overlapping "prefix" at the beginning of our layout. This means they always have space reserved for them. All the locals that are allowed to overlap in each variant are then laid out after this prefix, in the "overlap zone". So, if A and B were disqualified, and X, Y, and Z were all eligible for overlap, our generator might look something like this: You can think of a generator as an enum, where some fields are shared between variants. e.g. ```rust enum Generator { Unresumed, Poisoned, Returned, Suspend0(A, B, X), Suspend1(B), Suspend2(A, Y, Z), } ``` where every mention of `A` and `B` refer to the same field, which does not move when changing variants. Note that `A` and `B` would automatically be sent to the prefix in this example. Assuming that `X` is never `StorageLive` at the same time as either `Y` or `Z`, it would be allowed to overlap with them. Note that if two locals (`Y` and `Z` in this case) are assigned to the same variant in our generator, their memory would never overlap in the layout. Thus they can both be eligible for the overlapping section, even if they are storage-live at the same time. --- Depends on: - [x] #59897 Multi-variant layouts for generators - [x] #60840 Preserve local scopes in generator MIR - [x] #61373 Emit StorageDead along unwind paths for generators Before merging: - [x] ~Wrap the types of all generator fields in `MaybeUninitialized` in layout::ty::field~ (opened #60889) - [x] Make PR description more complete (e.g. explain why storage liveness is important and why we have to check every location) - [x] Clean up TODO - [x] Fix the layout code to enforce that the same field never moves around in the generator - [x] Add tests for async/await - [x] ~Reduce # bits we store by half, since the conflict relation is symmetric~ (note: decided not to do this, for simplicity) - [x] Store liveness information for each yield point in our `GeneratorLayout`, that way we can emit more useful debuginfo AND tell miri which fields are definitely initialized for a given variant (see discussion at #59897 (comment))
…ddyb Generator optimization: Overlap locals that never have storage live at the same time The specific goal of this optimization is to optimize async fns which use `await!`. Notably, `await!` has an enclosing scope around the futures it awaits ([definition](/~https://github.com/rust-lang/rust/blob/08bfe16129b0621bc90184f8704523d4929695ef/src/libstd/macros.rs#L365-L381)), which we rely on to implement the optimization. More generally, the optimization allows overlapping the storage of some locals which are never storage-live at the same time. **We care about storage-liveness when computing the layout, because knowing a field is `StorageDead` is the only way to prove it will not be accessed, either directly or through a reference.** To determine whether we can overlap two locals in the generator layout, we look at whether they might *both* be `StorageLive` at any point in the MIR. We use the `MaybeStorageLive` dataflow analysis for this. We iterate over every location in the MIR, and build a bitset for each local of the locals it might potentially conflict with. Next, we assign every saved local to one or more variants. The variants correspond to suspension points, and we include the set of locals live across a given suspension point in the variant. (Note that we use liveness instead of storage-liveness here; this ensures that the local has actually been initialized in each variant it has been included in. If the local is not live across a suspension point, then it doesn't need to be included in that variant.). It's important to note that the variants are a "view" into our layout. For the layout computation, we use a simplified approach. 1. Start with the set of locals assigned to only one variant. The rest are disqualified. 2. For each pair of locals which may conflict *and are not assigned to the same variant*, we pick one local to disqualify from overlapping. Disqualified locals go into a non-overlapping "prefix" at the beginning of our layout. This means they always have space reserved for them. All the locals that are allowed to overlap in each variant are then laid out after this prefix, in the "overlap zone". So, if A and B were disqualified, and X, Y, and Z were all eligible for overlap, our generator might look something like this: You can think of a generator as an enum, where some fields are shared between variants. e.g. ```rust enum Generator { Unresumed, Poisoned, Returned, Suspend0(A, B, X), Suspend1(B), Suspend2(A, Y, Z), } ``` where every mention of `A` and `B` refer to the same field, which does not move when changing variants. Note that `A` and `B` would automatically be sent to the prefix in this example. Assuming that `X` is never `StorageLive` at the same time as either `Y` or `Z`, it would be allowed to overlap with them. Note that if two locals (`Y` and `Z` in this case) are assigned to the same variant in our generator, their memory would never overlap in the layout. Thus they can both be eligible for the overlapping section, even if they are storage-live at the same time. --- Depends on: - [x] rust-lang#59897 Multi-variant layouts for generators - [x] rust-lang#60840 Preserve local scopes in generator MIR - [x] rust-lang#61373 Emit StorageDead along unwind paths for generators Before merging: - [x] ~Wrap the types of all generator fields in `MaybeUninitialized` in layout::ty::field~ (opened rust-lang#60889) - [x] Make PR description more complete (e.g. explain why storage liveness is important and why we have to check every location) - [x] Clean up TODO - [x] Fix the layout code to enforce that the same field never moves around in the generator - [x] Add tests for async/await - [x] ~Reduce # bits we store by half, since the conflict relation is symmetric~ (note: decided not to do this, for simplicity) - [x] Store liveness information for each yield point in our `GeneratorLayout`, that way we can emit more useful debuginfo AND tell miri which fields are definitely initialized for a given variant (see discussion at rust-lang#59897 (comment))
☀️ Test successful - checks-travis, status-appveyor |
Rollup of 9 pull requests Successful merges: - #60187 (Generator optimization: Overlap locals that never have storage live at the same time) - #61348 (Implement Clone::clone_from for Option and Result) - #61568 (Use Symbol, Span in libfmt_macros) - #61632 (ci: Collect CPU usage statistics on Azure) - #61654 (use pattern matching for slices destructuring) - #61671 (implement nth_back for Range(Inclusive)) - #61688 (is_fp and is_floating_point do the same thing, remove the former) - #61705 (Pass cflags rather than cxxflags to LLVM as CMAKE_C_FLAGS) - #61734 (Migrate rust-by-example to MdBook2) Failed merges: r? @ghost
rustc: correctly transform memory_index mappings for generators. Fixes #61793, closes #62011 (previous attempt at fixing #61793). During #60187, I made the mistake of suggesting that the (re-)computation of `memory_index` in `ty::layout`, after generator-specific logic split/recombined fields, be done off of the `offsets` of those fields (which needed to be computed anyway), as opposed to the `memory_index`. `memory_index` maps each field to its in-memory order index, which ranges over the same `0..n` values as the fields themselves, making it a bijective mapping, and more specifically a permutation (indeed, it's the permutation resulting from field reordering optimizations). Each field has an unique "memory index", meaning a sort based on them, even an unstable one, will not put them in the wrong order. But offsets don't have that property, because of ZSTs (which do not increase the offset), so sorting based on the offset of fields alone can (and did) result in wrong orders. Instead of going back to sorting based on (slices/subsets of) `memory_index`, or special-casing ZSTs to make sorting based on offsets produce the right results (presumably), as #62011 does, I opted to drop sorting altogether and focus on `O(n)` operations involving *permutations*: * a permutation is easily inverted (see the `invert_mapping` `fn`) * an `inverse_memory_index` was already employed in other parts of the `ty::layout` code (that is, a mapping from memory order to field indices) * inverting twice produces the original permutation, so you can invert, modify, and invert again, if it's easier to modify the inverse mapping than the direct one * you can modify/remove elements in a permutation, as long as the result remains dense (i.e. using every integer in `0..len`, without gaps) * for splitting a `0..n` permutation into disjoint `0..x` and `x..n` ranges, you can pick the elements based on a `i < x` / `i >= x` predicate, and for the latter, also subtract `x` to compact the range to `0..n-x` * in the general case, for taking an arbitrary subset of the permutation, you need a renumbering from that subset to a dense `0..subset.len()` - but notably, this is still `O(n)`! * you can merge permutations, as long as the result remains disjoint (i.e. each element is unique) * for concatenating two `0..n` and `0..m` permutations, you can renumber the elements in the latter to `n..n+m` * some of these operations can be combined, and an inverse mapping (be it a permutation or not) can still be used instead of a forward one by changing the "domain" of the loop performing the operation I wish I had a nicer / more mathematical description of the recombinations involved, but my focus was to fix the bug (in a way which preserves information more directly than sorting would), so I may have missed potential changes in the surrounding generator layout code, that would make this all more straight-forward. r? @tmandry
I tested the generator optimizations in rust-lang#60187 and rust-lang#61922 on the Fuchsia build, and noticed that some small generators (about 8% of the async fns in our build) increased in size slightly. This is because in rust-lang#60187 we split the fields into two groups, a "prefix" non-overlap region and an overlap region, and lay them out separately. This can introduce unnecessary padding bytes between the two groups. In every single case in the Fuchsia build, it was due to there being only a single variant being used in the overlap region. This means that we aren't doing any overlapping, period. So it's better to combine the two regions into one and lay out all the fields at once, which is what this change does.
…ssions, r=cramertj Fix generator size regressions due to optimization I tested the generator optimizations in rust-lang#60187 and rust-lang#61922 on the Fuchsia build, and noticed that some small generators (about 8% of the async fns in our build) increased in size slightly. This is because in rust-lang#60187 we split the fields into two groups, a "prefix" non-overlap region and an overlap region, and lay them out separately. This can introduce unnecessary padding bytes between the two groups. In every single case in the Fuchsia build, it was due to there being only a single variant being used in the overlap region. This means that we aren't doing any overlapping, period. So it's better to combine the two regions into one and lay out all the fields at once, which is what this change does. r? @cramertj cc @eddyb @Zoxc
The specific goal of this optimization is to optimize async fns which use
await!
. Notably,await!
has an enclosing scope around the futures it awaits (definition), which we rely on to implement the optimization.More generally, the optimization allows overlapping the storage of some locals which are never storage-live at the same time. We care about storage-liveness when computing the layout, because knowing a field is
StorageDead
is the only way to prove it will not be accessed, either directly or through a reference.To determine whether we can overlap two locals in the generator layout, we look at whether they might both be
StorageLive
at any point in the MIR. We use theMaybeStorageLive
dataflow analysis for this. We iterate over every location in the MIR, and build a bitset for each local of the locals it might potentially conflict with.Next, we assign every saved local to one or more variants. The variants correspond to suspension points, and we include the set of locals live across a given suspension point in the variant. (Note that we use liveness instead of storage-liveness here; this ensures that the local has actually been initialized in each variant it has been included in. If the local is not live across a suspension point, then it doesn't need to be included in that variant.). It's important to note that the variants are a "view" into our layout.
For the layout computation, we use a simplified approach.
Disqualified locals go into a non-overlapping "prefix" at the beginning of our layout. This means they always have space reserved for them. All the locals that are allowed to overlap in each variant are then laid out after this prefix, in the "overlap zone".
So, if A and B were disqualified, and X, Y, and Z were all eligible for overlap, our generator might look something like this:
You can think of a generator as an enum, where some fields are shared between variants. e.g.
where every mention of
A
andB
refer to the same field, which does not move when changing variants. Note thatA
andB
would automatically be sent to the prefix in this example. Assuming thatX
is neverStorageLive
at the same time as eitherY
orZ
, it would be allowed to overlap with them.Note that if two locals (
Y
andZ
in this case) are assigned to the same variant in our generator, their memory would never overlap in the layout. Thus they can both be eligible for the overlapping section, even if they are storage-live at the same time.Depends on:
Before merging:
Wrap the types of all generator fields in(opened Don't ignore generator fields in miri #60889)MaybeUninitialized
in layout::ty::fieldReduce # bits we store by half, since the conflict relation is symmetric(note: decided not to do this, for simplicity)GeneratorLayout
, that way we can emit more useful debuginfo AND tell miri which fields are definitely initialized for a given variant (see discussion at Multi-variant layouts for generators #59897 (comment))