Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[perf] cache type info for ParamEnv #123058

Merged
merged 2 commits into from
Apr 7, 2024
Merged

[perf] cache type info for ParamEnv #123058

merged 2 commits into from
Apr 7, 2024

Conversation

lukas-code
Copy link
Member

@lukas-code lukas-code commented Mar 25, 2024

This is an attempt to mitigate some of the perf regressions in #122553 (comment), but seems worth to test and land separately, since it is mostly unrelated to that PR.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 25, 2024
Comment on lines 191 to 214
let data_ptr = ptr::addr_of!(self.skel.data).cast::<T>();
// SAFETY: `data_ptr` has the same provenance as `self` and can therefore
// access the `self.len` elements stored at `self.data`.
// Note that we specifically don't reborrow `&self.skel.data`, because that
// would give us a pointer with provenance over 0 bytes.
unsafe { slice::from_raw_parts(data_ptr, self.skel.len) }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was UB before.

@lqd
Copy link
Member

lqd commented Mar 25, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 25, 2024
@bors
Copy link
Contributor

bors commented Mar 25, 2024

⌛ Trying commit 4a99cdc with merge be2a772...

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 25, 2024
[perf] cache type info for ParamEnv

This is an attempt to mitigate some of the perf regressions in rust-lang#122553 (comment), but seems worth to test and land separately.

r? `@ghost`
@bors
Copy link
Contributor

bors commented Mar 26, 2024

💥 Test timed out

@lukas-code
Copy link
Member Author

uuh idk what bors is on about, the artifact looks like it got built? Let's try this:

☀️ Try build successful - checks-actions
Build commit: be2a772 (be2a772ea7aa3f1114b45c8fb9dd2a2466a5c152)

@rust-timer

This comment has been minimized.

@lukas-code
Copy link
Member Author

well that worked i guess?

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (be2a772): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.4% [0.3%, 0.6%] 4
Regressions ❌
(secondary)
0.5% [0.5%, 0.5%] 3
Improvements ✅
(primary)
-1.7% [-5.0%, -0.2%] 20
Improvements ✅
(secondary)
-0.3% [-0.4%, -0.2%] 4
All ❌✅ (primary) -1.3% [-5.0%, 0.6%] 24

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.2% [0.6%, 1.5%] 4
Regressions ❌
(secondary)
2.7% [2.4%, 2.9%] 2
Improvements ✅
(primary)
-0.6% [-0.9%, -0.4%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.4% [-0.9%, 1.5%] 7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.0% [1.0%, 1.0%] 1
Regressions ❌
(secondary)
5.6% [2.2%, 9.0%] 2
Improvements ✅
(primary)
-1.5% [-2.9%, -0.4%] 12
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.4% [-2.9%, 1.0%] 13

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 672.58s -> 671.55s (-0.15%)
Artifact size: 315.08 MiB -> 315.13 MiB (0.01%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Mar 26, 2024
@lukas-code lukas-code marked this pull request as ready for review March 26, 2024 13:16
@lukas-code
Copy link
Member Author

The regression in token-stream-stress looks like it's probably noise:

image

The regressions in bitmaps might be legit, but to me it seems like the improvements outweigh the regressions here.

r? @lcnr or compiler

@bors
Copy link
Contributor

bors commented Mar 26, 2024

☔ The latest upstream changes (presumably #123098) made this pull request unmergeable. Please resolve the merge conflicts.

@bors
Copy link
Contributor

bors commented Apr 2, 2024

☔ The latest upstream changes (presumably #123327) made this pull request unmergeable. Please resolve the merge conflicts.

#[repr(C)]
pub struct ListWithCachedTypeInfo<T> {
skel: ListWithCachedTypeInfoSkeleton<T>,
opaque: OpaqueListContents,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is that moved out of ListSkeleton?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because accessing a field with an extern type tail will always panic, because we don't know the align of the extern type and therefore the padding before the field. So basically if we just put List as the tail of ListWithCachedTypeInfo, then any attempt to access the list will panic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that does not seem right to me 🤔 it should be totally fine to access an unsized field which contains an extern type as long as you never try to do a place-to-value conversion. I don't think we should ever copy ListSkeleton to the stack. Which use of ListSkeleton causes issues if you keep the current setup of having OpaqueListContents as a field of it?

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=a47ebf1e7a6ce3908b3ab3fcdd6ecc8d

Copy link
Member Author

@lukas-code lukas-code Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm apparently there is a special case if the field with an extern type tail is the only field. Which i guess makes sense since there can't be any padding in that case. If you change the outer type to have some extra data, your example will panic: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=92bafecde22b9fd72a3ddb9d9018bc68

I don't think we should ever copy ListSkeleton to the stack

I agree. If my code does that it's not intended.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some comments to the *Skeleton types to hopefully clear this up a bit.

Relevant diff: /~https://github.com/rust-lang/rust/compare/c4f1f7aa586451154c669711abc9c4d9ad9b1075..fcc477fbd0871d8fab045bd7e9524172ab10e51c

Copy link
Contributor

@lcnr lcnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me after explaining the changes in ty/list.rs to me 🤔

@lukas-code lukas-code force-pushed the clauses branch 2 times, most recently from c4f1f7a to 2cdccba Compare April 4, 2024 16:18
Copy link
Contributor

@lcnr lcnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 the panic with extern types is annoying, but I understand the issue now.

Looking at this again, we now duplicate the unsafe code for List which is somewhat intricate and I am a bit worried that it ends up getting out of sync. Could we instead have a single type:

struct RawList<H, T> {
    header: H,
    length: usize,
    data: [T; 0],
    opaque: OpaqueListConten,
}

This should avoid both the "extern type alignment" issue and aloows us to remove the code duplication.

Sorry for not thinking of that during the first review round 😅

@lcnr
Copy link
Contributor

lcnr commented Apr 6, 2024

@bors r+ rollup=never

@bors
Copy link
Contributor

bors commented Apr 6, 2024

📌 Commit 94d61d8 has been approved by lcnr

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 6, 2024
@bors
Copy link
Contributor

bors commented Apr 7, 2024

⌛ Testing commit 94d61d8 with merge 087ae97...

@bors
Copy link
Contributor

bors commented Apr 7, 2024

☀️ Test successful - checks-actions
Approved by: lcnr
Pushing 087ae97 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 7, 2024
@bors bors merged commit 087ae97 into rust-lang:master Apr 7, 2024
12 checks passed
@rustbot rustbot added this to the 1.79.0 milestone Apr 7, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (087ae97): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.9% [-4.9%, -0.2%] 47
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.9% [-4.9%, -0.2%] 47

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.0% [1.0%, 1.0%] 1
Improvements ✅
(primary)
-1.3% [-1.3%, -1.3%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.3% [-1.3%, -1.3%] 1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.2% [-3.4%, -0.8%] 9
Improvements ✅
(secondary)
-2.8% [-2.9%, -2.7%] 2
All ❌✅ (primary) -2.2% [-3.4%, -0.8%] 9

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 669.732s -> 668.846s (-0.13%)
Artifact size: 318.22 MiB -> 318.48 MiB (0.08%)

@rustbot rustbot removed the perf-regression Performance regression. label Apr 7, 2024
@lukas-code lukas-code deleted the clauses branch April 7, 2024 08:57
@nnethercote
Copy link
Contributor

I stumbled across this today while looking at List. I see RawList has been added without much explanation in the comments. It seems to allow an arbitrary header to be included before the list length. How/why is this a speed win in some cases? It seems very strange to augment a generic container type with an arbitrary value of a different type.

@lukas-code
Copy link
Member Author

There are some functions in the trait solver that use the TypeVisitableExt methods on a ParamEnv to find out some property about the where clauses that are assumed to be true during trait solving.

For example here we disable caching here if the where clauses contain inference variables (which can happen due to canonicalization).

Or here we find out whether inference has happened on the where clauses by resolving the inference variables. This has a shortcut if there are no inference variables.

if this.infcx.resolve_vars_if_possible(goal) != goal {


Before this PR, the ParamEnv was represented roughly like this:

               ------------------------------------------------------------
Clause         | type flags | outer binder | stable hash | predicate kind |
               ------------------------------------------------------------
               ^
               |
               +------------+
                            |
               -------------------------------------
List<Clause>   | length | clause 1 | clause 2 | ... 
               -------------------------------------
                    ^
                    |
                    |
               --------------------------
ParamEnv       | pointer   | reveal bit |
               --------------------------

So to find out the type flags or outer binder of the whole ParamEnv we'd have to iterate over all of the clauses. For some crates, in particular diesel, the ParamEnvs can get rather large (order of magnitude 100..1000) and iterating over them several times for every trait that we have to check adds up.

The main thing this PR does is add ListWithCachedTypeInfo, which stores the type flags and outer binder directly in front of the List, and use that new type in ParamEnv:

               ------------------------------------------------------------
Clause         | type flags | outer binder | stable hash | predicate kind |
               ------------------------------------------------------------
               ^
               |
               +--------------------------------------------------------+
                                                                        |
                                 -----------------------------------------------------------------
ListWithCachedTypeInfo<Clause>   | type flags | outer binder | length | clause 1 | clause 2 | ... 
                                 -----------------------------------------------------------------
                                  ^
                                  |
                    +-------------+
                    |
               --------------------------
ParamEnv       | pointer   | reveal bit |
               --------------------------

This way, we can compute e.g. param_env.has_infer() in O(1) rather than O(n).

RawList only exists to share code between List<T> (= RawList<(), T>) and ListWithCachedTypeInfo<T> (= RawList<TypeInfo, T>).

@nnethercote
Copy link
Contributor

Thanks for the explanation. In this bit:

                                 -----------------------------------------------------------------
ListWithCachedTypeInfo<Clause>   | type flags | outer binder | length | clause 1 | clause 2 | ... 
                                 -----------------------------------------------------------------

are the type flags/outer binder computed as a combination of the same-named fields from all the clauses?

@lukas-code
Copy link
Member Author

are the type flags/outer binder computed as a combination of the same-named fields from all the clauses?

Exactly. The flags of the list is the bit-or of all the flags from the clauses and the outer binder of the list is the maximum of the outer binders of the clauses.

They are computed when the ListWithCachedTypeInfo<Clauses> is constructed here:

let flags = super::flags::FlagComputation::for_clauses(clauses);
InternedInSet(ListWithCachedTypeInfo::from_arena(
&*self.arena,
flags.into(),
clauses,
))

pub fn for_clauses(clauses: &[ty::Clause<'_>]) -> FlagComputation {
let mut result = FlagComputation::new();
for c in clauses {
result.add_flags(c.as_predicate().flags());
result.add_exclusive_binder(c.as_predicate().outer_exclusive_binder());
}
result
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants