-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Unspill normalize-on-load variables using exact small type #90318
Conversation
The existing logic would sometimes unspill using the type of the local that is being unspilled. This type is often wider than the exact small type in the LclVarDsc, since NOL locals are normally expanded as CAST(<small type>, LCL_VAR<int>). This causes problems since morph will in some cases avoid inserting normalization for NOL locals when it has a subrange assertion available. This optimization relies on the backend always ensuring that the local will be normalized as part of unspilling and args homing. Size-wise regressions are expected on xarch since the encoding of the normalizing load is larger. However, as we have seen before, using wide loads can cause significant store-forwarding stalls which can have large negative impact on performance, so overall there is an expected perf benefit of using the small loads (in addition to the correctness fix). Fix dotnet#90219
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsThe existing logic would sometimes unspill using the type of the local This causes problems since morph will in some cases avoid inserting Size-wise regressions are expected on xarch since the encoding of the Fix #90219
|
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, Fuzzlyn |
Azure Pipelines successfully started running 3 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/azp run runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-coreclr libraries-jitstress, Fuzzlyn |
Azure Pipelines successfully started running 2 pipeline(s). |
cc @dotnet/jit-contrib PTAL @BruceForstall Diffs. Not quite as bad as the ones I displayed during the meeting after fixing my bug, but still somewhat bad. But as mentioned this is not expected to be a perf regression, but rather an improvement (but let's see in the perf lab). We could probably fix the size regression by not rex prefixing |
I was surprised to see a few examples in arm64 diffs of removing a register from byref GC reporting, but no codegen diff. The cases seemed benign since we don't need to report register pointers to the stack, but is it expected that this PR changes GC reporting? Example:
actually, this change exists on win-x64 as well. |
Hmm no -- it happens because |
On win-x64, the (non-GC) diffs seem to be converting from
Would that be expected? (the diffs are not all this way; there are some [Edit] looks like you enabled the new unspill code for |
src/coreclr/jit/codegenlinear.cpp
Outdated
// general, the type of 'lcl' does not have any relation to the | ||
// type of 'varDsc': | ||
// | ||
// * For NOL locals it is wider under normal circumstances, where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spell out NOL the first time
// * For NOL locals it is wider under normal circumstances, where | |
// * For normalize-on-load (NOL) locals it is wider under normal circumstances, where |
src/coreclr/jit/codegenlinear.cpp
Outdated
// | ||
// * For NOL locals it is wider under normal circumstances, where | ||
// morph has added a cast on top | ||
// * For all locals it can be narrower narrower in some cases, e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// * For all locals it can be narrower narrower in some cases, e.g. | |
// * For all locals it can be narrower in some cases, e.g. |
It happens when unspilling normalize-on-store locals (the LCL_VAR will be the small type, but the stack slot is wide and normalized). I unified the logic because I do not think ARM32 needs or should be special cased here. But let me do that change for .NET 9 instead, and scope this change down to only normalize-on-load locals for the other architectures in the interest of reducing risk. |
/azp run runtime-coreclr superpmi-diffs, runtime-coreclr superpmi-replay |
Azure Pipelines successfully started running 2 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The existing logic would sometimes unspill using the type of the local
that is being unspilled. This type is often wider than the exact small
type in the
LclVarDsc
, since NOL locals are normally expanded asCAST(<small type>, LCL_VAR<int>)
.This causes problems since morph will in some cases avoid inserting
normalization for NOL locals when it has a subrange assertion available.
This optimization relies on the backend always ensuring that the local
will be normalized as part of unspilling and args homing.
Size-wise regressions are expected on xarch since the encoding of the
normalizing load is larger. However, as we have seen before, using wide
loads can cause significant store-forwarding stalls which can have large
negative impact on performance, so overall there is an expected perf
benefit of using the small loads (in addition to the correctness fix).
Fix #90219