JIT: Unspill normalize-on-load variables using exact small type #90318

jakobbotsch · 2023-08-10T14:18:28Z

The existing logic would sometimes unspill using the type of the local
that is being unspilled. This type is often wider than the exact small
type in the LclVarDsc, since NOL locals are normally expanded as
CAST(<small type>, LCL_VAR<int>).

This causes problems since morph will in some cases avoid inserting
normalization for NOL locals when it has a subrange assertion available.
This optimization relies on the backend always ensuring that the local
will be normalized as part of unspilling and args homing.

Size-wise regressions are expected on xarch since the encoding of the
normalizing load is larger. However, as we have seen before, using wide
loads can cause significant store-forwarding stalls which can have large
negative impact on performance, so overall there is an expected perf
benefit of using the small loads (in addition to the correctness fix).

Fix #90219

The existing logic would sometimes unspill using the type of the local that is being unspilled. This type is often wider than the exact small type in the LclVarDsc, since NOL locals are normally expanded as CAST(<small type>, LCL_VAR<int>). This causes problems since morph will in some cases avoid inserting normalization for NOL locals when it has a subrange assertion available. This optimization relies on the backend always ensuring that the local will be normalized as part of unspilling and args homing. Size-wise regressions are expected on xarch since the encoding of the normalizing load is larger. However, as we have seen before, using wide loads can cause significant store-forwarding stalls which can have large negative impact on performance, so overall there is an expected perf benefit of using the small loads (in addition to the correctness fix). Fix dotnet#90219

ghost · 2023-08-10T14:18:42Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

The existing logic would sometimes unspill using the type of the local
that is being unspilled. This type is often wider than the exact small
type in the LclVarDsc, since NOL locals are normally expanded as
CAST(, LCL_VAR).

This causes problems since morph will in some cases avoid inserting
normalization for NOL locals when it has a subrange assertion available.
This optimization relies on the backend always ensuring that the local
will be normalized as part of unspilling and args homing.

Size-wise regressions are expected on xarch since the encoding of the
normalizing load is larger. However, as we have seen before, using wide
loads can cause significant store-forwarding stalls which can have large
negative impact on performance, so overall there is an expected perf
benefit of using the small loads (in addition to the correctness fix).

Fix #90219

Author:	jakobbotsch
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

jakobbotsch · 2023-08-10T21:07:24Z

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, Fuzzlyn

azure-pipelines · 2023-08-10T21:07:39Z

Azure Pipelines successfully started running 3 pipeline(s).

sultrifork

LGTM

jakobbotsch · 2023-08-11T09:25:27Z

jitstress failures are #90144. I've opened #90376 for the libraries-jitstress failures, but there were some that look like infrastructure, so I will rerun it. I've opened #90377 for the Fuzzlyn failure.

jakobbotsch · 2023-08-11T11:12:03Z

/azp run runtime-coreclr libraries-jitstress

azure-pipelines · 2023-08-11T11:12:16Z

Azure Pipelines successfully started running 1 pipeline(s).

src/coreclr/jit/codegenlinear.cpp

jakobbotsch · 2023-08-11T11:17:18Z

/azp run runtime-coreclr libraries-jitstress, Fuzzlyn

azure-pipelines · 2023-08-11T11:17:32Z

Azure Pipelines successfully started running 2 pipeline(s).

jakobbotsch · 2023-08-11T19:49:59Z

cc @dotnet/jit-contrib PTAL @BruceForstall

Diffs. Not quite as bad as the ones I displayed during the meeting after fixing my bug, but still somewhat bad. But as mentioned this is not expected to be a perf regression, but rather an improvement (but let's see in the perf lab).

We could probably fix the size regression by not rex prefixing movzx unnecessarily, but I don't think it's something we need to do for .NET 8. I will open a separate issue for it.

BruceForstall · 2023-08-11T21:22:57Z

I was surprised to see a few examples in arm64 diffs of removing a register from byref GC reporting, but no codegen diff. The cases seemed benign since we don't need to report register pointers to the stack, but is it expected that this PR changes GC reporting?

Example:

@@ -436,14 +436,12 @@ G_M8490_IG31:        ; bbWeight=0, gcrefRegs=D800000 {x23 x24 x26 x27}, byrefReg
             blr     x2
             ; gcr arg pop 0
             ldr     x2, [fp, #0x18]	// [V22 tmp14]
-            ; byrRegs +[x2]
             ldr     w0, [x2, #0x04]
             cmp     w0, #24
             mov     x3, x2
             bls     G_M8490_IG05
 						;; size=68 bbWeight=0 PerfScore 0.00
 G_M8490_IG32:        ; bbWeight=0, gcrefRegs=D800000 {x23 x24 x26 x27}, byrefRegs=100000 {x20}, byref
-            ; byrRegs -[x2]
             mov     x0, x27
             ; gcrRegs +[x0]
             mov     w1, wzr

actually, this change exists on win-x64 as well.

jakobbotsch · 2023-08-11T21:26:45Z

The cases seemed benign since we don't need to report register pointers to the stack, but is it expected that this PR changes GC reporting?

Hmm no -- it happens because GetStackSlotHomeType uses the type from the LclVarDsc, while we used to use the type from the GenTreeLclVarCommon that we are unspilling because of. I don't think it will be a problem, but I don't think we need to take the risk for .NET 8, so let me special case this for byrefs/GC pointers to use the original behavior.

BruceForstall · 2023-08-11T21:27:54Z

On win-x64, the (non-GC) diffs seem to be converting from mov DWORD to movzx BYTE or WORD. But on linux-arm, it seems like the opposite:

-            ldrb    r10, [sp+0x14]	// [V06 tmp3]
+            ldr     r10, [sp+0x14]	// [V06 tmp3]

Would that be expected?

(the diffs are not all this way; there are some ldr to ldrb or ldrh cases as well)

[Edit] looks like you enabled the new unspill code for TARGET_ARM where that target previously had different behavior.

BruceForstall · 2023-08-11T21:47:13Z

src/coreclr/jit/codegenlinear.cpp

+            // general, the type of 'lcl' does not have any relation to the
+            // type of 'varDsc':
+            //
+            // * For NOL locals it is wider under normal circumstances, where


nit: spell out NOL the first time

Suggested change

// * For NOL locals it is wider under normal circumstances, where

// * For normalize-on-load (NOL) locals it is wider under normal circumstances, where

BruceForstall · 2023-08-11T21:47:29Z

src/coreclr/jit/codegenlinear.cpp

+            //
+            // * For NOL locals it is wider under normal circumstances, where
+            // morph has added a cast on top
+            // * For all locals it can be narrower narrower in some cases, e.g.


Suggested change

// * For all locals it can be narrower narrower in some cases, e.g.

// * For all locals it can be narrower in some cases, e.g.

jakobbotsch · 2023-08-12T12:47:16Z

On win-x64, the (non-GC) diffs seem to be converting from mov DWORD to movzx BYTE or WORD. But on linux-arm, it seems like the opposite:
-            ldrb    r10, [sp+0x14]	// [V06 tmp3]
+            ldr     r10, [sp+0x14]	// [V06 tmp3]
Would that be expected?

(the diffs are not all this way; there are some ldr to ldrb or ldrh cases as well)

[Edit] looks like you enabled the new unspill code for TARGET_ARM where that target previously had different behavior.

It happens when unspilling normalize-on-store locals (the LCL_VAR will be the small type, but the stack slot is wide and normalized). I unified the logic because I do not think ARM32 needs or should be special cased here. But let me do that change for .NET 9 instead, and scope this change down to only normalize-on-load locals for the other architectures in the interest of reducing risk.

jakobbotsch · 2023-08-13T00:02:55Z

/azp run runtime-coreclr superpmi-diffs, runtime-coreclr superpmi-replay

azure-pipelines · 2023-08-13T00:03:12Z

Azure Pipelines successfully started running 2 pipeline(s).

src/coreclr/jit/compiler.hpp

BruceForstall

LGTM

jakobbotsch added 2 commits August 10, 2023 15:15

Add regression test

15be0fe

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 10, 2023

ghost assigned jakobbotsch Aug 10, 2023

This was referenced Aug 10, 2023

CI System.Text.RegularExpressions.Tests failure #84323

Closed

Timeout in System.Net.Quic.Functional.Tests #86019

Closed

System.Net.Http.WinHttpHandlerFunctional.Tests.BidirectionStreamingTest.AfterReadResponseServerError_ClientWrite failing #86262

Closed

Fix logic, update comment

9c59fe8

jakobbotsch force-pushed the fix-90219 branch from b9beba1 to 9c59fe8 Compare August 10, 2023 20:32

build-analysis bot mentioned this pull request Aug 10, 2023

Insufficient memory of docker containers on CI dotnet/dnceng#450

Open

3 tasks

sultrifork approved these changes Aug 11, 2023

View reviewed changes

jakobbotsch mentioned this pull request Aug 11, 2023

Runtime_90219 fails on Mono #90374

Open

Disable test on Mono

d6d0e08

jakobbotsch commented Aug 11, 2023

View reviewed changes

src/coreclr/jit/codegenlinear.cpp Outdated Show resolved Hide resolved

Update src/coreclr/jit/codegenlinear.cpp

b4b6654

jakobbotsch marked this pull request as ready for review August 11, 2023 19:44

jakobbotsch requested a review from BruceForstall August 11, 2023 19:50

BruceForstall reviewed Aug 11, 2023

View reviewed changes

jakobbotsch added 2 commits August 12, 2023 15:55

Address feedback; scope down

ab21c79

Run jit-format

a7c4210

jakobbotsch added this to the 8.0.0 milestone Aug 14, 2023

jakobbotsch commented Aug 14, 2023

View reviewed changes

src/coreclr/jit/compiler.hpp Outdated Show resolved Hide resolved

Update src/coreclr/jit/compiler.hpp

803ac86

BruceForstall approved these changes Aug 14, 2023

View reviewed changes

jakobbotsch merged commit 6d0e3e5 into dotnet:main Aug 14, 2023

jakobbotsch deleted the fix-90219 branch August 14, 2023 17:20

shushanhf added a commit to shushanhf/runtime that referenced this pull request Aug 17, 2023

[LoongArch64] fix compiling errors after dotnet#90318.

53e91a9

shushanhf mentioned this pull request Aug 17, 2023

[LoongArch64] fix the compiling errors after #90318. #90713

Merged

jakobbotsch pushed a commit that referenced this pull request Aug 17, 2023

[LoongArch64] fix compiling errors after #90318. (#90713)

8e3376c

EgorBo mentioned this pull request Aug 22, 2023

Regressions in System.Collections.Sort #90933

Closed

ghost locked as resolved and limited conversation to collaborators Sep 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Unspill normalize-on-load variables using exact small type #90318

JIT: Unspill normalize-on-load variables using exact small type #90318

jakobbotsch commented Aug 10, 2023 •

edited

Loading

ghost commented Aug 10, 2023

jakobbotsch commented Aug 10, 2023

azure-pipelines bot commented Aug 10, 2023

sultrifork left a comment

jakobbotsch commented Aug 11, 2023

jakobbotsch commented Aug 11, 2023

azure-pipelines bot commented Aug 11, 2023

jakobbotsch commented Aug 11, 2023

azure-pipelines bot commented Aug 11, 2023

jakobbotsch commented Aug 11, 2023

BruceForstall commented Aug 11, 2023 •

edited

Loading

jakobbotsch commented Aug 11, 2023

BruceForstall commented Aug 11, 2023 •

edited

Loading

BruceForstall Aug 11, 2023

BruceForstall Aug 11, 2023

jakobbotsch commented Aug 12, 2023 •

edited

Loading

jakobbotsch commented Aug 13, 2023

azure-pipelines bot commented Aug 13, 2023

BruceForstall left a comment

	// * For NOL locals it is wider under normal circumstances, where
	// * For normalize-on-load (NOL) locals it is wider under normal circumstances, where

	// * For all locals it can be narrower narrower in some cases, e.g.
	// * For all locals it can be narrower in some cases, e.g.

JIT: Unspill normalize-on-load variables using exact small type #90318

JIT: Unspill normalize-on-load variables using exact small type #90318

Conversation

jakobbotsch commented Aug 10, 2023 • edited Loading

ghost commented Aug 10, 2023

jakobbotsch commented Aug 10, 2023

azure-pipelines bot commented Aug 10, 2023

sultrifork left a comment

Choose a reason for hiding this comment

jakobbotsch commented Aug 11, 2023

jakobbotsch commented Aug 11, 2023

azure-pipelines bot commented Aug 11, 2023

jakobbotsch commented Aug 11, 2023

azure-pipelines bot commented Aug 11, 2023

jakobbotsch commented Aug 11, 2023

BruceForstall commented Aug 11, 2023 • edited Loading

jakobbotsch commented Aug 11, 2023

BruceForstall commented Aug 11, 2023 • edited Loading

BruceForstall Aug 11, 2023

Choose a reason for hiding this comment

BruceForstall Aug 11, 2023

Choose a reason for hiding this comment

jakobbotsch commented Aug 12, 2023 • edited Loading

jakobbotsch commented Aug 13, 2023

azure-pipelines bot commented Aug 13, 2023

BruceForstall left a comment

Choose a reason for hiding this comment

jakobbotsch commented Aug 10, 2023 •

edited

Loading

BruceForstall commented Aug 11, 2023 •

edited

Loading

BruceForstall commented Aug 11, 2023 •

edited

Loading

jakobbotsch commented Aug 12, 2023 •

edited

Loading