[Experiment] Inline "is on heap?" checks for checked write barriers #111561

EgorBo · 2025-01-18T00:13:21Z

Let's see if it works out

MyLargeStruct Largestruct(MyLargeStruct ms) => ms;

public struct MyLargeStruct
{
    public object Fld1;
    public object Fld2;
    public object Fld3;
    public object Fld4;
    public object Fld5;
    public object Fld6;
    public object Fld7;
    public object Fld8;
}

Current codegen:

; Method Benchmark:Largestruct(MyLargeStruct):MyLargeStruct (FullOpts)
G_M19878_IG01:
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
            mov     x1, x0
G_M19878_IG02:
            movz    x0, #0x5758      ;; code for CORINFO_HELP_BULK_WRITEBARRIER
            movk    x0, #0x6530 LSL #16
            movk    x0, #0x7FFA LSL #32
            ldr     x3, [x0]
            ldrsb   wzr, [x8]
            mov     x0, x8
            mov     x2, #64
            blr     x3 ;; CORINFO_HELP_BULK_WRITEBARRIER
G_M19878_IG03:
            ldp     fp, lr, [sp], #0x10
            ret     lr
; Total bytes of code: 52

New codegen:

; Method Benchmark:Largestruct(MyLargeStruct):MyLargeStruct (FullOpts)
G_M19878_IG01:
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
            mov     x1, x0
G_M19878_IG02:
            movz    x0, #0x5788
            movk    x0, #0xC993 LSL #16
            movk    x0, #0x7FFA LSL #32
            ldp     x2, x0, [x0]
            cmp     x8, x2
            ccmp    x8, x0, z, ge    ;;; is retBuf on GC heap? 
            blt     G_M19878_IG05
G_M19878_IG03:
            ldp     q16, q17, [x1]  ;;; it is on stack - do copy with 4 SIMD stores (stp)
            stp     q16, q17, [x8]
            ldp     q16, q17, [x1, #0x20]
            stp     q16, q17, [x8, #0x20]
G_M19878_IG04:
            ldp     fp, lr, [sp], #0x10
            ret     lr
G_M19878_IG05:  ;; is on heap
            movz    x0, #0x5758 ;; code for CORINFO_HELP_BULK_WRITEBARRIER
            movk    x0, #0x6A63 LSL #16
            movk    x0, #0x7FFA LSL #32
            ldr     x3, [x0]
            ldrsb   wzr, [x8]
            mov     x0, x8
            mov     x2, #64
            blr     x3 ;; CORINFO_HELP_BULK_WRITEBARRIER  ;; we might want a new helper here
            b       G_M19878_IG04
; Total bytes of code: 100

dotnet-policy-service · 2025-01-18T00:14:04Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

EgorBo · 2025-01-18T13:41:06Z

@EgorBot -azure_ampere -azure_cobalt100 -profiler

using System.Runtime.CompilerServices;
using BenchmarkDotNet.Attributes;

public class Benchmark
{
    private MyStruct _heap1;
    private MyLargeStruct _heap2;

    [Benchmark]
    public void ReturnBufferWB_Smallstruct_stack() => Smallstruct(default);

    [Benchmark]
    public void ReturnBufferWB_Smallstruct_heap() => _heap1 = Smallstruct(default);

    [Benchmark]
    public void ReturnBufferWB_Largestruct_stack() => Largestruct(default);

    [Benchmark]
    public void ReturnBufferWB_Largestruct_heap() => _heap2 = Largestruct(default);

    [MethodImpl(MethodImplOptions.NoInlining)]
    static MyStruct Smallstruct(MyStruct ms) => ms;

    [MethodImpl(MethodImplOptions.NoInlining)]
    static MyLargeStruct Largestruct(MyLargeStruct ms) => ms;
}

public struct MyStruct
{
    public object Fld1;
    public object Fld2;
    public object Fld3;
}

public struct MyLargeStruct
{
    public object Fld1;
    public object Fld2;
    public object Fld3;
    public object Fld4;
    public object Fld5;
    public object Fld6;
    public object Fld7;
    public object Fld8;
}

jkotas · 2025-01-18T15:02:12Z

            ldp     x2, x0, [x0]
            cmp     x8, x2
            ccmp    x8, x0, z, ge    ;;; is retBuf on GC heap?

If GC is triggered right after ldp x2, x0, [x0], expands the heap and moves the object into newly expanded part of the heap, we will compare the reference against a stale GC heap limit and things will go wrong there. The entire expansion may need to be marked as no-gc region to avoid this problem.

NinoFloris · 2025-01-18T17:03:10Z

Size diffs won't be nice, but preliminary arm64 perf results look very encouraging?

When I brought up return barrier cost I had assumed keeping returns unmanaged would help but it's more than I expected. (and there's still a heap check in here!)

Flipping the trade-off would help with the code size issue, assuming the copy mostly trades out with the removal of the write barrier call. Though I understand if such a broad change is just too involved or risky. This PR is nice in any case, as I have also seen things like try patterns show up in my profiling.

Thanks for picking this up @EgorBo

EgorBo · 2025-01-18T17:41:42Z

If GC is triggered right after ldp x2, x0, [x0], expands the heap and moves the object into newly expanded part of the heap, we will compare the reference against a stale GC heap limit and things will go wrong there. The entire expansion may need to be marked as no-gc region to avoid this problem.

Yep, I had it in mind, this expansion happens late enough to be able to arrange nogc, but I wonder if it's possible for the GC boundaries to be just constant? Like GC reserves a huge memory region and the boundaries represent reserved, not comitted? thus, we can bake them as constant

When I brought up return barrier cost I had assumed keeping returns unmanaged would help but it's more than I expected.

It's important to keep in mind that struct copies for large blocks aren't cheap even without write barriers, so an extra struct copy might ruin the idea

Overall, I am not planning to continue for now, I was mostly interested in perf numbers. I think we need something realistic to measure impact of it on. Also, @AndyAyersMS mentioned that at some point we might start emitting more (if not all) checked barriers due to IPO escape analsysis, so this might help there as well

jkotas · 2025-01-18T18:00:52Z

I wonder if it's possible for the GC boundaries to be just constant?

GC does that in some modes today. We would want to add information about this to WriteBarrierParameters.

NinoFloris · 2025-01-18T18:10:29Z

It's important to keep in mind that struct copies for large blocks aren't cheap even without write barriers, so an extra struct copy might ruin the idea

Yes larger structs may get hit by this, though I'm not sure how common those are, they're advised against either way. The perf numbers show most gains for the smaller structs, this is aligns with my own investigations.

Just to reiterate, it's frustrating to have managed struct hot paths today, if not *everything* can be inlined perf reduces significantly. It's not what you expect defining something as simple as (unmanaged header, ROSeq buffer). Using a pooled class to avoid these return costs is not always possible, and can easily end up being more expensive due to the actual heap barriers you now incur to update the fields.

EgorBo added 2 commits January 18, 2025 00:41

inline wb

0a8246f

resolve conflicts

fa07550

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 18, 2025

dotnet-policy-service bot assigned EgorBo Jan 18, 2025

EgorBot mentioned this pull request Jan 18, 2025

EgorBot for EgorBo in #111561 EgorBot/runtime-utils#232

Closed

fix conflicts

147eac4

EgorBot mentioned this pull request Jan 18, 2025

EgorBot for EgorBo in #111561 EgorBot/runtime-utils#233

Closed

small opt

69aec85

EgorBot mentioned this pull request Jan 18, 2025

EgorBot for EgorBo in #111561 EgorBot/runtime-utils#234

Closed

optimization

d1b5491

EgorBot mentioned this pull request Jan 18, 2025

EgorBot for EgorBo in #111561 EgorBot/runtime-utils#235

Closed

set weights

9313784

build-analysis bot mentioned this pull request Jan 18, 2025

Error while retrieving client Settings for PipelineArtifact. HttpRequestException: nodename nor servname provided, or not known (vsblobprodcus3.vsblob.visualstudio.com:443) #96798

Open

test

71e08bf

This was referenced Jan 18, 2025

EgorBot for EgorBo in #111561 EgorBot/runtime-utils#236

Closed

EgorBot for EgorBo in #111561 EgorBot/runtime-utils#237

Closed

EgorBot mentioned this pull request Jan 18, 2025

EgorBot for EgorBo in #111561 EgorBot/runtime-utils#238

Open

test

22d9984

EgorBo mentioned this pull request Jan 18, 2025

Return buffer write barrier trade-off #111127

Open

EgorBo closed this Jan 18, 2025

This was referenced Jan 18, 2025

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Experiment] Inline "is on heap?" checks for checked write barriers #111561

[Experiment] Inline "is on heap?" checks for checked write barriers #111561

EgorBo commented Jan 18, 2025 •

edited

Loading

dotnet-policy-service bot commented Jan 18, 2025

EgorBo commented Jan 18, 2025

jkotas commented Jan 18, 2025

NinoFloris commented Jan 18, 2025

EgorBo commented Jan 18, 2025 •

edited

Loading

jkotas commented Jan 18, 2025

NinoFloris commented Jan 18, 2025

[Experiment] Inline "is on heap?" checks for checked write barriers #111561

[Experiment] Inline "is on heap?" checks for checked write barriers #111561

Conversation

EgorBo commented Jan 18, 2025 • edited Loading

dotnet-policy-service bot commented Jan 18, 2025

EgorBo commented Jan 18, 2025

jkotas commented Jan 18, 2025

NinoFloris commented Jan 18, 2025

EgorBo commented Jan 18, 2025 • edited Loading

jkotas commented Jan 18, 2025

NinoFloris commented Jan 18, 2025

EgorBo commented Jan 18, 2025 •

edited

Loading

EgorBo commented Jan 18, 2025 •

edited

Loading