StringBuilder: use Span.Fill in Append repeating char #86287

yesmey · 2023-05-16T00:40:10Z

StringBuilder.Append(char value, int repeatCount) currently use a loop to add each character in sequence.
We can instead use Span<T>.Fill that is better optimized for this.

Note:
Append(char value) was calling this method as a fallback to when it needed to allocate. But with the new changes, Append(char value, int repeatCount) might be inlined, therefor I added a specific AppendWithExpansion(char) with MethodImplOptions.NoInlining.

Benchmark code
Benchmark result:

BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22621.1778/22H2/2022Update/SunValley2)
AMD Ryzen 7 3700X, 1 CPU, 16 logical and 8 physical cores
.NET SDK=8.0.100-preview.6.23276.3
  [Host]     : .NET 8.0.0 (8.0.23.27214), X64 RyuJIT AVX2
  Job-WZCNRH : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-RFXJAL : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

Method	Toolchain	amount	c	Mean	Ratio
AppendRepeated	main	1	a	2.169 ns	1.00
AppendRepeated	PR	1	a	2.623 ns	1.21

AppendRepeated	main	2	a	4.057 ns	1.00
AppendRepeated	PR	2	a	2.577 ns	0.64

AppendRepeated	main	4	a	7.857 ns	1.00
AppendRepeated	PR	4	a	3.035 ns	0.39

AppendRepeated	main	8	a	15.549 ns	1.00
AppendRepeated	PR	8	a	3.694 ns	0.24

AppendRepeated	main	16	a	31.002 ns	1.00
AppendRepeated	PR	16	a	3.979 ns	0.13

ghost · 2023-05-16T01:19:39Z

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

StringBuilder.Append(char value, int repeatCount) currently use a loop to add each character in sequence.
We can instead use Span<T>.Fill that is better optimized for this.

Note:
Append(char value) was calling this method as a fallback to when it needed to allocate. But with the new changes, Append(char value, int repeatCount) might be inlined, therefor I added a specific AppendWithExpansion(char) with MethodImplOptions.NoInlining.

Benchmark code
Benchmark result:

BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22621.1702/22H2/2022Update/SunValley2)
AMD Ryzen 7 3700X, 1 CPU, 16 logical and 8 physical cores
.NET SDK=8.0.100-preview.5.23255.2
  [Host]     : .NET 8.0.0 (8.0.23.25213), X64 RyuJIT AVX2
  Job-FJHCLO : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-DXTBEB : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

Method	Toolchain	amount	c	Mean	Ratio
AppendRepeatedConstAmount	main	?	?	8.393 ns	1.00
AppendRepeatedConstAmount	PR	?	?	3.210 ns	0.38

AppendRepeatedConst	main	1	?	3.488 ns	1.00
AppendRepeatedConst	PR	1	?	2.884 ns	0.83

AppendRepeatedConst	main	2	?	5.465 ns	1.00
AppendRepeatedConst	PR	2	?	2.733 ns	0.50

AppendRepeatedConst	main	4	?	8.640 ns	1.00
AppendRepeatedConst	PR	4	?	3.410 ns	0.39

AppendRepeatedConst	main	8	?	15.649 ns	1.00
AppendRepeatedConst	PR	8	?	3.849 ns	0.25

AppendRepeatedConst	main	16	?	30.945 ns	1.00
AppendRepeatedConst	PR	16	?	4.028 ns	0.13

AppendRepeated	main	1	a	3.491 ns	1.00
AppendRepeated	PR	1	a	4.069 ns	1.17

AppendRepeated	main	2	a	5.345 ns	1.00
AppendRepeated	PR	2	a	3.730 ns	0.70

AppendRepeated	main	4	a	8.662 ns	1.00
AppendRepeated	PR	4	a	4.304 ns	0.50

AppendRepeated	main	8	a	14.983 ns	1.00
AppendRepeated	PR	8	a	4.807 ns	0.32

AppendRepeated	main	16	a	29.593 ns	1.00
AppendRepeated	PR	16	a	5.170 ns	0.17

Author:	yesmey
Assignees:	-
Labels:	`area-System.Runtime`, `community-contribution`
Milestone:	-

src/libraries/System.Private.CoreLib/src/System/Text/StringBuilder.cs

IDisposable · 2023-05-19T22:04:37Z

src/libraries/System.Private.CoreLib/src/System/Text/StringBuilder.cs

+            int firstLength = chunkChars.Length - chunkLength;
+            if (firstLength > 0)
+            {
+                chunkChars.AsSpan(chunkLength, firstLength).Fill(value);


Insert

Debug.Assert(firstLength < repeatCount, "We shouldn't be called if there was enough space for the entire run.");

src/libraries/System.Private.CoreLib/src/System/Text/StringBuilder.cs

adamsitnik

LGTM, thank you @yesmey not just for the contribution but also for providing benchmark numbers and their source code!

Would you be interested in contributing a benchmark for Append(char, int) to /~https://github.com/dotnet/performance/blob/main/src/benchmarks/micro/libraries/System.Text/Perf.StringBuilder.cs? Currently we have no benchmarks for it and our reporting system won't report any improvements once I merge your PR.

src/libraries/System.Private.CoreLib/src/System/Text/StringBuilder.cs

yesmey · 2023-07-12T09:43:30Z

@adamsitnik The regression in #88673 comes from marking the allocating path of Append(char) with MethodImplOptions.NoInlining. That benchmark in particular is allocating fairly often, and it makes sense the slow path is faster when it previously was being inlined more aggressively. This pattern is fairly common across the runtime, but if you prefer I can remove it and see if its better?

adamsitnik · 2023-07-17T08:44:41Z

I can remove it and see if its better?

@yesmey could you please give it a try?

I was also thinking about doing sth like this:

public StringBuilder Append(char value)
{
    int nextCharIndex = m_ChunkLength;
    char[] chars = m_ChunkChars;
    
    if ((uint)chars.Length == (uint)nextCharIndex)
    {
        ExpandByABlock(1);
        nextCharIndex = 0;
        chars = m_ChunkChars;
    }
    
    chars[nextCharIndex] = value;
    m_ChunkLength++;

    return this;
}

but I am not sure if array boundaries check would got removed.

MichalPetryka · 2023-07-17T16:58:45Z

but I am not sure if array boundaries check would got removed.

You can avoid them like this:

Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(chars), (uint)nextCharIndex) = value;

yesmey · 2023-07-23T13:06:38Z

Sorry about the delay. I'm gonna need some help on this one.

Having trouble reproducing this locally. I think its because most of the benchmark is spent allocating and the Error column is sometimes very high. I tried to turn off as much as possible on my machine while running the benchmarks.

but I am not sure if array boundaries check would got removed.

You can avoid them like this:
Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(chars), (uint)nextCharIndex) = value;

I doubt avoiding an array boundaries check would even be noticeable here since the allocation size doubles every time, so the times that code is executed is gonna be rare, and sacrificing bounds safety for that is likely not worth it

My hypothesis is that the slowdown is coming from the benchmark calling this very frequently in a loop, and the jit is able to inline so the read/write of m_ChunkLength better in the previous version.

Here's how I build and run the benchmark locally

.\build.cmd Clr+Clr.Aot+Libs -c Release -rc Release

and

python3 .\scripts\benchmarks_ci.py -f net8.0 --filter System.Text.Tests.Perf_StringBuilder.Append_Char --corerun "D:\dotnet\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\CoreRun.exe" "D:\dotnet\yesmey_runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\CoreRun.exe" --bdn-artifacts C:\results\after_test

\yesmey_runtime...CoreRun.exe built from ff57624
\runtime...CoreRun.exe built from 5a03596 (parent)

BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 11 (10.0.22621.1992)
AMD Ryzen 7 7800X3D, 1 CPU, 16 logical and 8 physical cores
.NET SDK=8.0.100-preview.7.23364.32
  [Host]     : .NET 8.0.0 (8.0.23.36403), X64 RyuJIT AVX2
  Job-DWSUBF : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-YNMWPS : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1

Method	Job	Toolchain	length	Mean	Error	StdDev	Median	Min	Max	Ratio	Gen0	Gen1	Allocated	Alloc Ratio
Append_Char	Job-DWSUBF	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\CoreRun.exe	100	90.69 ns	0.659 ns	0.584 ns	90.50 ns	89.93 ns	91.68 ns	1.00	0.0108	-	544 B	1.00
Append_Char	Job-YNMWPS	\yesmey_runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\CoreRun.exe	100	97.78 ns	0.240 ns	0.201 ns	97.72 ns	97.61 ns	98.18 ns	1.08	0.0106	-	544 B	1.00

Append_Char	Job-DWSUBF	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\CoreRun.exe	100000	51,838.73 ns	170.468 ns	142.348 ns	51,837.67 ns	51,624.53 ns	52,088.30 ns	1.00	4.1254	3.5066	209968 B	1.00
Append_Char	Job-YNMWPS	\yesmey_runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\CoreRun.exe	100000	51,924.36 ns	184.830 ns	163.847 ns	51,939.05 ns	51,721.98 ns	52,278.99 ns	1.00	4.1391	3.5182	209968 B	1.00

Here's the asm output:

Before

; System.Text.Tests.Perf_StringBuilder.Append_Char(Int32)
       push      rdi
       push      rsi
       push      rbp
       push      rbx
       sub       rsp,28
       mov       ebx,edx
       mov       rcx,offset MT_System.Text.StringBuilder
       call      CORINFO_HELP_NEWSFAST
       mov       rsi,rax
       mov       dword ptr [rsi+20],7FFFFFFF
       mov       rcx,offset MT_System.Char[]
       mov       edx,10
       call      CORINFO_HELP_NEWARR_1_VC
       lea       rcx,[rsi+8]
       mov       rdx,rax
       call      CORINFO_HELP_ASSIGN_REF
       xor       edi,edi
       test      ebx,ebx
       jle       short M00_L02
M00_L00:
       mov       ecx,[rsi+18]
       mov       edx,ecx
       mov       rax,[rsi+8]
       mov       r8d,[rax+8]
       cmp       r8d,edx
       jbe       short M00_L03
       mov       edx,edx
       mov       word ptr [rax+rdx*2+10],61
       inc       ecx
       mov       [rsi+18],ecx
M00_L01:
       inc       edi
       cmp       edi,ebx
       jl        short M00_L00
M00_L02:
       mov       rax,rsi
       add       rsp,28
       pop       rbx
       pop       rbp
       pop       rsi
       pop       rdi
       ret
M00_L03:
       mov       ebp,1
       mov       edx,[rsi+1C]
       lea       edx,[rdx+rcx+1]
       cmp       edx,[rsi+20]
       jg        short M00_L07
       test      edx,edx
       jle       short M00_L07
M00_L04:
       mov       rdx,[rsi+8]
       cmp       [rdx+8],ecx
       jle       short M00_L05
       lea       eax,[rcx+1]
       cmp       ecx,[rdx+8]
       jae       short M00_L08
       mov       ecx,ecx
       mov       word ptr [rdx+rcx*2+10],61
       dec       ebp
       jmp       short M00_L06
M00_L05:
       mov       [rsi+18],ecx
       mov       rcx,rsi
       mov       edx,ebp
       call      qword ptr [7FFE1853D518]; System.Text.StringBuilder.ExpandByABlock(Int32)
       xor       ecx,ecx
       mov       eax,ecx
M00_L06:
       test      ebp,ebp
       mov       ecx,eax
       jg        short M00_L04
       mov       [rsi+18],ecx
       jmp       short M00_L01
M00_L07:
       mov       rcx,offset MT_System.ArgumentOutOfRangeException
       call      CORINFO_HELP_NEWSFAST
       mov       rbx,rax
       mov       ecx,18331
       mov       rdx,7FFE180A4000
       call      CORINFO_HELP_STRCNS
       mov       rsi,rax
       call      qword ptr [7FFE18786790]
       mov       r8,rax
       mov       rdx,rsi
       mov       rcx,rbx
       call      qword ptr [7FFE182B6F28]
       mov       rcx,rbx
       call      CORINFO_HELP_THROW
       int       3
M00_L08:
       call      CORINFO_HELP_RNGCHKFAIL
       int       3
; Total bytes of code 280

After

; System.Text.Tests.Perf_StringBuilder.Append_Char(Int32)
       push      rdi
       push      rsi
       push      rbx
       sub       rsp,20
       mov       ebx,edx
       mov       rcx,offset MT_System.Text.StringBuilder
       call      CORINFO_HELP_NEWSFAST
       mov       rsi,rax
       mov       dword ptr [rsi+20],7FFFFFFF
       mov       rcx,offset MT_System.Char[]
       mov       edx,10
       call      CORINFO_HELP_NEWARR_1_VC
       lea       rcx,[rsi+8]
       mov       rdx,rax
       call      CORINFO_HELP_ASSIGN_REF
       xor       edi,edi
       test      ebx,ebx
       jle       short M00_L02
M00_L00:
       mov       ecx,[rsi+18]
       mov       edx,ecx
       mov       rax,[rsi+8]
       mov       r8d,[rax+8]
       cmp       r8d,edx
       jbe       short M00_L03
       mov       edx,edx
       mov       word ptr [rax+rdx*2+10],61
       inc       ecx
       mov       [rsi+18],ecx
M00_L01:
       inc       edi
       cmp       edi,ebx
       jl        short M00_L00
M00_L02:
       mov       rax,rsi
       add       rsp,20
       pop       rbx
       pop       rsi
       pop       rdi
       ret
M00_L03:
       mov       rcx,rsi
       mov       edx,61
       call      qword ptr [7FFE1853C3F0]; System.Text.StringBuilder.AppendWithExpansion(Char)
       jmp       short M00_L01
; Total bytes of code 137

; System.Text.StringBuilder.AppendWithExpansion(Char)
       push      rsi
       push      rbx
       sub       rsp,28
       mov       rbx,rcx
       mov       esi,edx
       mov       rcx,rbx
       mov       edx,1
       call      qword ptr [7FFE1853CB58]; System.Text.StringBuilder.ExpandByABlock(Int32)
       mov       rax,[rbx+8]
       cmp       dword ptr [rax+8],0
       jbe       short M01_L00
       mov       [rax+10],si
       inc       dword ptr [rbx+18]
       add       rsp,28
       pop       rbx
       pop       rsi
       ret
M01_L00:
       call      CORINFO_HELP_RNGCHKFAIL
       int       3
; Total bytes of code 55

StringBuilder: use Span.Fill when adding repeating characters

0945de5

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 16, 2023

ghost added the community-contribution Indicates that the PR has been added by a community member label May 16, 2023

vcsjones added area-System.Runtime and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 16, 2023

This was referenced May 16, 2023

Failed USB connection via port 54050, error 61, in tvOS arm64 Release AllSubsets_Mono #82637

Open

[mono][wasm] System.Numerics.Vectors.Tests assert equal failure #84775

Closed

Update test

6acfd23

build-analysis bot mentioned this pull request May 16, 2023

Several XML tests are sporadically failing in CI due to OOMs #82894

Closed

runfoapp bot mentioned this pull request May 16, 2023

Infra improvements for Helix #68176

Closed

MichalPetryka reviewed May 16, 2023

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Text/StringBuilder.cs Show resolved Hide resolved

IDisposable reviewed May 19, 2023

View reviewed changes

yesmey commented May 20, 2023

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Text/StringBuilder.cs Outdated Show resolved Hide resolved

yesmey added 2 commits May 28, 2023 10:43

Merge branch 'main' into fill-append

4143418

Use Span<T>.Slice check

7ee3ffa

adamsitnik approved these changes Jul 7, 2023

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Text/StringBuilder.cs Outdated Show resolved Hide resolved

adamsitnik self-assigned this Jul 7, 2023

adamsitnik added the tenet-performance Performance related issue label Jul 7, 2023

Apply suggestions from code review

8222f57

adamsitnik merged commit ff57624 into dotnet:main Jul 7, 2023

adamsitnik added this to the 8.0.0 milestone Jul 7, 2023

cincuranet mentioned this pull request Jul 11, 2023

[Perf] Linux/x64: 2 Regressions on 7/7/2023 5:58:29 PM #88673

Closed

kotlarmilos mentioned this pull request Jul 19, 2023

[Perf] Linux/arm64: 4 Improvements on 7/7/2023 8:59:21 PM dotnet/perf-autofiling-issues#19927

Closed

kotlarmilos mentioned this pull request Aug 11, 2023

.NET 8 Per-Preview Performance report on WASM, Mono AOT, and Interpreter #84302

Closed

ghost locked as resolved and limited conversation to collaborators Aug 22, 2023

yesmey deleted the fill-append branch August 24, 2023 20:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StringBuilder: use Span.Fill in Append repeating char #86287

StringBuilder: use Span.Fill in Append repeating char #86287

yesmey commented May 16, 2023 •

edited

Loading

ghost commented May 16, 2023

IDisposable May 19, 2023

adamsitnik left a comment

yesmey commented Jul 12, 2023 •

edited by adamsitnik

Loading

adamsitnik commented Jul 17, 2023

MichalPetryka commented Jul 17, 2023

yesmey commented Jul 23, 2023 •

edited

Loading

StringBuilder: use Span.Fill in Append repeating char #86287

StringBuilder: use Span.Fill in Append repeating char #86287

Conversation

yesmey commented May 16, 2023 • edited Loading

ghost commented May 16, 2023

IDisposable May 19, 2023

Choose a reason for hiding this comment

adamsitnik left a comment

Choose a reason for hiding this comment

yesmey commented Jul 12, 2023 • edited by adamsitnik Loading

adamsitnik commented Jul 17, 2023

MichalPetryka commented Jul 17, 2023

yesmey commented Jul 23, 2023 • edited Loading

Before

After

yesmey commented May 16, 2023 •

edited

Loading

yesmey commented Jul 12, 2023 •

edited by adamsitnik

Loading

yesmey commented Jul 23, 2023 •

edited

Loading