-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StringBuilder: use Span.Fill in Append repeating char #86287
Conversation
Tagging subscribers to this area: @dotnet/area-system-runtime Issue Details
Note: Benchmark code BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22621.1702/22H2/2022Update/SunValley2)
AMD Ryzen 7 3700X, 1 CPU, 16 logical and 8 physical cores
.NET SDK=8.0.100-preview.5.23255.2
[Host] : .NET 8.0.0 (8.0.23.25213), X64 RyuJIT AVX2
Job-FJHCLO : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Job-DXTBEB : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
|
int firstLength = chunkChars.Length - chunkLength; | ||
if (firstLength > 0) | ||
{ | ||
chunkChars.AsSpan(chunkLength, firstLength).Fill(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Insert
Debug.Assert(firstLength < repeatCount, "We shouldn't be called if there was enough space for the entire run.");
src/libraries/System.Private.CoreLib/src/System/Text/StringBuilder.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you @yesmey not just for the contribution but also for providing benchmark numbers and their source code!
Would you be interested in contributing a benchmark for Append(char, int)
to /~https://github.com/dotnet/performance/blob/main/src/benchmarks/micro/libraries/System.Text/Perf.StringBuilder.cs? Currently we have no benchmarks for it and our reporting system won't report any improvements once I merge your PR.
src/libraries/System.Private.CoreLib/src/System/Text/StringBuilder.cs
Outdated
Show resolved
Hide resolved
@adamsitnik The regression in #88673 comes from marking the allocating path of Append(char) with MethodImplOptions.NoInlining. That benchmark in particular is allocating fairly often, and it makes sense the slow path is faster when it previously was being inlined more aggressively. This pattern is fairly common across the runtime, but if you prefer I can remove it and see if its better? |
@yesmey could you please give it a try? I was also thinking about doing sth like this: public StringBuilder Append(char value)
{
int nextCharIndex = m_ChunkLength;
char[] chars = m_ChunkChars;
if ((uint)chars.Length == (uint)nextCharIndex)
{
ExpandByABlock(1);
nextCharIndex = 0;
chars = m_ChunkChars;
}
chars[nextCharIndex] = value;
m_ChunkLength++;
return this;
} but I am not sure if array boundaries check would got removed. |
You can avoid them like this: Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(chars), (uint)nextCharIndex) = value; |
Sorry about the delay. I'm gonna need some help on this one. Having trouble reproducing this locally. I think its because most of the benchmark is spent allocating and the Error column is sometimes very high. I tried to turn off as much as possible on my machine while running the benchmarks.
I doubt avoiding an array boundaries check would even be noticeable here since the allocation size doubles every time, so the times that code is executed is gonna be rare, and sacrificing bounds safety for that is likely not worth it My hypothesis is that the slowdown is coming from the benchmark calling this very frequently in a loop, and the jit is able to inline so the read/write of m_ChunkLength better in the previous version. Here's how I build and run the benchmark locally .\build.cmd Clr+Clr.Aot+Libs -c Release -rc Release and python3 .\scripts\benchmarks_ci.py -f net8.0 --filter System.Text.Tests.Perf_StringBuilder.Append_Char --corerun "D:\dotnet\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\CoreRun.exe" "D:\dotnet\yesmey_runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\CoreRun.exe" --bdn-artifacts C:\results\after_test \yesmey_runtime...CoreRun.exe built from ff57624 BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 11 (10.0.22621.1992)
AMD Ryzen 7 7800X3D, 1 CPU, 16 logical and 8 physical cores
.NET SDK=8.0.100-preview.7.23364.32
[Host] : .NET 8.0.0 (8.0.23.36403), X64 RyuJIT AVX2
Job-DWSUBF : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Job-YNMWPS : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 WarmupCount=1
Here's the asm output: Before; System.Text.Tests.Perf_StringBuilder.Append_Char(Int32)
push rdi
push rsi
push rbp
push rbx
sub rsp,28
mov ebx,edx
mov rcx,offset MT_System.Text.StringBuilder
call CORINFO_HELP_NEWSFAST
mov rsi,rax
mov dword ptr [rsi+20],7FFFFFFF
mov rcx,offset MT_System.Char[]
mov edx,10
call CORINFO_HELP_NEWARR_1_VC
lea rcx,[rsi+8]
mov rdx,rax
call CORINFO_HELP_ASSIGN_REF
xor edi,edi
test ebx,ebx
jle short M00_L02
M00_L00:
mov ecx,[rsi+18]
mov edx,ecx
mov rax,[rsi+8]
mov r8d,[rax+8]
cmp r8d,edx
jbe short M00_L03
mov edx,edx
mov word ptr [rax+rdx*2+10],61
inc ecx
mov [rsi+18],ecx
M00_L01:
inc edi
cmp edi,ebx
jl short M00_L00
M00_L02:
mov rax,rsi
add rsp,28
pop rbx
pop rbp
pop rsi
pop rdi
ret
M00_L03:
mov ebp,1
mov edx,[rsi+1C]
lea edx,[rdx+rcx+1]
cmp edx,[rsi+20]
jg short M00_L07
test edx,edx
jle short M00_L07
M00_L04:
mov rdx,[rsi+8]
cmp [rdx+8],ecx
jle short M00_L05
lea eax,[rcx+1]
cmp ecx,[rdx+8]
jae short M00_L08
mov ecx,ecx
mov word ptr [rdx+rcx*2+10],61
dec ebp
jmp short M00_L06
M00_L05:
mov [rsi+18],ecx
mov rcx,rsi
mov edx,ebp
call qword ptr [7FFE1853D518]; System.Text.StringBuilder.ExpandByABlock(Int32)
xor ecx,ecx
mov eax,ecx
M00_L06:
test ebp,ebp
mov ecx,eax
jg short M00_L04
mov [rsi+18],ecx
jmp short M00_L01
M00_L07:
mov rcx,offset MT_System.ArgumentOutOfRangeException
call CORINFO_HELP_NEWSFAST
mov rbx,rax
mov ecx,18331
mov rdx,7FFE180A4000
call CORINFO_HELP_STRCNS
mov rsi,rax
call qword ptr [7FFE18786790]
mov r8,rax
mov rdx,rsi
mov rcx,rbx
call qword ptr [7FFE182B6F28]
mov rcx,rbx
call CORINFO_HELP_THROW
int 3
M00_L08:
call CORINFO_HELP_RNGCHKFAIL
int 3
; Total bytes of code 280 After; System.Text.Tests.Perf_StringBuilder.Append_Char(Int32)
push rdi
push rsi
push rbx
sub rsp,20
mov ebx,edx
mov rcx,offset MT_System.Text.StringBuilder
call CORINFO_HELP_NEWSFAST
mov rsi,rax
mov dword ptr [rsi+20],7FFFFFFF
mov rcx,offset MT_System.Char[]
mov edx,10
call CORINFO_HELP_NEWARR_1_VC
lea rcx,[rsi+8]
mov rdx,rax
call CORINFO_HELP_ASSIGN_REF
xor edi,edi
test ebx,ebx
jle short M00_L02
M00_L00:
mov ecx,[rsi+18]
mov edx,ecx
mov rax,[rsi+8]
mov r8d,[rax+8]
cmp r8d,edx
jbe short M00_L03
mov edx,edx
mov word ptr [rax+rdx*2+10],61
inc ecx
mov [rsi+18],ecx
M00_L01:
inc edi
cmp edi,ebx
jl short M00_L00
M00_L02:
mov rax,rsi
add rsp,20
pop rbx
pop rsi
pop rdi
ret
M00_L03:
mov rcx,rsi
mov edx,61
call qword ptr [7FFE1853C3F0]; System.Text.StringBuilder.AppendWithExpansion(Char)
jmp short M00_L01
; Total bytes of code 137 ; System.Text.StringBuilder.AppendWithExpansion(Char)
push rsi
push rbx
sub rsp,28
mov rbx,rcx
mov esi,edx
mov rcx,rbx
mov edx,1
call qword ptr [7FFE1853CB58]; System.Text.StringBuilder.ExpandByABlock(Int32)
mov rax,[rbx+8]
cmp dword ptr [rax+8],0
jbe short M01_L00
mov [rax+10],si
inc dword ptr [rbx+18]
add rsp,28
pop rbx
pop rsi
ret
M01_L00:
call CORINFO_HELP_RNGCHKFAIL
int 3
; Total bytes of code 55 |
StringBuilder.Append(char value, int repeatCount)
currently use a loop to add each character in sequence.We can instead use
Span<T>.Fill
that is better optimized for this.Note:
Append(char value)
was calling this method as a fallback to when it needed to allocate. But with the new changes,Append(char value, int repeatCount)
might be inlined, therefor I added a specificAppendWithExpansion(char)
withMethodImplOptions.NoInlining
.Benchmark code
Benchmark result: