Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eliminate ActorPath.ToSerializationFormat UID allocations #6195

Merged
merged 13 commits into from
Nov 29, 2022

Conversation

Aaronontheweb
Copy link
Member

Changes

Used some more Span<char> magic to avoid additional allocations when string-ifying ActorPath components. Large source of allocations during outbound serialization.

Checklist

For significant changes, please ensure that the following have been completed (delete if not relevant):

Latest dev Benchmarks

Using RemotePingPong.

NET Core 3.1

OSVersion: Microsoft Windows NT 6.2.9200.0
ProcessorCount: 16
ClockSpeed: 0 MHZ
Actor Count: 32
Messages sent/received per client: 200000 (2e5)
Is Server GC: True
Thread count: 112

Num clients, Total [msg], Msgs/sec, Total [ms], Start Threads, End Threads
1, 200000, 160129, 1249.60, 112, 134
5, 1000000, 315160, 3173.45, 144, 160
10, 2000000, 330088, 6059.34, 168, 168
15, 3000000, 326727, 9182.70, 176, 159
20, 4000000, 329327, 12146.22, 168, 153
25, 5000000, 331544, 15081.96, 161, 142
30, 6000000, 320582, 18716.53, 150, 137

NET 6.0

OSVersion: Microsoft Windows NT 10.0.19044.0
ProcessorCount: 16
ClockSpeed: 0 MHZ
Actor Count: 32
Messages sent/received per client: 200000 (2e5)
Is Server GC: True
Thread count: 112

Num clients, Total [msg], Msgs/sec, Total [ms], Start Threads, End Threads
1, 200000, 169349, 1181.20, 112, 138
5, 1000000, 338181, 2957.13, 148, 161
10, 2000000, 357016, 5602.10, 169, 169
15, 3000000, 357484, 8392.14, 177, 169
20, 4000000, 282307, 14169.69, 177, 152
25, 5000000, 252922, 19769.79, 161, 139
30, 6000000, 332153, 18064.79, 149, 140

This PR's Benchmarks

NET Core 3.1

OSVersion: Microsoft Windows NT 6.2.9200.0
ProcessorCount: 16
ClockSpeed: 0 MHZ
Actor Count: 32
Messages sent/received per client: 200000 (2e5)
Is Server GC: True
Thread count: 112

Num clients, Total [msg], Msgs/sec, Total [ms], Start Threads, End Threads
1, 200000, 167365, 1195.77, 112, 134
5, 1000000, 314565, 3179.28, 149, 166
10, 2000000, 328031, 6097.24, 174, 174
15, 3000000, 320890, 9349.82, 182, 165
20, 4000000, 322763, 12393.38, 174, 157
25, 5000000, 316196, 15813.78, 166, 141
30, 6000000, 313677, 19128.38, 150, 136

NET 6.0

OSVersion: Microsoft Windows NT 10.0.19044.0
ProcessorCount: 16
ClockSpeed: 0 MHZ
Actor Count: 32
Messages sent/received per client: 200000 (2e5)
Is Server GC: True
Thread count: 112

Num clients, Total [msg], Msgs/sec, Total [ms], Start Threads, End Threads
1, 200000, 187091, 1069.74, 112, 137
5, 1000000, 345424, 2895.43, 146, 166
10, 2000000, 351371, 5692.77, 174, 174
15, 3000000, 345265, 8689.21, 182, 173
20, 4000000, 260095, 15379.95, 182, 147
25, 5000000, 280521, 17824.88, 155, 143
30, 6000000, 282526, 21237.83, 151, 140

Benchmarks (ActorPath in Benchmark.NET)

These are all on .NET 6

dev

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.2130 (21H2)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.201
  [Host]     : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT
  DefaultJob : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT

Method Mean Error StdDev Gen 0 Allocated
ActorPath_Parse 256.868 ns 0.8422 ns 0.7466 ns 0.1106 464 B
ActorPath_Concat 38.275 ns 0.5407 ns 0.4515 ns 0.0268 112 B
ActorPath_Equals 4.248 ns 0.0264 ns 0.0206 ns - -
ActorPath_ToString 48.342 ns 0.1882 ns 0.1571 ns 0.0268 112 B
ActorPath_ToSerializationFormat 60.137 ns 1.2127 ns 1.5337 ns 0.0267 112 B
ActorPath_ToSerializationFormatWithAddress 57.091 ns 0.3018 ns 0.2675 ns 0.0267 112 B

This PR

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.2130 (21H2)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.201
  [Host]     : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT
  DefaultJob : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT

Method Mean Error StdDev Gen 0 Allocated
ActorPath_Parse 271.712 ns 0.9350 ns 0.7808 ns 0.1106 464 B
ActorPath_Concat 37.990 ns 0.1582 ns 0.1402 ns 0.0268 112 B
ActorPath_Equals 5.280 ns 0.1052 ns 0.0878 ns - -
ActorPath_ToString 50.854 ns 1.0870 ns 1.1631 ns 0.0268 112 B
ActorPath_ToSerializationFormat 57.832 ns 1.1664 ns 1.0910 ns 0.0267 112 B
ActorPath_ToSerializationFormatWithAddress 61.906 ns 0.4740 ns 0.4202 ns 0.0267 112 B

The allocations definitely disappeared from my profiler in Rider, but the total benchmark impact is negligible. Probably going to push this change to v1.4.45.

Used some more `Span<char>` magic to avoid additional allocations when string-ifying `ActorPath` components.
i = Math.Abs(i);

// count sig figs
while (i > 0)
Copy link
Member

@to11mtm to11mtm Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would an if or Jump based table based on length potentially be better here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might very well be - I haven't benchmarked these methods themselves, only in aggregate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, looking at .NET, one thing that we -can- do here, is something like:

while (i > 100)
{
   i = i / 100;
   startLen +=2;
}
if (i > 10)
{
   startLen++;
}
return startLen +1; //Could alternatively 'preload' the first character in the negative ternary.

The advantage is that division happens half as often, even if the ASM -may- be a little bit larger... but we can elide elsewhere. :)

@Aaronontheweb Aaronontheweb modified the milestones: 1.4.45, 1.4.46 Oct 19, 2022
@Aaronontheweb Aaronontheweb modified the milestones: 1.4.46, 1.4.47 Nov 15, 2022
src/core/Akka/Util/SpanHacks.cs Outdated Show resolved Hide resolved

while (i > 0)
{
span[startPos + targetLength - index++ - 1] = Numbers[i % 10];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we need startPos here? i.e. would it be better to have callers .Slice() the input span to the correct starting position?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Question: would it be better to do = (i%10)+'0' here? I guess it depends on whether you'd rather throw (slower but potentially safer) or overflow (faster but potentially unsafer.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a dedicated benchmark for this function so we can measure it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we need startPos here? i.e. would it be better to have callers .Slice() the input span to the correct starting position?

I don't think it hurts anything - I just didn't wanted all changes to be made on the same input Span

@Aaronontheweb
Copy link
Member Author

@to11mtm think this is worth picking back up again? It shows up in my profiler for sure but doesn't make much of an impact on the benchmarks.

Copy link
Member

@to11mtm to11mtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some food for thought here.

Also, here is a sharplab exampling some of the ASM improvements possible with minimal lift.


// UID calculation
var uidSizeHint = 0;
if (uid != null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether it makes more sense to do something where we can avoid the branch in AppendUidSpan.

Ideas that come to mind:

  • Pad the allocated span by uint maxvalue+1 (21), that would let us just do the padding call within the if block.
  • Move this alloc stuff into a function that returns the span of proper length, either with UID filled or unfilled (and do the UID calc there)

src/core/Akka/Actor/ActorPath.cs Show resolved Hide resolved
src/core/Akka/Util/SpanHacks.cs Outdated Show resolved Hide resolved
i = Math.Abs(i);

// count sig figs
while (i > 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, looking at .NET, one thing that we -can- do here, is something like:

while (i > 100)
{
   i = i / 100;
   startLen +=2;
}
if (i > 10)
{
   startLen++;
}
return startLen +1; //Could alternatively 'preload' the first character in the negative ternary.

The advantage is that division happens half as often, even if the ASM -may- be a little bit larger... but we can elide elsewhere. :)

@to11mtm
Copy link
Member

to11mtm commented Nov 25, 2022

@to11mtm think this is worth picking back up again? It shows up in my profiler for sure but doesn't make much of an impact on the benchmarks.

I think it's worth picking back up, but it needs to get done well enough to show a benefit. The PR comments I added -should- help to some extent (especially the sharplab examples, they are less branchy/loopy)

The other major hurdle to perf, is the way we are setting in the Span, is resulting in extra bounds checks and throws when we ('may') get out of bounds. I'm not 100% certain of the best way to handle that, but there -should- be a way to do so. It's mostly tricky because we are going backwards on top of potentially getting an span of arbitrary size (That may be either too small, or larger.)

@Aaronontheweb
Copy link
Member Author

@to11mtm ok, I'll get to work on this

@Aaronontheweb
Copy link
Member Author

Benchmarks for the Int64CharCount method, as it is currently written:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.2251 (21H2)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100
  [Host]     : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT
  DefaultJob : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT

Method Formatted Mean Error StdDev Allocated
Int64CharCountBenchmark 1 3.792 ns 0.0248 ns 0.0220 ns -
Int64CharCountBenchmark 1000 3.661 ns 0.0149 ns 0.0132 ns -
Int64CharCountBenchmark 9223372036854775807 25.671 ns 0.2450 ns 0.2292 ns -

@Aaronontheweb
Copy link
Member Author

Aaronontheweb commented Nov 28, 2022

Using some simplified math:

public static int Int64SizeInCharacters(long i)
        {
            // still need 1 char to represent '0'
            if(i == 0) return 1;
	
            // account for negative characters
            int padding = 1;
            if (i < 0)
            {
                i *= -1;
                padding = 2;
            }

            // count sig figs
            return (int)Math.Log10(i) + padding;
        }
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.2251 (21H2)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100
  [Host]     : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT
  DefaultJob : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT

Method Formatted Mean Error StdDev Allocated
Int64CharCountBenchmark 1 15.02 ns 0.198 ns 0.175 ns -
Int64CharCountBenchmark 1000 11.93 ns 0.052 ns 0.046 ns -
Int64CharCountBenchmark 9223372036854775807 11.94 ns 0.066 ns 0.059 ns -

@Aaronontheweb
Copy link
Member Author

Implemented a jump table and I'm pretty happy with the results @to11mtm - the largest UID today is int32.MaxSize and the jump table will cover it:

 /// <summary>
        /// How many characters do we need to represent this int as a string?
        /// </summary>
        /// <param name="i">The int.</param>
        /// <returns>Character length.</returns>
        public static int Int64SizeInCharacters(long i)
        {
	        // account for negative characters
            var padding = 0;
            if (i < 0)
            {
                i *= -1;
                padding = 1;
            }

            switch (i)
            {
                case 0:
                    return 1;
                case < 10:
                    return 2 + padding;
                case < 100:
                    return 3 + padding;
                case < 1000:
                    return 4 + padding;
                case < 10000:
                    return 5 + padding;
                case < 100000:
                    return 6 + padding;
                case < 1_000_000:
                    return 7 + padding;
                case < 10_000_000:
                    return 8 + padding;
                case < 100_000_000:
                    return 9 + padding;
                case < 1_000_000_000:
                    return 10 + padding;
                case < 10_000_000_000:
                    return 11 + padding;
                case 100_000_000_000:
                    return 12 + padding;
                default:
                    return (int)Math.Log10(i) + 1 + padding;
            }
        }

This required upgrading Akka.NET to support C#9, FYI

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.2251 (21H2)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100
  [Host]     : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT
  DefaultJob : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT

Method Formatted Mean Error StdDev Allocated
Int64CharCountBenchmark -1 5.041 ns 0.0280 ns 0.0234 ns -
Int64CharCountBenchmark 0 2.416 ns 0.0189 ns 0.0176 ns -
Int64CharCountBenchmark 1 2.738 ns 0.0268 ns 0.0251 ns -
Int64CharCountBenchmark 1000 2.774 ns 0.0416 ns 0.0390 ns -
Int64CharCountBenchmark 2147483647 3.062 ns 0.0313 ns 0.0292 ns -
Int64CharCountBenchmark 9223372036854775807 11.220 ns 0.0973 ns 0.0910 ns -

@Aaronontheweb
Copy link
Member Author

You could probably make this even more efficient doing a binary-search style operation but it wouldn't save you much IMHO.

@Aaronontheweb
Copy link
Member Author

Pretty sure my last changes broke the UID function (I think it's an N+1 error), so I'll fix that. In the meantime - added some benchies for the formatting function itself:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.2251 (21H2)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100
  [Host]     : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT
  DefaultJob : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT

Method Formatted Mean Error StdDev Allocated
Int64CharCountBenchmark -1 5.058 ns 0.0484 ns 0.0429 ns -
TryFormatBenchmark -1 10.986 ns 0.0121 ns 0.0094 ns -
Int64CharCountBenchmark 0 2.471 ns 0.0495 ns 0.0463 ns -
TryFormatBenchmark 0 6.593 ns 0.0398 ns 0.0332 ns -
Int64CharCountBenchmark 1 2.752 ns 0.0057 ns 0.0048 ns -
TryFormatBenchmark 1 10.641 ns 0.0058 ns 0.0049 ns -
Int64CharCountBenchmark 1000 2.737 ns 0.0107 ns 0.0089 ns -
TryFormatBenchmark 1000 16.273 ns 0.0942 ns 0.0835 ns -
Int64CharCountBenchmark 2147483647 3.089 ns 0.0502 ns 0.0470 ns -
TryFormatBenchmark 2147483647 27.632 ns 0.2332 ns 0.2067 ns -
Int64CharCountBenchmark 9223372036854775807 11.167 ns 0.1172 ns 0.1039 ns -
TryFormatBenchmark 9223372036854775807 57.502 ns 0.1208 ns 0.0943 ns -

@Aaronontheweb
Copy link
Member Author

Implemented some of @to11mtm 's suggestions and sped up the formatting function itself:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.2251 (21H2)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100
  [Host]     : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT
  DefaultJob : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT

Method Formatted Mean Error StdDev Allocated
Int64CharCountBenchmark -1 5.034 ns 0.0101 ns 0.0090 ns -
TryFormatBenchmark -1 10.644 ns 0.0072 ns 0.0060 ns -
Int64CharCountBenchmark 0 2.417 ns 0.0024 ns 0.0021 ns -
TryFormatBenchmark 0 6.725 ns 0.0099 ns 0.0082 ns -
Int64CharCountBenchmark 1 2.752 ns 0.0025 ns 0.0021 ns -
TryFormatBenchmark 1 6.782 ns 0.0194 ns 0.0152 ns -
Int64CharCountBenchmark 1000 2.743 ns 0.0125 ns 0.0105 ns -
TryFormatBenchmark 1000 15.053 ns 0.1593 ns 0.1490 ns -
Int64CharCountBenchmark 2147483647 3.072 ns 0.0383 ns 0.0320 ns -
TryFormatBenchmark 2147483647 25.817 ns 0.4027 ns 0.4135 ns -
Int64CharCountBenchmark 9223372036854775807 11.124 ns 0.1126 ns 0.0998 ns -
TryFormatBenchmark 9223372036854775807 54.919 ns 0.0679 ns 0.0602 ns -

@Aaronontheweb
Copy link
Member Author

ActorPath Benchmarks on .NET 7.0

This PR

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.2251 (21H2)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100
  [Host]     : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT
  DefaultJob : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT

Method Mean Error StdDev Gen 0 Allocated
ActorPath_Parse 224.320 ns 1.3475 ns 1.1946 ns 0.1109 464 B
ActorPath_Concat 39.405 ns 0.6358 ns 0.5636 ns 0.0268 112 B
ActorPath_Equals 4.267 ns 0.0182 ns 0.0161 ns - -
ActorPath_ToString 49.135 ns 0.6010 ns 0.5622 ns 0.0268 112 B
ActorPath_ToSerializationFormat 57.184 ns 0.6982 ns 0.6531 ns 0.0267 112 B
ActorPath_ToSerializationFormatWithAddress 64.348 ns 0.2643 ns 0.2472 ns 0.0267 112 B

v1.4 Branch

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.2251 (21H2)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100
  [Host]     : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT
  DefaultJob : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT

Method Mean Error StdDev Gen 0 Allocated
ActorPath_Parse 227.059 ns 4.3677 ns 4.0856 ns 0.1106 464 B
ActorPath_Concat 39.632 ns 0.6738 ns 0.6617 ns 0.0268 112 B
ActorPath_Equals 4.318 ns 0.0544 ns 0.0425 ns - -
ActorPath_ToString 48.681 ns 1.0389 ns 2.1456 ns 0.0268 112 B
ActorPath_ToSerializationFormat 60.018 ns 1.1744 ns 1.0985 ns 0.0267 112 B
ActorPath_ToSerializationFormatWithAddress 57.864 ns 0.7122 ns 0.6314 ns 0.0267 112 B

Looks like the current implementation is winning out on perf - or it's all in the margin of error

Aaronontheweb added a commit to Aaronontheweb/akka.net that referenced this pull request Nov 29, 2022
While working on akkadotnet#6195 I realized that none of those `ActorPath`s actually have a set UID, thus we're missing that entire facet from both the parsing and serialization benchmarks.
Aaronontheweb added a commit that referenced this pull request Nov 29, 2022
While working on #6195 I realized that none of those `ActorPath`s actually have a set UID, thus we're missing that entire facet from both the parsing and serialization benchmarks.
@Aaronontheweb
Copy link
Member Author

ActorPath Benchmarks

This PR

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.2251 (21H2)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100
  [Host]     : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT
  DefaultJob : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT

Method Uid Mean Error StdDev Gen 0 Allocated
ActorPath_Parse 1 286.090 ns 5.4953 ns 5.8799 ns 0.0992 416 B
ActorPath_Concat 1 39.046 ns 0.8300 ns 0.8151 ns 0.0268 112 B
ActorPath_Equals 1 4.276 ns 0.0173 ns 0.0135 ns - -
ActorPath_ToString 1 56.383 ns 0.7110 ns 0.6650 ns 0.0268 112 B
ActorPath_ToSerializationFormat 1 68.930 ns 0.9070 ns 0.8040 ns 0.0286 120 B
ActorPath_ToSerializationFormatWithAddress 1 75.431 ns 0.8508 ns 0.7958 ns 0.0286 120 B
ActorPath_Parse 100000 281.287 ns 2.4783 ns 2.0695 ns 0.0992 416 B
ActorPath_Concat 100000 38.671 ns 0.1395 ns 0.1089 ns 0.0268 112 B
ActorPath_Equals 100000 4.267 ns 0.0162 ns 0.0126 ns - -
ActorPath_ToString 100000 57.425 ns 0.7558 ns 0.6700 ns 0.0267 112 B
ActorPath_ToSerializationFormat 100000 87.786 ns 0.2636 ns 0.2058 ns 0.0305 128 B
ActorPath_ToSerializationFormatWithAddress 100000 84.566 ns 0.7803 ns 0.7299 ns 0.0305 128 B
ActorPath_Parse 2147483647 295.088 ns 2.3138 ns 1.9321 ns 0.0992 416 B
ActorPath_Concat 2147483647 39.235 ns 0.5909 ns 0.5527 ns 0.0268 112 B
ActorPath_Equals 2147483647 4.290 ns 0.0326 ns 0.0305 ns - -
ActorPath_ToString 2147483647 56.219 ns 0.9611 ns 0.8990 ns 0.0267 112 B
ActorPath_ToSerializationFormat 2147483647 108.536 ns 2.2346 ns 3.2049 ns 0.0324 136 B
ActorPath_ToSerializationFormatWithAddress 2147483647 93.275 ns 1.7111 ns 1.6006 ns 0.0324 136 B

v1.4 Branch

Latest v1.4 Benchmarks

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.2251 (21H2)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100
  [Host]     : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT
  DefaultJob : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT

Method Uid Mean Error StdDev Gen 0 Allocated
ActorPath_Parse 1 268.316 ns 1.5227 ns 1.2715 ns 0.0992 416 B
ActorPath_Concat 1 38.400 ns 0.1241 ns 0.1036 ns 0.0268 112 B
ActorPath_Equals 1 4.377 ns 0.0361 ns 0.0320 ns - -
ActorPath_ToString 1 50.872 ns 0.1519 ns 0.1269 ns 0.0268 112 B
ActorPath_ToSerializationFormat 1 168.876 ns 1.6593 ns 1.4709 ns 0.0610 256 B
ActorPath_ToSerializationFormatWithAddress 1 168.242 ns 2.5289 ns 2.1117 ns 0.0610 256 B
ActorPath_Parse 100000 281.081 ns 1.4482 ns 1.2838 ns 0.0992 416 B
ActorPath_Concat 100000 38.480 ns 0.1676 ns 0.1309 ns 0.0268 112 B
ActorPath_Equals 100000 4.377 ns 0.0479 ns 0.0425 ns - -
ActorPath_ToString 100000 50.507 ns 0.1441 ns 0.1203 ns 0.0268 112 B
ActorPath_ToSerializationFormat 100000 180.087 ns 0.3269 ns 0.2552 ns 0.0629 264 B
ActorPath_ToSerializationFormatWithAddress 100000 185.135 ns 3.4828 ns 3.0874 ns 0.0629 264 B
ActorPath_Parse 2147483647 300.098 ns 3.0413 ns 2.6960 ns 0.0992 416 B
ActorPath_Concat 2147483647 39.325 ns 0.8076 ns 0.7159 ns 0.0268 112 B
ActorPath_Equals 2147483647 4.360 ns 0.0337 ns 0.0315 ns - -
ActorPath_ToString 2147483647 51.842 ns 1.0444 ns 1.0725 ns 0.0268 112 B
ActorPath_ToSerializationFormat 2147483647 193.560 ns 2.6633 ns 2.2240 ns 0.0648 272 B
ActorPath_ToSerializationFormatWithAddress 2147483647 194.647 ns 3.9673 ns 6.1767 ns 0.0648 272 B

@Aaronontheweb
Copy link
Member Author

ActorPath.ToSerializationFormat and ActorPath.ToSerializationFormatWithAddress are 40-45% faster and instead of allocating 272b per call only allocate 128b.

This looks like success to me - going to merge it into v1.4.

@Aaronontheweb Aaronontheweb enabled auto-merge (squash) November 29, 2022 15:44
@Aaronontheweb
Copy link
Member Author

RemotePingPong

Running on .NET 7.0 with Dynamic PGO enabled

This PR

OSVersion:                         Microsoft Windows NT 10.0.19044.0
ProcessorCount:                    16
ClockSpeed:                        0 MHZ
Actor Count:                       32
Messages sent/received per client: 200000  (2e5)
Is Server GC:                      True
Thread count:                      110

Num clients, Total [msg], Msgs/sec, Total [ms], Start Threads, End Threads
         1,  200000,    121213,    1650.68,           110,             130
         5, 1000000,    433276,    2308.27,           138,             144
        10, 2000000,    466745,    4285.30,           152,             153
        15, 3000000,    464829,    6454.08,           161,             155
        20, 4000000,    459243,    8710.20,           163,             158
        25, 5000000,    456496,   10953.30,           166,             156
        30, 6000000,    458226,   13094.08,           164,             151

v1.4 Branch

OSVersion:                         Microsoft Windows NT 10.0.19044.0
ProcessorCount:                    16
ClockSpeed:                        0 MHZ
Actor Count:                       32
Messages sent/received per client: 200000  (2e5)
Is Server GC:                      True
Thread count:                      109

Num clients, Total [msg], Msgs/sec, Total [ms], Start Threads, End Threads
         1,  200000,    108402,    1845.71,           109,             129
         5, 1000000,    459771,    2175.87,           138,             146
        10, 2000000,    458191,    4365.38,           154,             166
        15, 3000000,    453515,    6615.62,           174,             166
        20, 4000000,    455011,    8791.93,           174,             158
        25, 5000000,    447909,   11163.87,           166,             149
        30, 6000000,    446595,   13435.92,           157,             149

Results

Looks like a modest but measurable improvement in this PR - peak numbers are 1-2% higher here than in v1.4.

@Aaronontheweb Aaronontheweb merged commit 37179fe into akkadotnet:v1.4 Nov 29, 2022
@Aaronontheweb Aaronontheweb deleted the inline-uid branch November 29, 2022 20:48
Aaronontheweb added a commit to Aaronontheweb/akka.net that referenced this pull request Nov 30, 2022
While working on akkadotnet#6195 I realized that none of those `ActorPath`s actually have a set UID, thus we're missing that entire facet from both the parsing and serialization benchmarks.
Aaronontheweb added a commit to Aaronontheweb/akka.net that referenced this pull request Nov 30, 2022
…et#6195)

* eliminate `ActorPath.ToSerializationFormat` UID allocations

Used some more `Span<char>` magic to avoid additional allocations when string-ifying `ActorPath` components.

* adding `SpanHacks` benchmarks

* sped up `Int64SizeInCharacters`

* added `TryFormat` benchmarks

* fixed n+1 error in jump table

* cleaned up `TryFormat` inside `SpanHacks`

* fixed `SpanHacks` index calculation

* removed BDN results

* Update SpanHacks.cs
Aaronontheweb added a commit that referenced this pull request Dec 1, 2022
* cleaned up duplicate System.Collections.Immutable package reference (#6264)

also standardized all System.* packages on a common version

* converted build system to .NET 7.0 (#6263)

* converted build system to .NET 7.0

* upgrade to Incrementalist.Cmd v0.8.0

* upgraded MNTR to support .NET 7.0

* fixed build system to target .NET 7.0

* upgrade to latest version of DocFx

* add .NET 6 SDK back to build system

* fixed HyperionConfigTests

* Akka.Streams: `ReuseLatest` stage to repeatedly emit the most recent value until a newer one is pushed (#6262)

* code cleanup in Akka.Streams `Attributes`

* added `RepeatPrevious{T}` stage

* WIP - debugging `RepeatPreviousSpecs`

* fixed tests and added documentation

* fixed documentation

* API approvals

* fixed markdown linting

* removed `SwapPrevious<T>` delegate.

* renamed stage from `RepeatPrevious` to `ReuseLatest`

* remove BDN results

* added real UID to `ActorPathBenchmarks` (#6276)

While working on #6195 I realized that none of those `ActorPath`s actually have a set UID, thus we're missing that entire facet from both the parsing and serialization benchmarks.

* Enable dynamic PGO for RemotePingPong and PingPong (#6277)

* eliminate `ActorPath.ToSerializationFormat` UID allocations (#6195)

* eliminate `ActorPath.ToSerializationFormat` UID allocations

Used some more `Span<char>` magic to avoid additional allocations when string-ifying `ActorPath` components.

* adding `SpanHacks` benchmarks

* sped up `Int64SizeInCharacters`

* added `TryFormat` benchmarks

* fixed n+1 error in jump table

* cleaned up `TryFormat` inside `SpanHacks`

* fixed `SpanHacks` index calculation

* removed BDN results

* Update SpanHacks.cs

* compilation fixes and V1.5 api approval
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants