-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: chained guarded devirtualization #51890
JIT: chained guarded devirtualization #51890
Conversation
When expanding a guarded devirtualization call site, scout ahead to see if there's another site just after this one. If so, expand the second site so that the frequently-taken paths from the first and second tests form a join-free "hot path". Continue scouting and chaining candidates along this path until we run out of candidates, find a big enough stretch without a candidate, or find a candidate with low likelihood of success. Chaining enhances the abilities of the redundant branch optimizer to clean up redundant type tests, so if there are multiple virtual or interface calls in a short span, all dispatching off the same object, one test will suffice to cover all the calls.
SPMI diff summary -- diffs in asp.net only. Despite this being a code-expanding transformation (since we're duplicating S1) it usually reduces method size. Current heuristic requires a likelihood >= 75 to start or continue a chain, and no more than 2 interposing statements. I may look at increasing the interposing count as there are some opportunities out there if we're willing to copy 3 or more statements
|
Love seeing ASP.NET in here 👍🏾 |
Very much appreciating that are including aspnet dlls in the Jit comparisions now 💜 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a nice win!
It's too bad your detailed PR comment, including pictures, can't be part of the code comments. That would probably be helpful later on. Maybe there would be a way to add some textual formulation in the comments?
// Various policies for GuardedDevirtualization | ||
CONFIG_INTEGER(JitGuardedDevirtualizationChainLikelihood, W("JitGuardedDevirtualizationChainLikelihood"), 0x4B) // 75 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears you actually want this in Release builds? For now for forever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. I'd like to be able to experiment with it for a while in actual product builds.
src/coreclr/jit/compiler.h
Outdated
@@ -5416,7 +5416,8 @@ class Compiler | |||
|
|||
void fgRemoveEmptyBlocks(); | |||
|
|||
void fgRemoveStmt(BasicBlock* block, Statement* stmt); | |||
void fgRemoveStmt(BasicBlock* block, Statement* stmt, bool isUnlink = false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isUnlink
should use DEBUGARG
@@ -513,7 +513,8 @@ class IndirectCallTransformer | |||
{ | |||
origCall = GetCall(stmt); | |||
|
|||
JITDUMP("*** %s contemplating [%06u]\n", Name(), compiler->dspTreeID(origCall)); | |||
JITDUMP("\n----------------\n\n*** %s contemplating [%06u] in " FMT_BB " \n", Name(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like indirectcalltransformer.cpp
is a poor name for this file now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it is still "just" transforming indirect calls...
src/coreclr/jit/fgstmt.cpp
Outdated
// fgUnlinkStmt: unlink a statement from a block's statement list | ||
// | ||
// Arguments: | ||
// block - the block into which 'stmt' will be inserted; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments here, and in fgRemoveStmt
, about "be inserted" are wrong; some copy/paste error
Would it work for this (though not suggesting its a common case) unsafe class Program
{
int SumFunc(int[] a, int k)
{
static bool where(int x, int k) => x > k;
int sum = 0;
for (var i = 1; i <= 1000; i++)
{
sum += Filter(i, k, &where);
}
return sum;
}
unsafe int Filter(int i, int k, delegate*<int, int, bool> f)
{
if (f(i, k)) { return 1; }
return 0;
}
} As it looks to resolve Program.SumFunc(Int32[], Int32)
L0000: push ebp
L0001: mov ebp, esp
L0003: push edi
L0004: push esi
L0005: push ebx
L0006: mov esi, [ebp+8]
L0009: xor edi, edi
L000b: mov ebx, 1
L0010: mov ecx, ebx
L0012: mov edx, esi
L0014: mov eax, Program.<SumFunc>g__where|0_0(Int32, Int32) ; where resolved
L0019: call eax
L001b: test eax, eax
L001d: je short L0026
L001f: mov eax, 1
L0024: jmp short L0028
L0026: xor eax, eax
L0028: add edi, eax
L002a: inc ebx
L002b: cmp ebx, 0x3e8
L0031: jle short L0010
L0033: mov eax, edi
L0035: pop ebx
L0036: pop esi
L0037: pop edi
L0038: pop ebp
L0039: ret 4 |
No, not yet. Currently indirect call profiling is done at the class level, so indirect calls that aren't virtual or interface calls don't get profiled or optimized. It would not be hard to profile the remaining indirect call targets, but there are some complications because there are multiple "entry points" for methods. So when we profile such a calls site, we'd either have to pay the price of translating the various entry points to the MethodDesc (not sure how doable/costly this is) or defer and do it later on when we got to externalize or analyze the collected profile data. Assuming we ,managed to translate to a MethodDesc, when went to use the data to make a guarded test we'd have to pick one entry to test for (likely the multi-callable entry). Collectible assemblies also complicate things. Even when the target is obvious the jit is not yet able to optimize indirect calls (see #44610). So it seems like fixing that is a prerequisite. |
Jit-experimental does not run cleanly, but there are only three tests that fail (across several legs) so it should be simple enough to check that there are no unexpected failures. |
// Various policies for GuardedDevirtualization | ||
CONFIG_INTEGER(JitGuardedDevirtualizationChainLikelihood, W("JitGuardedDevirtualizationChainLikelihood"), 0x4B) // 75 | ||
CONFIG_INTEGER(JitGuardedDevirtualizationChainStatements, W("JitGuardedDevirtualizationChainStatements"), 2) | ||
#if defined(DEBUG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean "how many unrelated statements are allowed between two GDV candidates"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes -- it's a (crude) limit on how much other code we're willing to duplicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm asking because I had:
obj.GDV();
Console.WriteLine();
obj.GDV();
worked
obj.GDV();
Console.WriteLine();
Console.WriteLine();
obj.GDV();
did not 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I set the statement limit to 2 based on some scouting in SPMI. But given the overall size decrease it might be ok to pick a larger value.
Rough numbers of opportunities / cost vs stmt limit:
limit | chained segments | dup count |
---|---|---|
0 | 29 | 0 |
1 | 218 | 189 |
2 | 262 | 251 |
3 | 307 | 366 |
4 | 313 | 390 |
6 | 320 | 432 |
8 | 321 | 440 |
Nice! I've cloned your branch and played with it locally, looks cool! |
Upping the statement limit to 4 reduces the size improvement a bit, but it's still an improvement overall. So we'll go with that initially.
|
Not sure what the best solution is here. A big glob of ascii art? https://dot-to-ascii.ggerganov.com/ looked promising, but doesn't seem to like the embedded HTML layouts for the more complex modes (dot "record" nodes don't work out here, unfortunately). For example:
|
If I use multi line labels instead of going for the record format it works out somewhat better...
|
The dot-to-ascii idea is cool, but neither display you show are particularly enlightening, compared to your original. I wonder if we should accept markdown + PNG "sidecar" documentation in the repo, with "links" from the source code (maybe in a doc/ subdir)? |
There's also a design note on guarded devirtualization, so perhaps I should update that and refer to it from the code. |
When expanding a guarded devirtualization call site, scout ahead to see if
there's another site just after this one. If so, expand the second site
so that the frequently-taken paths from the first and second tests form a
join-free "hot path". Continue scouting and chaining candidates along this
path until we run out of candidates, find a big enough stretch without a
candidate, or find a candidate with low likelihood of success.
Chaining enhances the abilities of the redundant branch optimizer to clean
up redundant type tests, so if there are multiple virtual or interface
calls in a short span, all dispatching off the same object, one test will
suffice to cover all the calls.