Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement fake hot/cold splitting and corresponding stress mode #69763

Merged
merged 3 commits into from
May 27, 2022

Conversation

amanasifkhalid
Copy link
Member

@amanasifkhalid amanasifkhalid commented May 24, 2022

The COMPlus_JitFakeProcedureSplitting configuration flag enables testing the JIT's hot/cold splitting functionality independent of the VM. This configuration "fakes" splitting by requesting only a hot section from the VM with the following layout: hot code + 4KB buffer + cold code. Hot/Cold code pointers are manually set to their respective sections of the buffer, and the JIT continues to operate as if the VM allocated separate sections. This implementation does not currently split unwind information, and only reserves/allocates it for the hot section -- this breaks stack walks and, thus, the GC. When using this configuration, suppress the GC with a large memory threshold (ex: set COMPlus_GCgen0size=1000000).

The COMPlus_JitStressProcedureSplitting configuration flag runs a stress mode for hot/cold code splitting by always splitting a method after its first basic block. This mode exposed the following behaviors incompatible with splitting that have been corrected:

  • Long branches between hot and cold sections cannot be optimized to short branches. Such optimization now only occurs for branches within a section.
  • The decision to align a loop can be invalidated after moving part of it to the cold section. Thus, loop alignment is disabled for cold blocks.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 24, 2022
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label May 24, 2022
@ghost
Copy link

ghost commented May 24, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

The COMPlus_JitFakeProcedureSplitting configuration flag enables testing the JIT's hot/cold splitting functionality independent of the VM. This configuration "fakes" splitting by requesting only a hot section from the VM with the following layout: hot code + 4KB buffer + cold code. Hot/Cold code pointers are manually set to their respective sections of the buffer, and the JIT continues to operate as if the VM allocated separate sections. This implementation does not currently split unwind information, and only reserves/allocates it for the hot section -- this breaks stack walks and, thus, the GC. When using this configuration, suppress the GC with a large memory threshold (ex: set COMPlus_GCgen0size=1000000).

The COMPlus_JitStressProcedureSplitting configuration flag runs a stress mode for hot/cold code splitting by always splitting a method after its first basic block. This mode exposed the following behaviors incompatible with splitting that have been corrected:

  • Long branches between hot and cold sections cannot be optimized to short branches. Such optimization now only occurs for branches within a section.
  • The decision to align a loop can be invalidated after moving blocks to a cold section. Thus, loop alignment is disabled for cold blocks.
Author: amanasifkhalid
Assignees: -
Labels:

area-CodeGen-coreclr, community-contribution

Milestone: -

@amanasifkhalid amanasifkhalid marked this pull request as ready for review May 25, 2022 02:13
@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone May 25, 2022
@JulieLeeMSFT
Copy link
Member

cc @dotnet/jit-contrib.

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments

Comment on lines 5161 to 5163
// Loop alignment is disabled for cold blocks
assert((fgFirstBB->bbFlags & BBF_COLD) == 0);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there's no need for this assert, since we're not aligning this loop. In fact, we're specifically choosing to NOT align it, here.

@@ -1122,9 +1122,39 @@ void Compiler::eeDispLineInfos()
* (e.g., host AMD64, target ARM64), then VM will get confused anyway.
*/

void Compiler::eeAllocMem(AllocMemArgs* args, UNATIVE_OFFSET hotSizeRequest, UNATIVE_OFFSET coldSizeRequest)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should have the same signature as allocMem: no need to pass hotSizeRequest or coldSizeRequest since they can be found from args->hotCodeSize and args->coldCodeSize.

{
#ifdef DEBUG
// Fake splitting implementation: hot section = hot code + 4K buffer + cold code
const UNATIVE_OFFSET buffer = 4096;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: buffer is a pretty generic name. How about:

Suggested change
const UNATIVE_OFFSET buffer = 4096;
const UNATIVE_OFFSET fakeSplittingBuffer = 4096;

Comment on lines 1132 to 1136
args->hotCodeSize = hotSizeRequest + buffer + coldSizeRequest;
args->coldCodeSize = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that changing args fields works out because the caller doesn't (I think) consult these values afterwards. To be perfectly clean, you might want to copy args to a local temp before modifying its fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha; for brevity, I simply copied and restored hotCodeSize and coldCodeSize, so args' input members behave like they're read-only.

Comment on lines 1152 to 1157
// Fake splitting currently does not handle unwind info for cold code
if (isColdCode && JitConfig.JitFakeProcedureSplitting())
{
return;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to move this immediately before the call to reserveUnwindInfo, below, and add:

JITDUMP("reserveUnwindInfo for cold code with JitFakeProcedureSplitting enabled: ignoring cold unwind info\n");

which would add a line to the JitDump in this case, but also let the normal reserveUnwindInfo printf be executed first.

Comment on lines 1180 to 1185
// Fake splitting currently does not handle unwind info for cold code
if (pColdCode && JitConfig.JitFakeProcedureSplitting())
{
return;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here as for reserveUnwindInfo: move to just before the allocUnwindInfo call, and add a JITDUMP printout.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the earlier out is only for DEBUG, is that intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the JitFakeProcedureSplitting flag is defined only on Debug/Checked builds. For now, we skip unwind info for cold sections only when fake-splitting.

// For now, this disables unwind info for
// cold sections, breaking stack walks.
// Set COMPlus_GCgen0size=1000000 to avoid
// running the GC and breaking things.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// running the GC and breaking things.
// running the GC which requires stack walking.

// JitFakeProcedureSplitting overrides JitNoProcedureSplitting with a fake splitting implementation
if (JitConfig.JitFakeProcedureSplitting())
{
opts.compProcedureSplitting = true;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is necessary. If someone sets COMPlus_JitFakeProcedureSplitting=1 and COMPlus_JitNoProcedureSplitting=Main, it seems like that should work: do fake splitting on all functions except Main.

It seems like the real issue is:

 opts.compProcedureSplitting = !opts.compDbgCode;

maybe should be:

if (opts.compDbgCode && !JitConfig.JitFakeProcedureSplitting())
{
 opts.compProcedureSplitting = false;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, looks like I misunderstood what precedence JitFakeProcedureSplitting should (not) take over other configurations. I've changed the logic so JitFakeProcedureSplitting can be partially overriden by other configs, but can still override opts.compDbgCode if enabled.

@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label May 26, 2022
src/coreclr/jit/compiler.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emit.cpp Outdated Show resolved Hide resolved
Comment on lines 1180 to 1185
// Fake splitting currently does not handle unwind info for cold code
if (pColdCode && JitConfig.JitFakeProcedureSplitting())
{
return;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the earlier out is only for DEBUG, is that intentional?

@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label May 26, 2022
@amanasifkhalid
Copy link
Member Author

The 20 failing checks all seem to build/run correctly, but fail when sending the results to Helix -- an Azure DevOps API is returning a 401 error, citing a bad System.AccessToken value.

@BruceForstall
Copy link
Member

The 20 failing checks all seem to build/run correctly, but fail when sending the results to Helix -- an Azure DevOps API is returning a 401 error, citing a bad System.AccessToken value.

That was a general infrastructure problem, now fixed. You can request the testing to be rerun. (See the "Re-run" "button" on "runtime" here: /~https://github.com/dotnet/runtime/pull/69763/checks?check_run_id=6607345705 (rerunning the "parent" will cause rerunning all the failed "children"))

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Implementation splits after first basic block in method, assuming
there is more than one block. Accompanying this implementation are
the following fixes:
- Loop alignment is disabled for cold blocks, as moving blocks
into the cold section may invalidate the initial decision to align.
- Long jumps are no longer reduced to short jumps if crossing
hot/cold sections.
@amanasifkhalid
Copy link
Member Author

Thank you! I couldn't find any re-run button in the Checks UI, but force-pushing an empty commit triggered the run.

@amanasifkhalid amanasifkhalid merged commit 70fd5dc into dotnet:main May 27, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jun 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants