Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more iterators to JIT #52515

Merged
merged 20 commits into from
Jun 7, 2021
Merged

Conversation

BruceForstall
Copy link
Member

@BruceForstall BruceForstall commented May 9, 2021

Add more iterators compatible with range-based for syntax for various data structures. These iterators all assume (and some check) that the underlying data structures determining the iteration are not changed during the iteration. For example, don't use these to iterator over the block list if you are also modifying the block list. Similarly, don't use these to iterate over the predecessor edges if you are changing the order or contents of the predecessor edge list.

  • BasicBlock: iterate over all blocks in the function, a subset starting not at the first block, or a specified range of blocks. Removed uses of the foreach_block macro. E.g.:
for (BasicBlock* const block : Blocks()) // all blocks in function
for (BasicBlock* const block : BasicBlockSimpleList(fgFirstBB->bbNext)) // all blocks starting at fgFirstBB->bbNext
for (BasicBlock* const testBlock : BasicBlockRangeList(firstNonLoopBlock, lastNonLoopBlock)) // all blocks in range (inclusive)
  • block predecessors: iterate over all predecessor edges, or all predecessor blocks, e.g.:
for (flowList* const edge : block->PredEdges())
for (BasicBlock* const predBlock : block->PredBlocks())
  • block successors: iterate over all block successors using the NumSucc()/GetSucc(), or NumSucc(Compiler*)/GetSucc(Compiler*) pairs, e.g.:
for (BasicBlock* const succ : Succs())
for (BasicBlock* const succ : Succs(compiler))

Note that there already exists the "AllSuccessorsIter" which iterates over block successors including possible EH successors, e.g.:

for (BasicBlock* succ : block->GetAllSuccs(m_pCompiler))
  • switch targets, (namely, the successors of BBJ_SWITCH blocks), e.g.:
for (BasicBlock* const bTarget : block->SwitchTargets())
  • loops blocks: iterate over all the blocks in a loop, e.g.:
for (BasicBlock* const blk : optLoopTable[loopInd].LoopBlocks())
  • Statements: added an iterator shortcut for the non-phi statements, e.g.:
for (Statement* const stmt : block->NonPhiStatements())

Note that there already exists an iterator over all statements, e.g.:

for (Statement* const stmt : block->Statements())
  • EH clauses, e.g.:
for (EHblkDsc* const HBtab : EHClauses(this))
  • GenTree in linear order (but not LIR, which already has an iterator), namely, using the gtNext links, e.g.:
for (GenTree* const call : stmt->TreeList())

This is a no-diff change.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 9, 2021
@dotnet dotnet deleted a comment from azure-pipelines bot May 9, 2021
@dotnet dotnet deleted a comment from azure-pipelines bot May 9, 2021
@BruceForstall
Copy link
Member Author

/azp run runtime-coreclr jitstress

@dotnet dotnet deleted a comment from azure-pipelines bot May 12, 2021
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@BruceForstall BruceForstall changed the title Add iterators Add more iterators to JIT May 12, 2021
@BruceForstall
Copy link
Member Author

BruceForstall commented May 12, 2021

Here are PIN throughput measurements over a superpmi replay of the benchmarks collection:

Collection: benchmarks                
  Debug Checked Release Release (no PGO) Debug/Checked Debug/Release Checked/Release Rel-no-PGO/Rel
Baseline 531,491,152,090 168,090,820,492 32,748,694,066 36,500,757,131        
  531,492,080,507 168,073,928,845 32,747,953,699 36,516,286,268        
  531,466,358,038 168,085,854,512 32,734,810,088 36,514,926,941        
Average: 531,483,196,878 168,083,534,616 32,743,819,284 36,510,656,780 3.162018208 16.23155785 5.133290444 1.11503965
(max - min) / min 4.83991E-05 0.000100501 0.000424135 0.000425447        
                 
Diff 544,010,189,065 168,089,266,473 33,131,445,166 36,515,701,230        
  544,021,549,738 168,110,984,589 33,138,743,064 36,528,148,318        
  544,023,088,633 168,114,904,944 33,132,977,072 36,509,253,833        
Average: 544,018,275,812 168,105,052,002 33,134,388,434 36,517,701,127 3.236180408 16.41853982 5.07343156 1.102108802
(max - min) / min 2.3712E-05 0.000152529 0.000220271 0.000517526        
                 
Diff/Base (Average) 1.02358509 1.000128016 1.011928027 1.000192939        

There is a "normal" Release build and "no-PGO" Release build, which is ~11% slower, but the "diff" case doesn't suffer from out-of-date PGO training data.

There is about a %0.02 TP regression in Release. There is also a 2% regression in Debug, which is more expected, as function calls are done for the iteration ++ and != operations.

@BruceForstall
Copy link
Member Author

@dotnet/jit-contrib Comments? Ideas? Suggestions? Should this be pursued? Do you like how the code looks? Does it seem like a useful abstraction improvement?

@EgorBo
Copy link
Member

EgorBo commented May 12, 2021

As for the regressions I wonder if it should be exactly the same on clang/gcc
E.g. we recently discovered that MSVC doesn't unroll loops with iterators https://godbolt.org/z/enhcafn88

@AndyAyersMS
Copy link
Member

For the release regression measurement, were you using non-PGO optimized builds?

@@ -2194,7 +2185,7 @@ void Compiler::fgDumpTrees(BasicBlock* firstBlock, BasicBlock* lastBlock)

// Note that typically we have already called fgDispBasicBlocks()
// so we don't need to print the preds and succs again here.
for (BasicBlock* block = firstBlock; block; block = block->bbNext)
for (BasicBlock* block = firstBlock; block != nullptr; block = block->bbNext)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be for (BasicBlock* const block : BasicBlockSimpleList(firstBlock)) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would work. I had converted the whole thing to:

for (BasicBlock* const block : BasicBlockRangeList(firstBlock, lastBlock))
{
    fgDumpBlock(block);
}

but some callers pass in a nullptr for lastBlock (and maybe firstBlock?) and I didn't track down if that's ok.

@BruceForstall
Copy link
Member Author

For the release regression measurement, were you using non-PGO optimized builds?

Good point. This was for default Release builds. I need to figure out the current method for disabling PGO (any suggestions?).

@sandreenko
Copy link
Contributor

sandreenko commented May 12, 2021

Good point. This was for default Release builds. I need to figure out the current method for disabling PGO (any suggestions?).

build %SUBSET% -rc %MODE% -lc release -arch %ARCH% /p:NoPgoOptimize=true

@BruceForstall
Copy link
Member Author

I updated the throughput table above with a Release-no-PGO run, and it shows only a 0.02% TP regression. The noise level appears to be about 0.05%, so this is in the noise.

@AndyAyersMS
Copy link
Member

it shows only a 0.02% TP regression

Great! I would have been somewhat surprised to find this sort of thing still causes problems for C++ compilers when optimizing.

@AndyAyersMS
Copy link
Member

This is a nice improvement to the code. I assume this is no-diff?

@chuck-mitchell would also approve.

@BruceForstall
Copy link
Member Author

/azp run runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@BruceForstall
Copy link
Member Author

I assume this is no-diff?

Correct. I ran x86 and x64 SPMI asm diffs.

@BruceForstall BruceForstall marked this pull request as ready for review May 13, 2021 00:36
@BruceForstall
Copy link
Member Author

@dotnet/jit-contrib This is ready for review

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great.

I'm still concerned about unexpected behavior if a collection is modified. Not sure if it's possible or worth keeping extra state in the debug builds iterators to try and catch when this happens.

Am also tempted by the idea that the block iterators could be expressed by overloads, eg

Blocks()
Blocks(BasicBlock* fromBlock)
Blocks(BasicBlock* fromBlock, BasicBlock* toBlock)

@BruceForstall
Copy link
Member Author

I'm still concerned about unexpected behavior if a collection is modified. Not sure if it's possible or worth keeping extra state in the debug builds iterators to try and catch when this happens.

What kind of checking do you think might be useful? One idea I had is caching the "next" pointer, then verifying it when we go to ++. For something like BasicBlock, we could assert that this->bbNext->bbPrev == this, to check that the block hasn't been spliced out, for example.

Am also tempted by the idea that the block iterators could be expressed by overloads

I like that; I'll make the change.

@AndyAyersMS
Copy link
Member

What kind of checking do you think might be useful? One idea I had is caching the "next" pointer...

That would be a good start -- probably enough to catch the most obvious failure pattern.

@BruceForstall
Copy link
Member Author

It turns out that the EfficientEdgeCountInstrumentor, when inserting counts for critical edges, can create new basic blocks. So, asserting that the bbNext block is what we expected it to be when we first iterated to a block fails: the bbNext has changed.

One option is to back off and use to old-style "manual" iteration for anything that changes to block list (as an example; some kind of checking might make sense for the other iterators). But I wonder if we should just be ok with changes to the flow graph. If the iterator pre-calculated next, it would be a problem. Maybe it's ok to assume bbNext will only be consulted when ++ is used.

There was another case for a forward walk was used in fgUnlinkRange, and that didn't work with my assert that block->bbNext->bbPrev == block, since the bbPrev links outside of the range had been fixed up, and the range was the spliced-out blocks. Once again, it works fine without the asserts, but perhaps the asserts will actually catch something useful. Not sure what the best balance is.

Comments?

@BruceForstall
Copy link
Member Author

Here's an updated PIN over superpmi replay of benchmarks collection for the current code:

Collection: benchmarks            
  Debug Checked Release (no PGO) Debug/Checked Debug/Release Checked/Release
Baseline 524,976,811,958 166,458,195,773 36,624,759,188      
  525,029,406,347 166,472,846,364 36,617,857,951      
  524,982,131,581 166,455,678,324 36,626,907,592      
Average: 524,996,116,629 166,462,240,154 36,623,174,910 3.153845077 14.3350793 4.545270599
(max - min) / min 0.000100184 0.000103139 0.000247137      
             
Diff 542,820,582,754 166,413,960,622 36,633,871,905      
  542,871,277,889 166,380,588,108 36,630,646,502      
  542,837,533,343 166,380,383,253 36,684,350,538      
Average: 542,843,131,329 166,391,643,994 36,649,622,982 3.262442262 14.81169756 4.540064275
(max - min) / min 9.33921E-05 0.000201811 0.001466096      
             
Diff/Base (Average) 1.033994565 0.999575903 1.000722168      

So the Release build is 0.01% slower, which appears to be in the noise. The Debug build is 3% slower, which is expected.

1. Make more BasicBlock members `const`.
`for (Statement* stmt : blk->NonPhiStatements())`
This matches the style used for block and GenTree iterators, and prevents
errors where changing the stmt pointer expects the iterator location to change.
With this change, all block iteration can start from Compiler::Block():
```
for (BasicBlock* const block : compiler->Blocks()) ...
for (BasicBlock* const block : compiler->Blocks(startBlock)) ...
for (BasicBlock* const block : compiler->Blocks(startBlock, endBlock)) ...
```

The second two could be `static`, but I haven't done that.
E.g.:
```
for (BasicBlock* const target : block->SwitchTargets())
```

Also, (1) reverted one LSRA block iterator, (2) tweaked
LSRA block ordering dump.
e.g.,

    for (BasicBlock* const succ : block->Succs())

This is equivalent to looping using NumSucc() and GetSucc() using the
versions which don't take a `Compiler*`.
If fall-through and branch destinations are identical, only
iterate over one.
@BruceForstall
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@BruceForstall
Copy link
Member Author

@dotnet/jit-contrib This is ready for a final review. I've updated the top comment to describe all the added iterators.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an easy change to review -- I looked over several of the files I'm more familiar with and it looks good.

@BruceForstall BruceForstall merged commit 5788ea0 into dotnet:main Jun 7, 2021
@BruceForstall BruceForstall deleted the AddIterators branch June 7, 2021 23:16
@ghost ghost locked as resolved and limited conversation to collaborators Jul 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants