Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DivRem intrinsic for X86 #66551

Merged
merged 85 commits into from
Feb 15, 2023
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
00da1f4
Add managed api for divrem
huoyaoyuan Mar 7, 2022
d679cca
Add NI definition of DivRem
huoyaoyuan Mar 7, 2022
4eba5c5
Fix DivRem to be static
huoyaoyuan Mar 11, 2022
6f8fbbf
Implement DivRem in clrjit
huoyaoyuan Mar 11, 2022
997d8a5
Add tests for DivRem
huoyaoyuan Mar 11, 2022
c0c3dd6
Adjust lsra and RMW
huoyaoyuan Mar 12, 2022
c08cee7
Use DivRem intrinsic in Math
huoyaoyuan Mar 12, 2022
1804350
Bring lower change from coreclr#37928
huoyaoyuan Mar 12, 2022
46c3f78
Fix signedness of DIV
huoyaoyuan Mar 13, 2022
2dbd2e9
Revert RMW change and fix reg allocation
huoyaoyuan Mar 13, 2022
1b6d09b
Fix import of X64 intrinsic
huoyaoyuan Mar 13, 2022
3ef3600
Fix static in PNSE version
huoyaoyuan Mar 13, 2022
acaf211
Fix accidential indent change
huoyaoyuan Mar 13, 2022
728440d
Apply format patch
huoyaoyuan Mar 13, 2022
4eace5d
op3 candidate should be different from op1 and op2
huoyaoyuan Mar 13, 2022
3182ed8
Add guard over AsHWIntrinsic
huoyaoyuan Mar 14, 2022
48c12a4
Disable DivRem intrinsic use for Mono
huoyaoyuan Mar 14, 2022
be5b33c
Fix LSRA and invalid assertion
huoyaoyuan Mar 15, 2022
d868194
Set RWM information correctly, although totally unused.
huoyaoyuan Mar 15, 2022
ff65608
Move multi-reg temp allocation to common helper
huoyaoyuan Mar 15, 2022
b31f896
Disable DivRem intrinsic test for Mono.
huoyaoyuan Mar 16, 2022
0bd1327
Fix typo of quotient
huoyaoyuan Mar 23, 2022
51959c0
Merge branch 'main'
huoyaoyuan May 20, 2022
8f1d9a2
Merge branch 'main' into divrem
huoyaoyuan Jun 8, 2022
f4aece2
Merge branch 'main'
huoyaoyuan Jul 15, 2022
75dda8d
Merge branch 'main' into divrem
huoyaoyuan Sep 18, 2022
43049a9
Use impAssignMultiRegTypeToVar
huoyaoyuan Sep 19, 2022
d485b61
Update #ifdef
huoyaoyuan Sep 19, 2022
aeb114a
Remove change in LowerBlockStore
huoyaoyuan Sep 19, 2022
aa57f26
Adjust assert and helper method
huoyaoyuan Sep 23, 2022
9345e57
Update CheckMultiRegLclVar
huoyaoyuan Sep 23, 2022
65c0af6
fix build errors
kunalspathak Sep 23, 2022
2bfa332
Merge branch 'main' into divrem
huoyaoyuan Sep 24, 2022
7b41fd6
Merge remote-tracking branch 'origin/main' into pr-66551
kunalspathak Nov 7, 2022
df922de
Fix some merge conflicts
kunalspathak Nov 7, 2022
924fc42
Fix some errors
kunalspathak Nov 7, 2022
1e7ff05
Fix GCC build error
kunalspathak Nov 8, 2022
62e6c59
jit format
kunalspathak Nov 8, 2022
a1b4802
Make the size depending on the struct
kunalspathak Nov 8, 2022
d547337
Merge remote-tracking branch 'origin/main' into divrem
kunalspathak Dec 5, 2022
8199acb
Fix the formatting of summary
kunalspathak Dec 6, 2022
ed12edb
Remove trailing spaces
kunalspathak Dec 7, 2022
532a8fd
Fix the tests to adopt new model
kunalspathak Dec 7, 2022
281c1b0
Fix MAX_RET_REG_COUNT -> MAX_MULTIREG_COUNT
kunalspathak Dec 8, 2022
97630a2
Exclude from mono run
kunalspathak Dec 9, 2022
491f82e
Add RequiresProcessIsolation
kunalspathak Dec 12, 2022
7487dfc
Merge remote-tracking branch 'origin/main' into divrem
kunalspathak Dec 28, 2022
7a862b2
Merge remote-tracking branch 'origin/main' into divrem
kunalspathak Jan 6, 2023
0fc11c9
Real fix for build break
kunalspathak Jan 7, 2023
8ab6023
Fix the test cases to adopt to the new system
kunalspathak Jan 11, 2023
8d67890
Disable for llvm-fullaot
kunalspathak Jan 12, 2023
0d85848
Also continue disabling for mono
kunalspathak Jan 12, 2023
1fd2b74
Merge remote-tracking branch 'origin/main' into divrem
kunalspathak Jan 12, 2023
18c9787
Disable the test in a different group
kunalspathak Jan 12, 2023
b63dea3
Merge remote-tracking branch 'origin/main' into divrem
kunalspathak Jan 13, 2023
1b0e670
fix merge conflict
kunalspathak Jan 13, 2023
7631033
format
kunalspathak Jan 13, 2023
c0be00a
fix another merge conflict error
kunalspathak Jan 14, 2023
83192b1
Merge remote-tracking branch 'origin/main' into divrem
kunalspathak Jan 17, 2023
b29e1b4
misc changes from review
kunalspathak Jan 17, 2023
82c38a3
fix the replay errors
kunalspathak Jan 18, 2023
61fabe2
jit format
kunalspathak Jan 18, 2023
eea804c
Add exclude list in mono/llvmfullaot
kunalspathak Jan 18, 2023
ebe8781
Review feedback and fix bug in CheckMultiRegLclVar
kunalspathak Jan 20, 2023
9dc8522
Pass registerCount
kunalspathak Jan 20, 2023
0093b79
review feedback
kunalspathak Jan 20, 2023
8260134
Remove the extra comments
kunalspathak Jan 24, 2023
320270a
Disable tests on mono llvmfullaot runs as well
akoeplinger Jan 24, 2023
a98dbde
Add missing case for upper save/restore
kunalspathak Jan 24, 2023
06f0460
Add missing RequiresProcessIsolation
kunalspathak Jan 25, 2023
1b3e851
Revert "Add missing case for upper save/restore"
kunalspathak Jan 25, 2023
9f192ac
Merge remote-tracking branch 'huoyaoyuan/divrem' into divrem
kunalspathak Jan 25, 2023
537f678
Remove RequiresProcessIsolation property
kunalspathak Jan 25, 2023
ca32b24
Merge branch 'main' into divrem
kunalspathak Feb 9, 2023
e1b5fb3
Replace ref with fakelibs
kunalspathak Feb 9, 2023
6589700
Add references of fakelib to test project
kunalspathak Feb 10, 2023
2a90b4b
Fix CompatibilitySuppressions.xml
kunalspathak Feb 10, 2023
aad300a
fix test builds
kunalspathak Feb 10, 2023
eb3272c
Remove unneeded extern/using
kunalspathak Feb 10, 2023
39ae98d
Revert "Replace ref with fakelibs"
kunalspathak Feb 13, 2023
f8473ec
Revert "Revert "Replace ref with fakelibs""
kunalspathak Feb 13, 2023
e00a8c1
Exclude DivRem.csproj
kunalspathak Feb 14, 2023
9e28279
Handle the case to delay-free op3 for op1/op2
kunalspathak Feb 14, 2023
2d4d8d9
Create the DivRem.RefOnly fake CoreLib as a reference assembly
kunalspathak Feb 14, 2023
b5e3dd3
Unify AddDelayFreeUses
kunalspathak Feb 15, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -3357,6 +3357,8 @@ class Compiler
NamedIntrinsic hwIntrinsicID);
GenTreeHWIntrinsic* gtNewScalarHWIntrinsicNode(
var_types type, GenTree* op1, GenTree* op2, GenTree* op3, NamedIntrinsic hwIntrinsicID);
GenTreeLclVar* gtNewLclvForMultiRegIntrinsicNode(GenTreeHWIntrinsic* intrinsicNode, CORINFO_SIG_INFO* sig);

CORINFO_CLASS_HANDLE gtGetStructHandleForHWSIMD(var_types simdType, CorInfoType simdBaseJitType);
CorInfoType getBaseJitTypeFromArgIfNeeded(NamedIntrinsic intrinsic,
CORINFO_CLASS_HANDLE clsHnd,
Expand Down Expand Up @@ -4497,6 +4499,10 @@ class Compiler
CorInfoType simdBaseJitType,
var_types retType,
unsigned simdSize);
GenTree* impX86BaseIntrinsic(NamedIntrinsic intrinsic,
CORINFO_METHOD_HANDLE method,
CORINFO_SIG_INFO* sig,
CorInfoType simdBaseJitType);
GenTree* impSSEIntrinsic(NamedIntrinsic intrinsic, CORINFO_METHOD_HANDLE method, CORINFO_SIG_INFO* sig);
GenTree* impSSE2Intrinsic(NamedIntrinsic intrinsic, CORINFO_METHOD_HANDLE method, CORINFO_SIG_INFO* sig);
GenTree* impAvxOrAvx2Intrinsic(NamedIntrinsic intrinsic, CORINFO_METHOD_HANDLE method, CORINFO_SIG_INFO* sig);
Expand Down
20 changes: 20 additions & 0 deletions src/coreclr/jit/gentree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17956,6 +17956,8 @@ bool GenTree::isRMWHWIntrinsic(Compiler* comp)
case NI_FMA_MultiplySubtractNegated:
case NI_FMA_MultiplySubtractNegatedScalar:
case NI_FMA_MultiplySubtractScalar:
case NI_X86Base_DivRem:
case NI_X86Base_X64_DivRem:
{
return true;
}
Expand Down Expand Up @@ -21635,6 +21637,24 @@ GenTreeHWIntrinsic* Compiler::gtNewScalarHWIntrinsicNode(
/* isSimdAsHWIntrinsic */ false, op1, op2, op3);
}

GenTreeLclVar* Compiler::gtNewLclvForMultiRegIntrinsicNode(GenTreeHWIntrinsic* intrinsicNode, CORINFO_SIG_INFO* sig)
huoyaoyuan marked this conversation as resolved.
Show resolved Hide resolved
{
assert(HWIntrinsicInfo::IsMultiReg(intrinsicNode->GetHWIntrinsicId()));

const unsigned lclNum = lvaGrabTemp(true DEBUGARG("Return value temp for multireg intrinsic"));
impAssignTempGen(lclNum, intrinsicNode, sig->retTypeSigClass, (unsigned)CHECK_SPILL_ALL);

LclVarDsc* varDsc = lvaGetDesc(lclNum);
// The following is to exclude the fields of the local to have SSA.
varDsc->lvIsMultiRegRet = true;

GenTreeLclVar* lclVar = gtNewLclvNode(lclNum, varDsc->lvType);
lclVar->SetDoNotCSE();
lclVar->SetMultiReg();

return lclVar;
}

// Returns true for the HW Intrinsic instructions that have MemoryLoad semantics, false otherwise
bool GenTreeHWIntrinsic::OperIsMemoryLoad() const
{
Expand Down
4 changes: 3 additions & 1 deletion src/coreclr/jit/gentree.h
Original file line number Diff line number Diff line change
Expand Up @@ -8130,6 +8130,7 @@ inline var_types GenTree::GetRegTypeByIndex(int regIndex) const
#endif // !defined(TARGET_64BIT)
#endif // FEATURE_MULTIREG_RET

#ifdef FEATURE_HW_INTRINSICS
if (OperIsHWIntrinsic())
{
assert(TypeGet() == TYP_STRUCT);
Expand All @@ -8146,9 +8147,10 @@ inline var_types GenTree::GetRegTypeByIndex(int regIndex) const
#elif defined(TARGET_XARCH)
// At this time, the only multi-reg HW intrinsics all return the type of their
// arguments. If this changes, we will need a way to record or determine this.
return gtGetOp1()->TypeGet();
return AsHWIntrinsic()->Op(1)->TypeGet();
tannergooding marked this conversation as resolved.
Show resolved Hide resolved
#endif
}
#endif // FEATURE_HW_INTRINSICS

if (OperIsScalarLocal())
{
Expand Down
6 changes: 6 additions & 0 deletions src/coreclr/jit/hwintrinsic.h
Original file line number Diff line number Diff line change
Expand Up @@ -744,6 +744,12 @@ struct HWIntrinsicInfo
return 2;
#endif

#ifdef TARGET_XARCH
case NI_X86Base_DivRem:
case NI_X86Base_X64_DivRem:
return 2;
#endif // TARGET_XARCH

default:
unreached();
}
Expand Down
18 changes: 3 additions & 15 deletions src/coreclr/jit/hwintrinsicarm64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1582,27 +1582,15 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
}
}

GenTree* loadIntrinsic = gtNewSimdHWIntrinsicNode(retType, op1, intrinsic, simdBaseJitType, simdSize);
GenTreeHWIntrinsic* loadIntrinsic =
gtNewSimdHWIntrinsicNode(retType, op1, intrinsic, simdBaseJitType, simdSize);
// This operation contains an implicit indirection
// it could point into the global heap or
// it could throw a null reference exception.
//
loadIntrinsic->gtFlags |= (GTF_GLOB_REF | GTF_EXCEPT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can delete this line, it should not be necessary now (but check we still set the flags in the constructor for the HWI node).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks somehow out of scope. I abstracted the common code path from arm64, but didn't meant to modify it. I don't have the environment to verify this change either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add back the assert assert(HWIntrinsicInfo::IsMultiReg(intrinsic));?

Also, I think you should use the code what used to be here inside impAssignMultiRegTypeToVar() because this code looked much cleaner. E.g.:

 lclVar->SetDoNotCSE();
lclVar->SetMultiReg();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetMultiReg and lvIsMultiRegRet in fact doesn't do the same thing and replacing would cause failures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetMultiReg() goes on GenTree node while lvIsMultiRegRet goes on LclVarDsc. You missed setting the flag on LclVarDsc.


assert(HWIntrinsicInfo::IsMultiReg(intrinsic));

const unsigned lclNum = lvaGrabTemp(true DEBUGARG("Return value temp for multireg intrinsic"));
impAssignTempGen(lclNum, loadIntrinsic, sig->retTypeSigClass, (unsigned)CHECK_SPILL_ALL);

LclVarDsc* varDsc = lvaGetDesc(lclNum);
// The following is to exclude the fields of the local to have SSA.
varDsc->lvIsMultiRegRet = true;

GenTreeLclVar* lclVar = gtNewLclvNode(lclNum, varDsc->lvType);
lclVar->SetDoNotCSE();
lclVar->SetMultiReg();

retNode = lclVar;
retNode = gtNewLclvForMultiRegIntrinsicNode(loadIntrinsic, sig);
break;
}

Expand Down
37 changes: 37 additions & 0 deletions src/coreclr/jit/hwintrinsiccodegenxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1175,6 +1175,43 @@ void CodeGen::genX86BaseIntrinsic(GenTreeHWIntrinsic* node)
break;
}

case NI_X86Base_DivRem:
case NI_X86Base_X64_DivRem:
{
assert(node->GetOperandCount() == 3);

// SIMD base type is from signature and can distinguish signed and unsigned
var_types targetType = node->GetSimdBaseType();
GenTree* op1 = node->Op(1);
GenTree* op2 = node->Op(2);
GenTree* op3 = node->Op(3);
instruction ins = HWIntrinsicInfo::lookupIns(intrinsicId, targetType);

regNumber op1Reg = op1->GetRegNum();
regNumber op2Reg = op2->GetRegNum();
regNumber op3Reg = op3->GetRegNum();

emitAttr attr = emitTypeSize(targetType);
emitter* emit = GetEmitter();

// op1: EAX, op2: EDX, op3: free
kunalspathak marked this conversation as resolved.
Show resolved Hide resolved
assert(op1Reg != REG_EDX);
assert(op2Reg != REG_EAX);
if (op3->isUsedFromReg())
{
assert(op3Reg != REG_EDX);
assert(op3Reg != REG_EAX);
}

emit->emitIns_Mov(INS_mov, attr, REG_EAX, op1Reg, /* canSkip */ true);
emit->emitIns_Mov(INS_mov, attr, REG_EDX, op2Reg, /* canSkip */ true);

// emit the DIV/IDIV instruction
emit->emitInsBinary(ins, attr, node, op3);

break;
}

default:
unreached();
break;
Expand Down
2 changes: 2 additions & 0 deletions src/coreclr/jit/hwintrinsiclistxarch.h
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,7 @@ HARDWARE_INTRINSIC(Vector256, Xor,
HARDWARE_INTRINSIC(X86Base, BitScanForward, 0, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_bsf, INS_bsf, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Scalar, HW_Flag_NoFloatingPointUsed|HW_Flag_NoRMWSemantics)
HARDWARE_INTRINSIC(X86Base, BitScanReverse, 0, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_bsr, INS_bsr, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Scalar, HW_Flag_NoFloatingPointUsed|HW_Flag_NoRMWSemantics)
HARDWARE_INTRINSIC(X86Base, Pause, 0, 0, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Special, HW_Flag_NoContainment|HW_Flag_NoRMWSemantics)
HARDWARE_INTRINSIC(X86Base, DivRem, 0, 3, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_idiv, INS_div, INS_idiv, INS_div, INS_invalid, INS_invalid}, HW_Category_Scalar, HW_Flag_NoFloatingPointUsed|HW_Flag_BaseTypeFromSecondArg|HW_Flag_MultiReg|HW_Flag_SpecialImport|HW_Flag_SpecialCodeGen)

// ***************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
// ISA Function name SIMD size NumArg Instructions Category Flags
Expand All @@ -232,6 +233,7 @@ HARDWARE_INTRINSIC(X86Base, Pause,
// X86Base 64-bit-only Intrinsics
HARDWARE_INTRINSIC(X86Base_X64, BitScanForward, 0, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_bsf, INS_bsf, INS_invalid, INS_invalid}, HW_Category_Scalar, HW_Flag_NoFloatingPointUsed|HW_Flag_NoRMWSemantics)
HARDWARE_INTRINSIC(X86Base_X64, BitScanReverse, 0, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_bsr, INS_bsr, INS_invalid, INS_invalid}, HW_Category_Scalar, HW_Flag_NoFloatingPointUsed|HW_Flag_NoRMWSemantics)
HARDWARE_INTRINSIC(X86Base_X64, DivRem, 0, 3, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_idiv, INS_div, INS_invalid, INS_invalid}, HW_Category_Scalar, HW_Flag_NoFloatingPointUsed|HW_Flag_BaseTypeFromSecondArg|HW_Flag_MultiReg|HW_Flag_SpecialImport|HW_Flag_SpecialCodeGen)

// ***************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
// ISA Function name SIMD size NumArg Instructions Category Flags
Expand Down
63 changes: 53 additions & 10 deletions src/coreclr/jit/hwintrinsicxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -491,8 +491,10 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
{
case InstructionSet_Vector256:
case InstructionSet_Vector128:
case InstructionSet_X86Base:
return impBaseIntrinsic(intrinsic, clsHnd, method, sig, simdBaseJitType, retType, simdSize);
case InstructionSet_X86Base:
case InstructionSet_X86Base_X64:
return impX86BaseIntrinsic(intrinsic, method, sig, simdBaseJitType);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This had an interesting side effect: if FeatureSIMD is turned off, X86Base.Pause will also be turned off.

case InstructionSet_SSE:
return impSSEIntrinsic(intrinsic, method, sig);
case InstructionSet_SSE2:
Expand Down Expand Up @@ -550,13 +552,8 @@ GenTree* Compiler::impBaseIntrinsic(NamedIntrinsic intrinsic,
return nullptr;
}

var_types simdBaseType = TYP_UNKNOWN;

if (intrinsic != NI_X86Base_Pause)
{
simdBaseType = JitType2PreciseVarType(simdBaseJitType);
assert(varTypeIsArithmetic(simdBaseType));
}
var_types simdBaseType = JitType2PreciseVarType(simdBaseJitType);
assert(varTypeIsArithmetic(simdBaseType));

switch (intrinsic)
{
Expand Down Expand Up @@ -2194,16 +2191,62 @@ GenTree* Compiler::impBaseIntrinsic(NamedIntrinsic intrinsic,
break;
}

default:
{
return nullptr;
}
}

return retNode;
}

GenTree* Compiler::impX86BaseIntrinsic(NamedIntrinsic intrinsic,
CORINFO_METHOD_HANDLE method,
CORINFO_SIG_INFO* sig,
CorInfoType simdBaseJitType)
{
GenTree* retNode = nullptr;
GenTree* op1 = nullptr;
GenTree* op2 = nullptr;
GenTree* op3 = nullptr;
GenTree* op4 = nullptr;

var_types retType = JITtype2varType(sig->retType);

switch (intrinsic)
{

case NI_X86Base_Pause:
{
assert(sig->numArgs == 0);
assert(JITtype2varType(sig->retType) == TYP_VOID);
assert(simdSize == 0);
assert(retType == TYP_VOID);
assert(simdBaseJitType == CORINFO_TYPE_UNDEF);

retNode = gtNewScalarHWIntrinsicNode(TYP_VOID, intrinsic);
break;
}

case NI_X86Base_DivRem:
case NI_X86Base_X64_DivRem:
{
assert(sig->numArgs == 3);
assert(HWIntrinsicInfo::IsMultiReg(intrinsic));
assert(retType == TYP_STRUCT);
assert(simdBaseJitType != CORINFO_TYPE_UNDEF);

op3 = impPopStack().val;
op2 = impPopStack().val;
op1 = impPopStack().val;

GenTreeHWIntrinsic* divRemIntrinsic = gtNewScalarHWIntrinsicNode(retType, op1, op2, op3, intrinsic);

// Store the type from signature into SIMD base type for convenience
divRemIntrinsic->SetSimdBaseJitType(simdBaseJitType);

retNode = gtNewLclvForMultiRegIntrinsicNode(divRemIntrinsic, sig);
break;
}

default:
{
return nullptr;
Expand Down
8 changes: 4 additions & 4 deletions src/coreclr/jit/lower.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6749,7 +6749,7 @@ bool Lowering::NodesAreEquivalentLeaves(GenTree* tree1, GenTree* tree2)
bool Lowering::CheckMultiRegLclVar(GenTreeLclVar* lclNode, const ReturnTypeDesc* retTypeDesc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the following:

struct V00 { int a, int b, int c, int d }; // Independently promoted

V00 = DivRem(); // DivRem is { long, long }

This logic will fail to reject the enregistration if V00 (or, more accurately, its fields) even as it should (see also my earlier comment about passing the register count instead of retTypeDesc to fix this case).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the logic of CheckMultiRegLclVar() the way you suggested. I am not fully sure about this example though. Can you give C# example where I can see things goes wrong?

Copy link
Contributor

@SingleAccretion SingleAccretion Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give C# example where I can see things goes wrong?

It is not unlikely this cannot actually be reproduced right now because of the constrained way multi-reg locals are used, but here would be the idea for a reproduction:

// V00: struct { long, long }
// V01: struct { int, int, int, int }

V00 = DivDem()
*(typeof(V00)*)V01 = V00

Then V00 and V01 need to be promoted and forward subtitution needs to happen s.t. we end with:

V01 = DivRem()

The more general point though is that this can happen according to the IR validity rules and so needs to be handled.

{
bool canEnregister = false;
#if FEATURE_MULTIREG_RET
#if FEATURE_MULTIREG_RET || FEATURE_HW_INTRINSICS
huoyaoyuan marked this conversation as resolved.
Show resolved Hide resolved
LclVarDsc* varDsc = comp->lvaGetDesc(lclNode->GetLclNum());
if ((comp->lvaEnregMultiRegVars) && varDsc->lvPromoted)
{
Expand All @@ -6776,10 +6776,10 @@ bool Lowering::CheckMultiRegLclVar(GenTreeLclVar* lclNode, const ReturnTypeDesc*
}
}
#ifdef TARGET_XARCH
// For local stores on XARCH we only handle mismatched src/dest register count for
// calls of SIMD type. If the source was another lclVar similarly promoted, we would
// For local stores on XARCH we can't handle another lclVar source.
// If the source was another lclVar similarly promoted, we would
// have broken it into multiple stores.
if (lclNode->OperIs(GT_STORE_LCL_VAR) && !lclNode->gtGetOp1()->OperIs(GT_CALL))
if (lclNode->OperIs(GT_STORE_LCL_VAR) && lclNode->gtGetOp1()->OperIs(GT_LCL_VAR))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this is doing, but not doing this will cause crossgen2 failing for Utf8Formatter.TryFormat(DateTime).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try to find the root cause of this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have time for this. I brought it from another PR mentioned earlier and it just works. It's better to ask the original author.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt can you help explaining this? Not doing this isn't causing failures now, but the code looks necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huoyaoyuan - Carol doesn't work on this code base anymore. As the comment suggests, " If the source was another lclVar similarly promoted, we would have broken it into multiple stores.". So, I think you should revert this change, given that it works now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was just a means to try which nodes are caught if op1 != GT_CALL. So the reason it is failing (as expected) is because the tree now contains the op1 == GT_HWINTRINSIC.

                                                            ┌──▌  t10    int    
                                                            ├──▌  t13    int    
                                                            ├──▌  t58    int    
N007 (  8,  9) [000014] -----------                   t14 = ▌  HWINTRINSIC struct int DivRem $140
                                                            ┌──▌  t14    struct 
N009 ( 18, 16) [000017] MA---------                         ▌  STORE_LCL_VAR struct<System.ValueTuple`2[System.Int32, System.Int32], 8>(P) V05 tmp3         
                                                            ▌    int    V05.Item1 (offs=0x00) -> V13 tmp11        
                                                            ▌    int    V05.Item2 (offs=0x04) -> V14 tmp12   

We do want to enregister in such cases and not doing it (retaining op1 != GT_CALL) spills the result on stack.

https://www.diffchecker.com/TUY5sGjV

Copy link
Contributor

@SingleAccretion SingleAccretion Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same code on xarch?

Yes and no. My suggestion includes the (significant) addition that we will DNER all locals, not just promoted ones. It is to prevent the case I described above from arising: struct retyped as long = multi-reg-node and other assigns codegen doesn't support. Alternatively, we could of course support them in codegen, but that seems out of scope for this change.

Also, for the below code, do you mean to say use the current check of independent promotion too?

Yes, the independent promotion check (and the field count check as well) must stay. In fact, I think they should be expanded. Consider struct { promoted field<double> } = HWI(int, int) and other mismatched cases. Right now they would pass, but we need to DNER them, by passing the register count to the method instead of the return type descriptor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's now new failures and I don't have enough time to investigate in more depth.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take a look hopefully next week.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code as per @SingleAccretion suggestion.

{
canEnregister = false;
}
Expand Down
24 changes: 24 additions & 0 deletions src/coreclr/jit/lowerxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -384,6 +384,16 @@ void Lowering::LowerBlockStore(GenTreeBlk* blkNode)

ContainBlockStoreAddress(blkNode, size, dstAddr);
}
else if (src->OperIsHWIntrinsic())
{
assert(!blkNode->AsObj()->GetLayout()->HasGCPtr());
if (blkNode->OperIs(GT_STORE_OBJ))
{
blkNode->SetOper(GT_STORE_BLK);
}
blkNode->gtBlkOpKind = GenTreeBlk::BlkOpKindUnroll;
ContainBlockStoreAddress(blkNode, size, dstAddr);
}
huoyaoyuan marked this conversation as resolved.
Show resolved Hide resolved
else
{
assert(blkNode->OperIs(GT_STORE_BLK, GT_STORE_DYN_BLK));
Expand Down Expand Up @@ -6478,6 +6488,20 @@ void Lowering::ContainCheckHWIntrinsic(GenTreeHWIntrinsic* node)
}
break;
}
case NI_X86Base_DivRem:
case NI_X86Base_X64_DivRem:
{
// DIV only allows divisor (op3) in memory
if (TryGetContainableHWIntrinsicOp(node, &op3, &supportsRegOptional))
{
MakeSrcContained(node, op3);
tannergooding marked this conversation as resolved.
Show resolved Hide resolved
}
else if (supportsRegOptional)
{
op3->SetRegOptional();
kunalspathak marked this conversation as resolved.
Show resolved Hide resolved
}
break;
}

default:
{
Expand Down
41 changes: 39 additions & 2 deletions src/coreclr/jit/lsraxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2015,7 +2015,23 @@ int LinearScan::BuildHWIntrinsic(GenTreeHWIntrinsic* intrinsicTree, int* pDstCou
}

int srcCount = 0;
int dstCount = intrinsicTree->IsValue() ? 1 : 0;
int dstCount;

if (intrinsicTree->IsValue())
{
if (HWIntrinsicInfo::IsMultiReg(intrinsicId))
{
dstCount = HWIntrinsicInfo::GetMultiRegCount(intrinsicId);
}
else
{
dstCount = 1;
}
}
else
{
dstCount = 0;
}

regMaskTP dstCandidates = RBM_NONE;

Expand Down Expand Up @@ -2193,6 +2209,26 @@ int LinearScan::BuildHWIntrinsic(GenTreeHWIntrinsic* intrinsicTree, int* pDstCou
}
#endif // TARGET_X86

case NI_X86Base_DivRem:
case NI_X86Base_X64_DivRem:
{
assert(numArgs == 3);
assert(dstCount == 2);
assert(isRMW);

// DIV implicitly put op1(lower) to EAX and op2(upper) to EDX
srcCount += BuildOperandUses(op1, RBM_EAX);
srcCount += BuildOperandUses(op2, RBM_EDX);
srcCount += op3->isContained() ? BuildOperandUses(op3) : BuildDelayFreeUses(op3, op1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to work as expected?

Normally the second operand to BuildDelayFreeUses is the rmw node. However, in this case we have two RMW nodes (op1 and op2) and so we should likely need to indicate that op3 is delay free with regards to both op1 and op2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it remains delayRegFree for the entire treeNode. We reset pendingDelayFree when processing the next treeNode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, @kunalspathak, there is special logic that looks at node (op3) compared to rmwNode (op1)

and a note specifically around:

// Note: we don't handle multi-reg vars here. It's not clear that there are any cases
// where we'd encounter a multi-reg var in an RMW context.
assert(!rmwNode->AsLclVar()->IsMultiReg());

So my guess is we either don't have sufficient tests covering this case or we aren't correctly tracking the return value as multi-reg. There is also has the later logic that says if (node == rmwNode) then it might not set the delay free.

So, I think we need another overload of BuildDelayFreeUses that takes both rmwNodes and does the right general checks.


// result put in EAX and EDX
BuildDef(intrinsicTree, RBM_EAX, 0);
BuildDef(intrinsicTree, RBM_EDX, 1);

buildUses = false;
break;
}

case NI_BMI2_MultiplyNoFlags:
case NI_BMI2_X64_MultiplyNoFlags:
{
Expand Down Expand Up @@ -2463,7 +2499,8 @@ int LinearScan::BuildHWIntrinsic(GenTreeHWIntrinsic* intrinsicTree, int* pDstCou
}
else
{
assert(dstCount == 0);
// Currently dstCount = 2 is only used for DivRem, which has special constriants and handled above
assert((dstCount == 0) || (dstCount == 2));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably worth including (dstCout == 2) && (intrinsicId == DivRem)?

}

*pDstCount = dstCount;
Expand Down
Loading