-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Added SVE GetFfr
, SetFfr
, LoadVectorFirstFaulting
, GatherVectorFirstFaulting
#104502
Conversation
Note regarding the
|
1 similar comment
Note regarding the
|
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics |
src/libraries/Common/tests/TestUtilities/System/Buffers/BoundedMemory.Windows.cs
Show resolved
Hide resolved
/// svbool_t svrdffr() | ||
/// RDFFR Presult.B | ||
/// </summary> | ||
public static unsafe Vector<byte> GetFfr() => GetFfr(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding @terrajobst I think the APIs for GetFfr
need another look. We can't provide overloads by return type, so if we were to expose a way to do that, we need to use suffix versions, i.e. GetFfrByte
, GetFfrUInt16
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you've done is actually how API review discussed and approved it, it just didn't get captured in the approval notes.
We have a standard convention of suffixing hwintrinsic APIs with the .NET type name where we need to disambiguate, such as ConvertToSingle
or LoadVector128
. This will always be a Vector<T>
return and so we only needed to encode the T
, such as Byte
, Int16
, Int32
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, thinking about it, we do not have individual type variants in C++ APIs. wondering why we have them at all? https://developer.arm.com/architectures/instruction-sets/intrinsics/#q=svrdffr. There should be just 1 API for this IMO - Vector<byte> GetFfr()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering why we have them at all?
C/C++ doesn't use templates and has an explicit mask register type (svbool_t
). By default you'd presume that svbool_t
is byte
, but that isn't always the case and how that svbool_t
is interpreted depends on the consuming operation. For example CNTP
which finds the number of active elements set reads every nth
bit:
if !HaveSVE() && !HaveSME() then UNDEFINED;
constant integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Pn);
integer d = UInt(Rd);
CheckSVEEnabled();
constant integer VL = CurrentVL;
constant integer PL = VL DIV 8;
constant integer elements = VL DIV esize;
bits(PL) mask = P[g, PL];
bits(PL) operand = P[n, PL];
bits(64) sum = Zeros(64);
for e = 0 to elements-1
if ActivePredicateElement(mask, e, esize) && ActivePredicateElement(operand, e, esize) then
sum = sum + 1;
X[d, 64] = sum;
boolean ActivePredicateElement(bits(N) pred, integer e, integer esize)
assert esize IN {8, 16, 32, 64, 128};
integer n = e * (esize DIV 8);
assert n >= 0 && n < N;
return pred<n> == '1';
And so, a svbool_t
for int
and for byte
are both the same size, which is 1/8th
the size of Vector<T>
(thus for a 2048-bit Vector, we have 256-bit VectorMask<T>
). Where VectorMask<byte>
means check every bit and VectorMask<int>
means check every 4th
bit.
Since .NET uses generics and doesn't have an explicit mask type, we want to explicitly expose the underlying T
so that devs can take the mask produced by GetFfr
and pass it down to other APIs without casting, which is the same thing they could do in C/C++.
@dotnet/arm64-contrib @kunalspathak @tannergooding this is ready. stress test results:
|
Ok, this isn't quite as ready as I would like. I had my test wrong for GatherVector. |
@dotnet/arm64-contrib @kunalspathak @tannergooding Now this is ready. :) stress results:
|
@@ -105,6 +105,7 @@ HARDWARE_INTRINSIC(Sve, GatherPrefetch64Bit, | |||
HARDWARE_INTRINSIC(Sve, GatherPrefetch8Bit, -1, -1, false, {INS_sve_prfb, INS_sve_prfb, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Special, HW_Flag_Scalable|HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialCodeGen|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation|HW_Flag_HasImmediateOperand|HW_Flag_HasEnumOperand) | |||
HARDWARE_INTRINSIC(Sve, GatherVector, -1, -1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_ld1w, INS_sve_ld1w, INS_sve_ld1d, INS_sve_ld1d, INS_sve_ld1w, INS_sve_ld1d}, HW_Category_MemoryLoad, HW_Flag_Scalable|HW_Flag_SpecialCodeGen|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation) | |||
HARDWARE_INTRINSIC(Sve, GatherVectorByteZeroExtend, -1, -1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_ld1b, INS_sve_ld1b, INS_sve_ld1b, INS_sve_ld1b, INS_invalid, INS_invalid}, HW_Category_MemoryLoad, HW_Flag_Scalable|HW_Flag_SpecialCodeGen|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation) | |||
HARDWARE_INTRINSIC(Sve, GatherVectorFirstFaulting, -1, 3, true, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_ldff1w, INS_sve_ldff1w, INS_sve_ldff1d, INS_sve_ldff1d, INS_sve_ldff1w, INS_sve_ldff1d}, HW_Category_MemoryLoad, HW_Flag_Scalable|HW_Flag_SpecialCodeGen|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation|HW_Flag_SpecialSideEffectMask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this have SpecialEffects
flag? Having memoryload should be good enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It causes a side-effect. If I do not have that flag on there, this API can get dead-code eliminated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is different for this API from other API? Do you have an example where this gets dead-code eliminated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, it's simple to write something that will be dead-code eliminated:
void SomeMethod()
{
var notUsingResult = GatherVectorFirstFaulting*; // or LoadVectorFirstFaulting
}
@@ -136,6 +145,7 @@ HARDWARE_INTRINSIC(Sve, LoadVectorByteZeroExtendToInt64, | |||
HARDWARE_INTRINSIC(Sve, LoadVectorByteZeroExtendToUInt16, -1, 2, false, {INS_invalid, INS_invalid, INS_invalid, INS_sve_ld1b, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_MemoryLoad, HW_Flag_Scalable|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation) | |||
HARDWARE_INTRINSIC(Sve, LoadVectorByteZeroExtendToUInt32, -1, 2, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_ld1b, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_MemoryLoad, HW_Flag_Scalable|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation) | |||
HARDWARE_INTRINSIC(Sve, LoadVectorByteZeroExtendToUInt64, -1, 2, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_ld1b, INS_invalid, INS_invalid}, HW_Category_MemoryLoad, HW_Flag_Scalable|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation) | |||
HARDWARE_INTRINSIC(Sve, LoadVectorFirstFaulting, -1, 2, true, {INS_sve_ldff1b, INS_sve_ldff1b, INS_sve_ldff1h, INS_sve_ldff1h, INS_sve_ldff1w, INS_sve_ldff1w, INS_sve_ldff1d, INS_sve_ldff1d, INS_sve_ldff1w, INS_sve_ldff1d}, HW_Category_MemoryLoad, HW_Flag_Scalable|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation|HW_Flag_SpecialCodeGen|HW_Flag_SpecialSideEffectMask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here. why do we need SpecialEffect
flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this API causes a side-effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so the MemoryLoad is going to force it to be a global ref, so it should take care of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In liveness, there is this logic:
#ifdef FEATURE_HW_INTRINSICS
case GT_HWINTRINSIC:
{
GenTreeHWIntrinsic* hwintrinsic = node->AsHWIntrinsic();
NamedIntrinsic intrinsicId = hwintrinsic->GetHWIntrinsicId();
if (hwintrinsic->OperIsMemoryStore())
{
// Never remove these nodes, as they are always side-effecting.
break;
}
else if (HWIntrinsicInfo::HasSpecialSideEffect(intrinsicId))
{
// Never remove these nodes, as they are always side-effecting
// or have a behavioral semantic that is undesirable to remove
break;
}
fgTryRemoveNonLocal(node, &blockRange);
break;
}
#endif // FEATURE_HW_INTRINSICS
So, this is what I believe I encountered without the flag. The nodes would get removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handling of GatherVectorFirstFaulting
should be exactly similar to GatherVectorByteZeroExtend
or any other GatherVector*
APIs in terms of liveness, etc. Can you double check why other APIs are not removed while this one is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FirstFaulting is side-effectful because it sets the FFR register. The non-FirstFaulting APIs do not do this, therefore, we don't care if they get dead-code eliminated.
src/tests/JIT/HardwareIntrinsics/Arm/Shared/SveGatherVectorFirstFaultingIndices.template
Show resolved
Hide resolved
|
||
public void RunStructLclFldScenario() | ||
{ | ||
TestLibrary.TestFramework.BeginScenario(nameof(RunStructLclFldScenario)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is different in this template that other templates do not have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to have a new validation method for first-faulting and I also need to initialize the inBounded
.
If we really want to start re-using stuff from other templates without all the duplication, we shouldn't continue to write tests this way. We should try to extract more and more stuff out into helpers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Knowing the original template that was referenced for creating this one helps me in diffing to spot what was the new addition that was added in the new template.
break; | ||
} | ||
} | ||
var succeeded = Helpers.CheckGatherVectorBehavior<{RetBaseType}, {ExtendedElementType}, {Op3BaseType}>(firstOp, secondOp, thirdOp, result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was there anything we were missing for this test, or is the change just trying to reuse the new logic that is written in Helpers.cs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-using the logic. If we can extract out how we validate/check the APIs in the helpers, we should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Please run all the tests (with stress) that uses this template and confirm that they all pass.
I don't disagree, the user can always convert it to the |
@TIHan - I have pushed a change to handle the intrinsics correctly with FFR register handling. Please validate the tests with these changes, specifically around |
return TFault.One; | ||
} | ||
|
||
var index = int.CreateChecked(indices[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this validation work? I was hoping that we check if the address that we want to read for ith
lane is invalid and then we would say hasFaulted
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is not checking the bases
which contain the addresses, the CheckGatherVectorBasesFirstFaultingBehavior
is the one that looks at the address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you point me to the line of code that does it?
return (maskResult != T.Zero) ? result : falseResult; | ||
} | ||
|
||
private static T ConditionalSelectTrueResult<T>(T maskResult, T result, T trueResult) where T : INumberBase<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its ok to add this in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a follow-up PR would be fine. We should do this for all test cases that use ConditionalSelectTrueResult.
|
||
Vector<{VectorBaseType}> loadMask = Sve.CreateTrueMask{VectorBaseType}(SveMaskPattern.All); | ||
|
||
Sve.SetFfr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return TFault.One; | ||
} | ||
|
||
var index = int.CreateChecked(indices[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is not checking the bases
which contain the addresses, the CheckGatherVectorBasesFirstFaultingBehavior
is the one that looks at the address.
return (maskResult != T.Zero) ? result : falseResult; | ||
} | ||
|
||
private static T ConditionalSelectTrueResult<T>(T maskResult, T result, T trueResult) where T : INumberBase<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a follow-up PR would be fine. We should do this for all test cases that use ConditionalSelectTrueResult.
Yes, for the FirstFaulting and probably
I actually went ahead and did the validation. Spotted few issues that I pushed on this PR. All tests are passing - https://gist.github.com/kunalspathak/ad873d6b11f8e9959c3ecdf1df992e85 |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
/ba-g Build Analysis seems to be stuck |
This was merged with number of failures that are clearly caused by the PR. For example:
in number of libraries legs. I am reverting the change. |
I see this is related to the BoundedMemory Unix changes. I think we should just have a separate impl for the FirstFaulting tests to reduce risk of other tests failing. |
Contributes to #99957
Adds