-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Java Behaviour w.r.t Math.max and Math.min for Floating Points #20185
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1367,6 +1367,11 @@ bool J9::RecognizedCallTransformer::isInlineable(TR::TreeTop* treetop) | |
case TR::java_lang_Math_max_L: | ||
case TR::java_lang_Math_min_L: | ||
return !comp()->getOption(TR_DisableMaxMinOptimization); | ||
case TR::java_lang_Math_max_F: | ||
case TR::java_lang_Math_min_F: | ||
case TR::java_lang_Math_max_D: | ||
case TR::java_lang_Math_min_D: | ||
return !comp()->getOption(TR_DisableMaxMinOptimization) && cg()->getSupportsInlineMath_MaxMin_FD(); | ||
case TR::java_lang_Math_multiplyHigh: | ||
return cg()->getSupportsLMulHigh(); | ||
case TR::java_lang_StringUTF16_toBytes: | ||
|
@@ -1495,6 +1500,18 @@ void J9::RecognizedCallTransformer::transform(TR::TreeTop* treetop) | |
case TR::java_lang_Math_min_L: | ||
processIntrinsicFunction(treetop, node, TR::lmin); | ||
break; | ||
case TR::java_lang_Math_max_F: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does other platform supports There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it should only transform it when the option Other platforms define their own There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
My comment was more in relationship with floating point max and min operation. All platforms (Including Z) had such floating point max and min evaluators but they were not recognizing floating point max and min operation, as those methods prior to this change was not checked in recognized call transformation. With this change, we would recognized java method, and then will convert the call to fmax/dmax/fmin/dmin node on all platform. I do not know if the code we generate on other platform would be OK with Java specification or not. This is something we should verify. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it looks like the current behaviour for Semeru on x86 and Power is the same as Z (the normal Java spec). testing the inlined versions now to see it they are different |
||
processIntrinsicFunction(treetop, node, TR::fmax); | ||
break; | ||
case TR::java_lang_Math_min_F: | ||
processIntrinsicFunction(treetop, node, TR::fmin); | ||
break; | ||
case TR::java_lang_Math_max_D: | ||
processIntrinsicFunction(treetop, node, TR::dmax); | ||
break; | ||
case TR::java_lang_Math_min_D: | ||
processIntrinsicFunction(treetop, node, TR::dmin); | ||
break; | ||
case TR::java_lang_Math_multiplyHigh: | ||
processIntrinsicFunction(treetop, node, TR::lmulh); | ||
break; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -906,76 +906,48 @@ allocateWriteBarrierInternalPointerRegister(TR::CodeGenerator * cg, TR::Node * s | |
} | ||
|
||
|
||
extern TR::Register * | ||
doubleMaxMinHelper(TR::Node *node, TR::CodeGenerator *cg, bool isMaxOp) | ||
TR::Register * | ||
J9::Z::TreeEvaluator::dmaxEvaluator(TR::Node * node, TR::CodeGenerator * cg) | ||
{ | ||
TR_ASSERT(node->getNumChildren() >= 1 || node->getNumChildren() <= 2, "node has incorrect number of children"); | ||
|
||
/* ===================== Allocating Registers ===================== */ | ||
|
||
TR::Register * v16 = cg->allocateRegister(TR_VRF); | ||
TR::Register * v17 = cg->allocateRegister(TR_VRF); | ||
TR::Register * v18 = cg->allocateRegister(TR_VRF); | ||
|
||
/* ===================== Generating instructions ===================== */ | ||
|
||
/* ====== LD FPR0,16(GPR5) Load a ====== */ | ||
TR::Register * v0 = cg->fprClobberEvaluate(node->getFirstChild()); | ||
|
||
/* ====== LD FPR2, 0(GPR5) Load b ====== */ | ||
TR::Register * v2 = cg->evaluate(node->getSecondChild()); | ||
|
||
/* ====== WFTCIDB V16,V0,X'F' a == NaN ====== */ | ||
generateVRIeInstruction(cg, TR::InstOpCode::VFTCI, node, v16, v0, 0xF, 8, 3); | ||
|
||
/* ====== For Max: WFCHE V17,V0,V2 Compare a >= b ====== */ | ||
if(isMaxOp) | ||
if (cg->getSupportsVectorRegisters()) | ||
{ | ||
generateVRRcInstruction(cg, TR::InstOpCode::VFCH, node, v17, v0, v2, 0, 8, 3); | ||
cg->generateDebugCounter("z13/simd/doubleMax", 1, TR::DebugCounter::Free); | ||
return OMR::Z::TreeEvaluator::fpMinMaxVectorHelper(node, cg); | ||
} | ||
/* ====== For Min: WFCHE V17,V0,V2 Compare a <= b ====== */ | ||
else | ||
return OMR::Z::TreeEvaluator::xmaxxminHelper(node, cg); | ||
} | ||
|
||
TR::Register * | ||
J9::Z::TreeEvaluator::dminEvaluator(TR::Node * node, TR::CodeGenerator * cg) | ||
{ | ||
if (cg->getSupportsVectorRegisters()) | ||
{ | ||
generateVRRcInstruction(cg, TR::InstOpCode::VFCH, node, v17, v2, v0, 0, 8, 3); | ||
cg->generateDebugCounter("z13/simd/doubleMin", 1, TR::DebugCounter::Free); | ||
return OMR::Z::TreeEvaluator::fpMinMaxVectorHelper(node, cg); | ||
} | ||
return OMR::Z::TreeEvaluator::xmaxxminHelper(node, cg); | ||
} | ||
|
||
/* ====== VO V16,V16,V17 (a >= b) || (a == NaN) ====== */ | ||
generateVRRcInstruction(cg, TR::InstOpCode::VO, node, v16, v16, v17, 0, 0, 0); | ||
|
||
/* ====== For Max: WFTCIDB V17,V0,X'800' a == +0 ====== */ | ||
if(isMaxOp) | ||
{ | ||
generateVRIeInstruction(cg, TR::InstOpCode::VFTCI, node, v17, v0, 0x800, 8, 3); | ||
} | ||
/* ====== For Min: WFTCIDB V17,V0,X'400' a == -0 ====== */ | ||
else | ||
{ | ||
generateVRIeInstruction(cg, TR::InstOpCode::VFTCI, node, v17, v0, 0x400, 8, 3); | ||
} | ||
/* ====== WFTCIDB V18,V2,X'C00' b == 0 ====== */ | ||
generateVRIeInstruction(cg, TR::InstOpCode::VFTCI, node, v18, v2, 0xC00, 8, 3); | ||
|
||
/* ====== VN V17,V17,V18 (a == -0) && (b == 0) ====== */ | ||
generateVRRcInstruction(cg, TR::InstOpCode::VN, node, v17, v17, v18, 0, 0, 0); | ||
|
||
/* ====== VO V16,V16,V17 (a >= b) || (a == NaN) || ((a == -0) && (b == 0)) ====== */ | ||
generateVRRcInstruction(cg, TR::InstOpCode::VO, node, v16, v16, v17, 0, 0, 0); | ||
|
||
/* ====== VSEL V0,V0,V2,V16 ====== */ | ||
generateVRReInstruction(cg, TR::InstOpCode::VSEL, node, v0, v0, v2, v16); | ||
|
||
/* ===================== Deallocating Registers ===================== */ | ||
cg->stopUsingRegister(v2); | ||
cg->stopUsingRegister(v16); | ||
cg->stopUsingRegister(v17); | ||
cg->stopUsingRegister(v18); | ||
|
||
node->setRegister(v0); | ||
|
||
cg->decReferenceCount(node->getFirstChild()); | ||
cg->decReferenceCount(node->getSecondChild()); | ||
TR::Register * | ||
J9::Z::TreeEvaluator::fmaxEvaluator(TR::Node * node, TR::CodeGenerator * cg) | ||
{ | ||
if (cg->getSupportsVectorRegisters()) | ||
{ | ||
cg->generateDebugCounter("z13/simd/floatMax", 1, TR::DebugCounter::Free); | ||
return OMR::Z::TreeEvaluator::fpMinMaxVectorHelper(node, cg); | ||
} | ||
return OMR::Z::TreeEvaluator::xmaxxminHelper(node, cg); | ||
} | ||
|
||
return node->getRegister(); | ||
TR::Register * | ||
J9::Z::TreeEvaluator::fminEvaluator(TR::Node * node, TR::CodeGenerator * cg) | ||
{ | ||
if (cg->getSupportsVectorRegisters()) | ||
{ | ||
cg->generateDebugCounter("z13/simd/floatMin", 1, TR::DebugCounter::Free); | ||
return OMR::Z::TreeEvaluator::fpMinMaxVectorHelper(node, cg); | ||
} | ||
return OMR::Z::TreeEvaluator::xmaxxminHelper(node, cg); | ||
} | ||
|
||
TR::Register* | ||
|
@@ -2945,19 +2917,7 @@ J9::Z::TreeEvaluator::toLowerIntrinsic(TR::Node *node, TR::CodeGenerator *cg, bo | |
return caseConversionHelper(node, cg, false, isCompressedString); | ||
} | ||
|
||
TR::Register* | ||
J9::Z::TreeEvaluator::inlineDoubleMax(TR::Node *node, TR::CodeGenerator *cg) | ||
{ | ||
cg->generateDebugCounter("z13/simd/doubleMax", 1, TR::DebugCounter::Free); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Original exploitation of SIMD was generating a debug counter - Your refactoring has removed it I think. |
||
return doubleMaxMinHelper(node, cg, true); | ||
} | ||
|
||
TR::Register* | ||
J9::Z::TreeEvaluator::inlineDoubleMin(TR::Node *node, TR::CodeGenerator *cg) | ||
{ | ||
cg->generateDebugCounter("z13/simd/doubleMin", 1, TR::DebugCounter::Free); | ||
return doubleMaxMinHelper(node, cg, false); | ||
} | ||
|
||
TR::Register * | ||
J9::Z::TreeEvaluator::inlineMathFma(TR::Node *node, TR::CodeGenerator *cg) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@r30shah I confirmed that with this change the min/max calls will only be replaced on Z (so no crash on x86), but that means that other platforms that do define the fmin/fmax/dmin/dmax opcodes will still not use them.
Is this a separate issue, or should the change for the other platforms go in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming, you tried your test to see if it handles all the case on x86 platform, can you share what was your observation? i.e, which test case failed with inlining floating point min and max on x86 or any other platform you saw this failing ?
I think we can create separate issue for that, but like to know specifics first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get this without the change
same for the other opcodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I am ok with opening up separate issue for X not supporting max and min, FYI @hzongaro.
@matthewhall2 regarding to other platform, does your unit test pass (On P and aarch64) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It passed on P.
However looking at aarch64, it appears that the quiet bit may be automatically set by the cpu. It failed some (but not all) of the tests with
-Xjit:count=1
(and count=0) set (aarch64 test). Looking into itThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matthewhall2 @r30shah i have this item in P's backlog: https://github.ibm.com/runtimes/openj9-jit-power/issues/416
will this PR end up with code of comparable performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes in this PR enables recognition of Math.max and min for floating point types with transformation to
fmax/min, dmax/min
. It hasn't made any changes to code-gen evaluator and looking at the P code for max/min [1], and if I have understood correctly, the limitation you have noted in the issue from previous comment with +0/-0 and NaNs should still be there.@matthewhall2 would it be possible to share what code P generates with this change, as I am wondering, how does P passes your unit tests.
[1]. https://github.com/eclipse/omr/blob/870b1c9be57147290e8ced68c40a9096290c753d/compiler/p/codegen/ControlFlowEvaluator.cpp#L4561
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that code is not active currently (re min/max_FD), due to client's complaint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@r30shah looks like my test didn't inline. It fails on P with inlining enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, We should add P as well to the issue for fixing and enabling min/maxFD on other platform as recommended in #20185 (comment) So please open an issue for that.
The change in its current state (after you addressed concern and added the check for Code-Gen Flag, It should only be inlining it on Z. So we should be good.