Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LV][EVL] Support call instruction with EVL-vectorization #110412

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

LiqinWeng
Copy link
Contributor

No description provided.

@llvmbot
Copy link
Member

llvmbot commented Sep 29, 2024

@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-llvm-transforms

Author: LiqinWeng (LiqinWeng)

Changes

Patch is 32.18 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110412.diff

13 Files Affected:

  • (modified) llvm/include/llvm/Analysis/VectorUtils.h (+6)
  • (modified) llvm/include/llvm/IR/VectorBuilder.h (+2-2)
  • (modified) llvm/lib/Analysis/VectorUtils.cpp (+7)
  • (modified) llvm/lib/IR/VectorBuilder.cpp (+4-5)
  • (modified) llvm/lib/Transforms/Utils/LoopUtils.cpp (+2-2)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+1-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+96-4)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+81)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+11-3)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+2-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp (+3)
  • (added) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-call-intrinsics.ll (+223)
diff --git a/llvm/include/llvm/Analysis/VectorUtils.h b/llvm/include/llvm/Analysis/VectorUtils.h
index e2dd4976f39065..4c636a073a2c0a 100644
--- a/llvm/include/llvm/Analysis/VectorUtils.h
+++ b/llvm/include/llvm/Analysis/VectorUtils.h
@@ -160,6 +160,12 @@ bool isVectorIntrinsicWithOverloadTypeAtArg(Intrinsic::ID ID, int OpdIdx);
 Intrinsic::ID getVectorIntrinsicIDForCall(const CallInst *CI,
                                           const TargetLibraryInfo *TLI);
 
+/// Returns VP intrinsic ID for call.
+/// For the input call instruction it finds mapping intrinsic and returns
+/// its intrinsic ID, in case it does not found it return not_intrinsic.
+Intrinsic::ID getVPIntrinsicIDForCall(const CallInst *CI,
+                                      const TargetLibraryInfo *TLI);
+
 /// Given a vector and an element number, see if the scalar value is
 /// already around as a register, for example if it were inserted then extracted
 /// from the vector.
diff --git a/llvm/include/llvm/IR/VectorBuilder.h b/llvm/include/llvm/IR/VectorBuilder.h
index b0277c2b52595e..da1f48e7b6047e 100644
--- a/llvm/include/llvm/IR/VectorBuilder.h
+++ b/llvm/include/llvm/IR/VectorBuilder.h
@@ -99,11 +99,11 @@ class VectorBuilder {
                                  const Twine &Name = Twine());
 
   /// Emit a VP reduction intrinsic call for recurrence kind.
-  /// \param RdxID       The intrinsic ID of llvm.vector.reduce.*
+  /// \param ID           The intrinsic ID of Call Intrinsic
   /// \param ValTy       The type of operand which the reduction operation is
   ///                    performed.
   /// \param VecOpArray  The operand list.
-  Value *createSimpleReduction(Intrinsic::ID RdxID, Type *ValTy,
+  Value *createSimpleIntrinsic(Intrinsic::ID ID, Type *ValTy,
                                ArrayRef<Value *> VecOpArray,
                                const Twine &Name = Twine());
 };
diff --git a/llvm/lib/Analysis/VectorUtils.cpp b/llvm/lib/Analysis/VectorUtils.cpp
index dbffbb8a5f81d9..d827a83e5368e3 100644
--- a/llvm/lib/Analysis/VectorUtils.cpp
+++ b/llvm/lib/Analysis/VectorUtils.cpp
@@ -169,6 +169,13 @@ Intrinsic::ID llvm::getVectorIntrinsicIDForCall(const CallInst *CI,
   return Intrinsic::not_intrinsic;
 }
 
+Intrinsic::ID llvm::getVPIntrinsicIDForCall(const CallInst *CI,
+                                            const TargetLibraryInfo *TLI) {
+  Intrinsic::ID ID = getIntrinsicForCallSite(*CI, TLI);
+
+  return VPIntrinsic::getForIntrinsic(ID);
+}
+
 /// Given a vector and an element number, see if the scalar value is
 /// already around as a register, for example if it were inserted then extracted
 /// from the vector.
diff --git a/llvm/lib/IR/VectorBuilder.cpp b/llvm/lib/IR/VectorBuilder.cpp
index f42948ba89042f..84647741f11a1e 100644
--- a/llvm/lib/IR/VectorBuilder.cpp
+++ b/llvm/lib/IR/VectorBuilder.cpp
@@ -60,13 +60,12 @@ Value *VectorBuilder::createVectorInstruction(unsigned Opcode, Type *ReturnTy,
   return createVectorInstructionImpl(VPID, ReturnTy, InstOpArray, Name);
 }
 
-Value *VectorBuilder::createSimpleReduction(Intrinsic::ID RdxID,
-                                            Type *ValTy,
+Value *VectorBuilder::createSimpleIntrinsic(Intrinsic::ID ID, Type *ValTy,
                                             ArrayRef<Value *> InstOpArray,
                                             const Twine &Name) {
-  auto VPID = VPIntrinsic::getForIntrinsic(RdxID);
-  assert(VPReductionIntrinsic::isVPReduction(VPID) &&
-         "No VPIntrinsic for this reduction");
+  auto VPID = VPIntrinsic::getForIntrinsic(ID);
+  assert(VPIntrinsic::isVPIntrinsic(VPID) &&
+         "No VPIntrinsic for this Intrinsic");
   return createVectorInstructionImpl(VPID, ValTy, InstOpArray, Name);
 }
 
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 9a4289e1a30da0..a09ba1dee51a8e 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -1299,7 +1299,7 @@ Value *llvm::createSimpleReduction(VectorBuilder &VBuilder, Value *Src,
   Type *SrcEltTy = SrcTy->getElementType();
   Value *Iden = getRecurrenceIdentity(Kind, SrcEltTy, Desc.getFastMathFlags());
   Value *Ops[] = {Iden, Src};
-  return VBuilder.createSimpleReduction(Id, SrcTy, Ops);
+  return VBuilder.createSimpleIntrinsic(Id, SrcTy, Ops);
 }
 
 Value *llvm::createReduction(IRBuilderBase &B,
@@ -1342,7 +1342,7 @@ Value *llvm::createOrderedReduction(VectorBuilder &VBuilder,
   Intrinsic::ID Id = getReductionIntrinsicID(RecurKind::FAdd);
   auto *SrcTy = cast<VectorType>(Src->getType());
   Value *Ops[] = {Start, Src};
-  return VBuilder.createSimpleReduction(Id, SrcTy, Ops);
+  return VBuilder.createSimpleIntrinsic(Id, SrcTy, Ops);
 }
 
 void llvm::propagateIRFlags(Value *I, ArrayRef<Value *> VL, Value *OpValue,
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 17c7bc55c4b928..5caae0732c8b05 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8635,7 +8635,7 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
       // TODO: try to put it close to addActiveLaneMask().
       // Discard the plan if it is not EVL-compatible
       if (CM.foldTailWithEVL() &&
-          !VPlanTransforms::tryAddExplicitVectorLength(*Plan))
+          !VPlanTransforms::tryAddExplicitVectorLength(*Plan, *TLI))
         break;
       assert(verifyVPlanIsValid(*Plan) && "VPlan is invalid");
       VPlans.push_back(std::move(Plan));
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index c4567362eaffc7..6b22c333b6c6a9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -883,6 +883,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
     case VPRecipeBase::VPScalarIVStepsSC:
     case VPRecipeBase::VPVectorPointerSC:
     case VPRecipeBase::VPWidenCallSC:
+    case VPRecipeBase::VPWidenCallEVLSC:
     case VPRecipeBase::VPWidenCanonicalIVSC:
     case VPRecipeBase::VPWidenCastSC:
     case VPRecipeBase::VPWidenGEPSC:
@@ -1610,6 +1611,7 @@ class VPScalarCastRecipe : public VPSingleDefRecipe {
 
 /// A recipe for widening Call instructions.
 class VPWidenCallRecipe : public VPSingleDefRecipe {
+public:
   /// ID of the vector intrinsic to call when widening the call. If set the
   /// Intrinsic::not_intrinsic, a library call will be used instead.
   Intrinsic::ID VectorIntrinsicID;
@@ -1619,26 +1621,48 @@ class VPWidenCallRecipe : public VPSingleDefRecipe {
   /// VF with a valid variant.
   Function *Variant;
 
-public:
+protected:
   template <typename IterT>
-  VPWidenCallRecipe(Value *UV, iterator_range<IterT> CallArguments,
+  VPWidenCallRecipe(unsigned VPDefOpcode, Value *UV,
+                    iterator_range<IterT> CallArguments,
                     Intrinsic::ID VectorIntrinsicID, DebugLoc DL = {},
                     Function *Variant = nullptr)
-      : VPSingleDefRecipe(VPDef::VPWidenCallSC, CallArguments, UV, DL),
+      : VPSingleDefRecipe(VPDefOpcode, CallArguments, UV, DL),
         VectorIntrinsicID(VectorIntrinsicID), Variant(Variant) {
     assert(
         isa<Function>(getOperand(getNumOperands() - 1)->getLiveInIRValue()) &&
         "last operand must be the called function");
   }
 
+public:
+  template <typename IterT>
+  VPWidenCallRecipe(Value *UV, iterator_range<IterT> CallArguments,
+                    Intrinsic::ID VectorIntrinsicID, DebugLoc DL)
+      : VPWidenCallRecipe(VPDef::VPWidenCallSC, UV, CallArguments,
+                          VectorIntrinsicID, DL) {}
+
+  template <typename IterT>
+  VPWidenCallRecipe(Value *UV, iterator_range<IterT> CallArguments,
+                    Intrinsic::ID VectorIntrinsicID, DebugLoc DL,
+                    Function *Variant)
+      : VPWidenCallRecipe(VPDef::VPWidenCallSC, UV, CallArguments,
+                          VectorIntrinsicID, DL, Variant) {}
+
   ~VPWidenCallRecipe() override = default;
 
   VPWidenCallRecipe *clone() override {
     return new VPWidenCallRecipe(getUnderlyingValue(), operands(),
                                  VectorIntrinsicID, getDebugLoc(), Variant);
   }
+  static inline bool classof(const VPRecipeBase *R) {
+    return R->getVPDefID() == VPRecipeBase::VPWidenCallSC ||
+           R->getVPDefID() == VPRecipeBase::VPWidenCallEVLSC;
+  }
 
-  VP_CLASSOF_IMPL(VPDef::VPWidenCallSC)
+  static inline bool classof(const VPUser *U) {
+    auto *R = dyn_cast<VPRecipeBase>(U);
+    return R && classof(R);
+  }
 
   /// Produce a widened version of the call instruction.
   void execute(VPTransformState &State) override;
@@ -1665,6 +1689,74 @@ class VPWidenCallRecipe : public VPSingleDefRecipe {
 #endif
 };
 
+/// A recipe for widening Call instructions  with vector-predication intrinsics
+/// with explicit vector length (EVL).
+class VPWidenCallEVLRecipe : public VPWidenCallRecipe {
+  // using VPRecipeWithIRFlags::transferFlags;
+  // Intrinsic::ID VectorIntrinsicID;
+
+public:
+  template <typename IterT>
+  VPWidenCallEVLRecipe(Value *UV, iterator_range<IterT> CallArguments,
+                       Intrinsic::ID VectorIntrinsicID, DebugLoc DL,
+                       VPValue &EVL)
+      : VPWidenCallRecipe(VPDef::VPWidenCallEVLSC, UV, CallArguments,
+                          VectorIntrinsicID, DL) {
+    addOperand(&EVL);
+  }
+
+  VPWidenCallEVLRecipe(VPWidenCallRecipe &W, Intrinsic::ID VectorIntrinsicID,
+                       DebugLoc DL, VPValue &EVL)
+      : VPWidenCallEVLRecipe(W.getUnderlyingValue(), W.operands(),
+                             VectorIntrinsicID, DL, EVL) {}
+
+  ~VPWidenCallEVLRecipe() override = default;
+
+  VPWidenCallEVLRecipe *clone() override {
+    llvm_unreachable("VPWidenCallEVLRecipe cannot be cloned");
+    return nullptr;
+  }
+
+  VPValue *getEVL() { return getOperand(getNumOperands() - 1); }
+  const VPValue *getEVL() const { return getOperand(getNumOperands() - 1); }
+
+  // Intrinsic::ID getVectorIntrinsicID() {
+  //   return VectorIntrinsicID;
+  // }
+
+  VP_CLASSOF_IMPL(VPDef::VPWidenCallEVLSC)
+
+  InstructionCost computeCost(ElementCount VF, VPCostContext &Ctx) const final;
+
+  Function *getCalledScalarFunction() const {
+    return cast<Function>(getOperand(getNumOperands() - 2)->getLiveInIRValue());
+  }
+
+  operand_range arg_operands() {
+    return make_range(op_begin(), op_begin() + getNumOperands() - 2);
+  }
+  const_operand_range arg_operands() const {
+    return make_range(op_begin(), op_begin() + getNumOperands() - 2);
+  }
+  /// Produce a widened version of the call instruction.
+  void execute(VPTransformState &State) final;
+
+  /// Returns true if the recipe only uses the first lane of operand \p Op.
+  bool onlyFirstLaneUsed(const VPValue *Op) const override {
+    assert(is_contained(operands(), Op) &&
+           "Op must be an operand of the recipe");
+    // EVL in that recipe is always the last operand, thus any use before means
+    // the VPValue should be vectorized.
+    return getEVL() == Op;
+  }
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,
+             VPSlotTracker &SlotTracker) const final;
+#endif
+};
+
 /// A recipe representing a sequence of load -> update -> store as part of
 /// a histogram operation. This means there may be aliasing between vector
 /// lanes, which is handled by the llvm.experimental.vector.histogram family
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 75908638532950..cd665a8aa546a2 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -76,6 +76,7 @@ bool VPRecipeBase::mayWriteToMemory() const {
     return cast<Instruction>(getVPSingleValue()->getUnderlyingValue())
         ->mayWriteToMemory();
   case VPWidenCallSC:
+    // case VPWidenCallEVLSC:
     return !cast<VPWidenCallRecipe>(this)
                 ->getCalledScalarFunction()
                 ->onlyReadsMemory();
@@ -117,6 +118,7 @@ bool VPRecipeBase::mayReadFromMemory() const {
     return cast<Instruction>(getVPSingleValue()->getUnderlyingValue())
         ->mayReadFromMemory();
   case VPWidenCallSC:
+    // case VPWidenCallEVLSC:
     return !cast<VPWidenCallRecipe>(this)
                 ->getCalledScalarFunction()
                 ->onlyWritesMemory();
@@ -158,6 +160,7 @@ bool VPRecipeBase::mayHaveSideEffects() const {
   case VPInstructionSC:
     return mayWriteToMemory();
   case VPWidenCallSC: {
+    // case VPWidenCallEVLSC: {
     Function *Fn = cast<VPWidenCallRecipe>(this)->getCalledScalarFunction();
     return mayWriteToMemory() || !Fn->doesNotThrow() || !Fn->willReturn();
   }
@@ -951,6 +954,52 @@ void VPWidenCallRecipe::execute(VPTransformState &State) {
   State.addMetadata(V, CI);
 }
 
+void VPWidenCallEVLRecipe::execute(VPTransformState &State) {
+  Function *CalledScalarFn = getCalledScalarFunction();
+  assert(!isDbgInfoIntrinsic(CalledScalarFn->getIntrinsicID()) &&
+         "DbgInfoIntrinsic should have been dropped during VPlan construction");
+  State.setDebugLocFrom(getDebugLoc());
+
+  bool UseIntrinsic = VectorIntrinsicID != Intrinsic::not_intrinsic;
+
+  // TODO: more intrinsics to support , Now only support the
+  // llvm.smax/llvm.smin/llvm.umax/llvm.umin
+  auto *TysForDecl = VectorType::get(
+      CalledScalarFn->getReturnType()->getScalarType(), State.VF);
+
+  SmallVector<Value *, 4> Args;
+  for (const auto &I : enumerate(arg_operands())) {
+    Value *Arg;
+    if (UseIntrinsic && isVectorIntrinsicWithScalarOpAtArg(
+                            CalledScalarFn->getIntrinsicID(), I.index()))
+      Arg = State.get(I.value(), VPLane(0));
+    else
+      Arg = State.get(I.value());
+    Args.push_back(Arg);
+  }
+
+  IRBuilderBase &BuilderIR = State.Builder;
+  VectorBuilder VBuilder(BuilderIR);
+  Value *Mask = BuilderIR.CreateVectorSplat(State.VF, BuilderIR.getTrue());
+  VBuilder.setMask(Mask).setEVL(State.get(getEVL(), /*NeedsScalar=*/true));
+
+  auto VPInst = VBuilder.createSimpleIntrinsic(VectorIntrinsicID, TysForDecl,
+                                               Args, "vp.call");
+  // FIXME: IR/Recipe/EVLRecipe has same the flags. Can copy from IR?
+  if (VPInst) {
+    if (auto *VecOp = dyn_cast<CallInst>(VPInst))
+      VecOp->copyIRFlags(getUnderlyingInstr());
+  }
+
+  auto *CI = cast_or_null<CallInst>(getUnderlyingInstr());
+  SmallVector<OperandBundleDef, 1> OpBundles;
+  if (CI)
+    CI->getOperandBundlesAsDefs(OpBundles);
+
+  State.set(this, VPInst);
+  State.addMetadata(VPInst, CI);
+}
+
 InstructionCost VPWidenCallRecipe::computeCost(ElementCount VF,
                                                VPCostContext &Ctx) const {
   TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
@@ -998,6 +1047,12 @@ InstructionCost VPWidenCallRecipe::computeCost(ElementCount VF,
   return Ctx.TTI.getIntrinsicInstrCost(CostAttrs, CostKind);
 }
 
+// TODO: Reimplement of the computeCost
+InstructionCost VPWidenCallEVLRecipe::computeCost(ElementCount VF,
+                                                  VPCostContext &Ctx) const {
+  return VPRecipeBase::computeCost(VF, Ctx);
+}
+
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
 void VPWidenCallRecipe::print(raw_ostream &O, const Twine &Indent,
                               VPSlotTracker &SlotTracker) const {
@@ -1115,6 +1170,32 @@ void VPHistogramRecipe::print(raw_ostream &O, const Twine &Indent,
   }
 }
 
+void VPWidenCallEVLRecipe::print(raw_ostream &O, const Twine &Indent,
+                                 VPSlotTracker &SlotTracker) const {
+  O << Indent << "WIDEN-CALL ";
+
+  Function *CalledFn = getCalledScalarFunction();
+  if (CalledFn->getReturnType()->isVoidTy())
+    O << "void ";
+  else {
+    printAsOperand(O, SlotTracker);
+    O << " = ";
+  }
+
+  O << "vp.call @" << CalledFn->getName() << "(";
+  interleaveComma(arg_operands(), O, [&O, &SlotTracker](VPValue *Op) {
+    Op->printAsOperand(O, SlotTracker);
+  });
+  O << ")";
+
+  if (VectorIntrinsicID)
+    O << " (using vector intrinsic)";
+  else {
+    O << " (using library function";
+    O << ")";
+  }
+}
+
 void VPWidenSelectRecipe::print(raw_ostream &O, const Twine &Indent,
                                 VPSlotTracker &SlotTracker) const {
   O << Indent << "WIDEN-SELECT ";
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index a878613c4ba483..afb0b7f740aa5a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1350,7 +1350,8 @@ void VPlanTransforms::addActiveLaneMask(
 }
 
 /// Replace recipes with their EVL variants.
-static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
+static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL,
+                                         const TargetLibraryInfo &TLI) {
   SmallVector<VPValue *> HeaderMasks = collectAllHeaderMasks(Plan);
   for (VPValue *HeaderMask : collectAllHeaderMasks(Plan)) {
     for (VPUser *U : collectUsersRecursively(HeaderMask)) {
@@ -1379,6 +1380,12 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
                   return nullptr;
                 return new VPWidenEVLRecipe(*W, EVL);
               })
+              .Case<VPWidenCallRecipe>([&](VPWidenCallRecipe *W) {
+                auto *CI = cast<CallInst>(W->getUnderlyingInstr());
+                Intrinsic::ID VPID = getVPIntrinsicIDForCall(CI, &TLI);
+                return new VPWidenCallEVLRecipe(*W, VPID, CI->getDebugLoc(),
+                                                EVL);
+              })
               .Case<VPReductionRecipe>([&](VPReductionRecipe *Red) {
                 VPValue *NewMask = GetNewMask(Red->getCondOp());
                 return new VPReductionEVLRecipe(*Red, EVL, NewMask);
@@ -1429,7 +1436,8 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
 /// %NextEVLIV = add IVSize (cast i32 %VPEVVL to IVSize), %EVLPhi
 /// ...
 ///
-bool VPlanTransforms::tryAddExplicitVectorLength(VPlan &Plan) {
+bool VPlanTransforms::tryAddExplicitVectorLength(VPlan &Plan,
+                                                 const TargetLibraryInfo &TLI) {
   VPBasicBlock *Header = Plan.getVectorLoopRegion()->getEntryBasicBlock();
   // The transform updates all users of inductions to work based on EVL, instead
   // of the VF directly. At the moment, widened inductions cannot be updated, so
@@ -1481,7 +1489,7 @@ bool VPlanTransforms::tryAddExplicitVectorLength(VPlan &Plan) {
   NextEVLIV->insertBefore(CanonicalIVIncrement);
   EVLPhi->addOperand(NextEVLIV);
 
-  transformRecipestoEVLRecipes(Plan, *VPEVL);
+  transformRecipestoEVLRecipes(Plan, *VPEVL, TLI);
 
   // Replace all uses of VPCanonicalIVPHIRecipe by
   // VPEVLBasedIVPHIRecipe except for the canonical IV increment.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index f4a17aec42b244..f424685ba206c6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -108,7 +108,8 @@ struct VPlanTransforms {
   /// VPCanonicalIVPHIRecipe is only used to control the loop after
   /// this transformation.
   /// \returns true if the transformation succeeds, or false if it doesn't.
-  static bool tryAddExplicitVectorLength(VPlan &Plan);
+  static bool tryAddExplicitVectorLength(VPlan &Plan,
+                                         const TargetLibraryInfo &TLI);
 
   // For each Interleave Group in \p InterleaveGroups replace the Recipes
   // widening its memory instructions with a single VPInterleaveRecipe at its
diff --git a/llvm/lib/Transforms/Vectorize/VPlanValue.h b/llvm/lib/Transforms/Vectorize/VPlanValue.h
index 4c383244f96f1a..b78f6be90282f6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanValue.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanValue.h
@@ -347,6 +347,7 @@ class VPDef {
     VPScalarIVStepsSC,
     VPVectorPointerSC,
     VPWidenCallSC,
+    VPWidenCallEVLSC,
     VPWidenCanonicalIVSC,
     VPWidenCastSC,
     VPWidenGEPSC,
diff --git a/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp b/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
index 99bc4c38a3c3cd..76f5df39d89438 1006...
[truncated]

llvm/lib/Transforms/Vectorize/VPlan.h Outdated Show resolved Hide resolved
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Outdated Show resolved Hide resolved
Comment on lines 1401 to 1498
VPID, Ops, TypeInfo.inferScalarType(CInst), false,
false, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flags must be read from the original recipe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the addition of this metadata still needs to be done through getUnderlyingValue. Do we really not need to pass getUnderlyingValue?
State.addMetadata(VPInst, dyn_cast_or_null<Instruction>(getUnderlyingValue()));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, it would need to look up the flags from the underlying call

Comment on lines 1045 to 1054
std::optional<Intrinsic::ID> ID =
VPIntrinsic::getFunctionalIntrinsicIDForVP(VectorIntrinsicID);
if (ID) {
FID = ID.value();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. After the VP instruction adds the EVL parameter through the VPTransform process, Arguments will be cleared and will fall Into BasicTTIImpl, which will check if the VP intrinsic is a type based instruction. All type based instruction will fall into the getTypeBasedIntrinsicInstrCost
  2. VP Intrinsics should have the same cost as their non-vp counterpart
  3. To ensure the accuracy of the CostAttrs parameters passed to TTI, such as the number of parameters, parameter type, ID, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that needs to be fixed in the cost model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried but failed. The main problems I found were:

  1. We have no way to obtain VP Inst. Some VPs need Inst information when calculating costs,such as vp_icmp/vp_fcmp.
    image
  2. The size of Arguments is wrong, it will be cleared. I tried adding a mask (nullptr or get from getBlockInMask). I think VP's cost doesn't care about masks,so I didn't do it.
    image
  3. I think VP Intrinsics should have the same cost as their non-vp counterpart.
    I am not sure about some of the questions. If there are any problems, please point them out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiqinWeng I am aware that getCmpSelectInstrCost requires a predicate to estimate its cost. However, getIntrinsicInstrCost currently lacks a mechanism to provide this information.
Besides these, are there other vp intrinsics that necessitate their non-vp counter part for cost estimation?

Copy link
Contributor Author

@LiqinWeng LiqinWeng Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly call intrinsics, eg llvm.smax/llvm.smin...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for 1), shouldn't it be possible to retrieve the predicate from the intrinsic opcode?

for 2), would it be sufficient to just don't clear the arguments for VPIntrinics, if needed for cost computations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for 1) Implemented in #108533
for 2) The root cause is that there is no way to get the EVL Value && mask Value at this stage, and VP Intrinsic cannot configure the correct ArgmentsSize in CostAttr
image

Comment on lines 996 to 1010
if (VPIntrinsic::isVPIntrinsic(VectorIntrinsicID)) {
// Use vector version of the vector predicate Intrinsic
IRBuilderBase &BuilderIR = State.Builder;
VectorBuilder VBuilder(BuilderIR);
Value *Mask = BuilderIR.CreateVectorSplat(State.VF, BuilderIR.getTrue());
VBuilder.setMask(Mask).setEVL(Args.back());
// Remove the EVL from Args
Args.pop_back();
Value *VPInst = VBuilder.createSimpleIntrinsic(
VectorIntrinsicID, TysForDecl[0], Args, "vp.call");
if (!VPInst->getType()->isVoidTy())
State.set(this, VPInst);
State.addMetadata(VPInst,
dyn_cast_or_null<Instruction>(getUnderlyingValue()));
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need this code?

Copy link
Contributor Author

@LiqinWeng LiqinWeng Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, it is not necessary, I will fixed it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm/lib/IR/VectorBuilder.cpp Outdated Show resolved Hide resolved
Comment on lines 999 to 1004
Value *Mask =
State.Builder.CreateVectorSplat(State.VF, State.Builder.getTrue());
Value *EVL = Args.back();
Args.pop_back();
Args.push_back(Mask);
Args.push_back(EVL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed and for recipes != vp_select? Can't the mask be provided when the recipe is constructed, like for vp_select?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fhahn The VPValue BlockInMask currently represents an all-one mask as a no-mask (nullptr). It seems we have two options: either model the all-one mask directly in the plan, which may require adding a new recipe, or convert the nullptr (no-mask) to an all-one mask specifically for VP intrinsics in VPWidenIntrinsic. Which approach would you recommend?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a recipe for all-true, it should be possbile to use a live-in when creating the EVL recipe using something like Plan.getOrAddLiveIn(ConstantInt::getTrue(Ctx))?

Copy link
Contributor Author

@LiqinWeng LiqinWeng Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed and for recipes != vp_select? Can't the mask be provided when the recipe is constructed, like for vp_select?

vp_select hasn‘t mask op

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed and for recipes != vp_select? Can't the mask be provided when the recipe is constructed, like for vp_select?

vp_select hasn‘t mask op

Use VPIntrinsic::getMaskParamPos to get whether the vp intrinsic requires mask.
Use VPIntrinsic::getVectorLengthParamPos to get whether the vp intrinsic requires evl.
Than we can pass in the CallArgument explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may have misunderstood what I meant.
What I mean is to directly pass the mask and evl as call arguments when creating VPWidenIntrinsicRecipe. VPIntrinsic::getMaskParamPos and VPIntrinsic::getVectorLengthParamPos only provide a way for you to check whether a specific vp intrinsic requires a mask or evl.
If you need to pass an all-true mask, you can use

VPValue *AllTrue = Plan.getOrAddLiveIn(ConstantInt::getTrue(Ctx));

, as Florian mentioned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, fixed again~

Comment on lines 1045 to 1054
std::optional<Intrinsic::ID> ID =
VPIntrinsic::getFunctionalIntrinsicIDForVP(VectorIntrinsicID);
if (ID) {
FID = ID.value();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that needs to be fixed in the cost model?

Comment on lines 1040 to 1052
std::optional<Intrinsic::ID> ID =
VPIntrinsic::getFunctionalIntrinsicIDForVP(VectorIntrinsicID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this looks up the non-VP intrinsic ID and computes the cost using it? At least needs a comment, but if they have the same costs, the cost model should return the expected cost?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotation added

Comment on lines 1401 to 1498
VPID, Ops, TypeInfo.inferScalarType(CInst), false,
false, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, it would need to look up the flags from the underlying call

@LiqinWeng
Copy link
Contributor Author

Can it be merged? Any other questions? @fhahn @alexey-bataev @arcbbb

Copy link
Contributor

@Mel-Chen Mel-Chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that we need to pass not only the debug location, but possibly also the fast math flags (FMF). This might require modifying the constructor.
Possible solutions could be:

  1. Pass FMF directly, similar to how we pass the debug location.
  2. Pass the underlying instruction as Instruction &.
  3. Pass VPWidenIntrinsicRecipe &.

@fhahn What do you think?

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp Outdated Show resolved Hide resolved
Comment on lines 1487 to 1496
.Case<VPWidenIntrinsicRecipe>(
[&](VPWidenIntrinsicRecipe *CInst) -> VPRecipeBase * {
auto *CI = cast<CallInst>(CInst->getUnderlyingInstr());
SmallVector<VPValue *> Ops(CInst->operands());
Ops.push_back(&EVL);

Intrinsic::ID VPID = VPIntrinsic::getForIntrinsic(
CI->getCalledFunction()->getIntrinsicID());
if (VPID == Intrinsic::not_intrinsic)
return nullptr;
return new VPWidenIntrinsicRecipe(
*CI, VPID, Ops, CI->getType(), CI->getDebugLoc());
})
Copy link
Contributor

@Mel-Chen Mel-Chen Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early break if the intrinsic can not be transformed into vp intrinsic.

Suggested change
.Case<VPWidenIntrinsicRecipe>(
[&](VPWidenIntrinsicRecipe *CInst) -> VPRecipeBase * {
auto *CI = cast<CallInst>(CInst->getUnderlyingInstr());
SmallVector<VPValue *> Ops(CInst->operands());
Ops.push_back(&EVL);
Intrinsic::ID VPID = VPIntrinsic::getForIntrinsic(
CI->getCalledFunction()->getIntrinsicID());
if (VPID == Intrinsic::not_intrinsic)
return nullptr;
return new VPWidenIntrinsicRecipe(
*CI, VPID, Ops, CI->getType(), CI->getDebugLoc());
})
.Case<VPWidenIntrinsicRecipe>(
[&](VPWidenIntrinsicRecipe *CInst) -> VPRecipeBase * {
auto *CI = cast<CallInst>(CInst->getUnderlyingInstr());
Intrinsic::ID VPID = VPIntrinsic::getForIntrinsic(
CI->getCalledFunction()->getIntrinsicID());
if (VPID == Intrinsic::not_intrinsic)
return nullptr;
SmallVector<VPValue *> Ops(CInst->operands());
Ops.push_back(&EVL);
return new VPWidenIntrinsicRecipe(
*CI, VPID, Ops, CI->getType(), CI->getDebugLoc());
})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 999 to 1004
Value *Mask =
State.Builder.CreateVectorSplat(State.VF, State.Builder.getTrue());
Value *EVL = Args.back();
Args.pop_back();
Args.push_back(Mask);
Args.push_back(EVL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed and for recipes != vp_select? Can't the mask be provided when the recipe is constructed, like for vp_select?

vp_select hasn‘t mask op

Use VPIntrinsic::getMaskParamPos to get whether the vp intrinsic requires mask.
Use VPIntrinsic::getVectorLengthParamPos to get whether the vp intrinsic requires evl.
Than we can pass in the CallArgument explicitly.

Comment on lines 1082 to 1020
IntrinsicCostAttributes CostAttrs(
VectorIntrinsicID, RetTy, Arguments, ParamTys, FMF,
FID, RetTy, Arguments, ParamTys, FMF,
dyn_cast_or_null<IntrinsicInst>(getUnderlyingValue()));
return Ctx.TTI.getIntrinsicInstrCost(CostAttrs, CostKind);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that we still have some VP intrinsics without an implemented cost.
Could we try to get the cost from llvm.<intrinsic name> if vp.<intrinsic name> returns Invalid?

Copy link
Contributor Author

@LiqinWeng LiqinWeng Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. The issue with the cost model should only be that getIntrinsicInstrCost may not have implemented the cost for all vp intrinsics. Alternatively, you could choose one of the vp intrinsics for which getIntrinsicInstrCost has already been implemented as a demo.
Then we can discuss what to do about invalid costs.

Copy link
Contributor Author

@LiqinWeng LiqinWeng Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cost is based on non-vp intrinsic.Considering the legacy model, I use the non-vp intrinsics ID instead of vp intrinsics ID.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, why not use vp intrinsic ID directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because the cost of llvm.vp.cttz not implemented yet so you get Invalid, or something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the Vplan based cost more accurate? In that case, can the legacy cm be fixed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost of vscalble of llvm.cttz is invalid in legacy model
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After debugging, I found that ISD::cttz is an extended type and ISD::ctlz is a custom type.I'm trying to solve it
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of llvm.vp.cttz not implemented yet so you get Invalid, or something else?

Vplan cost is based the llvm.vp.cttz, which can get correct the cost, but llvm.cttz of cost isinvalid in leagcy cost model

Comment on lines +120 to 129
case Intrinsic::vp_abs:
case Intrinsic::ctlz:
case Intrinsic::vp_ctlz:
case Intrinsic::cttz:
case Intrinsic::vp_cttz:
case Intrinsic::is_fpclass:
case Intrinsic::vp_is_fpclass:
case Intrinsic::powi:
return (ScalarOpdIdx == 1);
Copy link
Contributor

@arcbbb arcbbb Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example,

%r = call <4 x i32> @llvm.vp.abs.v4i32(<4 x i32> %a, <4 x i1> %mask, i32 %evl, i1 false)

the scalar operands are %evl and false , located at indexes 2 and 3,
but the code only checks (ScalarOpdIdx == 1), is this correct?

It looks like VectorUtil doesn't work well with vp intrinsics.
I’m unsure of the best practice here, but in #115274, I handle %evl by adding

 if (VPIntrinsic::isVPIntrinsic(ID) &&
      (ScalarOpdIdx == VPIntrinsic::getVectorLengthParamPos(ID)))
    return true;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem like the document need to update! 🫠

commit 419948fe6708021524f86e15d755719d634ab8d2

Refs: llvmorg-17-init-2994-g419948fe6708
Author:     Yeting Kuo <[email protected]>
AuthorDate: Wed Feb 22 14:50:11 2023 +0800
Commit:     Yeting Kuo <[email protected]>
CommitDate: Thu Feb 23 13:58:21 2023 +0800

    [VP] Reorder is_int_min_poison/is_zero_poison operand before mask for vp.abs/ctlz/cttz.

    The patch ensures last two operands of vp.abs/ctlz/cttz are mask and evl.

    Reviewed By: craig.topper

    Differential Revision: https://reviews.llvm.org/D144536

; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
; RUN: -mtriple=riscv64 -mattr=+v -riscv-v-vector-bits-max=128 -disable-output < %s 2>&1 | FileCheck --check-prefix=IF-EVL %s

define void @vp_smax(ptr noalias %a, ptr noalias %b, ptr noalias %c, i64 %N) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. Please remove noalias if this PR don't need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines +1030 to +998
if (VPIntrinsic::isVPIntrinsic(VectorIntrinsicID)) {
Arguments.push_back(V);
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why gather the nullptr into Arguments for vp intrinsic only?
Is it for the all-true mask, or for other reasons?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to set the correct the argument size in CostAttrs. Passing the EVL parameter, the argumnet will be cleare. some vp cost will judge the number of parameters.Do you have any better suggestions?
image

Copy link
Contributor

@Mel-Chen Mel-Chen Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but I don't have better idea for this, since both all-true mask and EVL are not Value when we computing cost. :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document at least. Also, a bit surprising to just push 1 nullptr instead of 2. Are any cost implementations actually using the IR arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but I don't have better idea for this, since both all-true mask and EVL are not Value when we computing cost. :(
So before I took the cost to be based on non-vp intrinsics ~ From BasicTTIImpl.h && RISCVTargetTransform implementation: VP Intrinsics should have the same cost as their non-vp counterparts. Are we sure use cost is based on vp intrisnic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document at least. Also, a bit surprising to just push 1 nullptr instead of 2. Are any cost implementations actually using the IR arguments?

eg: llvm.vp.smx/llvm.vp.smin ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you don't push both null mask and null evl?

Suggested change
if (VPIntrinsic::isVPIntrinsic(VectorIntrinsicID)) {
Arguments.push_back(V);
break;
}
if (VPIntrinsic::isVPIntrinsic(VectorIntrinsicID) &&
(VPIntrinsic::getMaskParamPos(VectorIntrinsicID) == Idx ||
VPIntrinsic::getVectorLengthParamPos(VectorIntrinsicID) == Idx)) {
Arguments.push_back(V);
continue;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mask is not null!That is an all-one mask

Comment on lines 1496 to 1502
if (VPIntrinsic::getMaskParamPos(VPID)) {
VPValue *Mask = Plan.getOrAddLiveIn(ConstantInt::getTrue(
IntegerType::getInt1Ty(CI->getContext())));
Ops.push_back(Mask);
}
if (VPIntrinsic::getVectorLengthParamPos(VPID))
Ops.push_back(&EVL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this ifs actually be the assertions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines +120 to +127
case Intrinsic::vp_abs:
case Intrinsic::ctlz:
case Intrinsic::vp_ctlz:
case Intrinsic::cttz:
case Intrinsic::vp_cttz:
case Intrinsic::is_fpclass:
case Intrinsic::vp_is_fpclass:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need these changes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second parameter of this intrinsic is a scalar, which should remain in scalar form after vectorization.
llvm.vp.abs.nxv1i8(<vscale x 1 x i8> %va, i1 false, <vscale x 1 x i1> splat (i1 true), i32 %evl)

@@ -141,10 +145,13 @@ bool llvm::isVectorIntrinsicWithOverloadTypeAtArg(Intrinsic::ID ID,
case Intrinsic::fptoui_sat:
case Intrinsic::lrint:
case Intrinsic::llrint:
case Intrinsic::vp_lrint:
case Intrinsic::vp_llrint:
case Intrinsic::ucmp:
case Intrinsic::scmp:
return OpdIdx == -1 || OpdIdx == 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the EVL operand can be overloaded, should this just be OpIdx == 0 for vp_lrint/vp_llrint?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, -1 means the return type. The change in this patch is correct, I think.

@LiqinWeng
Copy link
Contributor Author

Any other problems? @fhahn @Mel-Chen @arcbbb @alexey-bataev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants