[SLP] Ensure TreeCost is scaled for ordered fadd reductions by KavinTheG · Pull Request #199388 · llvm/llvm-project

KavinTheG · 2026-05-24T01:14:03Z

Resolves #199267

Addresses an issue where getScaleToLoopIterations() can return 1 on isolated SLP trees because UserTreeIndex is invalid. This prevents TreeCost from scaling alongside ReductionCost, causing the cost model to incorrectly treat an unprofitable vector reduction as profitable.

This patch passes the reduction root instruction down into calculateTreeCostAndTrimNonProfitable and the underlying scale calculation so getScaleToLoopIterations can get the correct block context.

Addresses an issue where `getScaleToLoopIterations()` returns 1 on isolated SLP trees because `UserTreeIndex` is invalid. This prevents `TreeCost` from scaling alongside `ReductionCost`, causing the cost model to incorrectly treat an unprofitable vector reduction as profitable.

llvmorg-github-actions · 2026-05-24T01:14:37Z

@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-risc-v

Author: Kavin Gnanapandithan (KavinTheG)

Changes

Resolves #199267

Addresses an issue where getScaleToLoopIterations() can return 1 on isolated SLP trees because UserTreeIndex is invalid. This prevents TreeCost from scaling alongside ReductionCost, causing the cost model to incorrectly treat an unprofitable vector reduction as profitable.

This patch passes the reduction root instruction down into calculateTreeCostAndTrimNonProfitable and the underlying scale calculation so getScaleToLoopIterations can get the correct block context.

Full diff: https://github.com/llvm/llvm-project/pull/199388.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+16-10)
(added) llvm/test/Transforms/SLPVectorizer/RISCV/ordered-reduction.ll (+76)

diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 6bd595c86d55a..e83f3fcbe65c4 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -2153,7 +2153,8 @@ class slpvectorizer::BoUpSLP {
   /// Calculates the cost of the subtrees, trims non-profitable ones and returns
   /// final cost.
   InstructionCost
-  calculateTreeCostAndTrimNonProfitable(ArrayRef<Value *> VectorizedVals = {});
+  calculateTreeCostAndTrimNonProfitable(ArrayRef<Value *> VectorizedVals = {},
+                                        Instruction *RdxRoot = nullptr);
 
   /// \returns the vectorization cost of the subtree that starts at \p VL.
   /// A negative number means that this is profitable.
@@ -3973,7 +3974,8 @@ class slpvectorizer::BoUpSLP {
   /// LICM hoisting that optimizeGatherSequence() performs after vectorization
   /// for inserts with loop-invariant operands. Falls back to the whole-entry
   /// scale when per-lane information is unavailable or the feature is off.
-  uint64_t getGatherNodeEffectiveScale(const TreeEntry &TE);
+  uint64_t getGatherNodeEffectiveScale(const TreeEntry &TE,
+                                       Instruction *U = nullptr);
 
   /// Get the loop nest for the given loop \p L.
   ArrayRef<const Loop *> getLoopNest(const Loop *L);
@@ -16283,14 +16285,15 @@ uint64_t BoUpSLP::getLoopNestScale(const Loop *L) {
   return std::max<uint64_t>(1, Scale);
 }
 
-uint64_t BoUpSLP::getGatherNodeEffectiveScale(const TreeEntry &TE) {
+uint64_t BoUpSLP::getGatherNodeEffectiveScale(const TreeEntry &TE,
+                                              Instruction *U) {
   // Only meaningful for gather/buildvector-like entries; the per-lane
   // insertelements that make up such an entry are LICM-hoistable by
   // optimizeGatherSequence() when their operand is loop-invariant.
   assert((TE.isGather() || TE.State == TreeEntry::SplitVectorize) &&
          "Expected gather/split tree entry.");
 
-  uint64_t BaseScale = getScaleToLoopIterations(TE);
+  uint64_t BaseScale = getScaleToLoopIterations(TE, nullptr, U);
   if (!PerLaneGatherScale || LoopAwareTripCount == 0 || BaseScale <= 1)
     return BaseScale;
 
@@ -16315,7 +16318,8 @@ uint64_t BoUpSLP::getGatherNodeEffectiveScale(const TreeEntry &TE) {
     if (isConstant(V))
       continue;
     ++N;
-    uint64_t LaneScale = std::min(getScaleToLoopIterations(TE, V), BaseScale);
+    uint64_t LaneScale =
+        std::min(getScaleToLoopIterations(TE, V, U), BaseScale);
     Sum = SaturatingAdd(Sum, LaneScale, &Overflow);
     if (Overflow)
       return BaseScale;
@@ -18642,8 +18646,9 @@ static T *performExtractsShuffleAction(
   return Prev;
 }
 
-InstructionCost BoUpSLP::calculateTreeCostAndTrimNonProfitable(
-    ArrayRef<Value *> VectorizedVals) {
+InstructionCost
+BoUpSLP::calculateTreeCostAndTrimNonProfitable(ArrayRef<Value *> VectorizedVals,
+                                               Instruction *RdxRoot) {
   SmallDenseMap<const TreeEntry *, InstructionCost> NodesCosts;
   SmallPtrSet<Value *, 4> CheckedExtracts;
   SmallSetVector<TreeEntry *, 4> GatheredLoadsNodes;
@@ -18752,8 +18757,8 @@ InstructionCost BoUpSLP::calculateTreeCostAndTrimNonProfitable(
       }
     }
     if (!CostIsFree && !Scale) {
-      Scale = IsGatherLike ? getGatherNodeEffectiveScale(TE)
-                           : getScaleToLoopIterations(TE);
+      Scale = IsGatherLike ? getGatherNodeEffectiveScale(TE, RdxRoot)
+                           : getScaleToLoopIterations(TE, nullptr, RdxRoot);
       C *= Scale;
       EntryToScale.try_emplace(&TE, Scale);
       if (!TE.isGather() && TE.hasState()) {
@@ -29490,7 +29495,8 @@ class HorizontalReduction {
 
       V.transformNodes();
       V.computeMinimumValueSizes();
-      InstructionCost TreeCost = V.calculateTreeCostAndTrimNonProfitable(VL);
+      InstructionCost TreeCost =
+          V.calculateTreeCostAndTrimNonProfitable(VL, RdxRootInst);
       V.buildExternalUses(LocalExternallyUsedValues);
 
       InstructionCost ReductionCost =
diff --git a/llvm/test/Transforms/SLPVectorizer/RISCV/ordered-reduction.ll b/llvm/test/Transforms/SLPVectorizer/RISCV/ordered-reduction.ll
new file mode 100644
index 0000000000000..aa21b0f98f8a6
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/RISCV/ordered-reduction.ll
@@ -0,0 +1,76 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt -S --passes=slp-vectorizer -mtriple=riscv64-unknown-linux-gnu -mattr="+v" < %s | FileCheck %s
+
+target triple = "riscv64-unknown-linux-gnu"
+
+define void @main(ptr %arrayidx.i60, ptr %24, ptr %arrayidx44.i61, double %.pre117.i70, double %.pre116.i68, double %.pre115.i67, double %.pre114.i65, double %.pre113.i64, double %.pre.i62) {
+; CHECK-LABEL: define void @main(
+; CHECK-SAME: ptr [[ARRAYIDX_I60:%.*]], ptr [[TMP0:%.*]], ptr [[ARRAYIDX44_I61:%.*]], double [[DOTPRE117_I70:%.*]], double [[DOTPRE116_I68:%.*]], double [[DOTPRE115_I67:%.*]], double [[DOTPRE114_I65:%.*]], double [[DOTPRE113_I64:%.*]], double [[DOTPRE_I62:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  [[FOR_COND5_PREHEADER_I58:.*]]:
+; CHECK-NEXT:    br label %[[FOR_BODY8_I71:.*]]
+; CHECK:       [[FOR_BODY8_I71]]:
+; CHECK-NEXT:    [[TMP1:%.*]] = phi double [ [[DOTPRE117_I70]], %[[FOR_COND5_PREHEADER_I58]] ], [ [[TMP9:%.*]], %[[FOR_BODY8_I71]] ]
+; CHECK-NEXT:    [[TMP2:%.*]] = phi double [ [[DOTPRE116_I68]], %[[FOR_COND5_PREHEADER_I58]] ], [ [[TMP1]], %[[FOR_BODY8_I71]] ]
+; CHECK-NEXT:    [[TMP3:%.*]] = phi double [ [[DOTPRE115_I67]], %[[FOR_COND5_PREHEADER_I58]] ], [ [[TMP8:%.*]], %[[FOR_BODY8_I71]] ]
+; CHECK-NEXT:    [[TMP4:%.*]] = phi double [ [[DOTPRE114_I65]], %[[FOR_COND5_PREHEADER_I58]] ], [ [[DIV_I86:%.*]], %[[FOR_BODY8_I71]] ]
+; CHECK-NEXT:    [[TMP5:%.*]] = phi double [ [[DOTPRE113_I64]], %[[FOR_COND5_PREHEADER_I58]] ], [ [[TMP7:%.*]], %[[FOR_BODY8_I71]] ]
+; CHECK-NEXT:    [[TMP6:%.*]] = phi double [ [[DOTPRE_I62]], %[[FOR_COND5_PREHEADER_I58]] ], [ [[TMP5]], %[[FOR_BODY8_I71]] ]
+; CHECK-NEXT:    [[INDVARS_IV_I72:%.*]] = phi i64 [ 1, %[[FOR_COND5_PREHEADER_I58]] ], [ [[INDVARS_IV_NEXT_I74:%.*]], %[[FOR_BODY8_I71]] ]
+; CHECK-NEXT:    [[ADD_I73:%.*]] = fadd double [[TMP5]], [[TMP6]]
+; CHECK-NEXT:    [[INDVARS_IV_NEXT_I74]] = add nuw nsw i64 [[INDVARS_IV_I72]], 1
+; CHECK-NEXT:    [[ARRAYIDX23_I75:%.*]] = getelementptr inbounds nuw [8 x i8], ptr [[ARRAYIDX_I60]], i64 [[INDVARS_IV_NEXT_I74]]
+; CHECK-NEXT:    [[TMP7]] = load double, ptr [[ARRAYIDX23_I75]], align 8
+; CHECK-NEXT:    [[ADD24_I76:%.*]] = fadd double [[ADD_I73]], [[TMP7]]
+; CHECK-NEXT:    [[ADD30_I77:%.*]] = fadd double [[TMP4]], [[ADD24_I76]]
+; CHECK-NEXT:    [[ARRAYIDX34_I78:%.*]] = getelementptr inbounds nuw [8 x i8], ptr [[TMP0]], i64 [[INDVARS_IV_I72]]
+; CHECK-NEXT:    [[ADD35_I79:%.*]] = fadd double [[TMP3]], [[ADD30_I77]]
+; CHECK-NEXT:    [[ARRAYIDX40_I80:%.*]] = getelementptr inbounds nuw [8 x i8], ptr [[TMP0]], i64 [[INDVARS_IV_NEXT_I74]]
+; CHECK-NEXT:    [[TMP8]] = load double, ptr [[ARRAYIDX40_I80]], align 8
+; CHECK-NEXT:    [[ADD41_I81:%.*]] = fadd double [[TMP8]], [[ADD35_I79]]
+; CHECK-NEXT:    [[ADD48_I82:%.*]] = fadd double [[TMP2]], [[ADD41_I81]]
+; CHECK-NEXT:    [[ADD54_I83:%.*]] = fadd double [[TMP1]], [[ADD48_I82]]
+; CHECK-NEXT:    [[ARRAYIDX60_I84:%.*]] = getelementptr inbounds nuw [8 x i8], ptr [[ARRAYIDX44_I61]], i64 [[INDVARS_IV_NEXT_I74]]
+; CHECK-NEXT:    [[TMP9]] = load double, ptr [[ARRAYIDX60_I84]], align 8
+; CHECK-NEXT:    [[TMP18:%.*]] = fadd double [[TMP9]], [[ADD54_I83]]
+; CHECK-NEXT:    [[DIV_I86]] = fdiv double [[TMP18]], 9.000000e+00
+; CHECK-NEXT:    store double [[DIV_I86]], ptr [[ARRAYIDX34_I78]], align 8
+; CHECK-NEXT:    [[EXITCOND_NOT_I87:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT_I74]], 1999
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT_I87]], label %[[FOR_INC66_I88:.*]], label %[[FOR_BODY8_I71]]
+; CHECK:       [[FOR_INC66_I88]]:
+; CHECK-NEXT:    ret void
+;
+for.cond.preheader:
+  br label %for.body
+
+for.body:
+  %25 = phi double [ %.pre117.i70, %for.cond.preheader ], [ %33, %for.body ]
+  %26 = phi double [ %.pre116.i68, %for.cond.preheader ], [ %25, %for.body ]
+  %27 = phi double [ %.pre115.i67, %for.cond.preheader ], [ %32, %for.body ]
+  %28 = phi double [ %.pre114.i65, %for.cond.preheader ], [ %div.i86, %for.body ]
+  %29 = phi double [ %.pre113.i64, %for.cond.preheader ], [ %31, %for.body ]
+  %30 = phi double [ %.pre.i62, %for.cond.preheader ], [ %29, %for.body ]
+  %indvars.iv.i72 = phi i64 [ 1, %for.cond.preheader ], [ %indvars.iv.next.i74, %for.body ]
+  %add.i73 = fadd double %29, %30
+  %indvars.iv.next.i74 = add nuw nsw i64 %indvars.iv.i72, 1
+  %arrayidx23.i75 = getelementptr inbounds nuw [8 x i8], ptr %arrayidx.i60, i64 %indvars.iv.next.i74
+  %31 = load double, ptr %arrayidx23.i75, align 8
+  %add24.i76 = fadd double %add.i73, %31
+  %add30.i77 = fadd double %28, %add24.i76
+  %arrayidx34.i78 = getelementptr inbounds nuw [8 x i8], ptr %24, i64 %indvars.iv.i72
+  %add35.i79 = fadd double %27, %add30.i77
+  %arrayidx40.i80 = getelementptr inbounds nuw [8 x i8], ptr %24, i64 %indvars.iv.next.i74
+  %32 = load double, ptr %arrayidx40.i80, align 8
+  %add41.i81 = fadd double %32, %add35.i79
+  %add48.i82 = fadd double %26, %add41.i81
+  %add54.i83 = fadd double %25, %add48.i82
+  %arrayidx60.i84 = getelementptr inbounds nuw [8 x i8], ptr %arrayidx44.i61, i64 %indvars.iv.next.i74
+  %33 = load double, ptr %arrayidx60.i84, align 8
+  %add61.i85 = fadd double %33, %add54.i83
+  %div.i86 = fdiv double %add61.i85, 9.000000e+00
+  store double %div.i86, ptr %arrayidx34.i78, align 8
+  %exitcond.not.i87 = icmp eq i64 %indvars.iv.next.i74, 1999
+  br i1 %exitcond.not.i87, label %for.end, label %for.body
+
+for.end:
+  ret void
+}

KavinTheG · 2026-05-24T02:32:39Z

@alexey-bataev, would appreciate a review!

Extended calculateTreeCostAndTrimNonProfitable and getGatherNodeEffectiveScale so that getScaleToLoopIterations can deduce valid Parent value.

alexey-bataev · 2026-05-24T12:27:03Z

-      Scale = IsGatherLike ? getGatherNodeEffectiveScale(TE)
-                           : getScaleToLoopIterations(TE);
+      Scale = IsGatherLike ? getGatherNodeEffectiveScale(TE, RdxRoot)
+                           : getScaleToLoopIterations(TE, nullptr, RdxRoot);


No, do not pass RdxRoot here for getScaleToLoopIterations, must be passed only for gathers/buildvectors

alexey-bataev · 2026-05-24T12:28:13Z

@@ -0,0 +1,76 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6


Precommit the test in a separate NFC patch

KavinTheG added 2 commits May 23, 2026 20:17

Precommit

43f6dad

llvmorg-github-actions Bot added backend:RISC-V vectorizers llvm:transforms labels May 24, 2026

alexey-bataev reviewed May 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SLP] Ensure TreeCost is scaled for ordered fadd reductions#199388

[SLP] Ensure TreeCost is scaled for ordered fadd reductions#199388
KavinTheG wants to merge 2 commits into
llvm:mainfrom
KavinTheG:slp-keep-ordered-fadd-chain

KavinTheG commented May 24, 2026

Uh oh!

llvmorg-github-actions Bot commented May 24, 2026 •

edited

Loading

Uh oh!

KavinTheG commented May 24, 2026

Uh oh!

alexey-bataev May 24, 2026

Uh oh!

alexey-bataev May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,76 @@
		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6

Conversation

KavinTheG commented May 24, 2026

Uh oh!

llvmorg-github-actions Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KavinTheG commented May 24, 2026

Uh oh!

alexey-bataev May 24, 2026

Choose a reason for hiding this comment

Uh oh!

alexey-bataev May 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

llvmorg-github-actions Bot commented May 24, 2026 •

edited

Loading