planner: Add memory overhead for HashAgg on TiKV by terry1purcell · Pull Request #67953 · pingcap/tidb

terry1purcell · 2026-04-22T02:52:31Z

What problem does this PR solve?

Issue Number: close #68083

Problem Summary:

What changed and how does it work?

planner: fix HashAgg cost model — place hash table memory cost outside concurrency division

Previously, getPlanCostVer24PhysicalHashAgg divided ALL costs (agg, group, hash-build, hash-probe) by the concurrency factor (default 5). This gave HashAgg an artificial
discount on hash table memory, even though each partial worker maintains its own hash table and total memory does not decrease with more workers.

Root cause

The cost model treated hash table memory the same as CPU work — dividing it by concurrency. But while CPU work (aggregation, grouping, hash key computation, probing) is
genuinely parallelized across partial workers that each process a disjoint subset of input rows, memory is not: each partial worker builds its own hash table, so total memory
scales with concurrency * outputRows * rowSize.

Fix

For TiDB root and TiKV cop tasks, split the hash build cost into its CPU and memory components:

CPU work (agg, group, hash key computation, probe) stays inside /concurrency — partial workers genuinely parallelize this
Hash table memory (concurrency * outputRows * rowSize * memFactor) is placed outside the concurrency division

New formula (root/cop):

plan-cost = child-cost + concurrency * hashMemCost + (aggCost + groupCost + hashBuildCPUCost + hashProbeCost) / concurrency

MPP (unchanged):

plan-cost = child-cost + (aggCost + groupCost + hashBuildCost + hashProbeCost) / concurrency

Impact

High-NDV GROUP BY with wide output rows (outputRows ≈ inputRows): memory dominates, optimizer correctly prefers StreamAgg when an ordered index is available, reducing memory consumption from potentially 100MB–1GB (full hash table per worker) to ~1.5MB (streaming over ordered index output).
Low-NDV or no GROUP BY (e.g. count(*), sum(a), avg(a)): outputRows is small (~1), so the memory penalty is negligible. HashAgg retains its parallelism advantage and is still preferred. This is correct — these queries hash to a single value and don't suffer from large hash tables.
TiFlash MPP tasks: unchanged. Data is truly partitioned across nodes, so the original formula (divide everything by concurrency) is preserved.

Regression test

TestHashAggMemCostNotDividedByConcurrency verifies that StreamAgg is cheaper than HashAgg for a 100%-NDV GROUP BY with wide output rows and an available index.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

Bug Fixes
- Improved query cost estimation for aggregation operations, fixing memory cost calculation to prevent double-counting and better reflect execution efficiency across different task types.
Tests
- Added regression test for aggregation cost model accuracy.
- Updated query plan expectations across multiple test suites to reflect optimized execution strategies.

…ency division Previously, getPlanCostVer24PhysicalHashAgg divided ALL costs (agg, group, hash-build, hash-probe) by the concurrency factor (default 5). This gave HashAgg a 5x artificial discount over StreamAgg even for single-threaded scenarios where aggregation work is not reduced by parallelism — each input row is processed exactly once regardless of worker count. Root cause: threads share the same input stream so aggCost and groupCost represent the total work done, not per-thread work. Only the hash table operations (build + probe) genuinely benefit from parallel execution. Fix: for TiDB root and TiKV cop tasks, move aggCost and groupCost outside the concurrency division, matching the pattern already used by hash join where build-side work sits outside /concurrency. The new formula is: plan-cost = child-cost + agg-cost + group-cost + (hash-build-cost + hash-probe-cost) / concurrency For TiFlash MPP tasks, data is truly partitioned across nodes so each node handles a disjoint subset of rows — the original formula (divide everything by concurrency) is preserved to avoid disrupting MPP plan choices. Impact: high-NDV GROUP BY on an indexed column now correctly prefers StreamAgg over HashAgg, matching the optimizer's intent and reducing memory consumption from potentially 100MB-1GB (full hash table) to ~1.5MB (streaming over ordered index output). Low-NDV cases see minimal cost change. Add regression test TestHashAggMemCostNotDividedByConcurrency that verifies StreamAgg is cheaper than HashAgg for a 100%-NDV GROUP BY with an available index. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pantheon-ai · 2026-04-22T02:52:36Z

@terry1purcell I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

_{ℹ️ Learn more details on Pantheon AI.}

tiprow · 2026-04-22T02:52:47Z

Hi @terry1purcell. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coderabbitai · 2026-04-22T02:52:53Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This pull request modifies the hash aggregation cost calculation in the query planner to differentiate memory and CPU cost handling between MPP and non-MPP task types, causing plan optimization decisions to shift from hash aggregation to stream aggregation in certain scenarios. Test fixtures across multiple test suites are updated to reflect the new expected plan outputs.

Changes

Cohort / File(s)	Summary
Binary Plan Cost Updates `pkg/planner/core/casetest/binaryplan/testdata/binary_plan_suite_out.json`, `binary_plan_suite_xut.json`	Expected plan cost values updated for `explain analyze format = 'binary' select sum(t.a) from t join t2` test case (cost increased from `2007109896.2117057` to `2007109934.6117058`).
Enforce MPP Aggregation Changes `pkg/planner/core/casetest/enforcempp/testdata/enforce_mpp_suite_out.json`, `enforce_mpp_suite_xut.json`	Root aggregation operator changed from `HashAgg` to `StreamAgg` for `select count(distinct b), json_objectagg(d,c) from t;` in `TestMPPSingleDistinct3Stage`.
MPP Integration Plan Updates `pkg/planner/core/casetest/mpp/testdata/integration_suite_out.json`, `integration_suite_xut.json`	Removed `stream_count: 8` annotations from `HashJoin` and `HashAgg` nodes and their associated `ExchangeReceiver`/`ExchangeSender` nodes in `TestMppFineGrainedJoinAndAgg` test cases.
Physical Plan Test Updates `pkg/planner/core/casetest/physicalplantest/testdata/plan_suite_out.json`, `plan_suite_xut.json`	Restructured `IndexHashJoin` sub-tree plan for `TestSemiJoinToInner`, changing `IndexReader`-`HashAgg` nesting structure within the `Apply` pipeline.
Core Planner Test Fixtures `pkg/planner/core/casetest/testdata/integration_suite_out.json`, `integration_suite_xut.json`, `json_plan_suite_out.json`, `json_plan_suite_xut.json`, `plan_normalized_suite_out.json`, `plan_normalized_suite_xut.json`	Multiple fixture updates: cost/operator ID renumbering, aggregation operator changes from `HashAgg` to `StreamAgg`, execution timing adjustments, and operator identifier synchronization across test cases.
Hash Aggregation Cost Calculation `pkg/planner/core/plan_cost_ver2.go`	Modified `getPlanCostVer24PhysicalHashAgg` to apply different cost formulas for MPP vs non-MPP task types: for non-MPP tasks, memory cost (`hashMemCost`) is computed separately outside concurrency division, while CPU costs are divided by concurrency; MPP path retains original logic.
Cost Model Regression Test `pkg/planner/core/plan_cost_ver2_test.go`	Added `TestHashAggMemCostNotDividedByConcurrency` to verify stream aggregation cost is lower than hash aggregation for high-NDV `GROUP BY` workloads.
Integration Test Result Updates `tests/integrationtest/r/access_path_selection.result`, `clustered_index.result`, `collation_agg_func_disabled.result`, `collation_agg_func_enabled.result`, `planner/cascades/integration.result`	Updated `explain format='plan_tree'` output across multiple test suites to reflect aggregation operator changes from `HashAgg` to `StreamAgg` and plan restructuring in cascades optimization.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

planner: align core casetests with plan_tree #67113: Modifies planner core casetests' expected plan outputs and normalizes test fixtures format across multiple testdata files.

Suggested labels

ok-to-test

Suggested reviewers

qw4990
guo-shaoge
AilinKid

Poem

🐰 A fuzzy quest through cost and care,
We counted coins both here and there,
Stream flows cheaper, hashes find their place,
New numbers dance at perfect pace! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	PR description comprehensively covers problem, root cause, solution, impact, and regression test with detailed explanations.
Title check	✅ Passed	The title 'planner: Add memory overhead for HashAgg on TiKV' directly aligns with the main change: the cost model for HashAgg now treats memory costs differently for TiKV tasks.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

pkg/planner/core/plan_cost_ver2_test.go (1)

815-860: LGTM — test directly validates the cost-model fix.

The test correctly targets the claim: with 100% NDV on an indexed column, HashAgg's hash-table memory cost (now not divided by concurrency) makes StreamAgg on the ordered index scan cheaper. Running under RunTestUnderCascadesWithDomain also confirms the behavior holds under both cascades off and on, and the per-iteration store/domain isolates session state — so no explicit cost-factor reset is needed.

Optional nit: the hot-loop buf.WriteString(fmt.Sprintf(...)) on lines 836–841 can be fmt.Fprintf(&buf, "(%d,%d)", i, i) to avoid the intermediate string per row. Trivial, purely stylistic.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/plan_cost_ver2_test.go` around lines 815 - 860, Replace the
hot-loop string allocation inside TestHashAggMemCostNotDividedByConcurrency:
locate the loop that appends each row using
buf.WriteString(fmt.Sprintf("(%d,%d)", i, i)) and change it to use
fmt.Fprintf(&buf, "(%d,%d)", i, i) to avoid an intermediate fmt.Sprintf
allocation per iteration while preserving exact output.

pkg/planner/core/plan_cost_ver2.go (1)

622-637: Align the explanatory comments with the implemented formula.

Line 631 and Lines 673-674 say the hash-join/build-side work sits outside /concurrency, but Lines 675-676 still divide hashBuildCost. The formulas also omit startCost, which is added in both branches.

Proposed comment cleanup

-//	plan-cost = child-cost + agg-cost + group-cost + (hash-build-cost + hash-probe-cost) / concurrency
+//	plan-cost = start-cost + child-cost + agg-cost + group-cost + (hash-build-cost + hash-probe-cost) / concurrency
@@
-// Only the hash table operations (build + probe) benefit from parallel execution.
-// This matches the hash join pattern where build-side work is placed outside /concurrency.
+// Hash table build/probe costs remain concurrency-scaled by this model.
@@
-// For TiFlash MPP tasks, data is truly partitioned across nodes so each node handles a
-// disjoint subset of rows. All costs are divided by the MPP concurrency:
+// For TiFlash MPP tasks, data is partitioned across TiFlash nodes, so keep
+// concurrency scaling for the aggregation/grouping/hash portion:
@@
-//	plan-cost = child-cost + (agg-cost + group-cost + hash-build-cost + hash-probe-cost) / mppConcurrency
+//	plan-cost = start-cost + child-cost + (agg-cost + group-cost + hash-build-cost + hash-probe-cost) / mppConcurrency
@@
-		// Root (TiDB) and cop (TiKV) tasks: threads share the same input so aggregation
-		// work is not reduced by concurrency — each row is still processed once regardless
-		// of how many workers run in parallel. Only hash table operations benefit from
-		// parallel execution. This matches the hash join pattern where build-side work
-		// (including aggregation) sits outside /concurrency.
+		// Root (TiDB) and cop (TiKV) tasks: aggregation/grouping work is charged
+		// outside concurrency, while hash build/probe costs remain concurrency-scaled.

As per coding guidelines, “Comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees, SQL/compatibility contracts, or important performance trade-offs, and SHOULD NOT restate what the code already makes clear.”

Also applies to: 663-677

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/plan_cost_ver2.go` around lines 622 - 637, The comment block
above getPlanCostVer24PhysicalHashAgg is inconsistent with the implemented
calculation: update the explanatory comments to accurately reflect the
implemented formulas (including that startCost is added in both TiDB/TiKV and
TiFlash/MPP branches) and correct the statement about hash-build/hash-probe
treatment — if hashBuildCost (and hashProbeCost) are divided by concurrency in
the code they must be described as benefiting from parallelism, otherwise note
they sit outside the /concurrency division; ensure the text around the two
branches and the earlier comment section (the area covering the rationale
currently spanning lines ~663-677) explicitly matches the code’s use of
hashBuildCost, hashProbeCost and startCost so the comment no longer restates
code but explains the intended concurrency semantics implemented by
getPlanCostVer24PhysicalHashAgg.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/planner/core/plan_cost_ver2_test.go`:
- Around line 815-860: Replace the hot-loop string allocation inside
TestHashAggMemCostNotDividedByConcurrency: locate the loop that appends each row
using buf.WriteString(fmt.Sprintf("(%d,%d)", i, i)) and change it to use
fmt.Fprintf(&buf, "(%d,%d)", i, i) to avoid an intermediate fmt.Sprintf
allocation per iteration while preserving exact output.

In `@pkg/planner/core/plan_cost_ver2.go`:
- Around line 622-637: The comment block above getPlanCostVer24PhysicalHashAgg
is inconsistent with the implemented calculation: update the explanatory
comments to accurately reflect the implemented formulas (including that
startCost is added in both TiDB/TiKV and TiFlash/MPP branches) and correct the
statement about hash-build/hash-probe treatment — if hashBuildCost (and
hashProbeCost) are divided by concurrency in the code they must be described as
benefiting from parallelism, otherwise note they sit outside the /concurrency
division; ensure the text around the two branches and the earlier comment
section (the area covering the rationale currently spanning lines ~663-677)
explicitly matches the code’s use of hashBuildCost, hashProbeCost and startCost
so the comment no longer restates code but explains the intended concurrency
semantics implemented by getPlanCostVer24PhysicalHashAgg.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c741fa5e-db51-419f-9123-c7966eddc28c

📥 Commits

Reviewing files that changed from the base of the PR and between 874ff37 and ce99b69.

📒 Files selected for processing (30)

pkg/planner/core/casetest/binaryplan/testdata/binary_plan_suite_out.json
pkg/planner/core/casetest/binaryplan/testdata/binary_plan_suite_xut.json
pkg/planner/core/casetest/cbotest/testdata/analyze_suite_out.json
pkg/planner/core/casetest/cbotest/testdata/analyze_suite_xut.json
pkg/planner/core/casetest/enforcempp/testdata/enforce_mpp_suite_out.json
pkg/planner/core/casetest/enforcempp/testdata/enforce_mpp_suite_xut.json
pkg/planner/core/casetest/hint/testdata/integration_suite_out.json
pkg/planner/core/casetest/hint/testdata/integration_suite_xut.json
pkg/planner/core/casetest/mpp/testdata/integration_suite_out.json
pkg/planner/core/casetest/mpp/testdata/integration_suite_xut.json
pkg/planner/core/casetest/partition/testdata/partition_pruner_out.json
pkg/planner/core/casetest/partition/testdata/partition_pruner_xut.json
pkg/planner/core/casetest/physicalplantest/testdata/plan_suite_out.json
pkg/planner/core/casetest/physicalplantest/testdata/plan_suite_xut.json
pkg/planner/core/casetest/rule/testdata/join_reorder_suite_out.json
pkg/planner/core/casetest/rule/testdata/join_reorder_suite_xut.json
pkg/planner/core/casetest/rule/testdata/outer2inner_out.json
pkg/planner/core/casetest/rule/testdata/outer2inner_xut.json
pkg/planner/core/casetest/scalarsubquery/testdata/plan_suite_out.json
pkg/planner/core/casetest/scalarsubquery/testdata/plan_suite_xut.json
pkg/planner/core/casetest/testdata/integration_suite_out.json
pkg/planner/core/casetest/testdata/integration_suite_xut.json
pkg/planner/core/casetest/testdata/json_plan_suite_out.json
pkg/planner/core/casetest/testdata/json_plan_suite_xut.json
pkg/planner/core/casetest/testdata/plan_normalized_suite_out.json
pkg/planner/core/casetest/testdata/plan_normalized_suite_xut.json
pkg/planner/core/casetest/tpch/testdata/tpch_suite_out.json
pkg/planner/core/casetest/tpch/testdata/tpch_suite_xut.json
pkg/planner/core/plan_cost_ver2.go
pkg/planner/core/plan_cost_ver2_test.go

AilinKid · 2026-04-22T04:10:14Z

+//
+// For TiDB root and TiKV cop tasks:
+//
+//	plan-cost = child-cost + agg-cost + group-cost + (hash-build-cost + hash-probe-cost) / concurrency


clear comment and make sense

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/planner/core/plan_cost_ver2.go`:
- Around line 664-686: The code only checks the immediate child for
*physicalop.PhysicalTableReader when setting childIsTiFlash, so plans with unary
wrappers (e.g., Projection/Selection) hide a TiFlash TableReader; change the
detection to walk down unary single-child operators starting from
p.Children()[0] until you either find a *physicalop.PhysicalTableReader (set
childIsTiFlash = true) or hit a node with zero or multiple children (leave
false). Implement this by replacing the direct type assertion with a small loop
that examines p.Children()[0], repeatedly assigns child = child.Children()[0]
while len(child.Children())==1 and the child is one of the expected unary
wrapper types (e.g., physicalop.PhysicalProjection,
physicalop.PhysicalSelection, etc.), and then checks for
*physicalop.PhysicalTableReader to set childIsTiFlash.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f65c5efc-0223-4cf8-b4f0-f002e2ddcb15

📥 Commits

Reviewing files that changed from the base of the PR and between ee6105e and 3d43516.

📒 Files selected for processing (9)

pkg/planner/core/casetest/enforcempp/testdata/enforce_mpp_suite_out.json
pkg/planner/core/casetest/enforcempp/testdata/enforce_mpp_suite_xut.json
pkg/planner/core/casetest/tpch/testdata/tpch_suite_out.json
pkg/planner/core/casetest/tpch/testdata/tpch_suite_xut.json
pkg/planner/core/plan_cost_ver2.go
tests/integrationtest/r/globalindex/aggregate.result
tests/integrationtest/r/index_merge.result
tests/integrationtest/r/infoschema/infoschema.result
tests/integrationtest/r/planner/cascades/integration.result

✅ Files skipped from review due to trivial changes (1)

tests/integrationtest/r/infoschema/infoschema.result

coderabbitai

🧹 Nitpick comments (3)

pkg/planner/core/casetest/testdata/json_plan_suite_out.json (1)

108-153: Consider excluding volatile runtime fields from recorded JSON fixtures.

executeInfo and related timing/memory values in Line 108-Line 153 are runtime-volatile, but the current verifier (pkg/planner/core/casetest/plan_test.go:365-396) only asserts stable fields (ID, EstRows, ActRows, TaskType, AccessObject, OperatorInfo). Stripping or normalizing volatile fields during record would reduce testdata churn.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/casetest/testdata/json_plan_suite_out.json` around lines 108
- 153, The recorded JSON fixture json_plan_suite_out.json contains volatile
runtime fields (executeInfo, memoryInfo, diskInfo and similar timing/memory
strings) that cause churn; update the recorder to strip or normalize these
fields when writing fixtures so only stable fields (ID, EstRows, ActRows,
TaskType, AccessObject, OperatorInfo, etc.) are kept, i.e., remove or replace
executeInfo/memoryInfo/diskInfo (and nested equivalents under
subOperators/IndexFullScan_*/IndexReader_*) with a stable sentinel (empty string
or null) during serialization so the existing verifier that asserts stable
fields continues to pass without noisy runtime differences.

pkg/planner/core/plan_cost_ver2.go (1)

686-695: Extract the CPU-only hash-build term instead of rebuilding it inline.

This branch now carries a second copy of the hash-build formula, while the full hashBuildCost is still created earlier and ignored here. Pulling the CPU-only part into a helper would keep the traced expressions in one place and avoid branch drift the next time hashBuildCostVer2 changes.

Refactor sketch

-func hashBuildCostVer2(option *costusage.PlanCostOption, buildRows, buildRowSize, nKeys float64, cpuFactor, memFactor costusage.CostVer2Factor) costusage.CostVer2 {
-	// TODO: 1) consider types of keys, 2) dedicated factor for build-probe hash table
+func hashBuildCPUCostVer2(option *costusage.PlanCostOption, buildRows, nKeys float64, cpuFactor costusage.CostVer2Factor) costusage.CostVer2 {
 	hashKeyCost := costusage.NewCostVer2(option, cpuFactor,
 		buildRows*nKeys*cpuFactor.Value,
 		func() string { return fmt.Sprintf("hashkey(%v*%v*%v)", buildRows, nKeys, cpuFactor) })
-	hashMemCost := costusage.NewCostVer2(option, memFactor,
-		buildRows*buildRowSize*memFactor.Value,
-		func() string { return fmt.Sprintf("hashmem(%v*%v*%v)", buildRows, buildRowSize, memFactor) })
 	hashBuildCost := costusage.NewCostVer2(option, cpuFactor,
 		buildRows*cpuFactor.Value,
 		func() string { return fmt.Sprintf("hashbuild(%v*%v)", buildRows, cpuFactor) })
+	return costusage.SumCostVer2(hashKeyCost, hashBuildCost)
+}
+
+func hashBuildCostVer2(option *costusage.PlanCostOption, buildRows, buildRowSize, nKeys float64, cpuFactor, memFactor costusage.CostVer2Factor) costusage.CostVer2 {
+	// TODO: 1) consider types of keys, 2) dedicated factor for build-probe hash table
+	hashMemCost := costusage.NewCostVer2(option, memFactor,
+		buildRows*buildRowSize*memFactor.Value,
+		func() string { return fmt.Sprintf("hashmem(%v*%v*%v)", buildRows, buildRowSize, memFactor) })
-	return costusage.SumCostVer2(hashKeyCost, hashMemCost, hashBuildCost)
+	return costusage.SumCostVer2(hashBuildCPUCostVer2(option, buildRows, nKeys, cpuFactor), hashMemCost)
 }

As per coding guidelines, "Code SHOULD remain maintainable for future readers with basic TiDB familiarity" and "Code SHOULD be self-documenting through clear naming and structure."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/plan_cost_ver2.go` around lines 686 - 695, The branch
duplicates the hash-build CPU formula by recreating hashBuildCPUCost inline
while the full hashBuildCost created earlier is ignored; extract the CPU-only
portion into a helper (e.g., cpuOnlyHashBuildCost or a method on costusage like
hashBuildCostCPUOnly) that returns the CPU-only CostVer2 for the hash build and
replace the inline construction of hashBuildCPUCost in this file with a call to
that helper, and update any other places that should use the same CPU-only term
(and keep the original hashBuildCost creation intact so traced expressions
remain consistent).

pkg/planner/core/plan_cost_ver2_test.go (1)

832-845: Make the “wide row” setup less margin-sensitive.

This regression depends on outputRowSize being large enough to keep HashAgg above StreamAgg, but the inserted literals are still fairly short relative to the declared varchar(200) columns. Filling c/d closer to their declared width would give the assertion more headroom against future factor tuning.

Suggested adjustment

 		// Insert rows where every b value is unique (100% NDV).
+		wideC := strings.Repeat("c", 180)
+		wideD := strings.Repeat("d", 180)
 		var buf strings.Builder
 		buf.WriteString("insert into t_high_ndv values ")
 		for i := 0; i < 1000; i++ {
 			if i > 0 {
 				buf.WriteString(",")
 			}
-			buf.WriteString(fmt.Sprintf("(%d,%d,'%s','%s')", i, i, "padding-data-for-wide-rows", "more-padding-data-here"))
+			buf.WriteString(fmt.Sprintf("(%d,%d,'%s','%s')", i, i, wideC, wideD))
 		}

As per coding guidelines, "Keep test changes minimal and deterministic; avoid broad golden/testdata churn unless required."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/plan_cost_ver2_test.go` around lines 832 - 845, The test's
"wide row" insert into table t_high_ndv is too short to reliably bias cost
toward HashAgg; modify the value generation used in the insert builder inside
plan_cost_ver2_test.go so the c and d literals are filled close to their
declared varchar(200) length (e.g., use strings.Repeat or a precomputed longPad
variable and insert longPad[:200] or similar) when building the loop that writes
values into t_high_ndv; update the loop that appends
fmt.Sprintf("(%d,%d,'%s','%s')", ...) to use those padded strings for columns c
and d so outputRowSize is clearly large and the test becomes less
margin-sensitive.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/planner/core/casetest/testdata/json_plan_suite_out.json`:
- Around line 108-153: The recorded JSON fixture json_plan_suite_out.json
contains volatile runtime fields (executeInfo, memoryInfo, diskInfo and similar
timing/memory strings) that cause churn; update the recorder to strip or
normalize these fields when writing fixtures so only stable fields (ID, EstRows,
ActRows, TaskType, AccessObject, OperatorInfo, etc.) are kept, i.e., remove or
replace executeInfo/memoryInfo/diskInfo (and nested equivalents under
subOperators/IndexFullScan_*/IndexReader_*) with a stable sentinel (empty string
or null) during serialization so the existing verifier that asserts stable
fields continues to pass without noisy runtime differences.

In `@pkg/planner/core/plan_cost_ver2_test.go`:
- Around line 832-845: The test's "wide row" insert into table t_high_ndv is too
short to reliably bias cost toward HashAgg; modify the value generation used in
the insert builder inside plan_cost_ver2_test.go so the c and d literals are
filled close to their declared varchar(200) length (e.g., use strings.Repeat or
a precomputed longPad variable and insert longPad[:200] or similar) when
building the loop that writes values into t_high_ndv; update the loop that
appends fmt.Sprintf("(%d,%d,'%s','%s')", ...) to use those padded strings for
columns c and d so outputRowSize is clearly large and the test becomes less
margin-sensitive.

In `@pkg/planner/core/plan_cost_ver2.go`:
- Around line 686-695: The branch duplicates the hash-build CPU formula by
recreating hashBuildCPUCost inline while the full hashBuildCost created earlier
is ignored; extract the CPU-only portion into a helper (e.g.,
cpuOnlyHashBuildCost or a method on costusage like hashBuildCostCPUOnly) that
returns the CPU-only CostVer2 for the hash build and replace the inline
construction of hashBuildCPUCost in this file with a call to that helper, and
update any other places that should use the same CPU-only term (and keep the
original hashBuildCost creation intact so traced expressions remain consistent).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ca39a838-8117-43ea-a3a1-e0cb05eb8a5a

📥 Commits

Reviewing files that changed from the base of the PR and between 3d43516 and 0960f58.

📒 Files selected for processing (11)

pkg/planner/core/casetest/binaryplan/testdata/binary_plan_suite_out.json
pkg/planner/core/casetest/binaryplan/testdata/binary_plan_suite_xut.json
pkg/planner/core/casetest/physicalplantest/testdata/plan_suite_out.json
pkg/planner/core/casetest/physicalplantest/testdata/plan_suite_xut.json
pkg/planner/core/casetest/testdata/json_plan_suite_out.json
pkg/planner/core/casetest/testdata/json_plan_suite_xut.json
pkg/planner/core/casetest/tpch/testdata/tpch_suite_out.json
pkg/planner/core/casetest/tpch/testdata/tpch_suite_xut.json
pkg/planner/core/plan_cost_ver2.go
pkg/planner/core/plan_cost_ver2_test.go
tests/integrationtest/r/planner/cascades/integration.result

✅ Files skipped from review due to trivial changes (2)

pkg/planner/core/casetest/binaryplan/testdata/binary_plan_suite_xut.json
pkg/planner/core/casetest/binaryplan/testdata/binary_plan_suite_out.json

🚧 Files skipped from review as they are similar to previous changes (1)

pkg/planner/core/casetest/physicalplantest/testdata/plan_suite_xut.json

Restore mpp integration_suite testdata to match master — the stream_count removal was picked up as collateral during re-recording and is unrelated to the HashAgg cost model change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codecov · 2026-04-28T17:14:11Z

Codecov Report

❌ Patch coverage is 75.60976% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.1269%. Comparing base (5c19988) to head (a3ce481).
⚠️ Report is 12 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #67953        +/-   ##
================================================
- Coverage   77.7565%   77.1269%   -0.6297%     
================================================
  Files          1990       1972        -18     
  Lines        551624     554928      +3304     
================================================
- Hits         428924     427999       -925     
- Misses       121780     126755      +4975     
+ Partials        920        174       -746

Flag	Coverage Δ
integration	`41.7552% <75.6097%> (+1.9534%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`60.4888% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`50.0597% <ø> (-13.0338%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The hashMemCost term added by the HashAgg cost-model fix was applied unconditionally for root tasks, which over-penalized HashAgg whenever a StreamAgg alternative would itself need an explicit Sort to consume the child rows in GROUP BY order. In those cases the planner switched to Sort+StreamAgg even though paying for a sort over the join/agg output is strictly more expensive than HashAgg's hash table. Gate hashMemCost on a structural check of the HashAgg's child plan: only charge the memory penalty when the child reaches a base-table reader (IndexReader / IndexLookUpReader / IndexMergeReader / TableReader) through pass-through operators. Joins, aggregations, and other order-breaking operators skip the penalty since the StreamAgg alternative there carries its own (correctly-costed) Sort. Add a companion unit test that exercises the gate on a hash-join input to ensure HashAgg stays cheaper than Sort+StreamAgg in that scenario, alongside the existing test that keeps StreamAgg cheaper for the indexed high-NDV path. Re-record planner casetest fixtures (binary_plan, json_plan, tpch); the previously-recorded TPCH MPP plan flip reverts to its master shape, confirming the gate's effect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Append two cases to plan_cost_ver2.test that capture the plan-shape outcomes of the HashAgg memory-penalty gating as visible result diffs, making the optimizer behavior change easy to review. Case 1 (intended win): high-NDV GROUP BY on a clustered-PK table with wide payload columns. The TableFullScan delivers rows in PK prefix order for free, so the planner picks StreamAgg over HashAgg. Case 2 (gating prevents regression): high-NDV GROUP BY over a hash-join output. No free ordering on the group key, so the gating skips the HashAgg memory penalty and the planner picks HashAgg rather than Sort+StreamAgg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

terry1purcell · 2026-04-29T14:44:34Z

/retest-required

tiprow · 2026-04-29T14:44:59Z

@terry1purcell: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

terry1purcell · 2026-05-01T22:19:55Z

/ok-to-test

henrybw

LGTM. Just had a few nitpicks about making the comments more concise and less redundant, but those are nitpicks and thus nonblocking.

henrybw · 2026-05-01T23:29:15Z

+// requiring an explicit Sort. The check is structural: walk through
+// order-preserving operators and accept only base-table access paths
+// (IndexReader / IndexLookUpReader / IndexMergeReader / TableReader). Joins,
+// aggregations, applies, and other order-breaking operators return false —
+// for those, the StreamAgg alternative would need a Sort, and applying the
+// HashAgg memory penalty would unfairly favour Sort+StreamAgg over HashAgg.


Nit: I don't think comments that just paraphrase what the code does add much value. This particular point about the memory cost also seems to be repeated three times (once in the comment above getPlanCostVer24PhysicalHashAgg and another time inside the if taskType == property.RootTaskType { block).

henrybw · 2026-05-01T23:33:50Z

+		// Root (TiDB) tasks: partial workers each process a disjoint subset of input
+		// rows, so all CPU work (agg, group, hash key, probe) is genuinely parallelized
+		// and divided by concurrency. However, each partial worker maintains its own
+		// hash table, so total memory scales with the number of workers. We model this
+		// as concurrency * outputRows * rowSize * memFactor, placed outside the
+		// concurrency division. The penalty is gated on whether the child plan can
+		// provide ordering on the GROUP BY keys naturally — only then is the StreamAgg
+		// alternative free of additional Sort cost and able to benefit from HashAgg's
+		// memory penalty. When no free ordering is available, we skip the penalty so
+		// the optimizer doesn't get steered into a Sort+StreamAgg plan that costs more
+		// than HashAgg in practice.


Nit: I'm not sure this comment is adding anything more than what the comment above this function has already said. At the very least, I think it could be made more concise.

terry1purcell · 2026-05-03T15:02:25Z

/retest-required

terry1purcell · 2026-05-03T17:38:32Z

/retest-required

0xPoe

Thanks!

/hold
for https://github.com/pingcap/tidb/pull/67953/changes#r3178508370

0xPoe · 2026-05-03T17:40:25Z

+	})
+}
+
+func TestHashAggMemCostGatedOnFreeOrdering(t *testing.T) {


This test appears to pass without your fix. Perhaps we should redesign the case to ensure it can verify that the gate functions correctly. I typically verify it by running these regression tests without fixes to check if it truly serves as the regression test.

Please correct me if I misunderstood this case.

ti-chi-bot · 2026-05-03T17:40:46Z

[LGTM Timeline notifier]

Timeline:

2026-05-01 23:38:30.21309871 +0000 UTC m=+2986715.418458757: ☑️ agreed by henrybw.
2026-05-03 17:40:45.632620369 +0000 UTC m=+30318.505970331: ☑️ agreed by 0xPoe.

terry1purcell · 2026-05-03T18:43:23Z

/retest-required

0xPoe

Thanks!

Feel free to unhold. Thank you for addressing my comments.

ti-chi-bot · 2026-05-04T20:18:16Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 0xPoe, henrybw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [0xPoe]
~~pkg/planner/OWNERS~~ [0xPoe,henrybw]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

0xPoe · 2026-05-05T09:06:01Z

/unhold

0xPoe · 2026-05-05T10:01:11Z

/retest

ti-chi-bot Bot added do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. labels Apr 22, 2026

ti-chi-bot Bot added sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 22, 2026

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

AilinKid reviewed Apr 22, 2026

View reviewed changes

ti-chi-bot Bot removed the do-not-merge/needs-tests-checked label Apr 22, 2026

terry1purcell added 2 commits April 25, 2026 10:21

regen integration test1

a68232e

regen integration test2

ee6105e

terry1purcell changed the title ~~planner: Add memory overhead for HashAgg~~ planner: Add memory overhead for HashAgg on TiKV Apr 28, 2026

regen after reduced scope

3d43516

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread pkg/planner/core/plan_cost_ver2.go Outdated

regen after reduced scope to high NDV only

0960f58

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

ti-chi-bot Bot removed the do-not-merge/needs-linked-issue label Apr 28, 2026

terry1purcell and others added 3 commits April 27, 2026 21:10

regen after reduced scope to high NDV only2

ecf70d3

Merge branch 'pingcap:master' into hashaggcost

ed48bd5

regen after reduced scope to high NDV only4

ddabf0c

ti-chi-bot Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 28, 2026

terry1purcell and others added 2 commits April 28, 2026 08:59

regen after reduced scope to high NDV only5

2c40e0a

terry1purcell added 2 commits April 28, 2026 10:24

regen after reduced scope to high NDV only6

7e7fc39

intermediate commit

718434c

terry1purcell changed the title ~~planner: Add memory overhead for HashAgg on TiKV~~ planner: Add memory overhead for HashAgg on TiKV | tidb=pr/2737 Apr 29, 2026

terry1purcell changed the title ~~planner: Add memory overhead for HashAgg on TiKV | tidb=pr/2737~~ planner: Add memory overhead for HashAgg on TiKV | tidb-test=pr/2737 Apr 29, 2026

terry1purcell changed the title ~~planner: Add memory overhead for HashAgg on TiKV | tidb-test=pr/2737~~ planner: Add memory overhead for HashAgg on TiKV Apr 29, 2026

ti-chi-bot Bot added the ok-to-test Indicates a PR is ready to be tested. label May 1, 2026

henrybw approved these changes May 1, 2026

View reviewed changes

ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label May 1, 2026

simplify comments per review

a6fc281

0xPoe approved these changes May 3, 2026

View reviewed changes

ti-chi-bot Bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm approved and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 3, 2026

refactor for inneffective test

a3ce481

0xPoe approved these changes May 4, 2026

View reviewed changes

ti-chi-bot Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 5, 2026

ti-chi-bot Bot merged commit bf7884d into pingcap:master May 5, 2026
35 checks passed

Conversation

terry1purcell commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

planner: fix HashAgg cost model — place hash table memory cost outside concurrency division

Root cause

Fix

New formula (root/cop):

MPP (unchanged):

Impact

Regression test

Check List

Release note

Summary by CodeRabbit

Uh oh!

pantheon-ai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tiprow Bot commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

AilinKid Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

terry1purcell commented Apr 29, 2026

Uh oh!

tiprow Bot commented Apr 29, 2026

Uh oh!

terry1purcell commented May 1, 2026

Uh oh!

henrybw left a comment

Choose a reason for hiding this comment

Uh oh!

henrybw May 1, 2026

Choose a reason for hiding this comment

Uh oh!

terry1purcell May 3, 2026

Choose a reason for hiding this comment

Uh oh!

henrybw May 1, 2026

Choose a reason for hiding this comment

Uh oh!

terry1purcell May 3, 2026

Choose a reason for hiding this comment

Uh oh!

terry1purcell commented May 3, 2026

Uh oh!

terry1purcell commented May 3, 2026

Uh oh!

0xPoe left a comment

Choose a reason for hiding this comment

Uh oh!

0xPoe May 3, 2026

Choose a reason for hiding this comment

terry1purcell commented Apr 22, 2026 •

edited

Loading

pantheon-ai Bot commented Apr 22, 2026 •

edited

Loading

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

codecov Bot commented Apr 28, 2026 •

edited

Loading