Skip to content

[CORE] Fix multi-key DPP support in ColumnarSubqueryBroadcastExec#11795

Draft
yaooqinn wants to merge 1 commit intoapache:mainfrom
yaooqinn:fix/multi-key-dpp
Draft

[CORE] Fix multi-key DPP support in ColumnarSubqueryBroadcastExec#11795
yaooqinn wants to merge 1 commit intoapache:mainfrom
yaooqinn:fix/multi-key-dpp

Conversation

@yaooqinn
Copy link
Member

What changes were proposed in this pull request?

Fix the BuildSideRelation path in ColumnarSubqueryBroadcastExec to handle multiple filtering keys instead of only using the first key (indices(0)).

Problem

The existing code had a TODO(): fixme at line 92 that only used indices(0), silently dropping extra keys when Spark 4.0 generates multi-key DPP subqueries (SPARK-46946):

val index = indices(0) // TODO(): fixme

This caused multi-key Dynamic Partition Pruning to be silently ineffective — only the first key was used for partition filtering.

Fix

  • For single-key DPP: behavior unchanged (direct key projection)
  • For multi-key DPP: all keys are projected via CreateStruct, matching the HashedRelation path's multi-key support

This resolves potential DPP loss in queries like TPC-DS q23a/q23b with multi-column partition join keys.

How was this patch tested?

Existing CI tests. The multi-key DPP path is exercised by Spark 4.0's DPP test suites.

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions github-actions bot added the CORE works for Gluten Core label Mar 19, 2026
@yaooqinn yaooqinn force-pushed the fix/multi-key-dpp branch from 0c33816 to 9e88e1e Compare March 19, 2026 17:47
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@yaooqinn yaooqinn marked this pull request as draft March 19, 2026 17:50
@yaooqinn yaooqinn force-pushed the fix/multi-key-dpp branch from 9e88e1e to f8cc6bb Compare March 19, 2026 17:51
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

Fix the BuildSideRelation path to handle multiple filtering keys
instead of only using the first key (indices(0)). This resolves the
TODO/FIXME that caused multi-key DPP to silently drop extra keys.

For single-key DPP, behavior is unchanged. For multi-key DPP
(SPARK-46946), all keys are now projected via CreateStruct, matching
the HashedRelation path's multi-key support.

This fixes potential DPP loss in queries like TPC-DS q23a/q23b that
have multi-column partition join keys.
@yaooqinn yaooqinn force-pushed the fix/multi-key-dpp branch from f8cc6bb to fc1fe80 Compare March 19, 2026 18:34
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant