fix: emit proper CTE column for ARRAY JOIN scalar aliases (#253)#257
fix: emit proper CTE column for ARRAY JOIN scalar aliases (#253)#257
Conversation
Two fixes for bi-8 person.id bug: 1. Priority 3 in alias_to_id: find_unwind_aliases() traverses plan tree to detect UNWIND aliases, sets alias_to_id[alias] = alias (scalar IS the ID). Enables downstream Phase D detection. 2. Phase D: emit person_tag.person AS p6_person_id instead of skipping. Creates proper CTE column that downstream CTEs can resolve through standard property mapping. Before: person.id AS p6_person_id (invalid — person is scalar) After: person_tag.person AS p6_person_id (correct) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
genezhang
left a comment
There was a problem hiding this comment.
PR Review: Emit proper CTE column for ARRAY JOIN scalar aliases (#253)
Follow-up to the ARRAY JOIN scalar detection in PR #254. Changes the Phase D behavior from "skip" to "emit correct column reference."
Changes
1. Phase D: emit instead of skip (plan_builder_utils.rs:10006-10032)
Previously, detecting an ARRAY JOIN scalar caused a continue (skip adding the ID column). Now it emits from_alias.scalar_col AS "pN_alias_id" — so person_tag.person AS "p6_person_id". This is correct: downstream CTEs need the p6_person_id column to resolve person.id through standard property mapping.
The fallback path (no FROM alias → log and skip) is reasonable.
2. Priority 3 in alias_to_id: find_unwind_aliases() (plan_builder_utils.rs:11253-11289)
Traverses the plan tree to find Unwind nodes and collects their aliases. Sets alias_to_id[alias] = alias for detected UNWIND scalars. This feeds into the Phase D detection (is_self_id_in_upstream_cte).
One observation: find_unwind_aliases handles a specific set of plan node variants (Filter, Projection, OrderBy, Limit, Skip, GroupBy, WithClause). If a new wrapping plan node is added in the future, it won't be traversed. This is the same pattern used elsewhere in the codebase though (per CLAUDE.md §5 — plan traversal consistency), so it's consistent. Just needs to be remembered when adding new LogicalPlan variants.
3. Missing continue after Priority 2 (plan_builder_utils.rs:11227)
The added continue after Priority 2's alias_to_id_column.insert() is important — without it, Priority 3 would also run and potentially overwrite a more specific mapping. Good catch.
4. Regression test (ldbc_regression_tests.rs:472-478)
Adds a positive assertion: sql.contains("person_tag.person AS \"p6_person_id\""). Combined with the existing negative assertion (!sql.contains("person.id AS ...")), this fully validates the fix. Good.
Verdict
Clean fix that evolves the "skip" approach from PR #254 into the correct "emit proper column" approach. The find_unwind_aliases traversal is consistent with existing patterns. LGTM.
Fixes bi-8 person.id by tracking UNWIND aliases through alias_to_id and emitting correct CTE column in Phase D.