fix: qualify bare Column expressions in GraphNode.extract_select_items (#252)#254
fix: qualify bare Column expressions in GraphNode.extract_select_items (#252)#254
Conversation
Promotes bi-6 from FAIL to PASS. Embedded benchmark: 27/36 (75%). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
#252) ViewScan returns unqualified Column("name") from extract_select_items(). The SQL generator's fallback heuristic assigned "t" as default alias — which is the VLP CTE internal alias, making the SQL appear to reference VLP scope from outside. Fix: qualify bare Columns with GraphNode's alias (e.g., "friend.name" instead of "t.name"). Same pattern already used in GraphNode.to_render_plan. Fixes complex-12 "Unknown expression identifier t.length" on chdb. Adds LDBC complex-12 regression test. Closes #252 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…#253) UNWIND produces a scalar alias (person), not a table. Pattern comprehension correlation was generating person.id treating it as a table with an id column. Two fixes in plan_builder_utils.rs: 1. Phase D: skip adding ID column for ARRAY JOIN scalar aliases (detected via bare alias match in CTE SELECT + alias_to_id self-map) 2. find_pc_cte_join_column: use bare scalar reference instead of non-existent p6_person_id column Regression test updated to use official bi-8 query with assertions for ARRAY JOIN presence and absence of person.id. Closes #253 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
genezhang
left a comment
There was a problem hiding this comment.
PR Review: Fix VLP CTE Alias Leak + ARRAY JOIN Scalar Detection (#252, #253)
Three commits fixing two related query generation bugs. Both are in the render plan / select builder — high-risk area per CLAUDE.md.
Commit 1: Benchmark comment stripping (trivial)
Regex fix for // comments at EOF without trailing newline. (\n|$) is correct. No concerns.
Commit 2: Qualify bare Column in GraphNode.extract_select_items (#252)
The fix is correct and well-scoped. ViewScan returns unqualified Column("name") → SQL generator defaults to t (VLP internal alias) → broken SQL. Fix qualifies bare Column with graph_node.alias by converting to PropertyAccessExp.
One observation on the regression test (ldbc_complex_12_official, lines 313-343):
The test checks for t. leak by scanning lines with starts_with("t."). This is somewhat fragile — it would miss cases like SELECT t.foo (where t. isn't at the start of the line) or false-positive on indented lines. A simpler assertion might be:
// After extracting inner_select:
assert!(!inner_select.contains(" t."), "VLP alias leak");But this is a minor nit — the test catches the specific regression.
Commit 3: ARRAY JOIN scalar detection (#253)
This is the more complex fix. Two changes in plan_builder_utils.rs:
Phase D fix (lines 9983-10012): Detects UNWIND/ARRAY JOIN scalars by checking:
- CTE SELECT has a column whose alias exactly matches the exported alias (bare match)
alias_to_idmaps the alias to itself (self-referential = scalar IS the ID)
Both conditions together is a reasonable heuristic. The comment explains the logic well.
find_pc_cte_join_column fix (lines 17449-17480): Checks if ViewScan.property_mapping has a bare column matching var_name, and if so emits from_alias."var_name" instead of from_alias.p6_var_name_id.
Concern: The ViewScan property_mapping check at line 17456 iterates all values looking for a Column variant matching var_name. This works for the UNWIND case but could theoretically false-positive if a ViewScan happens to have a property column named the same as a non-scalar variable. The dual-condition guard in Phase D (bare alias + self-id) is more robust — should a similar guard be applied here too?
Test quality: The updated ldbc_bi_8 test (line 447) is good — switches from the adapted workaround query to the official one and asserts both the presence of ARRAY JOIN and the absence of the invalid person.id AS "p6_person_id". Concrete negative assertion is the right approach.
Verdict
Both fixes are correct and well-targeted. The Phase D dual-condition guard is solid. Minor concern about the find_pc_cte_join_column heuristic being single-condition vs dual — consider adding the alias_to_id self-map check there too for consistency. Not a blocker.
…oin_column Per review: added CTE SELECT bare-alias check (condition 1) alongside the existing ViewScan bare-column check (condition 2) for consistency with Phase D's dual-condition guard. Prevents false positives if a ViewScan property column happens to share a name with a non-scalar var. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
VLP CTE alias
tleaked into outer SELECT —t.lengthinstead offriend.length. Root cause:GraphNode.extract_select_items()returned bareColumn("name")without table qualifier, and SQL generator defaulted tot.Fix: qualify bare Columns with GraphNode alias. Closes #252.
Test plan
t.in outer SELECT🤖 Generated with Claude Code