Skip to content

fix: qualify bare Column expressions in GraphNode.extract_select_items (#252)#254

Merged
genezhang merged 5 commits intomainfrom
fix/complex-12-cte-alias-leak
Mar 28, 2026
Merged

fix: qualify bare Column expressions in GraphNode.extract_select_items (#252)#254
genezhang merged 5 commits intomainfrom
fix/complex-12-cte-alias-leak

Conversation

@genezhang
Copy link
Copy Markdown
Owner

Summary

VLP CTE alias t leaked into outer SELECT — t.length instead of friend.length. Root cause: GraphNode.extract_select_items() returned bare Column("name") without table qualifier, and SQL generator defaulted to t.

Fix: qualify bare Columns with GraphNode alias. Closes #252.

Test plan

  • 1,600+ tests pass (1 new regression test)
  • complex-12 SQL no longer contains t. in outer SELECT

🤖 Generated with Claude Code

genezhang and others added 3 commits March 27, 2026 20:09
Promotes bi-6 from FAIL to PASS. Embedded benchmark: 27/36 (75%).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
#252)

ViewScan returns unqualified Column("name") from extract_select_items().
The SQL generator's fallback heuristic assigned "t" as default alias —
which is the VLP CTE internal alias, making the SQL appear to reference
VLP scope from outside.

Fix: qualify bare Columns with GraphNode's alias (e.g., "friend.name"
instead of "t.name"). Same pattern already used in GraphNode.to_render_plan.

Fixes complex-12 "Unknown expression identifier t.length" on chdb.
Adds LDBC complex-12 regression test.

Closes #252

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…#253)

UNWIND produces a scalar alias (person), not a table. Pattern
comprehension correlation was generating person.id treating it as
a table with an id column.

Two fixes in plan_builder_utils.rs:
1. Phase D: skip adding ID column for ARRAY JOIN scalar aliases
   (detected via bare alias match in CTE SELECT + alias_to_id self-map)
2. find_pc_cte_join_column: use bare scalar reference instead of
   non-existent p6_person_id column

Regression test updated to use official bi-8 query with assertions
for ARRAY JOIN presence and absence of person.id.

Closes #253

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@genezhang genezhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Fix VLP CTE Alias Leak + ARRAY JOIN Scalar Detection (#252, #253)

Three commits fixing two related query generation bugs. Both are in the render plan / select builder — high-risk area per CLAUDE.md.

Commit 1: Benchmark comment stripping (trivial)

Regex fix for // comments at EOF without trailing newline. (\n|$) is correct. No concerns.

Commit 2: Qualify bare Column in GraphNode.extract_select_items (#252)

The fix is correct and well-scoped. ViewScan returns unqualified Column("name") → SQL generator defaults to t (VLP internal alias) → broken SQL. Fix qualifies bare Column with graph_node.alias by converting to PropertyAccessExp.

One observation on the regression test (ldbc_complex_12_official, lines 313-343):

The test checks for t. leak by scanning lines with starts_with("t."). This is somewhat fragile — it would miss cases like SELECT t.foo (where t. isn't at the start of the line) or false-positive on indented lines. A simpler assertion might be:

// After extracting inner_select:
assert!(!inner_select.contains(" t."), "VLP alias leak");

But this is a minor nit — the test catches the specific regression.

Commit 3: ARRAY JOIN scalar detection (#253)

This is the more complex fix. Two changes in plan_builder_utils.rs:

Phase D fix (lines 9983-10012): Detects UNWIND/ARRAY JOIN scalars by checking:

  1. CTE SELECT has a column whose alias exactly matches the exported alias (bare match)
  2. alias_to_id maps the alias to itself (self-referential = scalar IS the ID)

Both conditions together is a reasonable heuristic. The comment explains the logic well.

find_pc_cte_join_column fix (lines 17449-17480): Checks if ViewScan.property_mapping has a bare column matching var_name, and if so emits from_alias."var_name" instead of from_alias.p6_var_name_id.

Concern: The ViewScan property_mapping check at line 17456 iterates all values looking for a Column variant matching var_name. This works for the UNWIND case but could theoretically false-positive if a ViewScan happens to have a property column named the same as a non-scalar variable. The dual-condition guard in Phase D (bare alias + self-id) is more robust — should a similar guard be applied here too?

Test quality: The updated ldbc_bi_8 test (line 447) is good — switches from the adapted workaround query to the official one and asserts both the presence of ARRAY JOIN and the absence of the invalid person.id AS "p6_person_id". Concrete negative assertion is the right approach.

Verdict

Both fixes are correct and well-targeted. The Phase D dual-condition guard is solid. Minor concern about the find_pc_cte_join_column heuristic being single-condition vs dual — consider adding the alias_to_id self-map check there too for consistency. Not a blocker.

genezhang and others added 2 commits March 28, 2026 08:24
…oin_column

Per review: added CTE SELECT bare-alias check (condition 1) alongside
the existing ViewScan bare-column check (condition 2) for consistency
with Phase D's dual-condition guard. Prevents false positives if a
ViewScan property column happens to share a name with a non-scalar var.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@genezhang genezhang merged commit 20e5247 into main Mar 28, 2026
4 checks passed
@genezhang genezhang deleted the fix/complex-12-cte-alias-leak branch March 28, 2026 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: complex-12 VLP CTE alias leaks into outer SELECT scope

1 participant