DEV-1410: expand bare same-model derived refs in Column.sql#123
Conversation
Bare identifiers in Column.sql that name a sibling derived column now inline parenthesised, identical to the qualified <host>.<col> form. The two branches in _process_column_node are unified — bare refs are treated as host-alias-qualified, routing through the existing _walk_path_to_target + target_col lookup + recursion path. Inlining is scope-guarded via sqlglot.optimizer.scope.traverse_scope: derived-column rewrites only fire on root-scope exp.Column nodes. Refs inside sub-queries, set-op branches, CTEs, VALUES, etc. are left alone because they belong to inner rowsets. Window OVER() partition/order columns remain root-scope. Cycle detection now runs at storage.save_model time as well as compile time. StorageBackend.save_model is a template method that calls a new validate_no_column_cycles helper before delegating to the backend's new _save_model_impl abstract method. Cycles raise ColumnCycleError, which multi-inherits SlayerError and ValueError so existing call sites that catch ValueError keep working. The migration write-back at _migrate_and_refine_on_load passes _validate=False so legacy cyclic models remain loadable. A small fix in _walk_path_to_target: literal match against the host's FROM alias / model name comes before any __-splitting, so a bare ref on a multi-hop FROM alias like "B__C" routes back to the host instead of being mis-parsed as a join walk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughImplements DEV-1410: add save-time derived-column cycle detection (ColumnCycleError), make compile-time derived inlining root-scope aware (skip nested scopes), convert Storage.save_model to a template with _save_model_impl hook, and add docs + tests covering these behaviors. ChangesDEV-1410: Derived-column cycle detection and scope-aware expansion
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
slayer/engine/column_dependency.py (3)
145-173: 💤 Low valueUse
collections.dequefor O(1) popleft instead oflist.pop(0).
queue.pop(0)is O(n) for lists. For BFS over join graphs, this is unlikely to matter in practice (models rarely have many joins), butdeque.popleft()is the idiomatic choice.♻️ Suggested improvement
+from collections import deque + async def _prefetch_reachable_models( *, model: SlayerModel, storage: "StorageBackend", ) -> Dict[str, SlayerModel]: out: Dict[str, SlayerModel] = {model.name: model} - queue: List[SlayerModel] = [model] + queue: deque[SlayerModel] = deque([model]) while queue: - current = queue.pop(0) + current = queue.popleft() for join in current.joins: ... - queue.append(target) + queue.append(target) # append is O(1) for deque too🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/engine/column_dependency.py` around lines 145 - 173, The BFS in _prefetch_reachable_models uses a Python list and queue.pop(0) which is O(n); replace the list-based queue with collections.deque for O(1) popleft: import deque (or collections.deque) and change the queue type to a deque (optionally annotate with Deque[SlayerModel]), initialize with deque([model]), replace queue.pop(0) with queue.popleft(), and keep queue.append(target) unchanged so behavior of _prefetch_reachable_models and its use of storage.get_model and out mapping remains the same.
105-142: ⚖️ Poor tradeoffConsider extracting the inner
gofunction or simplifying control flow.SonarCloud flags cognitive complexity 18 (allowed 15). The nested
gofunction with multiple exit paths contributes to the score. The DFS logic is correct and well-structured for its purpose. If you want to address the linter, consider flattening the recursion into an iterative stack-based approach, though the current recursive form is idiomatic for cycle detection.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/engine/column_dependency.py` around lines 105 - 142, The cognitive complexity comes from the nested inner function go inside _detect_cycle_dfs; extract that DFS into a separate helper (e.g., _dfs_from or _detect_cycle_dfs_helper) that accepts start (Tuple[str,str]), reachable (Dict[str,SlayerModel]) and the traversal state (on_stack, on_stack_set, visited) as parameters, move the recursion logic out of _detect_cycle_dfs so _detect_cycle_dfs only initializes on_stack/on_stack_set/visited and calls the new helper, and ensure you preserve the same return semantics (Optional[List[Tuple[str,str]]]) and calls to _column_dependencies so behavior of _detect_cycle_dfs, host.get_column, and the cycle return value remain unchanged.
61-102: 💤 Low valueConsider extracting early-exit checks to reduce cognitive complexity.
SonarCloud flags cognitive complexity 16 (allowed 15). The function is cohesive and the complexity is inherent to AST traversal with multiple filter conditions. The current structure is readable, but if you want to address the linter, consider extracting the node-filtering logic into a small helper.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/engine/column_dependency.py` around lines 61 - 102, The function _column_dependencies has several nested checks that raise cognitive complexity; extract the per-node filter logic into a small helper (e.g., _is_relevant_root_column(node, root_ids, host, reachable)) and use early returns for simple cases; specifically, keep the existing early exits for column.sql being None or trivial base and parse exception, then replace the loop body that checks root membership, multi-part qualifiers (db/catalog), resolves table_alias via table_id, calls _resolve_target_for_ref, gets target.get_column and _is_trivial_base with a single helper that returns True/False, and only when False append (target.name, target_col.name) — this moves the conditions out of _column_dependencies while preserving calls to _is_trivial_base, _root_scope_column_ids and _resolve_target_for_ref.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@slayer/engine/column_dependency.py`:
- Around line 145-173: The BFS in _prefetch_reachable_models uses a Python list
and queue.pop(0) which is O(n); replace the list-based queue with
collections.deque for O(1) popleft: import deque (or collections.deque) and
change the queue type to a deque (optionally annotate with Deque[SlayerModel]),
initialize with deque([model]), replace queue.pop(0) with queue.popleft(), and
keep queue.append(target) unchanged so behavior of _prefetch_reachable_models
and its use of storage.get_model and out mapping remains the same.
- Around line 105-142: The cognitive complexity comes from the nested inner
function go inside _detect_cycle_dfs; extract that DFS into a separate helper
(e.g., _dfs_from or _detect_cycle_dfs_helper) that accepts start
(Tuple[str,str]), reachable (Dict[str,SlayerModel]) and the traversal state
(on_stack, on_stack_set, visited) as parameters, move the recursion logic out of
_detect_cycle_dfs so _detect_cycle_dfs only initializes
on_stack/on_stack_set/visited and calls the new helper, and ensure you preserve
the same return semantics (Optional[List[Tuple[str,str]]]) and calls to
_column_dependencies so behavior of _detect_cycle_dfs, host.get_column, and the
cycle return value remain unchanged.
- Around line 61-102: The function _column_dependencies has several nested
checks that raise cognitive complexity; extract the per-node filter logic into a
small helper (e.g., _is_relevant_root_column(node, root_ids, host, reachable))
and use early returns for simple cases; specifically, keep the existing early
exits for column.sql being None or trivial base and parse exception, then
replace the loop body that checks root membership, multi-part qualifiers
(db/catalog), resolves table_alias via table_id, calls _resolve_target_for_ref,
gets target.get_column and _is_trivial_base with a single helper that returns
True/False, and only when False append (target.name, target_col.name) — this
moves the conditions out of _column_dependencies while preserving calls to
_is_trivial_base, _root_scope_column_ids and _resolve_target_for_ref.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e2796a55-85d1-4fa7-9772-0409cff65cd4
📒 Files selected for processing (12)
.claude/skills/slayer-models.mdCLAUDE.mddocs/concepts/models.mdslayer/core/errors.pyslayer/engine/column_dependency.pyslayer/engine/column_expansion.pyslayer/storage/base.pyslayer/storage/join_sync.pyslayer/storage/sqlite_storage.pyslayer/storage/yaml_storage.pytests/test_column_dependency.pytests/test_cross_model_derived_columns.py
…guard) Three fixes from the codex review of PR #123: 1. Save-time validator's alias resolution now mirrors the compile-time _walk_path_to_target: a bare ``B.col`` ref is only a dependency if the host has a direct join to B, and canonical multi-hop refs (``B__C.col``) are walked through the join chain. Previously the validator accepted any reachable-via-BFS model and missed canonical path aliases entirely — false-positive on indirect references, false-negative on multi-hop path syntax. 2. _root_scope_column_ids fail-closed: if the wrapper/parsed pairing ever drifts, suppress derived-column inlining for the fragment instead of treating everything as root-scope (which would silently splice nested-scope refs). 3. JoinSyncStorage._save_model_impl docstring corrected: the inner save intentionally skips re-validation (cycle validation runs ONCE at the outer decorator layer), the previous wording was the opposite. Two new tests in tests/test_column_dependency.py pin the alias-walk fix: - indirect-join target ref is not a dep (false-positive guard) - canonical multi-hop path alias resolves and detects cycles (false-negative guard) Full non-integration suite: 2250 passed. ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (3)
slayer/engine/column_dependency.py (3)
83-124: ⚖️ Poor tradeoffAcknowledge SonarCloud cognitive complexity flag (16 vs 15 allowed).
The complexity here is largely inherent to the algorithm (parsing, filtering, resolving). If you want to address the linter, consider extracting the inner loop body (lines 107-123) into a small helper like
_resolve_single_column(node, host, reachable) -> Optional[Tuple[str, str]]. Otherwise, the code is readable and the overage is minor.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/engine/column_dependency.py` around lines 83 - 124, The SonarCloud cognitive complexity can be reduced by extracting the inner loop logic of _column_dependencies into a helper; create a new function _resolve_single_column(node: exp.Column, host: SlayerModel, reachable: Dict[str, SlayerModel]) -> Optional[Tuple[str, str]] that performs the checks currently inside the for loop (skip non-root ids, multi-part qualifiers, resolve table_alias via _resolve_target_for_ref, fetch target_col and drop trivial/base or missing columns) and returns (target.name, target_col.name) or None; then simplify _column_dependencies to iterate parsed.find_all(exp.Column), call _resolve_single_column for each node and append non-None results to deps. Ensure the helper uses existing symbols _resolve_target_for_ref, _is_trivial_base, and _root_scope_column_ids remains unchanged.
176-180: ⚡ Quick winUse
collections.dequefor O(1) BFS operations.
list.pop(0)is O(n) due to element shifting. For BFS,collections.dequewithpopleft()provides O(1) removal.♻️ Proposed fix
Add to imports at line 20:
from collections import dequeThen update the function:
out: Dict[str, SlayerModel] = {model.name: model} - queue: List[SlayerModel] = [model] + queue: deque[SlayerModel] = deque([model]) while queue: - current = queue.pop(0) + current = queue.popleft()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/engine/column_dependency.py` around lines 176 - 180, Replace the BFS list usage with a deque to avoid O(n) pop(0): add "from collections import deque" to imports, initialize queue as deque([model]) instead of list, and replace queue.pop(0) with queue.popleft() in the traversal loop (references: variables queue, out and the SlayerModel traversal in column_dependency.py).
127-164: ⚖️ Poor tradeoffAcknowledge SonarCloud cognitive complexity flag (18 vs 15 allowed).
The nested
gofunction is idiomatic for DFS closures over shared state (on_stack,on_stack_set,visited). Converting to an iterative approach or extractinggoas a standalone function would require threading more state through parameters. Given the algorithm's inherent complexity and the code's readability, this is acceptable to leave as-is or defer.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@slayer/engine/column_dependency.py` around lines 127 - 164, SonarCloud flags _detect_cycle_dfs for cognitive complexity but the nested DFS closure (inner function go with shared state on_stack/on_stack_set/visited) is intentional and readable; add a local Sonar suppression by annotating the function definition of _detect_cycle_dfs with a NOSONAR comment (e.g., append "# NOSONAR" or equivalent Sonar ignore marker) to silence the cognitive-complexity alert while keeping the nested go implementation as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@slayer/engine/column_dependency.py`:
- Around line 83-124: The SonarCloud cognitive complexity can be reduced by
extracting the inner loop logic of _column_dependencies into a helper; create a
new function _resolve_single_column(node: exp.Column, host: SlayerModel,
reachable: Dict[str, SlayerModel]) -> Optional[Tuple[str, str]] that performs
the checks currently inside the for loop (skip non-root ids, multi-part
qualifiers, resolve table_alias via _resolve_target_for_ref, fetch target_col
and drop trivial/base or missing columns) and returns (target.name,
target_col.name) or None; then simplify _column_dependencies to iterate
parsed.find_all(exp.Column), call _resolve_single_column for each node and
append non-None results to deps. Ensure the helper uses existing symbols
_resolve_target_for_ref, _is_trivial_base, and _root_scope_column_ids remains
unchanged.
- Around line 176-180: Replace the BFS list usage with a deque to avoid O(n)
pop(0): add "from collections import deque" to imports, initialize queue as
deque([model]) instead of list, and replace queue.pop(0) with queue.popleft() in
the traversal loop (references: variables queue, out and the SlayerModel
traversal in column_dependency.py).
- Around line 127-164: SonarCloud flags _detect_cycle_dfs for cognitive
complexity but the nested DFS closure (inner function go with shared state
on_stack/on_stack_set/visited) is intentional and readable; add a local Sonar
suppression by annotating the function definition of _detect_cycle_dfs with a
NOSONAR comment (e.g., append "# NOSONAR" or equivalent Sonar ignore marker) to
silence the cognitive-complexity alert while keeping the nested go
implementation as-is.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 49e28516-e724-487b-9088-a95b951ac2a3
📒 Files selected for processing (4)
slayer/engine/column_dependency.pyslayer/engine/column_expansion.pyslayer/storage/join_sync.pytests/test_column_dependency.py
🚧 Files skipped from review as they are similar to previous changes (2)
- slayer/engine/column_expansion.py
- tests/test_column_dependency.py
Two refactors in slayer/engine/column_dependency.py to drop the file under SonarCloud's 15-cognitive-complexity threshold (was 16 and 18): 1. Extract _resolve_single_column from _column_dependencies' for-loop body. The outer function becomes a flat append of non-None results. 2. Lift _detect_cycle_dfs's nested closure into a top-level _dfs_visit helper that takes the visit-state lists as parameters. The closure was inflating complexity by double-counting via the outer def scope. Also switches _prefetch_reachable_models' BFS queue from list.pop(0) (O(n)) to collections.deque.popleft (O(1)) — idiomatic Python BFS; addresses CodeRabbit nitpick. No behaviour change; full non-integration suite still 2250 passed, ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SonarCloud quality gate was failing on new_duplicated_lines_density (5.1% vs 3% threshold) — driven entirely by repeated cross-model fixture boilerplate in the new DEV-1410 tests, not by any individual Sonar issue. Two new factory helpers at the top of tests/test_column_dependency.py: - _model_a_to_b(*, foo_sql) — A↔B scaffold, used by two cycle tests. - _abc_chain(*, a_foo_sql, c_x_sql, c_joins_back_to_a=False) — A→B→C scaffold, used by the strict-alias-walk tests; the back-join is an optional parameter because only the canonical multi-hop test needs it. tests/test_cross_model_derived_columns.py: route the disambiguation test through the existing _save_a_b helper by injecting the extra foo_raw column via a_columns=, instead of constructing the A/B pair inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|



Summary
Column.sqlnaming a sibling derived column now inline parenthesised, identical to the qualified<host>.<col>form. Closes the DEV-1410 repro (iqs.sql = "CAST(wateraccess_score AS REAL) / 3.0"no longer emitsinfrastructure.wateraccess_score).sqlglot.optimizer.scope.traverse_scope: rewrites only fire on root-scopeexp.Columnnodes. Sub-queries, set-op branches, CTEs, andVALUESrowsets are left alone — they belong to inner scopes. WindowOVER()partition/order columns remain root-scope.StorageBackend.save_modelis a template method calling a newvalidate_no_column_cycleshelper before delegating to_save_model_impl(new abstract method, implemented by every concrete backend). Cycles raiseColumnCycleError(SlayerError, ValueError), so existingexcept ValueErrorcall sites keep working. The migration write-back passes_validate=Falseto keep legacy cyclic data loadable.Test plan
tests/test_cross_model_derived_columns.py(same-model, three-deep, mixed base+derived, CASE/COALESCE/NULLIF/CAST, sub-query, UNION, VALUES, window OVER, qualified-collision, string-literal lookalike, double-underscore JSON-leaf name, compile-time cycle, exact DEV-1410 repro).tests/test_column_dependency.py(same-model cycle, three-deep, self-referential, cross-model within datasource, save-B-second completes cycle, acyclic DAG, unresolved joined ref, subquery-scope refs not counted, migration write-back tolerated,_validate=Falseescape hatch, YAML+SQLite template-method parameterisation).poetry run pytest -m "not integration"— 2248 passed, 0 failed.poetry run ruff check slayer/ tests/clean.🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
New Features
Documentation
Tests