refactor(ops): split 5,300-line ops.py into a package with focused submodules by hussainsultan · Pull Request #256 · boringdata/boring-semantic-layer

hussainsultan · 2026-05-04T12:52:05Z

Stacked PR — 3 of 3 (top of stack)

This PR is stacked on top of #255, which is stacked on #254. It targets refactor/except-audit, not main. Read in order:

1 of 3: refactor: route all xorq imports through single _xorq shim #254 (refactor/xorq-shim → main) — must land first
2 of 3: refactor: audit broad except sites in ops.py #255 (refactor/except-audit → refactor/xorq-shim)
3 of 3: this PR (refactor/split-ops → refactor/except-audit)

Each PR in the stack is independently green. Once #254 and #255 merge, GitHub will retarget this PR to main automatically.

Summary

Convert src/boring_semantic_layer/ops.py (5,330 lines, 19 classes, 100+ functions) to a package, and pull self-contained sections into focused submodules. _core.py reduced from 5,330 → 2,908 lines (45%), with 13 new submodules carrying the rest.

All extractions are pure moves — no logic changes. ops/__init__.py re-exports every name external callers reference, so existing from boring_semantic_layer.ops import X imports keep working unchanged.

New module layout

Module	Lines	Contents
`_core.py`	2,908	Aggregate machinery, `SemanticJoinOp`, top-level utilities (left here on purpose — see below)
`_root_models.py`	483	Root-model traversal, join column-rename machinery
`_basic_ops.py`	436	`SemanticTableOp`, `SemanticFilterOp`, `SemanticProjectOp`, `SemanticGroupByOp`
`_column_extraction.py`	369	`ColumnTracker`, `TableColumnRequirements`, projection-pushdown extraction
`_index_op.py`	321	`SemanticIndexOp` + 6 fragment helpers
`_measure_helpers.py`	314	Measure classification, column-error formatting
`_values.py`	166	Public `Dimension`, `Measure` value objects
`_mutate_unnest.py`	161	`SemanticMutateOp`, `SemanticUnnestOp`
`_format.py`	160	Pipeline `__repr__` (`_semantic_repr` etc.)
`_xorq_compat.py`	127	ibis↔xorq bridge shims
`_order_limit.py`	118	`SemanticOrderByOp`, `SemanticLimitOp`
`_normalize.py`	105	`_normalize_to_name`, `_normalize_join_predicate`
`_callable.py`	60	`_CallableWrapper` cluster
`__init__.py`	67	Re-exports the public + private surface

Why

ops.py had become a god-file mixing:

Value objects (Dimension, Measure)
11 Op classes
Join resolution & column-rename machinery
Aggregation planning
Column-requirement extraction for projection pushdown
xorq compatibility shims
repr / format helpers

A cross-cutting change touched 5,000+ lines of context. Recent fixes documented in MEMORY.md (multi-way join column ambiguity, pre-aggregation join direction, resolver hash fix) span this file. Splitting along natural seams makes future changes easier to scope and review.

Why `_core.py` is still ~2,900 lines

SemanticAggregateOp (~1,690 lines incl. 12 planning helpers) and SemanticJoinOp (~900 lines incl. join-tree machinery) are deeply coupled to each other and to the central helpers (_to_untagged, _resolve_expr, _get_merged_fields, _classify_dependencies). Extracting them in this PR risks subtle behavior changes in the intricate fixes documented in MEMORY.md.

A future focused PR should split these — but it's the kind of pass that benefits from a fresh review window, not piling onto today's stack. Suggested follow-up:

Extract Aggregate machinery + SemanticAggregateOp to _aggregate.py
Extract SemanticJoinOp + helpers (_RenamedResolver, _BSL_JOIN_KEY_TMP_PREFIX) to _join_op.py

Test plan

930 unit tests pass on every commit in this branch (no regressions vs. parent branch)
Each commit is a focused move (column extraction, value objects, callable wrapper, xorq compat, normalize, root models, format/repr, Index op, OrderBy/Limit, Mutate/Unnest, measure helpers, basic ops)
Public API unchanged — from boring_semantic_layer.ops import ... works for all 11 Op classes plus Dimension, Measure, and the 15+ private helpers external modules depend on
Internal lazy imports inside functions break the circular dependency between _core.py and the extracted submodules

🤖 Generated with Claude Code

Convert src/boring_semantic_layer/ops.py to a package (ops/) and extract two cleanly-bounded chunks into submodules: - ops/_values.py (166 lines): Dimension, Measure value objects plus the prefix proxies that support model-prefix navigation in joined dimension lambdas. Imports nothing from the relational core; the only callback to _core (_format_column_error in the AttributeError path) is a lazy in-method import to avoid circular module loading. - ops/_column_extraction.py (369 lines): ColumnTracker, ColumnExtractionResult, JoinColumnExtractionResult, TableColumnRequirements and the projection-pushdown extraction helpers. Self-contained except for one lazy _unwrap import inside _extract_requirements_from_measures. ops/__init__.py re-exports every name external callers reference, so all existing `from boring_semantic_layer.ops import X` imports keep working unchanged. _core.py imports the moved names back so its own internal call sites are unchanged too. The remaining 4,860 lines in _core.py are the relational op classes (Semantic*Op) and their tightly-coupled helpers (root-model traversal, join-key/rename machinery, aggregation planning). Those resist a clean split: helpers reference SemanticTableOp which is defined mid-file, and the join logic threads through several intricate fixes documented in MEMORY.md (multi-way join column ambiguity, _RenamedResolver, pre-aggregation join direction). Splitting them further would risk subtle behavior changes for limited structural gain. A future pass could pull these apart but it warrants its own focused branch. No behavior change. 930 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… models Pull four more self-contained chunks out of _core.py: - _callable.py (60 lines): _CallableWrapper / _ensure_wrapped / _infer_unnest. Tiny module, used by serialization & utils externally. - _xorq_compat.py (127 lines): _ensure_xorq_table, _patch_xorq_sortkey_compat, _rebind_to_backend, _rebind_to_canonical_backend. Pure ibis↔xorq bridge logic. - _normalize.py (105 lines): _normalize_to_name and _normalize_join_predicate, the two arg-coercion helpers used by the user-facing API. - _root_models.py (456 lines): root-model traversal (_find_all_root_models et al) and the join-aware field-prefixing / _right column-rename machinery. SemanticTableOp / SemanticJoinOp are imported lazily inside each function to break the import cycle with _core. _core.py is now 4,186 lines (down from 4,621 last commit, 5,330 at start of branch). All extractions are pure moves; _core.py re-imports the names back from each submodule so internal call sites are unchanged. 930 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two more chunks out of _core.py: - _format.py (159 lines): _collect_chain, _format_op_summary, _format_root, _semantic_repr — the helpers that build the multi-line pipeline repr used by every Semantic*Op.__repr__. All Op classes are imported lazily inside each helper to dodge the circular dependency. - _index_op.py (343 lines): SemanticIndexOp plus its 6 fragment builders (_get_field_type_str, _get_weight_expr, _build_string_index_fragment, _build_numeric_index_fragment, _resolve_selector, _get_fields_to_index). Self-contained except for lazy imports of _get_merged_fields/_to_untagged from _core. _core.py is now 3,775 lines. All extractions are pure moves with re-imports back so internal call sites are unchanged. 930 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two more pairs of small "transparent" Op classes pulled out: - _order_limit.py (122 lines): SemanticOrderByOp + SemanticLimitOp. Both delegate every metadata accessor to self.source and only override to_untagged() and __repr__. - _mutate_unnest.py (167 lines): SemanticMutateOp + SemanticUnnestOp. Same pass-through pattern; struct unpacking lives in Unnest's to_untagged. Helpers used inside to_untagged (_to_untagged, _resolve_expr, _unwrap) are imported lazily to keep the modules independent of _core import order. _core.py now 3,551 lines. 930 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New _measure_helpers.py (~280 lines): - _extract_measure_metadata: read user-supplied measure forms (dict/Measure/raw) into a uniform tuple - _is_calculated_measure / _matches_aggregation_pattern / _find_matching_measure: detect calc measures vs base measures and match aggregation patterns against known measure definitions - _make_base_measure: wrap a callable/Deferred/AggregationExpr into a Measure - _classify_measure: top-level dispatch — used externally by expr.py - _build_json_definition: dim/measure JSON-export helper - _format_column_error: friendly diagnostic for missing-column errors _values.py's lazy import of _format_column_error switches to the new home so the value-objects → core back-ref is gone. _core.py is now 3,283 lines (down from 3,551). 930 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New _basic_ops.py (~430 lines) holds the four foundation Op classes: - SemanticTableOp: leaf relation that carries dimension/measure metadata. Uses lazy imports for _make_schema, _mutate_dimensions_with_dependencies, and the module logger from _core. - SemanticFilterOp: predicate filtering with derived-dimension enrichment. Lazy-imports _Resolver/_get_merged_fields/_resolve_expr/ _to_untagged/_unwrap and SemanticAggregateOp for its post-agg check. - SemanticProjectOp + helpers _classify_fields, _process_nested_access_marker, _evaluate_measures_with_unnesting, _build_select_or_aggregate. - SemanticGroupByOp: thin pass-through. All call sites that used to live in _core re-import from _basic_ops so external `from boring_semantic_layer.ops import X` keeps working. _core.py is now 2,908 lines (down from 5,330 at start of branch). 930 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hussainsultan and others added 6 commits May 4, 2026 07:31

hachej merged commit ba9606e into refactor/except-audit May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(ops): split 5,300-line ops.py into a package with focused submodules#256

refactor(ops): split 5,300-line ops.py into a package with focused submodules#256
hachej merged 6 commits intorefactor/except-auditfrom
refactor/split-ops

hussainsultan commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hussainsultan commented May 4, 2026

Stacked PR — 3 of 3 (top of stack)

Summary

New module layout

Why

Why _core.py is still ~2,900 lines

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Why `_core.py` is still ~2,900 lines