Skip to content

refactor(ops): split 5,300-line ops.py into a package with focused submodules#256

Merged
hachej merged 6 commits intorefactor/except-auditfrom
refactor/split-ops
May 5, 2026
Merged

refactor(ops): split 5,300-line ops.py into a package with focused submodules#256
hachej merged 6 commits intorefactor/except-auditfrom
refactor/split-ops

Conversation

@hussainsultan
Copy link
Copy Markdown
Collaborator

Stacked PR — 3 of 3 (top of stack)

This PR is stacked on top of #255, which is stacked on #254. It targets refactor/except-audit, not main. Read in order:

Each PR in the stack is independently green. Once #254 and #255 merge, GitHub will retarget this PR to main automatically.

Summary

Convert src/boring_semantic_layer/ops.py (5,330 lines, 19 classes, 100+ functions) to a package, and pull self-contained sections into focused submodules. _core.py reduced from 5,330 → 2,908 lines (45%), with 13 new submodules carrying the rest.

All extractions are pure moves — no logic changes. ops/__init__.py re-exports every name external callers reference, so existing from boring_semantic_layer.ops import X imports keep working unchanged.

New module layout

Module Lines Contents
_core.py 2,908 Aggregate machinery, SemanticJoinOp, top-level utilities (left here on purpose — see below)
_root_models.py 483 Root-model traversal, join column-rename machinery
_basic_ops.py 436 SemanticTableOp, SemanticFilterOp, SemanticProjectOp, SemanticGroupByOp
_column_extraction.py 369 ColumnTracker, TableColumnRequirements, projection-pushdown extraction
_index_op.py 321 SemanticIndexOp + 6 fragment helpers
_measure_helpers.py 314 Measure classification, column-error formatting
_values.py 166 Public Dimension, Measure value objects
_mutate_unnest.py 161 SemanticMutateOp, SemanticUnnestOp
_format.py 160 Pipeline __repr__ (_semantic_repr etc.)
_xorq_compat.py 127 ibis↔xorq bridge shims
_order_limit.py 118 SemanticOrderByOp, SemanticLimitOp
_normalize.py 105 _normalize_to_name, _normalize_join_predicate
_callable.py 60 _CallableWrapper cluster
__init__.py 67 Re-exports the public + private surface

Why

ops.py had become a god-file mixing:

  • Value objects (Dimension, Measure)
  • 11 Op classes
  • Join resolution & column-rename machinery
  • Aggregation planning
  • Column-requirement extraction for projection pushdown
  • xorq compatibility shims
  • repr / format helpers

A cross-cutting change touched 5,000+ lines of context. Recent fixes documented in MEMORY.md (multi-way join column ambiguity, pre-aggregation join direction, resolver hash fix) span this file. Splitting along natural seams makes future changes easier to scope and review.

Why _core.py is still ~2,900 lines

SemanticAggregateOp (~1,690 lines incl. 12 planning helpers) and SemanticJoinOp (~900 lines incl. join-tree machinery) are deeply coupled to each other and to the central helpers (_to_untagged, _resolve_expr, _get_merged_fields, _classify_dependencies). Extracting them in this PR risks subtle behavior changes in the intricate fixes documented in MEMORY.md.

A future focused PR should split these — but it's the kind of pass that benefits from a fresh review window, not piling onto today's stack. Suggested follow-up:

  • Extract Aggregate machinery + SemanticAggregateOp to _aggregate.py
  • Extract SemanticJoinOp + helpers (_RenamedResolver, _BSL_JOIN_KEY_TMP_PREFIX) to _join_op.py

Test plan

  • 930 unit tests pass on every commit in this branch (no regressions vs. parent branch)
  • Each commit is a focused move (column extraction, value objects, callable wrapper, xorq compat, normalize, root models, format/repr, Index op, OrderBy/Limit, Mutate/Unnest, measure helpers, basic ops)
  • Public API unchanged — from boring_semantic_layer.ops import ... works for all 11 Op classes plus Dimension, Measure, and the 15+ private helpers external modules depend on
  • Internal lazy imports inside functions break the circular dependency between _core.py and the extracted submodules

🤖 Generated with Claude Code

hussainsultan and others added 6 commits May 4, 2026 07:31
Convert src/boring_semantic_layer/ops.py to a package (ops/) and extract
two cleanly-bounded chunks into submodules:

- ops/_values.py (166 lines): Dimension, Measure value objects plus
  the prefix proxies that support model-prefix navigation in joined
  dimension lambdas. Imports nothing from the relational core; the only
  callback to _core (_format_column_error in the AttributeError path)
  is a lazy in-method import to avoid circular module loading.

- ops/_column_extraction.py (369 lines): ColumnTracker,
  ColumnExtractionResult, JoinColumnExtractionResult,
  TableColumnRequirements and the projection-pushdown extraction
  helpers. Self-contained except for one lazy _unwrap import inside
  _extract_requirements_from_measures.

ops/__init__.py re-exports every name external callers reference, so
all existing `from boring_semantic_layer.ops import X` imports keep
working unchanged. _core.py imports the moved names back so its own
internal call sites are unchanged too.

The remaining 4,860 lines in _core.py are the relational op classes
(Semantic*Op) and their tightly-coupled helpers (root-model traversal,
join-key/rename machinery, aggregation planning). Those resist a clean
split: helpers reference SemanticTableOp which is defined mid-file,
and the join logic threads through several intricate fixes documented
in MEMORY.md (multi-way join column ambiguity, _RenamedResolver,
pre-aggregation join direction). Splitting them further would risk
subtle behavior changes for limited structural gain. A future pass
could pull these apart but it warrants its own focused branch.

No behavior change. 930 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… models

Pull four more self-contained chunks out of _core.py:

- _callable.py (60 lines): _CallableWrapper / _ensure_wrapped /
  _infer_unnest. Tiny module, used by serialization & utils externally.
- _xorq_compat.py (127 lines): _ensure_xorq_table,
  _patch_xorq_sortkey_compat, _rebind_to_backend,
  _rebind_to_canonical_backend. Pure ibis↔xorq bridge logic.
- _normalize.py (105 lines): _normalize_to_name and
  _normalize_join_predicate, the two arg-coercion helpers used by
  the user-facing API.
- _root_models.py (456 lines): root-model traversal
  (_find_all_root_models et al) and the join-aware field-prefixing
  / _right column-rename machinery. SemanticTableOp / SemanticJoinOp
  are imported lazily inside each function to break the import
  cycle with _core.

_core.py is now 4,186 lines (down from 4,621 last commit, 5,330 at
start of branch). All extractions are pure moves; _core.py re-imports
the names back from each submodule so internal call sites are unchanged.

930 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two more chunks out of _core.py:

- _format.py (159 lines): _collect_chain, _format_op_summary,
  _format_root, _semantic_repr — the helpers that build the multi-line
  pipeline repr used by every Semantic*Op.__repr__. All Op classes are
  imported lazily inside each helper to dodge the circular dependency.

- _index_op.py (343 lines): SemanticIndexOp plus its 6 fragment
  builders (_get_field_type_str, _get_weight_expr,
  _build_string_index_fragment, _build_numeric_index_fragment,
  _resolve_selector, _get_fields_to_index). Self-contained except for
  lazy imports of _get_merged_fields/_to_untagged from _core.

_core.py is now 3,775 lines. All extractions are pure moves with
re-imports back so internal call sites are unchanged. 930 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two more pairs of small "transparent" Op classes pulled out:

- _order_limit.py (122 lines): SemanticOrderByOp + SemanticLimitOp.
  Both delegate every metadata accessor to self.source and only
  override to_untagged() and __repr__.
- _mutate_unnest.py (167 lines): SemanticMutateOp + SemanticUnnestOp.
  Same pass-through pattern; struct unpacking lives in Unnest's
  to_untagged.

Helpers used inside to_untagged (_to_untagged, _resolve_expr, _unwrap)
are imported lazily to keep the modules independent of _core import order.

_core.py now 3,551 lines. 930 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New _measure_helpers.py (~280 lines):

- _extract_measure_metadata: read user-supplied measure forms
  (dict/Measure/raw) into a uniform tuple
- _is_calculated_measure / _matches_aggregation_pattern /
  _find_matching_measure: detect calc measures vs base measures and
  match aggregation patterns against known measure definitions
- _make_base_measure: wrap a callable/Deferred/AggregationExpr into a
  Measure
- _classify_measure: top-level dispatch — used externally by expr.py
- _build_json_definition: dim/measure JSON-export helper
- _format_column_error: friendly diagnostic for missing-column errors

_values.py's lazy import of _format_column_error switches to the new
home so the value-objects → core back-ref is gone. _core.py is now
3,283 lines (down from 3,551). 930 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New _basic_ops.py (~430 lines) holds the four foundation Op classes:

- SemanticTableOp: leaf relation that carries dimension/measure
  metadata. Uses lazy imports for _make_schema,
  _mutate_dimensions_with_dependencies, and the module logger from
  _core.
- SemanticFilterOp: predicate filtering with derived-dimension
  enrichment. Lazy-imports _Resolver/_get_merged_fields/_resolve_expr/
  _to_untagged/_unwrap and SemanticAggregateOp for its post-agg check.
- SemanticProjectOp + helpers _classify_fields,
  _process_nested_access_marker, _evaluate_measures_with_unnesting,
  _build_select_or_aggregate.
- SemanticGroupByOp: thin pass-through.

All call sites that used to live in _core re-import from _basic_ops
so external `from boring_semantic_layer.ops import X` keeps working.

_core.py is now 2,908 lines (down from 5,330 at start of branch). 930
tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hachej hachej merged commit ba9606e into refactor/except-audit May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants