refactor(ops): split 5,300-line ops.py into a package with focused submodules#256
Merged
hachej merged 6 commits intorefactor/except-auditfrom May 5, 2026
Merged
refactor(ops): split 5,300-line ops.py into a package with focused submodules#256hachej merged 6 commits intorefactor/except-auditfrom
hachej merged 6 commits intorefactor/except-auditfrom
Conversation
Convert src/boring_semantic_layer/ops.py to a package (ops/) and extract two cleanly-bounded chunks into submodules: - ops/_values.py (166 lines): Dimension, Measure value objects plus the prefix proxies that support model-prefix navigation in joined dimension lambdas. Imports nothing from the relational core; the only callback to _core (_format_column_error in the AttributeError path) is a lazy in-method import to avoid circular module loading. - ops/_column_extraction.py (369 lines): ColumnTracker, ColumnExtractionResult, JoinColumnExtractionResult, TableColumnRequirements and the projection-pushdown extraction helpers. Self-contained except for one lazy _unwrap import inside _extract_requirements_from_measures. ops/__init__.py re-exports every name external callers reference, so all existing `from boring_semantic_layer.ops import X` imports keep working unchanged. _core.py imports the moved names back so its own internal call sites are unchanged too. The remaining 4,860 lines in _core.py are the relational op classes (Semantic*Op) and their tightly-coupled helpers (root-model traversal, join-key/rename machinery, aggregation planning). Those resist a clean split: helpers reference SemanticTableOp which is defined mid-file, and the join logic threads through several intricate fixes documented in MEMORY.md (multi-way join column ambiguity, _RenamedResolver, pre-aggregation join direction). Splitting them further would risk subtle behavior changes for limited structural gain. A future pass could pull these apart but it warrants its own focused branch. No behavior change. 930 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… models Pull four more self-contained chunks out of _core.py: - _callable.py (60 lines): _CallableWrapper / _ensure_wrapped / _infer_unnest. Tiny module, used by serialization & utils externally. - _xorq_compat.py (127 lines): _ensure_xorq_table, _patch_xorq_sortkey_compat, _rebind_to_backend, _rebind_to_canonical_backend. Pure ibis↔xorq bridge logic. - _normalize.py (105 lines): _normalize_to_name and _normalize_join_predicate, the two arg-coercion helpers used by the user-facing API. - _root_models.py (456 lines): root-model traversal (_find_all_root_models et al) and the join-aware field-prefixing / _right column-rename machinery. SemanticTableOp / SemanticJoinOp are imported lazily inside each function to break the import cycle with _core. _core.py is now 4,186 lines (down from 4,621 last commit, 5,330 at start of branch). All extractions are pure moves; _core.py re-imports the names back from each submodule so internal call sites are unchanged. 930 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two more chunks out of _core.py: - _format.py (159 lines): _collect_chain, _format_op_summary, _format_root, _semantic_repr — the helpers that build the multi-line pipeline repr used by every Semantic*Op.__repr__. All Op classes are imported lazily inside each helper to dodge the circular dependency. - _index_op.py (343 lines): SemanticIndexOp plus its 6 fragment builders (_get_field_type_str, _get_weight_expr, _build_string_index_fragment, _build_numeric_index_fragment, _resolve_selector, _get_fields_to_index). Self-contained except for lazy imports of _get_merged_fields/_to_untagged from _core. _core.py is now 3,775 lines. All extractions are pure moves with re-imports back so internal call sites are unchanged. 930 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two more pairs of small "transparent" Op classes pulled out: - _order_limit.py (122 lines): SemanticOrderByOp + SemanticLimitOp. Both delegate every metadata accessor to self.source and only override to_untagged() and __repr__. - _mutate_unnest.py (167 lines): SemanticMutateOp + SemanticUnnestOp. Same pass-through pattern; struct unpacking lives in Unnest's to_untagged. Helpers used inside to_untagged (_to_untagged, _resolve_expr, _unwrap) are imported lazily to keep the modules independent of _core import order. _core.py now 3,551 lines. 930 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New _measure_helpers.py (~280 lines): - _extract_measure_metadata: read user-supplied measure forms (dict/Measure/raw) into a uniform tuple - _is_calculated_measure / _matches_aggregation_pattern / _find_matching_measure: detect calc measures vs base measures and match aggregation patterns against known measure definitions - _make_base_measure: wrap a callable/Deferred/AggregationExpr into a Measure - _classify_measure: top-level dispatch — used externally by expr.py - _build_json_definition: dim/measure JSON-export helper - _format_column_error: friendly diagnostic for missing-column errors _values.py's lazy import of _format_column_error switches to the new home so the value-objects → core back-ref is gone. _core.py is now 3,283 lines (down from 3,551). 930 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New _basic_ops.py (~430 lines) holds the four foundation Op classes: - SemanticTableOp: leaf relation that carries dimension/measure metadata. Uses lazy imports for _make_schema, _mutate_dimensions_with_dependencies, and the module logger from _core. - SemanticFilterOp: predicate filtering with derived-dimension enrichment. Lazy-imports _Resolver/_get_merged_fields/_resolve_expr/ _to_untagged/_unwrap and SemanticAggregateOp for its post-agg check. - SemanticProjectOp + helpers _classify_fields, _process_nested_access_marker, _evaluate_measures_with_unnesting, _build_select_or_aggregate. - SemanticGroupByOp: thin pass-through. All call sites that used to live in _core re-import from _basic_ops so external `from boring_semantic_layer.ops import X` keeps working. _core.py is now 2,908 lines (down from 5,330 at start of branch). 930 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked PR — 3 of 3 (top of stack)
This PR is stacked on top of #255, which is stacked on #254. It targets
refactor/except-audit, notmain. Read in order:refactor/xorq-shim→main) — must land firstrefactor/except-audit→refactor/xorq-shim)refactor/split-ops→refactor/except-audit)Each PR in the stack is independently green. Once #254 and #255 merge, GitHub will retarget this PR to
mainautomatically.Summary
Convert
src/boring_semantic_layer/ops.py(5,330 lines, 19 classes, 100+ functions) to a package, and pull self-contained sections into focused submodules._core.pyreduced from 5,330 → 2,908 lines (45%), with 13 new submodules carrying the rest.All extractions are pure moves — no logic changes.
ops/__init__.pyre-exports every name external callers reference, so existingfrom boring_semantic_layer.ops import Ximports keep working unchanged.New module layout
_core.pySemanticJoinOp, top-level utilities (left here on purpose — see below)_root_models.py_basic_ops.pySemanticTableOp,SemanticFilterOp,SemanticProjectOp,SemanticGroupByOp_column_extraction.pyColumnTracker,TableColumnRequirements, projection-pushdown extraction_index_op.pySemanticIndexOp+ 6 fragment helpers_measure_helpers.py_values.pyDimension,Measurevalue objects_mutate_unnest.pySemanticMutateOp,SemanticUnnestOp_format.py__repr__(_semantic_repretc.)_xorq_compat.py_order_limit.pySemanticOrderByOp,SemanticLimitOp_normalize.py_normalize_to_name,_normalize_join_predicate_callable.py_CallableWrappercluster__init__.pyWhy
ops.pyhad become a god-file mixing:Dimension,Measure)A cross-cutting change touched 5,000+ lines of context. Recent fixes documented in
MEMORY.md(multi-way join column ambiguity, pre-aggregation join direction, resolver hash fix) span this file. Splitting along natural seams makes future changes easier to scope and review.Why
_core.pyis still ~2,900 linesSemanticAggregateOp(~1,690 lines incl. 12 planning helpers) andSemanticJoinOp(~900 lines incl. join-tree machinery) are deeply coupled to each other and to the central helpers (_to_untagged,_resolve_expr,_get_merged_fields,_classify_dependencies). Extracting them in this PR risks subtle behavior changes in the intricate fixes documented inMEMORY.md.A future focused PR should split these — but it's the kind of pass that benefits from a fresh review window, not piling onto today's stack. Suggested follow-up:
SemanticAggregateOpto_aggregate.pySemanticJoinOp+ helpers (_RenamedResolver,_BSL_JOIN_KEY_TMP_PREFIX) to_join_op.pyTest plan
from boring_semantic_layer.ops import ...works for all 11 Op classes plusDimension,Measure, and the 15+ private helpers external modules depend on_core.pyand the extracted submodules🤖 Generated with Claude Code