Skip to content

refactor: convert ibis→xorq once at SemanticModel construction#250

Merged
hachej merged 4 commits intomainfrom
hussain/refactor/xorq-boundary
May 4, 2026
Merged

refactor: convert ibis→xorq once at SemanticModel construction#250
hachej merged 4 commits intomainfrom
hussain/refactor/xorq-boundary

Conversation

@hussainsultan
Copy link
Copy Markdown
Collaborator

Summary

  • Move ibis→xorq conversion from lazy execute-time fixup to a single boundary at SemanticModel.__init__. Internal paths can assume xorq when supported, plain ibis otherwise.
  • Drop dead try/except ImportError for xorq imports (xorq is required).
  • Rename _unify_backends_rebind_to_canonical_backend; extract _rebind_to_backend primitive and dedup with serialization/reconstruct.py.

Surfaces two latent bugs that the lazy conversion was hiding:

  • SemanticTableOp.values built the calc-measure dummy table with plain ibis.table() and silently swallowed the mismatch — fixed via _get_ibis_module.
  • server/api.py called ibis.desc(...) on a xorq aggregate, hitting infinite recursion in xorq's bind — same fix.

Follow-up: cache from_ibis() per source backend so the runtime rebind pass can be deleted.

Test plan

  • pytest src/boring_semantic_layer/tests/ — 920 passed
  • Updated test_join_one_unsupported_backend to monkeypatch before construction

hussainsultan and others added 4 commits May 3, 2026 08:45
The boundary between user-supplied ibis tables and BSL's internal xorq
representation was scattered. ``_ensure_xorq_table`` was called lazily
inside ``SemanticTableOp.to_untagged()`` on every traversal, and
ops.py wrapped the xorq imports in a ``try/except ImportError`` that
could never trigger because ``xorq.api`` is already imported
unconditionally above it.

Move the conversion to ``SemanticModel.__init__`` so it runs once at
construction. Internal code paths can then assume a xorq table when the
backend is supported, and a plain ibis fallback otherwise (e.g.
Databricks). The conversion is idempotent, so re-construction via
``with_dimensions`` / ``with_measures`` is cheap.

Surfaces two latent mixing bugs that the lazy conversion was hiding:

- ``SemanticTableOp.values`` built a dummy table for calc-measure type
  inference using plain ``ibis.table()`` even when the base table was
  xorq, then silently swallowed the resulting compile error. Switch to
  the matching ibis module via ``_get_ibis_module``.
- The HTTP search-dimension-values endpoint called ``ibis.desc(...)``
  on what is now a xorq aggregate, triggering infinite recursion in
  xorq's ``bind``. Same fix.

Rename ``_unify_backends`` → ``_rebind_to_canonical_backend`` and split
out the rebind primitive as ``_rebind_to_backend``, deduplicating with
the copy in ``serialization/reconstruct.py``. The dead ``except
ImportError`` branch in ops.py is removed; xorq is required.

A follow-up PR can remove the runtime canonical-rebind once
``from_ibis()`` results are cached per source backend (currently each
call returns a distinct Backend, so joins between separately-converted
models still need the rebind pass).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-supplied callables produce expressions against exactly one of
``ibis`` or ``xorq.vendor.ibis`` — never a mix — so isinstance checks
against ``(ibis_ops.X, xorq_ops.X)`` unions were always going to match
exactly one tuple element. Replace the unions with a small
``_reductions_for_expr`` helper that picks the right ``reductions``
module by inspecting the expression's class module, and rewrite the
three reduction-shape predicates (``_is_mean_expr``,
``_is_count_distinct_expr``, ``_reagg_op_for_expr``) to use it.

No behavior change; just narrows each isinstance to a single concrete
type and removes the four module-level union tuples.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small fixes around calc-measure construction:

- Add ``validate_calc_ast`` in ``measure_scope`` and call it from
  ``_classify_measure``. Rejects illegal AllOf shapes (e.g.
  ``AllOf(BinOp(...))``) at construction with a message that names the
  offending calc measure, instead of letting them parse and fail deep
  inside the compiler. The existing valid shapes — ``AllOf(MeasureRef)``
  and ``AllOf(AggregationExpr)`` (handled by the rewrite pipeline) —
  pass through untouched.

- Add ``infer_calc_dtype`` in ``compile_all``. The type-inference path
  in ``SemanticTableOp.values`` now mirrors the ``AggregationExpr``
  rewrite step from ``compile_grouped_with_all`` so calc measures
  containing inline aggregations (``t.value.sum() / t.all(t.value.sum())``)
  get a known dtype on ``schema`` instead of being silently dropped.

- Replace the bare ``except Exception: pass`` in
  ``SemanticTableOp.values`` with a debug-level log naming the calc
  measure and the exception, so legitimate type-inference failures are
  at least discoverable.

Three tests added covering the validator (rejects, accepts) and the
dtype-inference round-trip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same recursion bug the construction-time conversion surfaced in
``server/api.py`` — ``ibis.desc(...)`` from plain ibis on a
xorq-vendored aggregate sends xorq's ``bind`` into infinite recursion.
The MCP ``search_dimension_values`` tool had its own copy of the
pattern that CI caught. Pick the matching module via
``_get_ibis_module(tbl)`` like the FastAPI handler now does.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hachej hachej merged commit 4d8e6f4 into main May 4, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants