chore(tesseract): Collapse logical plan into Query + LogicalPlan root by waralexrom · Pull Request #10921 · cube-js/cube

waralexrom · 2026-05-20T18:26:33Z

Summary

Collapse-then-decompose refactor of the Tesseract logical plan: the
former zoo of body types (AggregateMultipliedSubquery, KeysSubQuery,
MeasureSubquery, DimensionSubQuery) merges into a single Query
node with QueryKind role marker and an explicit LogicalPlan root
that owns the flat CTE pool. Multi-stage dimension planning moves to
the leaves and goes through a unified DimensionSubqueryPlanner
dispatcher; the physical builder switches off its context-based
multi-stage-dim lookup in favour of explicit MultiStageDimensionRef
joins.

Reshapes the cubesqlplanner logical plan around two ideas: - collapse specialised body types (KeysSubQuery, MeasureSubquery, AggregateMultipliedSubquery, DimensionSubQuery, multi-stage Calculation/Aggregation) into a single `Query` over a uniform `QuerySource` (`LogicalJoin` | `FullKeyAggregate` | `PreAggregation`); - decompose Query's role via a `QueryKind` enum (`TopLevelOverCtes` | `Stage(StageKind)` | `AggregateMultiplied` | `LeafOverJoin` | `InternalFact(FactKind)` | `PreAggregationLeaf`); - introduce a `LogicalPlan { ctes, root: PlanNode }` root container — not part of `PlanNode` — that owns the WITH-clause pool. Every CTE body (multi-stage stages, multiplied keys/measures, ex-DSQ bodies) surfaces as a `LogicalMultiStageMember` whose `.body` is always a `Rc<LogicalPlan>`; tree walkers cross the boundary explicitly. Migration touchpoints: - DimensionSubQuery is gone; ex-DSQ dimensions flow as `MultiStageDimensionRef` (`OnPrimaryKeys` / `OnOuterDimensions`). - Per-Query CTE refs live on `Query.multi_stage_dimensions`; CTE bodies live on the surrounding `LogicalPlan.ctes`. - New `PlanProcessor` renders the WITH clause; `QueryProcessor` no longer emits CTEs. - `PreAggregationOptimizer.try_optimize` and `OriginalSqlCollector` accept `Rc<LogicalPlan>` and walk it via `collect_cube_names_from_plan`. Re-enables `test_propagated_with_category_filter` (previously ignored due to DSQ ungrouped-measure wrapping). All 900 active tests pass.

`LogicalPlan.root` is now `Rc<Query>` (strict) — the previous `PlanNode` root forced runtime guards every time a caller wanted `.schema()` / `.modifers()` off the root. `LogicalMultiStageMember.body` is now a `MultiStageMemberBody` enum: - `Plan(Rc<LogicalPlan>)` — Query-rooted bodies (leaf, DSQ, MS inode, multiplied keys/measures/agg-multiplied). - `TimeSeries(Rc<MultiStageTimeSeries>)` — time-axis scaffold. - `RollingWindow(Rc<MultiStageRollingWindow>)` — window CTE. TimeSeries and RollingWindow no longer need an empty-CTE-pool wrapper. Pre-agg and `PlanProcessor` dispatch on the enum directly; `expect_query_root`-style runtime guards disappear.

`QueryPlanner` now exposes two entry points: - `plan() -> Rc<LogicalPlan>` — top-level, owns a fresh `CteState`, packs the result. - `plan_into(cte_state) -> Rc<Query>` — sub-plan that flattens every CTE it produces into the caller's pool, returns just the root Query. DSQ and multi-stage leaf bodies switch to `plan_into`. Inner CTEs (multiplied-measure keys/measure/agg-multiplied bodies for cross-cube leaves) now live alongside outer CTEs in a single top-level pool — no more nested `LogicalPlan` inside member bodies. `MultiStageMemberBody::Plan(Rc<LogicalPlan>)` is gone — bodies are `Query(Rc<Query>)` / `TimeSeries(...)` / `RollingWindow(...)`. `PreAggregationOptimizer` becomes a graph-walk: starts at `plan.root` FK refs, follows them by name through `plan.ctes`, dispatching by each member body's role (Leaf → simple-match, WholeSubtree → schema+root-filter match, PassThrough → keep + recurse, NoRewrite → keep). After the walk, unreachable members (orphans of replaced bodies, e.g. keys/measure CTEs of a replaced agg-multiplied) are pruned from the result. This makes flattening safe — pre-agg no longer double-counts inner CTEs as separate matches.

CTE naming is now uniformly `cte_N` via `CteState.next_cte_name()`. Previously DSQ bodies used `"{cube}_{dim}_dimension_subquery"` and multiplied KS/MS bodies used `"{cube}_keys_subquery_{seq}"` etc.; the semantic names were neither read by consumers nor deduped reliably, and the cross-planner `QueryTools.next_cte_seq_id` counter is gone. DSQ body now projects its dimension as a synthetic `MeasureSymbol` built off the dim's own `compiled_path` — the new helper `MeasureSymbol::new_synthetic_from_dimension(dim)` returns a Calculated measure whose `full_name`/`alias` match the dim's. Errors if the dim has no `sql` (e.g. Geo) or its type doesn't map to a `CalculatedMeasureType`. That kills the `MemberExpression`-with-`expr:`-prefix round-trip the DSQ planner used to build, which in turn collapses `MultiStageDimensionRef.exposed` into `body_column`: now the body's projection alias and the outer-scope full_name are the same symbol, so substitution needs only one ref. Dedup of identical CTE bodies (by `kind` + `eq_as_state`) deferred — see phase-B backlog.

Cleaner field name now that the body's projection symbol and the outer-scope reference are the same `MemberSymbol` by construction — no need to distinguish "the column the body projects" from "the member the outer scope reads". Also drops the stale FIXME and commented-out DSQ-planning block in `aggregate_subquery_plan` — DSQ for outer dims is correctly handled inside `keys_query` (via `collect_sub_query_dimensions_from_symbols` over the outer dim set); the outer Query{AggregateMultiplied} itself never needs DSQ refs of its own.

Replaces the manual `for pk in dimensions { if !already_in { push }}` with `outer_dims.chain(pk_dims).unique_by(|d| d.full_name())`. Same semantics, fewer moving parts.

`Query.filter` and `Query.modifers` are `#[builder(default)]` now — when a caller doesn't set them, the builder constructs an empty `LogicalFilter` / `LogicalQueryModifiers` instead of failing. The MeasureSubquery (`InternalFact(Measures)`) and the outer `AggregateMultiplied` bodies were always passing `Rc::new(LogicalFilter::default())` + `Rc::new(LogicalQueryModifiers::default())` through explicitly — matching the old `MeasureSubquery` struct, which had no such fields. Same for the time-series-get-range, Stage and DimensionCalc bodies (modifers are set, filter is default), and for the pre-agg-rewrite `schema_and_filter` shape (filter is set, modifers is default). Drop those redundant setters at the call sites; builder semantics stay identical.

The classic-compiler idiom `measures: { foo: { type: 'number', sql: 'count(*)' } }` previously fell through to `MeasureKind::Calculated` in the tesseract planner, so under a multiplied join the planner spun up a full aggregate-multiplied pipeline (keys/measure CTEs, fk_aggregate). The measure CTE ended up projecting `pk` next to `count(*)` without a GROUP BY, which Postgres rejects with `column "..." must appear in the GROUP BY clause`. Mirror the classic-compiler hack: detect the `count(*)` literal and promote the measure to `MeasureKind::Count` with the cube's primary-key sqls behind it. From that point on the regular Count path applies — additive in multiplied joins, rendered as `COUNT(DISTINCT pk)` when needed. - New `SqlCall::is_count_star()` — true only for an empty-dependency, empty-filter-param/group, empty-security-context `SqlCall` whose template is the literal `count(*)` (case-insensitive, trim-safe). - `MeasureKind::from_type_str` promotes a `number` + `count(*)` shape to `Count(Auto(pk_sqls))`. - `MeasureSymbolFactory.build` now collects `pk_sqls` when the SQL is absent OR is `count(*)`, so the promoted Count has the keys it needs for `COUNT(DISTINCT pk)`. Repro test: `tests::integration::chained_subquery:: test_or_and_filter_with_cross_cube_sub_query_dim` — mirrors the JS sql-generation-logic.test.ts "where filter with operators OR & AND" test on a visitors → visitor_checkins → cards chain with a sub_query dim and a count(*) measure.

…er_references The OnPrimaryKeys DSQ join was built as `cte_ref.pk_alias = Expr::new_member(pk_dim)`. RHS, a `MemberExpression`, renders via `render_references`. By the time the join condition was rendered, the outer SELECT had already resolved the same PK dim through `ReferencesBuilder` and — since `SingleSource::Cube` returns no column ref — picked it up from the joined CTE's `TableReference` schema, registering `pk_dim → cte.pk_alias` in `render_references`. The ON then collapsed to `cte.pk_alias = cte.pk_alias`, a self-equal that turned the LEFT JOIN into a CTE × cube cartesian and broke any subsequent join condition referencing the DSQ dim. Build each side per-source instead — same pattern the full-key-aggregate strategies use: - CTE side: `Expr::Reference(cte_alias, cte_schema.resolve_member_alias(dim))`. - Cube side: `Expr::Reference(cube_alias, dim.name())` — direct column reference, bypassing `render_references` entirely. Cube alias is threaded through `add_multi_stage_dimension_pk_join` from `LogicalJoinProcessor`, where it's already known via `default_alias_with_prefix`. Repro test: `tests::integration::subquery_in_join:: test_sub_query_dim_in_join_condition` — mirrors the JS sub-query-dimensions.test.ts "inserted at the right place of a join" test with B → C join filtered by a sub_query dim from A.

When a measure is queried through a view, the view's `join_path:` is baked into the underlying member's path during construction (e.g. a view that exposes `child_val_avg` via `join_path: many_to_one_root.many_to_one_child` produces an underlying measure symbol with `path=[root, child]`, even though `is_view=false`). The view-wrapper sits one level above and carries `is_view=true`; `resolve_reference_chain` drops to the underlying, picking up its inflated path on the way. `collect_join_hints_for_measures` then pushes `Vector([root, child])`, and `aggregate_subquery_measure` built the MeasureSubquery's source `FROM root LEFT JOIN child` — multiplying child rows back through root and breaking the very dedup the subquery exists to do (e.g. `child.val_avg` over (foo, foo) came back as 166.66 instead of 200). Strip the join-chain prefix from each measure before collecting its hints. Same helper as in `assert_measures_not_multiplied`. After that the MeasureSubquery source is `FROM child` alone, matching what the classic compiler emits for the same shape. Repro test: `test_many_to_one_view_one_sum_multiplied` in `member_expressions_on_views.rs` — mirrors the JS member-expressions-on-views.test.ts "many_to_one_view > one_sum" case with a multiplied seed.

When a multiplied measure list contains a member-expression with an inner aggregate (e.g. `COUNT(DISTINCT {view.child_test_dim})`), the MeasureSubquery CTE used to emit the expression verbatim, producing invalid SQL (aggregate without GROUP BY) and breaking outer re-aggregation by row-multiplying through the join chain. The MeasureSubquery now splits member-expressions into their dim/measure deps via `get_dependencies` + `resolve_reference_chain`, projecting dim-deps as plain dimension columns and measure-deps alongside the native measures. The member-expression itself stays on the outer `Query{AggregateMultiplied}` schema; `QueryProcessor` resolves render-references for it explicitly so its inner dim refs bind to the CTE columns (`q_N.cube__col`) instead of the raw source table.

Replaces the flat `Vec<LogicalMultiStageMember>` in `CteState` with typed `CteEntry { role, members, state, body }` records. Each CTE now carries: - `CteRole` (Keys / MultiStageDimension / MultiStageMeasure / FactMeasure / MultipliedMeasureSubquery) — DSQ folds into MultiStageDimension as a degenerate one-level case; - the members it projects (Vec, since fact/multiplied CTEs cover several measures at once); - the `Rc<QueryProperties>` state it was planned under. `next_cte_name(role)` takes a role and pulls from a per-kind counter, so generated aliases are `keys_0`, `fact_measure_0`, `multi_stage_measure_0`, etc. `add_member(...)` returns the final CTE name — `body.name` for fresh entries, the existing entry's name on same-body-name dedup — so callers can't pin a name that won't appear in the WITH clause. `find_matching(role, members, state)` exposes a `(role, members, state)` lookup over `QueryProperties::eq_as_state` + member-set comparison; no callers yet, wired up for future planner-level dedup. All ten call-sites (multiplied x4, dim_subq, multi_stage tree x5) pass role + members + state and use the returned name for downstream refs.

Stores the consumer-facing `Rc<MultiStageSubqueryRef>` alongside each CTE in `CteState` so future dedup hits (via `find_matching`) can hand back the same ref a fresh builder would have produced. Required for the upcoming bottom-up rewrite of the multi-stage planner where the recursion returns refs directly instead of going through a description tree + second-pass render. All six `add_member` call-sites (multiplied x4, dim_subq, multi_stage descr loop) build the ref up-front and pass it in. Existing behaviour is unchanged — `cte_ref` is not consumed by anything yet.

Replaces the two-pass multi-stage planner (build a tree of `MultiStageQueryDescription`s, then render each into a `LogicalMultiStageMember` in a second loop) with a single bottom-up recursion in `MultiStageQueryPlanner::build_multi_stage_cte` that returns `Rc<MultiStageSubqueryRef>` directly. The body is rendered and registered in `cte_state` on the way back up each stack frame; dedup is now `cte_state.find_matching(role, [member], state)`. `MultiStageMemberQueryPlanner` is gone — its render methods are now free `pub(crate) fn build_*` in `member_query_planner.rs`, taking explicit parameters and `&[Rc<MultiStageSubqueryRef>]` for children. `MultiStageQueryDescription` (and `query_description.rs`) is deleted along with the description tree; the time-series / time-series-range literal-alias cache (`descriptions.iter().find(d.alias() == ...)`) migrates to a local `TimeSeriesCache` on the planner. Three bucketing tests with chained / multi-dim multi-stage dimension setups are `#[ignore]`'d: the bottom-up `child.schema().dimensions` union in `build_for_cte_dimension_query` doesn't yet match the old `collect_all_non_multi_stage_dimension` walk. Tracked in FIXME comments at the test sites; everything else (917 tests) stays green.

…nner dispatcher Collector now surfaces both `sub_query: true` and `multi_stage: true` dimensions; DimensionSubqueryPlanner dispatches on the flavour and currently only builds DSQ — multi-stage path will follow. No behavior change: top-level MultiStageQueryPlanner still owns the multi-stage dim build, and the dispatcher silently skips them.

DimensionSubqueryPlanner now owns the multi-stage dim build path via the dispatcher introduced in the previous commit: when a dim is flagged multi_stage, it delegates to MultiStageQueryPlanner::build_multi_stage_dim_ref and returns a MultiStageDimensionRef with the OnOuterDimensions join placeholder. Top-level MultiStageQueryPlanner.plan_queries now skips dimension- flavoured multi-stage members; only measures are walked there. Physical builder is unchanged — multi-stage dim joins still go through the schema().multi_stage_dimensions() walk.

…ns JOIN Multi-stage dim CTEs now LEFT-JOIN onto the cube-join chain through the MultiStageDimensionRef::OnOuterDimensions descriptor — populated in build_multi_stage_dim_ref via the existing LogicalSchema::multi_stage_join_dimensions. LogicalJoinProcessor partitions ms-dim refs by join shape and applies pk vs outer accordingly. Removes the legacy combined-key context lookup: - MultiStageDimensionContext + the multi_stage_dimension_schemas / multi_stage_dimensions slots on PushDownBuilderContext. - remove_multi_stage_dimensions / add_multi_stage_dimension / get_multi_stage_dimensions / add_multi_stage_dimension_schema. - The schema-walk hack in QueryProcessor + DimensionCalc registration in PlanProcessor. QueryProcessor still drops ms-dim refs already exposed by the first FullKeyAggregate data input to avoid double-joining. Three bucketing tests with chained multi-stage dims remain ignored — separate recursive-walk regression in build_for_cte_dimension_query.

…tate ref_for_member rebuilt the LogicalSchema from the upper state, so extension dims added by deeper `add_group_by` directives never reached outer consumers — chained multi-stage dim CTEs lost their join keys. Replace with ref_for_body, which takes the just-rendered LogicalMultiStageMember and copies its body's own output schema (via new MultiStageMemberBody::schema accessor). Knocks out two of the three bucketing regressions (with_dimension_over_complex_dimension, with_two_dimensions). with_two_dims_concated remains ignored — separate pk_aggregate_keys_source UNION ALL column-set mismatch when chained ms-dim CTEs project disjoint dim sets.

…hape When a multi-stage dimension's only multi-stage dependencies are other multi-stage *dimensions* (no measure children), build its body as a leaf-style query over the cube join with ms-dim CTEs attached as OnOuterDimensions LEFT JOINs, rather than a FullKeyAggregate UNION over heterogeneous dim-child schemas (which can't merge disjoint dim sets across cube boundaries). `build_for_cte_dim_only_query` in member_query_planner; routing in build_multi_stage_cte via `dim_only_deps`/`build_dim_only_child_refs`. Third bucketing test (`with_two_dims_concated`) advances past the UNION mismatch but trips on a render-references issue (`customer_id` resolved from LEFT-joined ms-dim CTE pollutes cube JOIN ON). Remains ignored with FIXME describing the new failure.

…-only DimensionCalc For StageKind::DimensionCalc with a cube-join source (the new dim-only shape), only resolve render-references for the stage's own multi_stage_dimension. Extension/payload dims (e.g. `orders.customer_id`) projected for consumer JOIN keys are projected without resolving: otherwise `references_builder` finds them in the LEFT-joined ms-dim CTE first (cube sources return None), sets a render_reference, and the inner cube JOIN ON ends up rendered as `ms_dim_X.col = ms_dim_Y.col` instead of `cube.col = cube.col`. Unignores `test_bucketing_with_two_dims_concated`.

…into FullKeyAggregate path Mirrors the JS `Calc-Groups > source product_category cross join` case as a Rust integration test (currently ignored — see FIXME). Plumbing to advance the case beyond "no FROM clause at all": - FullKeyAggregateQueryPlanner now runs DimensionSubqueryPlanner over the outer query's dimensions and populates Query.multi_stage_dimensions. Without this, queries that have only multi-stage dimensions and no measures (top-level MultiStageQueryPlanner filters dims out) never triggered DSQ planner anywhere. - YAML test-fixture view parser now supports inline `dimensions:` and `measures:` blocks, matching the JS view shape (cubes: becomes optional). Switch dims / multi-stage case-switch dims can now live directly on a view without an artificial host cube. Test advances from "column 'source' does not exist" (no SQL) to "column 'source_b__product_category' does not exist" (disjoint column sets in case-switch branch CTEs hitting the FullKeyAggregate UNION). FIXME describes the remaining gap.

claude · 2026-05-20T18:26:55Z

Claude finished @waralexrom's task in 3m 13s —— View job

Review: Collapse logical plan into Query + LogicalPlan root

Gather context and understand the refactoring scope
Review new core types (Query, QueryKind, LogicalPlan)
Review logical plan changes and deleted types
Review physical plan builder changes
Review planner changes (dimension subquery, multi-stage, multiplied measures)
Review test coverage and new integration tests
Post comprehensive review

This is a well-structured collapse-then-decompose refactoring that consolidates the former zoo of logical plan body types (AggregateMultipliedSubquery, KeysSubQuery, MeasureSubquery, DimensionSubQuery) into a single Query node with an explicit QueryKind role marker and a flat LogicalPlan root owning the CTE pool. Overall, the design is sound and the code quality is high. Below are specific observations and suggestions.

Architecture & Design — Positive

Explicit role marking via QueryKind is a major clarity win. The previous approach inferred roles from structural predicates (source / multistage_members shape); the new enum variants (TopLevelOverCtes, Stage(StageKind), AggregateMultiplied, LeafOverJoin, InternalFact(FactKind), PreAggregationLeaf) make every Query's pipeline position self-documenting.
Flat CTE pool (LogicalPlan.ctes + CteState accumulator) eliminates nested pool bookkeeping. The drain_subquery_refs_from(baseline) pattern for scoped sub-planner calls (query_planner.rs:56-68) is clean and prevents inner refs from leaking.
Unified MultiStageDimensionRef merging DSQ and multi-stage-dim patterns under one descriptor (OnPrimaryKeys / OnOuterDimensions) removes a whole category of join-wiring duplication in the physical builder.
LogicalNode trait + PlanNode enum + match_plan_node! macro gives a solid foundation for generic tree traversal. The multi_stage_kind() method on PlanNode is a nice semantic-classification hook for passes that need leaf/stage distinction.

Issues & Suggestions

1. Typo: `modifers` → `modifiers` (persistent across codebase)

The field name modifers (missing an i) is used consistently throughout the PR:

query.rs:24 — modifers: Rc<LogicalQueryModifiers>
query.rs:46-47 — accessor fn modifers()
query.rs:58 — with_modifers
All consumer sites (query.rs, multiplied_measures_query_planner.rs:439,499, etc.)

This appears to be a pre-existing typo carried forward. Since this is a large refactor that introduces these fields freshly, now would be the ideal time to fix it before it spreads further. A codebase-wide rename of modifers → modifiers would prevent confusion.

Fix this →

2. `CteRole::MultipliedMeasureSubquery` conflation risk (`cte_state.rs:21-23`)

The doc comment on CteRole::MultipliedMeasureSubquery acknowledges that both the inner MeasureSubquery CTE and the outer Query{AggregateMultiplied} CTE share the same role, relying on member-set divergence to prevent dedup conflation:

/// `MultipliedMeasureSubquery` — MS-CTE (raw projection of dedup-by-pk
/// measures) and the outer `Query{AggregateMultiplied}` CTE that
/// stitches it to KS-CTE. They share the role today; the outer's
/// member-set differs from the inner's so dedup wouldn't conflate
/// them in practice.

This is fragile — if a future change causes the member sets to align, the dedup will silently produce incorrect plans. Consider splitting into two distinct roles (MultipliedMeasureInner / MultipliedMeasureOuter or similar) to make the invariant structural rather than relying on a coincidence.

3. `find_matching()` has no callers (`cte_state.rs:152`)

The doc comment says "Returned for future dedup wiring — no callers yet." However, find_matching is called from multi_stage_query_planner.rs:214:

if let Some(existing) = cte_state.find_matching(role, &[member.clone()], &state) {
    return Ok(existing.cte_ref.clone());
}

The doc comment is stale — it should be updated to reflect the actual usage. (Cosmetic, low priority.)

4. `FullKeyAggregate::with_inputs` reconstructs via builder but ignores semantic identity

In full_key_aggregate.rs:108-119, with_inputs checks for exactly 0 inputs then reconstructs itself through the builder. This is correct for tree-rewrite passes, but the reconstruction clones every field. Since FullKeyAggregate is a leaf in the LogicalNode tree (no child PlanNodes), the with_inputs(vec![]) call effectively returns a clone. Consider just returning self directly (via Ok(self)) when inputs are empty — avoids the allocation.

5. Missing `Debug` derive on several new types

MultiStageDimensionRef has #[derive(Debug)] but LogicalPlan, Query, QueryKind, FullKeyAggregate, and MultiStageSubqueryRef don't. While PrettyPrint is the primary debugging mechanism, Debug is useful for unwrap() / expect() error messages and assert_eq! diagnostics in tests. Consider adding Debug derives to the core types.

6. `let _ = me;` dead binding in `split_member_expression_deps` (`multiplied_measures_query_planner.rs:322`)

let _ = me;

This line has no effect — me was already moved into the if let binding and the for dep loop consumed it. This appears to be a leftover from development. It can be removed.

Fix this →

7. Pre-aggregation optimizer: `NotMatched` on any single CTE rolls back all usages

In optimizer.rs:226-229, if any CTE body fails to match a pre-aggregation, the optimizer truncates all usages and returns None for the entire plan:

CteRewriteOutcome::NotMatched => {
    self.usages.truncate(saved_usages_len);
    self.usage_counter = saved_counter;
    return Ok(None);
}

This all-or-nothing semantics is intentional (you can't partially rewrite a CTE graph), but it's worth documenting explicitly in the method doc that partial pre-agg matches are rejected — this isn't obvious from the function signature alone.

8. `PhysycalNode` typo in trait

The associated type is spelled PhysycalNode (missing i) in LogicalNodeProcessor:

type PhysycalNode = Rc<Select>;

This is pre-existing, but since the PR touches these lines, it could be corrected to PhysicalNode. (Cosmetic.)

Performance

The refactoring doesn't introduce new algorithmic bottlenecks. Key observations:

members_equivalent in cte_state.rs:165-179 sorts two Vec<String> on every find_matching call. For the typical multi-stage tree (< 20 CTEs), this is fine. If CTE counts grow significantly, consider normalizing member names at registration time.
Rc-heavy cloning is prevalent but matches the existing codebase idiom and avoids the complexity of lifetime-threaded references in a tree-rewriting context. No regression here.

Security

No security concerns. This is internal query-planning logic with no user-facing input handling beyond what the existing framework already validates. SQL generation goes through the existing template/parameterization pipeline.

Testing

Good coverage for a refactoring PR:

3 new integration test files (chained_subquery.rs, subquery_in_join.rs, calc_groups.rs) covering DSQ in complex filters, DSQ in join conditions, and calc-group cross-joins.
Extended combinations.rs with test_aggregating_cross_cube_measure_with_fanout_dim (correctly #[ignore]d with a detailed FIXME).
New many_to_one_view_one_sum_multiplied snapshot test for member-expression multiplied-measure dedup.
SQL seed files + YAML schemas are well-structured for reproducibility.
Snapshot tests (insta) validate output stability.

Observation: The test_source_product_category_cross_join test in calc_groups.rs is #[ignore]d with a FIXME about disjoint column sets in FullKeyAggregate UNION ALL. This is a known limitation — consider tracking it as a follow-up issue if not already done.

Summary

This is a high-quality structural refactoring that significantly improves the readability and maintainability of the Tesseract logical plan. The QueryKind enum, flat CTE pool, and unified dimension-ref model are well-designed. The main actionable items are:

Priority	Item
Low	Fix `modifers` → `modifiers` typo (widespread)
Low	Split `CteRole::MultipliedMeasureSubquery` into two roles
Trivial	Remove dead `let _ = me;` binding
Trivial	Update stale `find_matching` doc comment
Trivial	Fix `PhysycalNode` → `PhysicalNode` typo

None of these are blocking. The PR is well-structured and ready for merge from a code quality perspective.

codecov · 2026-05-20T18:38:15Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.53%. Comparing base (3477776) to head (e551a35).
⚠️ Report is 10 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #10921      +/-   ##
==========================================
+ Coverage   78.87%   83.53%   +4.65%     
==========================================
  Files         470      254     -216     
  Lines       92483    75869   -16614     
  Branches     3449        0    -3449     
==========================================
- Hits        72948    63374    -9574     
+ Misses      19031    12495    -6536     
+ Partials      504        0     -504

Flag	Coverage Δ
cube-backend	`?`
cubesql	`83.53% <ø> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

waralexrom added 21 commits May 19, 2026 21:06

chore(tesseract): Dedup keys_query schema_dimensions via chain+unique_by

20c9b4f

Replaces the manual `for pk in dimensions { if !already_in { push }}` with `outer_dims.chain(pk_dims).unique_by(|d| d.full_name())`. Same semantics, fewer moving parts.

github-actions Bot added the rust Pull requests that update Rust code label May 20, 2026

vercel Bot deployed to Preview May 20, 2026 18:28 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(tesseract): Collapse logical plan into Query + LogicalPlan root#10921

chore(tesseract): Collapse logical plan into Query + LogicalPlan root#10921
waralexrom wants to merge 21 commits into
masterfrom
tesseract-logical-plan-collapse

waralexrom commented May 20, 2026

Uh oh!

claude Bot commented May 20, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

waralexrom commented May 20, 2026

Summary

Uh oh!

claude Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: Collapse logical plan into Query + LogicalPlan root

Architecture & Design — Positive

Issues & Suggestions

1. Typo: modifers → modifiers (persistent across codebase)

2. CteRole::MultipliedMeasureSubquery conflation risk (cte_state.rs:21-23)

3. find_matching() has no callers (cte_state.rs:152)

4. FullKeyAggregate::with_inputs reconstructs via builder but ignores semantic identity

5. Missing Debug derive on several new types

6. let _ = me; dead binding in split_member_expression_deps (multiplied_measures_query_planner.rs:322)

7. Pre-aggregation optimizer: NotMatched on any single CTE rolls back all usages

8. PhysycalNode typo in trait

Performance

Security

Testing

Summary

Uh oh!

codecov Bot commented May 20, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented May 20, 2026 •

edited

Loading

1. Typo: `modifers` → `modifiers` (persistent across codebase)

2. `CteRole::MultipliedMeasureSubquery` conflation risk (`cte_state.rs:21-23`)

3. `find_matching()` has no callers (`cte_state.rs:152`)

4. `FullKeyAggregate::with_inputs` reconstructs via builder but ignores semantic identity

5. Missing `Debug` derive on several new types

6. `let _ = me;` dead binding in `split_member_expression_deps` (`multiplied_measures_query_planner.rs:322`)

7. Pre-aggregation optimizer: `NotMatched` on any single CTE rolls back all usages

8. `PhysycalNode` typo in trait