Skip to content

Fix lazy materialize retain#193

Merged
singaraiona merged 2 commits intomasterfrom
fix-lazy-materialize-retain
May 5, 2026
Merged

Fix lazy materialize retain#193
singaraiona merged 2 commits intomasterfrom
fix-lazy-materialize-retain

Conversation

@singaraiona
Copy link
Copy Markdown
Collaborator

No description provided.

`(select {s: (sum a) from: t})` was returning N copies of the same
value instead of a single row.  The projection-only path lowered
aggregates as ordinary column expressions, so OP_SELECT saw a scalar
atom and broadcast it to the input row count (exec.c: vec->type<0 ->
broadcast_scalar).

Route the all-aggregate / no-by case through ray_group(n_keys=0),
which already has a 1-row scalar-aggregate fast path.  WHERE is
pre-executed (same pattern as the by-with-where fuse path) so the
lazy g->selection bitmap reaches the reduction.

The n_keys==0 parallel scalar path was effectively dead code before
this and its FIRST/LAST merge silently relied on worker-id order
matching row-index order — broken under work-stealing dispatch.
Force serial execution when FIRST/LAST is in play; the DA path stays
parallel and tracks per-slot first_row/last_row already.

Two existing tests asserted the buggy broadcast row count
(groupby_aggregators.rfl:64, group_coverage.rfl:417); updated to the
correct 1-row expectation.
…t), LIKE on dict SYM

Lands the four findings + bonus from RAYFORCE_BOTTLENECKS.md, taking
ClickBench hot-run total from ~1.6 M ms to ~14 K ms across 40
measurable queries (≈99% reduction).

* Fused `select { … asc/desc: c take: K }` lowers to bounded-heap
  top-K when k << nrows and keys resolve to plain column refs.
  Single-key uses the radix-encoded fast path; multi-key falls back
  to the comparator-based heap.  Q26 SearchPhrase: 5 186 → 72 ms.

* Grouped `count(distinct)` no longer routed through per-group
  eval-fallback — the fused OP_COUNT_DISTINCT runs per group-slice.
  Scaling moves from 94×/decade to ≈4.6×/decade between 100 K and
  1 M rows (essentially linear).

* LIKE on dict-encoded SYM scans the dictionary once and lifts the
  result through the codes vector instead of re-evaluating per row.
  Low-card SYM (54-unique BrowserCountry): 52 → 3.65 ms (14×).
  High-card SYM (1.73 M-unique URL): 498 → 220 ms (2.3×).

* Unifies the previously-divergent glob matchers (eval used `*?[abc]`,
  DAG used SQL `%_`; one variant blew up exponentially on
  `a*a*…a*b` against an a-only string) behind a single iterative
  two-pointer implementation in src/ops/glob.{c,h}.  Both call sites
  delegate.

* Bonus: `(at table (iasc table.col))` no longer crashes on tables —
  re-indexes each column to return a TABLE.

Tests: query_coverage / read_csv / reserved_namespace updated for the
new dispatch paths; cross_type_workout / collection/at extended.
@singaraiona singaraiona merged commit d5c2cac into master May 5, 2026
0 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant