fix(zql): execute sibling exists from selective roots by Karavil · Pull Request #1 · Karavil/mono

Karavil · 2026-04-21T04:42:11Z

This PR makes selective whereExists plans executable without app code manually setting {flip: true}. The previous stack teaches the planner to cost mixed OR branches more honestly. This PR adds the missing physical shapes: a root union for independent OR branches, and a child-key intersection for sibling EXISTS branches on the same relationship.

I regenerated the SQL below against current origin/main and this branch with the same hypothetical education app schema inspired by goblinsapp.com. The data is synthetic, but the shape is the one that mattered for us: 2,000 assignments, sparse assignment_to_student rows, and permission-style filters expressed as EXISTS. There are no personal names or real customer rows.

Current Zero already handles a simple single EXISTS well. This is not where the PR wins:

assignment
  .where('archived_at', 'IS', null)
  .whereExists('assignment_to_student', q =>
    q.where('student_id', '=', 'student-1'),
  );

Current origin/main and this branch both generate the child-root shape:

SELECT "assignment_id","student_id","created_at"
FROM "assignment_to_student"
WHERE "student_id" = ?
ORDER BY "assignment_id" asc, "student_id" asc

SELECT "id","teacher_id","archived_at","created_at"
FROM "assignment"
WHERE "id" = ? AND "archived_at" IS ?
ORDER BY "created_at" desc, "id" asc

The first real win is a mixed parent predicate and child EXISTS under OR.

assignment.where(({and, cmp, exists, or}) =>
  and(
    cmp('archived_at', 'IS', null),
    or(
      cmp('teacher_id', '=', 1),
      exists('assignment_to_student', q =>
        q.where('student_id', '=', 'student-1'),
      ),
    ),
  ),
);

Current origin/main chooses a semi-join for the membership branch. On the 2,000 assignment scenario, that is one assignment scan plus 2,003 membership probes:

SELECT "id","teacher_id","archived_at","created_at"
FROM "assignment"
WHERE "archived_at" IS ?
ORDER BY "created_at" desc, "id" asc

-- repeated 2,003 times
SELECT "assignment_id","student_id","created_at"
FROM "assignment_to_student"
WHERE "assignment_id" = ? AND "student_id" = ?
ORDER BY "assignment_id" asc, "student_id" asc

With this stack, the membership branch starts from the student index and fetches only matching parents. The SQL work drops from 2,004 SQL calls to 5:

SELECT "id","teacher_id","archived_at","created_at"
FROM "assignment"
WHERE "archived_at" IS ?
ORDER BY "created_at" desc, "id" asc

SELECT "assignment_id","student_id","created_at"
FROM "assignment_to_student"
WHERE "student_id" = ?
ORDER BY "assignment_id" asc, "student_id" asc

-- repeated 3 times
SELECT "id","teacher_id","archived_at","created_at"
FROM "assignment"
WHERE "id" = ? AND "archived_at" IS ?
ORDER BY "created_at" desc, "id" asc

The second win is root union for a plain parent OR EXISTS shape.

assignment.where(({cmp, exists, or}) =>
  or(
    cmp('teacher_id', '=', 1),
    exists('assignment_to_student', q =>
      q.where('student_id', '=', 'student-1'),
    ),
  ),
);

Current origin/main and the stacked base both use the child index for the EXISTS, but the parent branch is still a full assignment scan:

SELECT "id","teacher_id","archived_at","created_at"
FROM "assignment"
ORDER BY "created_at" desc, "id" asc

SELECT "assignment_id","student_id","created_at"
FROM "assignment_to_student"
WHERE "student_id" = ?
ORDER BY "assignment_id" asc, "student_id" asc

-- repeated 3 times
SELECT "id","teacher_id","archived_at","created_at"
FROM "assignment"
WHERE "id" = ?
ORDER BY "created_at" desc, "id" asc

This PR turns that into two selective roots. In the scenario seed, the old parent branch reads 2,000 assignments. The new parent branch reads the 20 assignments with teacher_id = 1.

SELECT "id","teacher_id","archived_at","created_at"
FROM "assignment"
WHERE "teacher_id" = ?
ORDER BY "created_at" desc, "id" asc

SELECT "assignment_id","student_id","created_at"
FROM "assignment_to_student"
WHERE "student_id" = ?
ORDER BY "assignment_id" asc, "student_id" asc

-- repeated 3 times
SELECT "id","teacher_id","archived_at","created_at"
FROM "assignment"
WHERE "id" = ?
ORDER BY "created_at" desc, "id" asc

The third win is sibling EXISTS on the same relationship.

assignment
  .whereExists('assignment_to_student', q =>
    q.where('student_id', '=', 'student-1'),
  )
  .whereExists('assignment_to_student', q =>
    q.where('student_id', '=', 'student-2'),
  );

Current origin/main and the stacked base partially flip this. They scan student-1, fetch two parent assignments, then probe the second membership predicate once per fetched parent, plus the final stream exhaustion probe:

SELECT "assignment_id","student_id","created_at"
FROM "assignment_to_student"
WHERE "student_id" = ?
ORDER BY "assignment_id" asc, "student_id" asc

-- repeated 2 times
SELECT "id","teacher_id","archived_at","created_at"
FROM "assignment"
WHERE "id" = ? AND TRUE
ORDER BY "created_at" desc, "id" asc

-- repeated 3 times
SELECT "assignment_id","student_id","created_at"
FROM "assignment_to_student"
WHERE "assignment_id" = ? AND "student_id" = ?
ORDER BY "assignment_id" asc, "student_id" asc

This PR scans both child predicates first, intersects by assignment_id, and loads the one surviving parent. The SQL work drops from 6 calls to 3 for current Zero, and from 2,005 calls to 3 compared to a parent-pinned baseline:

-- repeated 2 times, once per student bind value
SELECT "assignment_id","student_id","created_at"
FROM "assignment_to_student"
WHERE "student_id" = ?
ORDER BY "assignment_id" asc, "student_id" asc

SELECT "id","teacher_id","archived_at","created_at"
FROM "assignment"
WHERE "id" = ?
ORDER BY "created_at" desc, "id" asc

The scenario seed makes the correctness check concrete: student-1 is attached to assignments 101 and 102, student-2 is attached to assignments 102 and 1500, and the only assignment returned is 102.

The implementation stays conservative. Root union refuses start, limit, and root related rows. The intersection path refuses nested related rows, nested subqueries, cursors, limits, explicit flip: false, incompatible relationship shapes, and child scans that are not unique for the correlation key. Root union also strips condition-only relationship payloads before merging branches, so the union schema stays honest.

The scenario harness now asserts optimized AST fragments, planner debug output, generated SQL, compacted SQL call counts, and returned rows. That gives us regression coverage for the thing we actually care about: same results, much less physical SQL work.

Verified with:

npm --workspace=zql run format
npm --workspace=zqlite run format
npm --workspace=zql run check-types
npm --workspace=zqlite run check-types
npm --workspace=zql run lint
npm --workspace=zqlite run lint
npm --workspace=zql run test
npm --workspace=zqlite run test

I am keeping this as a draft because it is stacked, and because the main review question is architectural. The optimization is good database-engine behavior, similar in spirit to SQLite's OR-by-union and PostgreSQL's bitmap-style key combination, but these physical alternatives probably want to move out of builder.ts if Zero keeps growing this planner surface.

Why: makes the query optimizer rules easier to audit and extend. * Replace parallel OR and AND merge bookkeeping with a shared column domain rewrite rule * Add coverage for overlapping IN predicate unions

Why: pin the edge cases that can quietly reintroduce broad scans or dead child branches. * Absorb stricter OR branches even when the shared predicate is not common to every branch * Collapse impossible non-scalar EXISTS branches and add a scenario that proves the child scan disappears * Extend idempotence coverage to generated correlated subquery filters

Karavil added 5 commits April 21, 2026 00:41

fix(zql): normalize planner filters before costing

b343f6e

test(zql): harden planner normalization edge cases

561ad13

♻️ refactor: model planner predicates as domains

7dd2095

Why: makes the query optimizer rules easier to audit and extend. * Replace parallel OR and AND merge bookkeeping with a shared column domain rewrite rule * Add coverage for overlapping IN predicate unions

fix(zql): execute sibling exists from selective roots

f3e4e47

Karavil changed the title ~~fix(zql): normalize planner filters before costing~~ fix(zql): execute sibling exists from selective roots Apr 21, 2026

Karavil added 8 commits April 21, 2026 10:40

docs(zql): explain selective root operators

dc9bfb5

docs(zqlite): explain planner scenario SQL

c1501ff

refactor(zql): clarify intersectable exists guards

0912711

docs(zql): clarify planner scenarios

d35795f

docs(zql): make optimizer comments generic

758822b

docs(zql): improve optimizer ascii diagrams

8d9b1d4

docs(zqlite): clarify planner scenario traces

96d4844

fix(zql): harden set operator planner rewrites

54a53ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(zql): execute sibling exists from selective roots#1

fix(zql): execute sibling exists from selective roots#1
Karavil wants to merge 13 commits intocodex/mixed-or-costingfrom
codex/query-engine-normalization

Karavil commented Apr 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Karavil commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Karavil commented Apr 21, 2026 •

edited

Loading