fix(zql): execute sibling exists from selective roots#1
Draft
Karavil wants to merge 13 commits intocodex/mixed-or-costingfrom
Draft
fix(zql): execute sibling exists from selective roots#1Karavil wants to merge 13 commits intocodex/mixed-or-costingfrom
Karavil wants to merge 13 commits intocodex/mixed-or-costingfrom
Conversation
Why: makes the query optimizer rules easier to audit and extend. * Replace parallel OR and AND merge bookkeeping with a shared column domain rewrite rule * Add coverage for overlapping IN predicate unions
Why: pin the edge cases that can quietly reintroduce broad scans or dead child branches. * Absorb stricter OR branches even when the shared predicate is not common to every branch * Collapse impossible non-scalar EXISTS branches and add a scenario that proves the child scan disappears * Extend idempotence coverage to generated correlated subquery filters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on rocicorp#5851.
This PR makes selective
whereExistsplans executable without app code manually setting{flip: true}. The previous stack teaches the planner to cost mixedORbranches more honestly. This PR adds the missing physical shapes: a root union for independentORbranches, and a child-key intersection for siblingEXISTSbranches on the same relationship.I regenerated the SQL below against current
origin/mainand this branch with the same hypothetical education app schema inspired by goblinsapp.com. The data is synthetic, but the shape is the one that mattered for us: 2,000 assignments, sparseassignment_to_studentrows, and permission-style filters expressed asEXISTS. There are no personal names or real customer rows.Current Zero already handles a simple single
EXISTSwell. This is not where the PR wins:Current
origin/mainand this branch both generate the child-root shape:The first real win is a mixed parent predicate and child
EXISTSunderOR.Current
origin/mainchooses a semi-join for the membership branch. On the 2,000 assignment scenario, that is one assignment scan plus 2,003 membership probes:With this stack, the membership branch starts from the student index and fetches only matching parents. The SQL work drops from 2,004 SQL calls to 5:
The second win is root union for a plain parent
OR EXISTSshape.Current
origin/mainand the stacked base both use the child index for theEXISTS, but the parent branch is still a full assignment scan:This PR turns that into two selective roots. In the scenario seed, the old parent branch reads 2,000 assignments. The new parent branch reads the 20 assignments with
teacher_id = 1.The third win is sibling
EXISTSon the same relationship.Current
origin/mainand the stacked base partially flip this. They scanstudent-1, fetch two parent assignments, then probe the second membership predicate once per fetched parent, plus the final stream exhaustion probe:This PR scans both child predicates first, intersects by
assignment_id, and loads the one surviving parent. The SQL work drops from 6 calls to 3 for current Zero, and from 2,005 calls to 3 compared to a parent-pinned baseline:The scenario seed makes the correctness check concrete:
student-1is attached to assignments 101 and 102,student-2is attached to assignments 102 and 1500, and the only assignment returned is 102.The implementation stays conservative. Root union refuses
start,limit, and rootrelatedrows. The intersection path refuses nested related rows, nested subqueries, cursors, limits, explicitflip: false, incompatible relationship shapes, and child scans that are not unique for the correlation key. Root union also strips condition-only relationship payloads before merging branches, so the union schema stays honest.The scenario harness now asserts optimized AST fragments, planner debug output, generated SQL, compacted SQL call counts, and returned rows. That gives us regression coverage for the thing we actually care about: same results, much less physical SQL work.
Verified with:
I am keeping this as a draft because it is stacked, and because the main review question is architectural. The optimization is good database-engine behavior, similar in spirit to SQLite's OR-by-union and PostgreSQL's bitmap-style key combination, but these physical alternatives probably want to move out of
builder.tsif Zero keeps growing this planner surface.