[codex] Document backend determinism contract#155
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Code Review
This pull request establishes a formal Search Determinism Contract and updates the codebase to guarantee deterministic search results across different backends and query paths. Notably, it modifies the TopK collector to break score ties using global row IDs instead of local candidate-list positions during subset scans, and updates candidate prefilters to use composite-key sorting. Feedback on this PR highlights a potential edge case in TopK::finalize_into where duplicate candidate IDs with identical scores and tie keys could still be ordered nondeterministically due to the unstable sort; adding the local index as a final tie-breaker is recommended to ensure absolute determinism.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Review Summary by QodoDocument backend determinism contract and fix subset tie-breaking
WalkthroughsDescription• Document backend determinism contract with golden fixtures • Fix RankQuant::search_asymmetric_subset tie-breaking to use global row IDs • Update TopK to support caller-supplied tie keys for subset scans • Add comprehensive determinism tests covering full search, subset, and batched paths Diagramflowchart LR
A["TopK struct"] -->|"add tie_keys field"| B["Support configurable tie keys"]
B -->|"used by"| C["search_asymmetric_subset"]
C -->|"now breaks ties by"| D["Global row ID ascending"]
D -->|"matches"| E["Full search policy"]
E -->|"documented in"| F["determinism.md"]
F -->|"verified by"| G["Golden fixtures"]
File Changes1. docs/determinism.md
|
Code Review by Qodo
Context used 1. Tolerance value missing in docs
|
There was a problem hiding this comment.
Pull request overview
This PR documents and enforces a cross-backend determinism/ordering contract for ordvec search APIs, and aligns RankQuant::search_asymmetric_subset tie-breaking with the same global row-id ordering used by full-index search, the C ABI, and Python bindings.
Changes:
- Added
docs/determinism.mddescribing score/order/tie-key guarantees, backend tolerance scope, batched-vs-single parity, and compatibility-note requirements. - Changed
RankQuant::search_asymmetric_subsetto break equal-score ties by global row ID ascending (not candidate-list position) by introducing tie-key support in the sharedTopKcollector. - Added deterministic golden tests/fixtures (Rust + Python) covering ties, subset ordering, batched parity, and empty/zero-
kshapes; updated older docs/comments to point at the new contract.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/redteam_beta.rs | Clarifies why set-equality is used for random/tolerance-boundary checks and points to determinism fixtures for ordering. |
| tests/index/quant.rs | Same clarification for quant reference tests; avoids asserting strict order where tolerance can affect near-ties. |
| tests/determinism_contract.rs | New golden tests pinning deterministic ordering/tie behavior across primitives and edge-case shapes. |
| src/util.rs | Extends TopK to support caller-supplied tie keys, enabling subset scans to order by global row IDs while returning local indices for mapping. |
| src/quant.rs | Updates search_asymmetric_subset to use TopK::new_with_tie_keys(...) and documents the global-row tie policy. |
| README.md | Adds link to the determinism contract doc. |
| ordvec-python/tests/test_sign_bitmap.py | Strengthens parity assertion: batched candidate selection must match single-query results including ties. |
| ordvec-python/tests/test_rank_quant.py | Requires subset search ordering to match full search (and adds explicit subset tie-order test). |
| ordvec-ffi/src/lib.rs | Updates C ABI commentary to reflect new core subset tie behavior and normalization expectations. |
| docs/RANK_MODES.md | Removes over-strong claims about exact scalar/SIMD equivalence; points readers to determinism contract. |
| docs/FOLLOWUP_BODY_KERNEL_TIE_BREAK.md | Marks prior tie-break follow-up as resolved and references the new determinism contract as the source of truth. |
| docs/determinism.md | New compatibility contract document for ordering, ties, tolerance scope, and compatibility-note triggers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
225a8c1 to
5504784
Compare
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
|
Addressed Qodo's determinism tolerance-doc finding in d7fba4ab9fd2be7194eb9d4348c935d11b6b69d0.
Local validation:
The inline Qodo thread has also been replied to and resolved. |
Summary
Closes #127.
docs/determinism.mdcovering score ordering, tie keys, backend scope, scalar/SIMD tolerance language, batched/single-query parity, FastScan's approximate status, and compatibility-note requirements.RankQuant::search_asymmetric_subsetequal-score ties from local candidate position to global row ID, matching full search, candidate prefilters, Python, and C ABI ordering.Compatibility Note
This PR intentionally changes
RankQuant::search_asymmetric_subsetbehavior for unsorted equal-score candidate lists: ties now resolve by global row ID ascending, not candidate-list position. Duplicate candidate IDs are still accepted, scored separately, and may produce duplicate hits.Scope Notes
This stays on
origin/mainand does not stack on PR #153 or PR #154. It does not change manifest verification, persisted format bytes, release policy, or storage semantics.RankQuantFastscanremains explicitly separate as an approximate hidden pre-ranker rather than an exact RankQuant parity surface.Validation
cargo fmt --checkcargo test --test determinism_contractcargo test --test indexcargo test --test redteam_betacargo test --test redteam_deltacargo test -p ordvec-ffiVIRTUAL_ENV=/home/ndspence/GitHub/ordvec/ordvec-python/.venv PATH=/home/ndspence/GitHub/ordvec/ordvec-python/.venv/bin:$PATH maturin develop -m ordvec-python/Cargo.tomlordvec-python/.venv/bin/python -m pytest ordvec-python/tests/test_rank_quant.py::test_search_asymmetric_subset_matches_full_when_candidates_eq_all ordvec-python/tests/test_rank_quant.py::test_search_asymmetric_subset_ties_use_global_row_ids ordvec-python/tests/test_sign_bitmap.py::test_batched_matches_scalar_for_each_rowcargo test -p ordvec --all-featurescargo test --workspace --exclude ordvec-python --all-featurescargo check -p ordvec-python --all-featurescargo clippy --workspace --all-targets --all-features -- -D warningscargo deny checkgit diff --checkAdversarial Review
A read-only adversarial review found two documentation blockers: the current subset tie-key behavior change needed an explicit compatibility note, and
docs/RANK_MODES.mdoverpromised exact SIMD/scalar top-k equivalence. Both were fixed before publishing this PR.