Project-Navi · Navi Bot (project-navi-bot) · Jun 3, 2026 · Jun 2, 2026 · Jun 3, 2026 · Jun 3, 2026
@@ -156,6 +156,7 @@ The runtime dependency floor is `numpy>=2.2`.
   [`docs/ALTERNATIVES_CONSIDERED.md`](https://github.com/Fieldnote-Echo/ordvec/blob/main/docs/ALTERNATIVES_CONSIDERED.md)
 - **Index-file trust model:**
   [`docs/INDEX_PROVENANCE.md`](https://github.com/Fieldnote-Echo/ordvec/blob/main/docs/INDEX_PROVENANCE.md),
+  [`docs/determinism.md`](https://github.com/Fieldnote-Echo/ordvec/blob/main/docs/determinism.md),
   [`THREAT_MODEL.md`](https://github.com/Fieldnote-Echo/ordvec/blob/main/THREAT_MODEL.md)
 - **Repo-local manifest verifier, C ABI, and Go wrapper:**
   available from the full GitHub checkout. These sidecars are not part of the

@@ -1,17 +1,11 @@
-# Follow-up: deterministic tie-breaking for body bitmap candidate selection
+# Resolved: deterministic tie-breaking for body bitmap candidate selection
 
-`Bitmap::top_m_candidates` and `top_m_candidates_batched`
-(in `src/bitmap.rs`) currently partition on
-bitmap overlap score alone. Boundary ties are not rare — overlap
-scores are small integers (`0..n_top`, e.g. `0..256`), so multiple
-docs frequently share the cutoff score, and `select_nth_unstable_by`
-may then choose different equal-scored docs at the boundary across
-runs or dispatch paths.
+`Bitmap::top_m_candidates` and `top_m_candidates_batched` now partition and
+sort by the composite key `(score desc, doc_id asc)`. Boundary ties are not
+rare because overlap scores are small integers (`0..n_top`, e.g. `0..256`), so
+the candidate set at the cutoff must be fully determined by score and row ID.
 
-**Fix**: add composite-key ordering `(score desc, doc_id asc)` to
-both the partition predicate (`select_nth_unstable_by`) and the
-post-partition sort (`sort_unstable_by`), so the candidate set at any
-given M is fully determined by `(score, doc_id)`.
+The fixed comparator is:
 
 ```rust
 let mut cmp = |&a: &u32, &b: &u32| {
@@ -23,9 +17,6 @@ idx.select_nth_unstable_by(m_eff - 1, &mut cmp);
 idx[..m_eff].sort_unstable_by(&mut cmp);
 ```
 
-**Keep it as a standalone change.** Rolling the determinism fix into
-an unrelated benchmark or kernel change would muddy attribution — if
-recall/latency numbers move, it should be clear whether the kernel
-changed or only the tie-break at the candidate-set boundary changed.
-The fix is behaviour-preserving on score ordering and only pins the
-boundary, so it is safe to land on its own.
+The broader search-output policy is now tracked in
+[`determinism.md`](determinism.md). Future changes to golden row IDs, tie keys,
+or duplicate-candidate behavior need an explicit compatibility note.
@@ -328,10 +328,10 @@ facts qualify this:
   rank-mode README recommends, and where the structural prior pays
   off.
 - **The asymmetric AVX-512 kernel is an exact packed scan, not an ANN
-  approximation.** It returns identical top-k to the scalar RankQuant
-  scorer and agrees within 1e-4 on scores (verified by
-  `rankquant_asymmetric_matches_reference_b{1,2,4}` in
-  `tests/index/quant.rs`).
+  approximation.** It is checked against the scalar RankQuant scorer with
+  score tolerances and deterministic golden tie fixtures (see
+  [`determinism.md`](determinism.md)); the random reference tests avoid
+  overfitting top-k order at near-tolerance boundaries.
 
 The byte-LUT scorer remains in the codebase as a labelled reference
 path (`ordvec::search_asymmetric_byte_lut`,
@@ -435,6 +435,9 @@ single-pass b=2 fast path; it supports `add`/`search` but not
 bilinear bucket-overlap decomposition and is reachable only behind the
 `experimental` feature.
 
+Search result ordering, backend score-equivalence expectations, tie keys, and
+empty-result shapes are specified in [`determinism.md`](determinism.md).
+
 ## Test coverage
 
 `cargo test --lib` — unit tests for the primitives in

@@ -0,0 +1,86 @@
+# Search Determinism Contract
+
+This document states the compatibility contract for ordvec search output:
+scores, ordering, tie handling, backend dispatch, and empty-result shape. It
+covers the primitive retrieval surface only. It does not define distributed
+merge order, replication, storage manifests, or deployment policy.
+
+## Global Ordering Rule
+
+For public top-k search results, ordvec orders hits by:
+
+1. score descending;
+2. row ID ascending when scores compare equal.
+
+The row ID is the internal zero-based insertion row. Subset APIs receive row
+IDs from the caller and return the same global row IDs. Duplicate candidate IDs
+are scored as duplicate candidate entries and may produce duplicate hits.
+
+`k` is clamped to the search space before result buffers are allocated. A
+full-index search space is the number of indexed rows. A subset search space is
+the candidate-list length. If the effective `k` is zero, or the search space is
+empty, search returns an empty result shape rather than padded sentinel hits.
+
+## Backend Scope
+
+Backend selection must not change the documented ordering rule. Exact integer
+popcount primitives are bit-exact across scalar, AVX-512, aarch64 NEON, and
+wasm `simd128` implementations. Floating-point score equivalence uses an
+absolute tolerance of `1e-4` and no relative tolerance (`rtol = 0`) unless a
+row below explicitly states that the score is integer-exact. Some tests use
+tighter tolerances for specific scalar helper comparisons, but `1e-4` is the
+public cross-backend/architecture compatibility tolerance. Intentional changes
+to that tolerance or to golden top-k output are compatibility-affecting and
+must be called out in the PR and release notes.
+
+Query-level parallelism may change scheduling, but each query is scored and
+finalized independently. Batched APIs must match the corresponding single-query
+API for the same query rows, modulo the primitive-specific tolerance stated
+below. Floating-point comparison tolerances apply only to score equivalence;
+the public hit order still follows the global ordering rule above.
+
+## Primitive Contracts
+
+| Surface | Score contract | Tie key | Backend contract |
+| --- | --- | --- | --- |
+| `Rank::search` | Normalized Spearman-style rank cosine; floating scores are tolerance-based with absolute tolerance `1e-4`, `rtol = 0`. | Global row ID ascending. | Fixed scalar arithmetic per row; query parallelism does not affect per-query output. |
+| `Rank::search_asymmetric` | Float query against stored ranks; floating scores are tolerance-based with absolute tolerance `1e-4`, `rtol = 0`. | Global row ID ascending. | Fixed scalar arithmetic per row; query parallelism does not affect per-query output. |
+| `RankQuant::search` | Symmetric bucketed-rank score; floating scores are tolerance-based with absolute tolerance `1e-4`, `rtol = 0`. | Global row ID ascending. | Scalar packed-byte LUT path; query parallelism does not affect per-query output. |
+| `RankQuant::search_asymmetric` | Float query against stored buckets; floating scores are tolerance-based with absolute tolerance `1e-4`, `rtol = 0`. | Global row ID ascending. | AVX-512, AVX2, and scalar-LUT dispatch must agree with the scalar reference within the public tolerance and preserve top-k order for the golden fixtures. |
+| `RankQuant::search_asymmetric_subset` | Same score as `RankQuant::search_asymmetric`, restricted to caller-supplied candidates; floating scores use the same `1e-4` absolute tolerance and `rtol = 0`. | Global row ID ascending, not candidate-list position. Duplicate candidate IDs remain duplicate entries. | Uses the same AVX-512, AVX2, or scalar dispatch as full asymmetric search over a gathered scratch buffer. |
+| `Bitmap::search` | Exact `popcount(Q AND D)` represented as `f32`; score is integer-exact, not tolerance-based. | Global row ID ascending. | Popcount scores are integer-exact across scalar and SIMD implementations. |
+| `Bitmap::top_m_candidates` | Exact `popcount(Q AND D)` candidate ordering; score key is integer-exact, not tolerance-based. | Global row ID ascending. | Single-query and batched candidate APIs must return the same ordered candidates. |
+| `Bitmap::search_subset` | Exact subset `popcount(Q AND D)` represented as `f32`; score is integer-exact, not tolerance-based. | Global row ID ascending. Duplicate candidate IDs remain duplicate entries. | Subset score kernels must agree bit-exactly with scalar popcount. |
+| `SignBitmap::top_m_candidates` | Lowest Hamming distance, equivalently highest sign agreement; score key is integer-exact, not tolerance-based. | Global row ID ascending. | Single-query and batched candidate APIs must return the same ordered candidates. |
+| `SignBitmap::score_all` | Dense sign-agreement counts aligned by row ID; scores are `u32` integer-exact, not tolerance-based. | Not a top-k API. | Popcount scores are integer-exact across scalar and SIMD implementations. |
+
+## FastScan
+
+`RankQuantFastscan` is a hidden, optional b=2 pre-ranker. It is deterministic
+for a fixed index, query, and backend dispatch, and its scalar and AVX-512
+FastScan kernels operate on the same quantized LUT inputs. It is not
+score-equivalent to exact `RankQuant::search_asymmetric`: the global 8-bit LUT
+quantization is intentional and can change scores or boundary ordering. Callers
+that need exact RankQuant scores should use `RankQuant::search_asymmetric` or
+`RankQuant::search_asymmetric_subset`.
+
+## Compatibility Notes
+
+Intentional changes to any of these are compatibility-affecting:
+
+- golden top-k row IDs;
+- tie keys or duplicate-candidate behavior;
+- empty-result or `k` clamping shape;
+- scalar/SIMD score tolerance;
+- whether an API is exact or approximate;
+- whether a backend is covered by this contract.
+
+Such changes need a compatibility note in the PR and release notes. Performance
+changes that preserve the same scores, row ordering, tie keys, and empty-result
+shape are not search-contract breaks.
+
+Compatibility note for this contract PR: `RankQuant::search_asymmetric_subset`
+now breaks equal-score ties by global row ID instead of local candidate-list
+position. That matches full-index search, C ABI hit ordering, Python binding
+ordering, and the candidate prefilters. Duplicate candidate IDs are still
+scored as duplicate entries and may still produce duplicate hits.
@@ -833,11 +833,9 @@ pub unsafe extern "C" fn ordvec_index_search(
             (LoadedIndex::RankQuant(index), Some(rows)) => {
                 // Ask the core for every candidate score, then normalize by the
                 // ABI's global row-id tie policy before truncating. The core
-                // subset helper breaks ties by candidate position before
-                // mapping back to global row IDs, so requesting only k could
-                // drop a boundary-tied lower row ID from an unsorted candidate
-                // list. Materializing all candidates preserves the ABI ordering
-                // contract until core exposes a global-row top-k scorer.
+                // subset helper uses global row IDs as score-tie keys; keeping
+                // the ABI normalization centralized preserves duplicate and
+                // boundary handling for caller-supplied candidate lists.
                 let (scores, indices) =
                     index.search_asymmetric_subset(validation.query, rows, rows.len());
                 normalize_global_order(scores, indices, validation.required_hits)

@@ -299,8 +299,7 @@ def test_search_asymmetric_subset_matches_full_when_candidates_eq_all():
     # When the candidate set is every doc, the subset path must agree
     # with full `search_asymmetric` on the top-k. Both use the
     # asymmetric kernel; the subset path just iterates the candidate
-    # list instead of all N docs. (Allow set equality — ties may
-    # permute within the same scoring tier.)
+    # list instead of all N docs.
     vectors = unit_vectors(40, 128, seed=0)
     idx = RankQuant(dim=128, bits=2)
     idx.add(vectors)
@@ -310,7 +309,21 @@ def test_search_asymmetric_subset_matches_full_when_candidates_eq_all():
     _, subset_ids = idx.search_asymmetric_subset(query, candidates, k=10)
 
     _, full_ids = idx.search_asymmetric(query[None, :], k=10)
-    assert set(int(i) for i in subset_ids) == set(int(i) for i in full_ids[0])
+    np.testing.assert_array_equal(subset_ids, full_ids[0])
+
+
+def test_search_asymmetric_subset_ties_use_global_row_ids():
+    vectors = np.ones((12, 64), dtype=np.float32)
+    idx = RankQuant(dim=64, bits=2)
+    idx.add(vectors)
+
+    candidates = np.array([9, 3, 7, 1], dtype=np.uint32)
+    scores, ids = idx.search_asymmetric_subset(
+        np.zeros(64, dtype=np.float32), candidates, k=2
+    )
+
+    np.testing.assert_array_equal(ids, np.array([1, 3], dtype=np.int64))
+    np.testing.assert_array_equal(scores, np.array([0.0, 0.0], dtype=np.float32))
 
 
 def test_search_asymmetric_subset_k_caps_at_candidate_count():

@@ -80,22 +80,16 @@ def test_top_m_candidates_batched_shape():
 
 
 def test_batched_matches_scalar_for_each_row():
-    # The batched AVX-512 VPOPCNTDQ kernel must agree with the scalar
-    # path on the same query at top-1; we check the leading match for a
-    # small batch (boundary ties at deeper ranks may diverge — see the
-    # body-kernel tie-break follow-up, separate from sign-bitmap).
+    # The batched kernel must agree with the single-query path for the full
+    # ordered top-m row, including boundary ties.
     idx = SignBitmap(dim=128)
     idx.add(unit_vectors(60, 128, seed=0))
     queries = unit_vectors(6, 128, seed=99)
 
     batched = idx.top_m_candidates_batched(queries, m=5)
     for i in range(6):
         scalar = idx.top_m_candidates(queries[i], m=5)
-        # Top-1 must agree exactly across both code paths.
-        assert int(batched[i, 0]) == int(scalar[0]), (
-            f"batched vs scalar disagree on top-1 for query {i}: "
-            f"batched={batched[i, 0]} scalar={scalar[0]}"
-        )
+        np.testing.assert_array_equal(batched[i], scalar)
 
 
 def test_empty_batch_returns_consistent_column_count():

@@ -536,7 +536,9 @@ impl RankQuant {
     /// subset (e.g., the top-M from a bitmap probe). Returns
     /// `(scores, indices)`: the top-`k` scores and their corresponding
     /// **global** doc IDs (the local candidate positions are mapped back
-    /// to global IDs before returning).
+    /// to global IDs before returning). Results are ordered by score
+    /// descending, then global row ID ascending, matching the full-index
+    /// search tie policy even when `candidates` is unsorted.
     ///
     /// Uses the same AVX-512 → AVX2 → scalar dispatch as
     /// [`Self::search_asymmetric`] and the same centre-drop math, just
@@ -606,7 +608,7 @@ impl RankQuant {
         // never reaches a kernel that would drop its tail chunk.
         #[cfg_attr(not(target_arch = "x86_64"), allow(unused_variables))]
         let simd_tier = select_simd_tier(dim, bits);
-        let mut top = TopK::new(k_eff);
+        let mut top = TopK::new_with_tie_keys(k_eff, candidates);
         #[cfg_attr(not(target_arch = "x86_64"), allow(unused_mut))]
         let mut centre_drop_used = false;
         #[cfg(target_arch = "x86_64")]