diff --git a/README.md b/README.md
index b4e9846b..971e9063 100644
--- a/README.md
+++ b/README.md
@@ -156,6 +156,7 @@ The runtime dependency floor is `numpy>=2.2`.
   [`docs/ALTERNATIVES_CONSIDERED.md`](https://github.com/Fieldnote-Echo/ordvec/blob/main/docs/ALTERNATIVES_CONSIDERED.md)
 - **Index-file trust model:**
   [`docs/INDEX_PROVENANCE.md`](https://github.com/Fieldnote-Echo/ordvec/blob/main/docs/INDEX_PROVENANCE.md),
+  [`docs/determinism.md`](https://github.com/Fieldnote-Echo/ordvec/blob/main/docs/determinism.md),
   [`THREAT_MODEL.md`](https://github.com/Fieldnote-Echo/ordvec/blob/main/THREAT_MODEL.md)
 - **Repo-local manifest verifier, C ABI, and Go wrapper:**
   available from the full GitHub checkout. These sidecars are not part of the
diff --git a/docs/FOLLOWUP_BODY_KERNEL_TIE_BREAK.md b/docs/FOLLOWUP_BODY_KERNEL_TIE_BREAK.md
index 0994a25d..ac0452ff 100644
--- a/docs/FOLLOWUP_BODY_KERNEL_TIE_BREAK.md
+++ b/docs/FOLLOWUP_BODY_KERNEL_TIE_BREAK.md
@@ -1,17 +1,11 @@
-# Follow-up: deterministic tie-breaking for body bitmap candidate selection
+# Resolved: deterministic tie-breaking for body bitmap candidate selection
 
-`Bitmap::top_m_candidates` and `top_m_candidates_batched`
-(in `src/bitmap.rs`) currently partition on
-bitmap overlap score alone. Boundary ties are not rare — overlap
-scores are small integers (`0..n_top`, e.g. `0..256`), so multiple
-docs frequently share the cutoff score, and `select_nth_unstable_by`
-may then choose different equal-scored docs at the boundary across
-runs or dispatch paths.
+`Bitmap::top_m_candidates` and `top_m_candidates_batched` now partition and
+sort by the composite key `(score desc, doc_id asc)`. Boundary ties are not
+rare because overlap scores are small integers (`0..n_top`, e.g. `0..256`), so
+the candidate set at the cutoff must be fully determined by score and row ID.
 
-**Fix**: add composite-key ordering `(score desc, doc_id asc)` to
-both the partition predicate (`select_nth_unstable_by`) and the
-post-partition sort (`sort_unstable_by`), so the candidate set at any
-given M is fully determined by `(score, doc_id)`.
+The fixed comparator is:
 
 ```rust
 let mut cmp = |&a: &u32, &b: &u32| {
@@ -23,9 +17,6 @@ idx.select_nth_unstable_by(m_eff - 1, &mut cmp);
 idx[..m_eff].sort_unstable_by(&mut cmp);
 ```
 
-**Keep it as a standalone change.** Rolling the determinism fix into
-an unrelated benchmark or kernel change would muddy attribution — if
-recall/latency numbers move, it should be clear whether the kernel
-changed or only the tie-break at the candidate-set boundary changed.
-The fix is behaviour-preserving on score ordering and only pins the
-boundary, so it is safe to land on its own.
+The broader search-output policy is now tracked in
+[`determinism.md`](determinism.md). Future changes to golden row IDs, tie keys,
+or duplicate-candidate behavior need an explicit compatibility note.
diff --git a/docs/RANK_MODES.md b/docs/RANK_MODES.md
index bb47a98b..f36fed14 100644
--- a/docs/RANK_MODES.md
+++ b/docs/RANK_MODES.md
@@ -328,10 +328,10 @@ facts qualify this:
   rank-mode README recommends, and where the structural prior pays
   off.
 - **The asymmetric AVX-512 kernel is an exact packed scan, not an ANN
-  approximation.** It returns identical top-k to the scalar RankQuant
-  scorer and agrees within 1e-4 on scores (verified by
-  `rankquant_asymmetric_matches_reference_b{1,2,4}` in
-  `tests/index/quant.rs`).
+  approximation.** It is checked against the scalar RankQuant scorer with
+  score tolerances and deterministic golden tie fixtures (see
+  [`determinism.md`](determinism.md)); the random reference tests avoid
+  overfitting top-k order at near-tolerance boundaries.
 
 The byte-LUT scorer remains in the codebase as a labelled reference
 path (`ordvec::search_asymmetric_byte_lut`,
@@ -435,6 +435,9 @@ single-pass b=2 fast path; it supports `add`/`search` but not
 bilinear bucket-overlap decomposition and is reachable only behind the
 `experimental` feature.
 
+Search result ordering, backend score-equivalence expectations, tie keys, and
+empty-result shapes are specified in [`determinism.md`](determinism.md).
+
 ## Test coverage
 
 `cargo test --lib` — unit tests for the primitives in
diff --git a/docs/determinism.md b/docs/determinism.md
new file mode 100644
index 00000000..40a64722
--- /dev/null
+++ b/docs/determinism.md
@@ -0,0 +1,86 @@
+# Search Determinism Contract
+
+This document states the compatibility contract for ordvec search output:
+scores, ordering, tie handling, backend dispatch, and empty-result shape. It
+covers the primitive retrieval surface only. It does not define distributed
+merge order, replication, storage manifests, or deployment policy.
+
+## Global Ordering Rule
+
+For public top-k search results, ordvec orders hits by:
+
+1. score descending;
+2. row ID ascending when scores compare equal.
+
+The row ID is the internal zero-based insertion row. Subset APIs receive row
+IDs from the caller and return the same global row IDs. Duplicate candidate IDs
+are scored as duplicate candidate entries and may produce duplicate hits.
+
+`k` is clamped to the search space before result buffers are allocated. A
+full-index search space is the number of indexed rows. A subset search space is
+the candidate-list length. If the effective `k` is zero, or the search space is
+empty, search returns an empty result shape rather than padded sentinel hits.
+
+## Backend Scope
+
+Backend selection must not change the documented ordering rule. Exact integer
+popcount primitives are bit-exact across scalar, AVX-512, aarch64 NEON, and
+wasm `simd128` implementations. Floating-point score equivalence uses an
+absolute tolerance of `1e-4` and no relative tolerance (`rtol = 0`) unless a
+row below explicitly states that the score is integer-exact. Some tests use
+tighter tolerances for specific scalar helper comparisons, but `1e-4` is the
+public cross-backend/architecture compatibility tolerance. Intentional changes
+to that tolerance or to golden top-k output are compatibility-affecting and
+must be called out in the PR and release notes.
+
+Query-level parallelism may change scheduling, but each query is scored and
+finalized independently. Batched APIs must match the corresponding single-query
+API for the same query rows, modulo the primitive-specific tolerance stated
+below. Floating-point comparison tolerances apply only to score equivalence;
+the public hit order still follows the global ordering rule above.
+
+## Primitive Contracts
+
+| Surface | Score contract | Tie key | Backend contract |
+| --- | --- | --- | --- |
+| `Rank::search` | Normalized Spearman-style rank cosine; floating scores are tolerance-based with absolute tolerance `1e-4`, `rtol = 0`. | Global row ID ascending. | Fixed scalar arithmetic per row; query parallelism does not affect per-query output. |
+| `Rank::search_asymmetric` | Float query against stored ranks; floating scores are tolerance-based with absolute tolerance `1e-4`, `rtol = 0`. | Global row ID ascending. | Fixed scalar arithmetic per row; query parallelism does not affect per-query output. |
+| `RankQuant::search` | Symmetric bucketed-rank score; floating scores are tolerance-based with absolute tolerance `1e-4`, `rtol = 0`. | Global row ID ascending. | Scalar packed-byte LUT path; query parallelism does not affect per-query output. |
+| `RankQuant::search_asymmetric` | Float query against stored buckets; floating scores are tolerance-based with absolute tolerance `1e-4`, `rtol = 0`. | Global row ID ascending. | AVX-512, AVX2, and scalar-LUT dispatch must agree with the scalar reference within the public tolerance and preserve top-k order for the golden fixtures. |
+| `RankQuant::search_asymmetric_subset` | Same score as `RankQuant::search_asymmetric`, restricted to caller-supplied candidates; floating scores use the same `1e-4` absolute tolerance and `rtol = 0`. | Global row ID ascending, not candidate-list position. Duplicate candidate IDs remain duplicate entries. | Uses the same AVX-512, AVX2, or scalar dispatch as full asymmetric search over a gathered scratch buffer. |
+| `Bitmap::search` | Exact `popcount(Q AND D)` represented as `f32`; score is integer-exact, not tolerance-based. | Global row ID ascending. | Popcount scores are integer-exact across scalar and SIMD implementations. |
+| `Bitmap::top_m_candidates` | Exact `popcount(Q AND D)` candidate ordering; score key is integer-exact, not tolerance-based. | Global row ID ascending. | Single-query and batched candidate APIs must return the same ordered candidates. |
+| `Bitmap::search_subset` | Exact subset `popcount(Q AND D)` represented as `f32`; score is integer-exact, not tolerance-based. | Global row ID ascending. Duplicate candidate IDs remain duplicate entries. | Subset score kernels must agree bit-exactly with scalar popcount. |
+| `SignBitmap::top_m_candidates` | Lowest Hamming distance, equivalently highest sign agreement; score key is integer-exact, not tolerance-based. | Global row ID ascending. | Single-query and batched candidate APIs must return the same ordered candidates. |
+| `SignBitmap::score_all` | Dense sign-agreement counts aligned by row ID; scores are `u32` integer-exact, not tolerance-based. | Not a top-k API. | Popcount scores are integer-exact across scalar and SIMD implementations. |
+
+## FastScan
+
+`RankQuantFastscan` is a hidden, optional b=2 pre-ranker. It is deterministic
+for a fixed index, query, and backend dispatch, and its scalar and AVX-512
+FastScan kernels operate on the same quantized LUT inputs. It is not
+score-equivalent to exact `RankQuant::search_asymmetric`: the global 8-bit LUT
+quantization is intentional and can change scores or boundary ordering. Callers
+that need exact RankQuant scores should use `RankQuant::search_asymmetric` or
+`RankQuant::search_asymmetric_subset`.
+
+## Compatibility Notes
+
+Intentional changes to any of these are compatibility-affecting:
+
+- golden top-k row IDs;
+- tie keys or duplicate-candidate behavior;
+- empty-result or `k` clamping shape;
+- scalar/SIMD score tolerance;
+- whether an API is exact or approximate;
+- whether a backend is covered by this contract.
+
+Such changes need a compatibility note in the PR and release notes. Performance
+changes that preserve the same scores, row ordering, tie keys, and empty-result
+shape are not search-contract breaks.
+
+Compatibility note for this contract PR: `RankQuant::search_asymmetric_subset`
+now breaks equal-score ties by global row ID instead of local candidate-list
+position. That matches full-index search, C ABI hit ordering, Python binding
+ordering, and the candidate prefilters. Duplicate candidate IDs are still
+scored as duplicate entries and may still produce duplicate hits.
diff --git a/ordvec-ffi/src/lib.rs b/ordvec-ffi/src/lib.rs
index 763c3c0c..03f660d3 100644
--- a/ordvec-ffi/src/lib.rs
+++ b/ordvec-ffi/src/lib.rs
@@ -833,11 +833,9 @@ pub unsafe extern "C" fn ordvec_index_search(
             (LoadedIndex::RankQuant(index), Some(rows)) => {
                 // Ask the core for every candidate score, then normalize by the
                 // ABI's global row-id tie policy before truncating. The core
-                // subset helper breaks ties by candidate position before
-                // mapping back to global row IDs, so requesting only k could
-                // drop a boundary-tied lower row ID from an unsorted candidate
-                // list. Materializing all candidates preserves the ABI ordering
-                // contract until core exposes a global-row top-k scorer.
+                // subset helper uses global row IDs as score-tie keys; keeping
+                // the ABI normalization centralized preserves duplicate and
+                // boundary handling for caller-supplied candidate lists.
                 let (scores, indices) =
                     index.search_asymmetric_subset(validation.query, rows, rows.len());
                 normalize_global_order(scores, indices, validation.required_hits)
diff --git a/ordvec-python/tests/test_rank_quant.py b/ordvec-python/tests/test_rank_quant.py
index cc2893e3..24d0bd96 100644
--- a/ordvec-python/tests/test_rank_quant.py
+++ b/ordvec-python/tests/test_rank_quant.py
@@ -299,8 +299,7 @@ def test_search_asymmetric_subset_matches_full_when_candidates_eq_all():
     # When the candidate set is every doc, the subset path must agree
     # with full `search_asymmetric` on the top-k. Both use the
     # asymmetric kernel; the subset path just iterates the candidate
-    # list instead of all N docs. (Allow set equality — ties may
-    # permute within the same scoring tier.)
+    # list instead of all N docs.
     vectors = unit_vectors(40, 128, seed=0)
     idx = RankQuant(dim=128, bits=2)
     idx.add(vectors)
@@ -310,7 +309,21 @@ def test_search_asymmetric_subset_matches_full_when_candidates_eq_all():
     _, subset_ids = idx.search_asymmetric_subset(query, candidates, k=10)
 
     _, full_ids = idx.search_asymmetric(query[None, :], k=10)
-    assert set(int(i) for i in subset_ids) == set(int(i) for i in full_ids[0])
+    np.testing.assert_array_equal(subset_ids, full_ids[0])
+
+
+def test_search_asymmetric_subset_ties_use_global_row_ids():
+    vectors = np.ones((12, 64), dtype=np.float32)
+    idx = RankQuant(dim=64, bits=2)
+    idx.add(vectors)
+
+    candidates = np.array([9, 3, 7, 1], dtype=np.uint32)
+    scores, ids = idx.search_asymmetric_subset(
+        np.zeros(64, dtype=np.float32), candidates, k=2
+    )
+
+    np.testing.assert_array_equal(ids, np.array([1, 3], dtype=np.int64))
+    np.testing.assert_array_equal(scores, np.array([0.0, 0.0], dtype=np.float32))
 
 
 def test_search_asymmetric_subset_k_caps_at_candidate_count():
diff --git a/ordvec-python/tests/test_sign_bitmap.py b/ordvec-python/tests/test_sign_bitmap.py
index 000378b1..ba3d8d0c 100644
--- a/ordvec-python/tests/test_sign_bitmap.py
+++ b/ordvec-python/tests/test_sign_bitmap.py
@@ -80,10 +80,8 @@ def test_top_m_candidates_batched_shape():
 
 
 def test_batched_matches_scalar_for_each_row():
-    # The batched AVX-512 VPOPCNTDQ kernel must agree with the scalar
-    # path on the same query at top-1; we check the leading match for a
-    # small batch (boundary ties at deeper ranks may diverge — see the
-    # body-kernel tie-break follow-up, separate from sign-bitmap).
+    # The batched kernel must agree with the single-query path for the full
+    # ordered top-m row, including boundary ties.
     idx = SignBitmap(dim=128)
     idx.add(unit_vectors(60, 128, seed=0))
     queries = unit_vectors(6, 128, seed=99)
@@ -91,11 +89,7 @@ def test_batched_matches_scalar_for_each_row():
     batched = idx.top_m_candidates_batched(queries, m=5)
     for i in range(6):
         scalar = idx.top_m_candidates(queries[i], m=5)
-        # Top-1 must agree exactly across both code paths.
-        assert int(batched[i, 0]) == int(scalar[0]), (
-            f"batched vs scalar disagree on top-1 for query {i}: "
-            f"batched={batched[i, 0]} scalar={scalar[0]}"
-        )
+        np.testing.assert_array_equal(batched[i], scalar)
 
 
 def test_empty_batch_returns_consistent_column_count():
diff --git a/src/quant.rs b/src/quant.rs
index 0e4a2ffc..f7700433 100644
--- a/src/quant.rs
+++ b/src/quant.rs
@@ -536,7 +536,9 @@ impl RankQuant {
     /// subset (e.g., the top-M from a bitmap probe). Returns
     /// `(scores, indices)`: the top-`k` scores and their corresponding
     /// **global** doc IDs (the local candidate positions are mapped back
-    /// to global IDs before returning).
+    /// to global IDs before returning). Results are ordered by score
+    /// descending, then global row ID ascending, matching the full-index
+    /// search tie policy even when `candidates` is unsorted.
     ///
     /// Uses the same AVX-512 → AVX2 → scalar dispatch as
     /// [`Self::search_asymmetric`] and the same centre-drop math, just
@@ -606,7 +608,7 @@ impl RankQuant {
         // never reaches a kernel that would drop its tail chunk.
         #[cfg_attr(not(target_arch = "x86_64"), allow(unused_variables))]
         let simd_tier = select_simd_tier(dim, bits);
-        let mut top = TopK::new(k_eff);
+        let mut top = TopK::new_with_tie_keys(k_eff, candidates);
         #[cfg_attr(not(target_arch = "x86_64"), allow(unused_mut))]
         let mut centre_drop_used = false;
         #[cfg(target_arch = "x86_64")]
diff --git a/src/util.rs b/src/util.rs
index 1684ae76..0229f72e 100644
--- a/src/util.rs
+++ b/src/util.rs
@@ -352,28 +352,31 @@ fn xor_popcount_simd128(doc: &[u64], q: &[u64]) -> u32 {
 /// partial sort.
 ///
 /// **Tie-break (deterministic across CPUs).** Ranking is by the
-/// composite key `(score desc, doc_id asc)`: on equal scores the
-/// LOWER doc_id wins, both for eviction and in the final order. SIMD
-/// vs scalar f32 summation-order differences can flip genuine
-/// near-ties between hosts; the composite key removes that
-/// nondeterminism and matches the candidate-gen paths
-/// (`top_m_candidates`) which already partition on `(score, doc_id)`.
-/// The "worst kept" entry — the one evicted first — is therefore the
-/// one with the lowest score and, among equal-score entries, the
-/// HIGHEST doc_id.
+/// composite key `(score desc, tie_key asc)`: on equal scores the lower
+/// tie key wins, both for eviction and in the final order. Full-index
+/// scans use `doc_id` as the tie key. Subset scans may emit local scratch
+/// indices while supplying global row IDs as the tie keys. SIMD vs scalar
+/// f32 summation-order differences can flip genuine near-ties between
+/// hosts; the composite key removes exact-tie nondeterminism and matches
+/// the candidate-gen paths (`top_m_candidates`) which already partition on
+/// `(score, doc_id)`. The "worst kept" entry — the one evicted first — is
+/// therefore the one with the lowest score and, among equal-score entries,
+/// the highest tie key.
 pub(crate) struct TopK {
     k: usize,
     scores: Vec<f32>,
     indices: Vec<i64>,
+    tie_keys: Vec<i64>,
+    tie_key_by_index: Option<Vec<i64>>,
     filled: usize,
-    /// Slot holding the worst kept entry under `(score asc, doc_id
+    /// Slot holding the worst kept entry under `(score asc, tie_key
     /// desc)` — the next to be evicted.
     worst_pos: usize,
     /// Score of the worst kept entry.
     worst_val: f32,
-    /// doc_id of the worst kept entry (used to break score ties:
-    /// among equal scores the higher doc_id is worse to keep).
-    worst_idx: i64,
+    /// Tie key of the worst kept entry. Among equal scores, the higher
+    /// tie key is worse to keep.
+    worst_tie_key: i64,
 }
 
 impl TopK {
@@ -382,13 +385,27 @@ impl TopK {
             k,
             scores: vec![f32::NEG_INFINITY; k],
             indices: vec![-1; k],
+            tie_keys: vec![i64::MAX; k],
+            tie_key_by_index: None,
             filled: 0,
             worst_pos: 0,
             worst_val: f32::INFINITY,
-            worst_idx: i64::MAX,
+            worst_tie_key: i64::MAX,
         }
     }
 
+    /// Construct a top-k collector whose emitted indices are local scan
+    /// positions but whose score ties are broken by caller-supplied keys.
+    ///
+    /// This is used by subset scans: SIMD kernels still emit local candidate
+    /// positions into the gathered scratch buffer, while ties must follow the
+    /// public global row-id policy.
+    pub(crate) fn new_with_tie_keys(k: usize, tie_key_by_index: &[u32]) -> Self {
+        let mut top = Self::new(k);
+        top.tie_key_by_index = Some(tie_key_by_index.iter().map(|&id| i64::from(id)).collect());
+        top
+    }
+
     #[inline]
     pub(crate) fn maybe_insert(&mut self, score: f32, idx: usize) {
         // Convert the doc_id to its i64 storage form once, up front. doc_ids
@@ -401,52 +418,59 @@ impl TopK {
         // stays clippy-clean on 32-bit, where `idx <= i64::MAX as usize` would
         // be an always-true `absurd_extreme_comparison`.
         let id = i64::try_from(idx).expect("ordvec: doc_id exceeds i64::MAX");
+        let tie_key = self
+            .tie_key_by_index
+            .as_ref()
+            .map(|keys| keys[idx])
+            .unwrap_or(id);
         if self.filled < self.k {
             self.scores[self.filled] = score;
             self.indices[self.filled] = id;
+            self.tie_keys[self.filled] = tie_key;
             self.filled += 1;
             if self.filled == self.k {
                 self.recompute_worst();
             }
         } else {
-            // Replace the worst kept entry iff the incoming `(score, id)` is
-            // strictly better to keep under the `(score desc, doc_id asc)`
-            // order: a higher score, or an equal score with a lower doc_id.
-            // doc_ids are unique per scan, so this is a total order — the
-            // greedy eviction keeps exactly the top-k set under the composite
-            // key.
-            let better = score > self.worst_val || (score == self.worst_val && id < self.worst_idx);
+            // Replace the worst kept entry iff the incoming `(score, tie_key)`
+            // is strictly better to keep under the `(score desc, tie_key asc)`
+            // order: a higher score, or an equal score with a lower row key.
+            // Full-index scans use `doc_id` as the tie key. Subset scans use
+            // global row IDs while still emitting local scratch-buffer indices.
+            let better =
+                score > self.worst_val || (score == self.worst_val && tie_key < self.worst_tie_key);
             if better {
                 self.scores[self.worst_pos] = score;
                 self.indices[self.worst_pos] = id;
+                self.tie_keys[self.worst_pos] = tie_key;
                 self.recompute_worst();
             }
         }
     }
 
-    /// Locate the worst kept entry under `(score asc, doc_id desc)`:
-    /// lowest score, and among equal scores the highest doc_id. That
-    /// is the entry a strictly-better incoming candidate evicts.
+    /// Locate the worst kept entry under `(score asc, tie_key desc)`:
+    /// lowest score, and among equal scores the highest tie key. That is the
+    /// entry a strictly-better incoming candidate evicts.
     fn recompute_worst(&mut self) {
         let mut wv = f32::INFINITY;
-        let mut wi = i64::MIN;
+        let mut wt = i64::MIN;
         let mut wp = 0;
         for i in 0..self.filled {
             let s = self.scores[i];
-            let id = self.indices[i];
-            if s < wv || (s == wv && id > wi) {
+            let tie_key = self.tie_keys[i];
+            if s < wv || (s == wv && tie_key > wt) {
                 wv = s;
-                wi = id;
+                wt = tie_key;
                 wp = i;
             }
         }
         self.worst_val = wv;
-        self.worst_idx = wi;
+        self.worst_tie_key = wt;
         self.worst_pos = wp;
     }
 
     /// Drain into `out_scores` / `out_indices` sorted by the composite
-    /// key `(score desc, doc_id asc)`. `out_scores.len()` is the
+    /// key `(score desc, tie_key asc)`. `out_scores.len()` is the
     /// user-requested `k`; positions beyond `self.filled` are left as
     /// sentinels.
     pub(crate) fn finalize_into(&self, out_scores: &mut [f32], out_indices: &mut [i64]) {
@@ -457,26 +481,32 @@ impl TopK {
         for i in out_indices.iter_mut() {
             *i = -1;
         }
-        let mut pairs: Vec<(f32, i64)> = self
+        let mut pairs: Vec<(f32, i64, i64, usize)> = self
             .scores
             .iter()
             .zip(self.indices.iter())
+            .zip(self.tie_keys.iter())
+            .enumerate()
             .take(self.filled)
-            .map(|(&s, &i)| (s, i))
+            .map(|(slot, ((&s, &i), &tie_key))| (s, i, tie_key, slot))
             .collect();
-        // Composite key: score descending, then doc_id ascending. The
-        // doc_id tie-break makes the final order deterministic when
-        // scores are equal.
+        // Composite key: score descending, then tie key ascending. The kept
+        // slot is only a final deterministic tie-break when duplicate
+        // candidate entries are otherwise indistinguishable. For full-index
+        // scans the tie key is the doc_id; for subset scans it is the global
+        // row id associated with the emitted local index.
         pairs.sort_unstable_by(|a, b| {
             // `total_cmp` is a true total order (IEEE-754 `totalOrder`), so the
             // sort stays well-defined even if a non-finite score ever slipped
             // past the finite-input guards — `partial_cmp(..).unwrap_or(Equal)`
             // is not a total order and can mis-sort around NaN. For the finite
-            // scores we actually have, the two agree. doc_id ascending breaks
-            // score ties (unchanged).
-            b.0.total_cmp(&a.0).then_with(|| a.1.cmp(&b.1))
+            // scores we actually have, the two agree. The ascending tie key
+            // makes score ties deterministic.
+            b.0.total_cmp(&a.0)
+                .then_with(|| a.2.cmp(&b.2))
+                .then_with(|| a.3.cmp(&b.3))
         });
-        for (slot, (s, i)) in pairs.into_iter().enumerate() {
+        for (slot, (s, i, _, _)) in pairs.into_iter().enumerate() {
             if slot >= out_scores.len() {
                 break;
             }
@@ -533,6 +563,21 @@ mod tests {
         assert!(scores.is_empty() && indices.is_empty());
     }
 
+    #[test]
+    fn topk_duplicate_candidate_ties_have_total_final_order() {
+        let mut top = TopK::new_with_tie_keys(2, &[7, 7, 7]);
+        top.maybe_insert(0.0, 0);
+        top.maybe_insert(0.0, 1);
+        top.maybe_insert(0.0, 2);
+
+        let mut scores = [f32::NEG_INFINITY; 2];
+        let mut indices = [-1; 2];
+        top.finalize_into(&mut scores, &mut indices);
+
+        assert_eq!(scores, [0.0, 0.0]);
+        assert_eq!(indices, [0, 1]);
+    }
+
     #[test]
     fn checked_new_len_accepts_up_to_max() {
         use crate::rank_io::MAX_VECTORS;
diff --git a/tests/determinism_contract.rs b/tests/determinism_contract.rs
new file mode 100644
index 00000000..cddffd10
--- /dev/null
+++ b/tests/determinism_contract.rs
@@ -0,0 +1,142 @@
+use ordvec::{search_asymmetric_byte_lut, Bitmap, Rank, RankQuant, SignBitmap};
+
+fn repeated_docs(n: usize, dim: usize, value: f32) -> Vec<f32> {
+    vec![value; n * dim]
+}
+
+fn assert_ids(actual: &[i64], expected: &[i64]) {
+    assert_eq!(actual, expected, "ids {actual:?} != expected {expected:?}");
+}
+
+fn assert_u32_ids(actual: &[u32], expected: &[u32]) {
+    assert_eq!(actual, expected, "ids {actual:?} != expected {expected:?}");
+}
+
+#[test]
+fn full_search_ties_return_lowest_row_ids() {
+    const DIM: usize = 64;
+    const N: usize = 8;
+    let docs = repeated_docs(N, DIM, 1.0);
+    let query = vec![1.0; DIM];
+    let zero_query = vec![0.0; DIM];
+
+    let mut rank = Rank::new(DIM);
+    rank.add(&docs);
+    assert_ids(rank.search(&query, 4).indices_for_query(0), &[0, 1, 2, 3]);
+    let rank_asym = rank.search_asymmetric(&zero_query, 4);
+    assert_ids(rank_asym.indices_for_query(0), &[0, 1, 2, 3]);
+    assert!(rank_asym.scores_for_query(0).iter().all(|&s| s == 0.0));
+
+    let mut rankquant = RankQuant::new(DIM, 2);
+    rankquant.add(&docs);
+    assert_ids(
+        rankquant.search(&query, 4).indices_for_query(0),
+        &[0, 1, 2, 3],
+    );
+    let rq_asym = rankquant.search_asymmetric(&zero_query, 4);
+    assert_ids(rq_asym.indices_for_query(0), &[0, 1, 2, 3]);
+    assert!(rq_asym.scores_for_query(0).iter().all(|&s| s == 0.0));
+
+    let mut bitmap = Bitmap::new(DIM, DIM / 4);
+    bitmap.add(&docs);
+    let bitmap_hits = bitmap.search(&query, 4);
+    assert_ids(bitmap_hits.indices_for_query(0), &[0, 1, 2, 3]);
+    let bitmap_score = bitmap_hits.scores_for_query(0)[0];
+    assert!(bitmap_hits
+        .scores_for_query(0)
+        .iter()
+        .all(|&s| s == bitmap_score));
+}
+
+#[test]
+fn rankquant_dispatch_matches_scalar_reference_on_ordered_ties() {
+    for &dim in &[20usize, 64] {
+        let docs = repeated_docs(8, dim, 1.0);
+        let query = vec![0.0; dim];
+        let mut index = RankQuant::new(dim, 2);
+        index.add(&docs);
+
+        let production = index.search_asymmetric(&query, 6);
+        let scalar = search_asymmetric_byte_lut(&index, &query, 6);
+
+        assert_ids(production.indices_for_query(0), &[0, 1, 2, 3, 4, 5]);
+        assert_eq!(production.indices, scalar.indices, "dim={dim}");
+        assert_eq!(production.scores, scalar.scores, "dim={dim}");
+    }
+}
+
+#[test]
+fn rankquant_subset_ties_use_global_row_ids() {
+    const DIM: usize = 64;
+    let docs = repeated_docs(12, DIM, 1.0);
+    let query = vec![0.0; DIM];
+    let mut index = RankQuant::new(DIM, 2);
+    index.add(&docs);
+
+    let (scores, ids) = index.search_asymmetric_subset(&query, &[9, 3, 7, 1], 2);
+    assert_eq!(scores, vec![0.0, 0.0]);
+    assert_ids(&ids, &[1, 3]);
+
+    let (duplicate_scores, duplicate_ids) = index.search_asymmetric_subset(&query, &[7, 8, 7], 2);
+    assert_eq!(duplicate_scores, vec![0.0, 0.0]);
+    assert_ids(&duplicate_ids, &[7, 7]);
+}
+
+#[test]
+fn candidate_prefilters_preserve_order_across_single_and_batched_paths() {
+    const DIM: usize = 64;
+    const N: usize = 10;
+    let docs = repeated_docs(N, DIM, 1.0);
+    let query = vec![1.0; DIM];
+    let queries = [query.clone(), query.clone()].concat();
+
+    let mut bitmap = Bitmap::new(DIM, DIM / 4);
+    bitmap.add(&docs);
+    let bitmap_expected = vec![0, 1, 2, 3, 4];
+    assert_u32_ids(&bitmap.top_m_candidates(&query, 5), &bitmap_expected);
+    for row in bitmap.top_m_candidates_batched(&queries, 5) {
+        assert_u32_ids(&row, &bitmap_expected);
+    }
+
+    let mut sign = SignBitmap::new(DIM);
+    sign.add(&docs);
+    let sign_expected = vec![0, 1, 2, 3, 4];
+    assert_u32_ids(&sign.top_m_candidates(&query, 5), &sign_expected);
+    for row in sign.top_m_candidates_batched(&queries, 5) {
+        assert_u32_ids(&row, &sign_expected);
+    }
+}
+
+#[test]
+fn empty_and_zero_k_result_shapes_are_empty() {
+    const DIM: usize = 64;
+    let query = vec![1.0; DIM];
+
+    let rank = Rank::new(DIM);
+    let rank_empty = rank.search(&query, 10);
+    assert_eq!(rank_empty.k, 0);
+    assert!(rank_empty.scores.is_empty());
+    assert!(rank_empty.indices.is_empty());
+
+    let rankquant = RankQuant::new(DIM, 2);
+    let rq_empty = rankquant.search_asymmetric(&query, 10);
+    assert_eq!(rq_empty.k, 0);
+    assert!(rq_empty.scores.is_empty());
+    assert!(rq_empty.indices.is_empty());
+
+    let bitmap = Bitmap::new(DIM, DIM / 4);
+    let bitmap_empty = bitmap.search(&query, 10);
+    assert_eq!(bitmap_empty.k, 0);
+    assert!(bitmap_empty.scores.is_empty());
+    assert!(bitmap_empty.indices.is_empty());
+
+    let sign = SignBitmap::new(DIM);
+    assert!(sign.top_m_candidates(&query, 10).is_empty());
+
+    let mut nonempty = RankQuant::new(DIM, 2);
+    nonempty.add(&repeated_docs(2, DIM, 1.0));
+    let zero_k = nonempty.search_asymmetric(&query, 0);
+    assert_eq!(zero_k.k, 0);
+    assert!(zero_k.scores.is_empty());
+    assert!(zero_k.indices.is_empty());
+}
diff --git a/tests/index/quant.rs b/tests/index/quant.rs
index bf99a50e..fc8e5450 100644
--- a/tests/index/quant.rs
+++ b/tests/index/quant.rs
@@ -164,8 +164,9 @@ fn rankquant_asymmetric_matches_reference(bits: u8) {
         );
     }
 
-    // And the top-10 set must match (we allow tied scores to permute
-    // within ties — same set, possibly different order).
+    // This random reference check uses set equality to avoid overfitting a
+    // near-tolerance boundary. Exact score-tie ordering is pinned by
+    // tests/determinism_contract.rs.
     let mut ref_sorted: Vec<(usize, f32)> = ref_scores
         .iter()
         .enumerate()
diff --git a/tests/redteam_beta.rs b/tests/redteam_beta.rs
index a884d041..12d0e663 100644
--- a/tests/redteam_beta.rs
+++ b/tests/redteam_beta.rs
@@ -87,7 +87,9 @@ fn assert_asym_matches_byte_lut(dim: usize, bits: u8, seed: u64) {
     let prod_idx = prod.indices_for_query(0);
     let ref_idx = reference.indices_for_query(0);
 
-    // Top-k *set* must match (ties may permute within equal scores).
+    // This dispatch-grid red-team check uses set equality because random
+    // near-ties can sit inside the scalar/SIMD tolerance. Exact score-tie
+    // ordering is pinned by tests/determinism_contract.rs.
     let prod_set: std::collections::HashSet<i64> = prod_idx.iter().copied().collect();
     let ref_set: std::collections::HashSet<i64> = ref_idx.iter().copied().collect();
     assert_eq!(