Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,7 @@ The runtime dependency floor is `numpy>=2.2`.
[`docs/ALTERNATIVES_CONSIDERED.md`](https://github.com/Fieldnote-Echo/ordvec/blob/main/docs/ALTERNATIVES_CONSIDERED.md)
- **Index-file trust model:**
[`docs/INDEX_PROVENANCE.md`](https://github.com/Fieldnote-Echo/ordvec/blob/main/docs/INDEX_PROVENANCE.md),
[`docs/determinism.md`](https://github.com/Fieldnote-Echo/ordvec/blob/main/docs/determinism.md),
[`THREAT_MODEL.md`](https://github.com/Fieldnote-Echo/ordvec/blob/main/THREAT_MODEL.md)
- **Repo-local manifest verifier, C ABI, and Go wrapper:**
available from the full GitHub checkout. These sidecars are not part of the
Expand Down
27 changes: 9 additions & 18 deletions docs/FOLLOWUP_BODY_KERNEL_TIE_BREAK.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,11 @@
# Follow-up: deterministic tie-breaking for body bitmap candidate selection
# Resolved: deterministic tie-breaking for body bitmap candidate selection

`Bitmap::top_m_candidates` and `top_m_candidates_batched`
(in `src/bitmap.rs`) currently partition on
bitmap overlap score alone. Boundary ties are not rare — overlap
scores are small integers (`0..n_top`, e.g. `0..256`), so multiple
docs frequently share the cutoff score, and `select_nth_unstable_by`
may then choose different equal-scored docs at the boundary across
runs or dispatch paths.
`Bitmap::top_m_candidates` and `top_m_candidates_batched` now partition and
sort by the composite key `(score desc, doc_id asc)`. Boundary ties are not
rare because overlap scores are small integers (`0..n_top`, e.g. `0..256`), so
the candidate set at the cutoff must be fully determined by score and row ID.

**Fix**: add composite-key ordering `(score desc, doc_id asc)` to
both the partition predicate (`select_nth_unstable_by`) and the
post-partition sort (`sort_unstable_by`), so the candidate set at any
given M is fully determined by `(score, doc_id)`.
The fixed comparator is:

```rust
let mut cmp = |&a: &u32, &b: &u32| {
Expand All @@ -23,9 +17,6 @@ idx.select_nth_unstable_by(m_eff - 1, &mut cmp);
idx[..m_eff].sort_unstable_by(&mut cmp);
```

**Keep it as a standalone change.** Rolling the determinism fix into
an unrelated benchmark or kernel change would muddy attribution — if
recall/latency numbers move, it should be clear whether the kernel
changed or only the tie-break at the candidate-set boundary changed.
The fix is behaviour-preserving on score ordering and only pins the
boundary, so it is safe to land on its own.
The broader search-output policy is now tracked in
[`determinism.md`](determinism.md). Future changes to golden row IDs, tie keys,
or duplicate-candidate behavior need an explicit compatibility note.
11 changes: 7 additions & 4 deletions docs/RANK_MODES.md
Original file line number Diff line number Diff line change
Expand Up @@ -328,10 +328,10 @@ facts qualify this:
rank-mode README recommends, and where the structural prior pays
off.
- **The asymmetric AVX-512 kernel is an exact packed scan, not an ANN
approximation.** It returns identical top-k to the scalar RankQuant
scorer and agrees within 1e-4 on scores (verified by
`rankquant_asymmetric_matches_reference_b{1,2,4}` in
`tests/index/quant.rs`).
approximation.** It is checked against the scalar RankQuant scorer with
score tolerances and deterministic golden tie fixtures (see
[`determinism.md`](determinism.md)); the random reference tests avoid
overfitting top-k order at near-tolerance boundaries.

The byte-LUT scorer remains in the codebase as a labelled reference
path (`ordvec::search_asymmetric_byte_lut`,
Expand Down Expand Up @@ -435,6 +435,9 @@ single-pass b=2 fast path; it supports `add`/`search` but not
bilinear bucket-overlap decomposition and is reachable only behind the
`experimental` feature.

Search result ordering, backend score-equivalence expectations, tie keys, and
empty-result shapes are specified in [`determinism.md`](determinism.md).

## Test coverage

`cargo test --lib` — unit tests for the primitives in
Expand Down
86 changes: 86 additions & 0 deletions docs/determinism.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Search Determinism Contract

This document states the compatibility contract for ordvec search output:
scores, ordering, tie handling, backend dispatch, and empty-result shape. It
covers the primitive retrieval surface only. It does not define distributed
merge order, replication, storage manifests, or deployment policy.

## Global Ordering Rule

For public top-k search results, ordvec orders hits by:

1. score descending;
2. row ID ascending when scores compare equal.

The row ID is the internal zero-based insertion row. Subset APIs receive row
IDs from the caller and return the same global row IDs. Duplicate candidate IDs
are scored as duplicate candidate entries and may produce duplicate hits.

`k` is clamped to the search space before result buffers are allocated. A
full-index search space is the number of indexed rows. A subset search space is
the candidate-list length. If the effective `k` is zero, or the search space is
empty, search returns an empty result shape rather than padded sentinel hits.

## Backend Scope

Backend selection must not change the documented ordering rule. Exact integer
popcount primitives are bit-exact across scalar, AVX-512, aarch64 NEON, and
wasm `simd128` implementations. Floating-point score equivalence uses an
absolute tolerance of `1e-4` and no relative tolerance (`rtol = 0`) unless a
row below explicitly states that the score is integer-exact. Some tests use
tighter tolerances for specific scalar helper comparisons, but `1e-4` is the
public cross-backend/architecture compatibility tolerance. Intentional changes
to that tolerance or to golden top-k output are compatibility-affecting and
must be called out in the PR and release notes.

Query-level parallelism may change scheduling, but each query is scored and
finalized independently. Batched APIs must match the corresponding single-query
API for the same query rows, modulo the primitive-specific tolerance stated
below. Floating-point comparison tolerances apply only to score equivalence;
the public hit order still follows the global ordering rule above.

## Primitive Contracts

| Surface | Score contract | Tie key | Backend contract |
| --- | --- | --- | --- |
| `Rank::search` | Normalized Spearman-style rank cosine; floating scores are tolerance-based with absolute tolerance `1e-4`, `rtol = 0`. | Global row ID ascending. | Fixed scalar arithmetic per row; query parallelism does not affect per-query output. |
| `Rank::search_asymmetric` | Float query against stored ranks; floating scores are tolerance-based with absolute tolerance `1e-4`, `rtol = 0`. | Global row ID ascending. | Fixed scalar arithmetic per row; query parallelism does not affect per-query output. |
| `RankQuant::search` | Symmetric bucketed-rank score; floating scores are tolerance-based with absolute tolerance `1e-4`, `rtol = 0`. | Global row ID ascending. | Scalar packed-byte LUT path; query parallelism does not affect per-query output. |
| `RankQuant::search_asymmetric` | Float query against stored buckets; floating scores are tolerance-based with absolute tolerance `1e-4`, `rtol = 0`. | Global row ID ascending. | AVX-512, AVX2, and scalar-LUT dispatch must agree with the scalar reference within the public tolerance and preserve top-k order for the golden fixtures. |
| `RankQuant::search_asymmetric_subset` | Same score as `RankQuant::search_asymmetric`, restricted to caller-supplied candidates; floating scores use the same `1e-4` absolute tolerance and `rtol = 0`. | Global row ID ascending, not candidate-list position. Duplicate candidate IDs remain duplicate entries. | Uses the same AVX-512, AVX2, or scalar dispatch as full asymmetric search over a gathered scratch buffer. |
| `Bitmap::search` | Exact `popcount(Q AND D)` represented as `f32`; score is integer-exact, not tolerance-based. | Global row ID ascending. | Popcount scores are integer-exact across scalar and SIMD implementations. |
| `Bitmap::top_m_candidates` | Exact `popcount(Q AND D)` candidate ordering; score key is integer-exact, not tolerance-based. | Global row ID ascending. | Single-query and batched candidate APIs must return the same ordered candidates. |
| `Bitmap::search_subset` | Exact subset `popcount(Q AND D)` represented as `f32`; score is integer-exact, not tolerance-based. | Global row ID ascending. Duplicate candidate IDs remain duplicate entries. | Subset score kernels must agree bit-exactly with scalar popcount. |
| `SignBitmap::top_m_candidates` | Lowest Hamming distance, equivalently highest sign agreement; score key is integer-exact, not tolerance-based. | Global row ID ascending. | Single-query and batched candidate APIs must return the same ordered candidates. |
| `SignBitmap::score_all` | Dense sign-agreement counts aligned by row ID; scores are `u32` integer-exact, not tolerance-based. | Not a top-k API. | Popcount scores are integer-exact across scalar and SIMD implementations. |

## FastScan

`RankQuantFastscan` is a hidden, optional b=2 pre-ranker. It is deterministic
for a fixed index, query, and backend dispatch, and its scalar and AVX-512
FastScan kernels operate on the same quantized LUT inputs. It is not
score-equivalent to exact `RankQuant::search_asymmetric`: the global 8-bit LUT
quantization is intentional and can change scores or boundary ordering. Callers
that need exact RankQuant scores should use `RankQuant::search_asymmetric` or
`RankQuant::search_asymmetric_subset`.

## Compatibility Notes

Intentional changes to any of these are compatibility-affecting:

- golden top-k row IDs;
- tie keys or duplicate-candidate behavior;
- empty-result or `k` clamping shape;
- scalar/SIMD score tolerance;
- whether an API is exact or approximate;
- whether a backend is covered by this contract.

Such changes need a compatibility note in the PR and release notes. Performance
changes that preserve the same scores, row ordering, tie keys, and empty-result
shape are not search-contract breaks.

Compatibility note for this contract PR: `RankQuant::search_asymmetric_subset`
now breaks equal-score ties by global row ID instead of local candidate-list
position. That matches full-index search, C ABI hit ordering, Python binding
ordering, and the candidate prefilters. Duplicate candidate IDs are still
scored as duplicate entries and may still produce duplicate hits.
8 changes: 3 additions & 5 deletions ordvec-ffi/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -833,11 +833,9 @@ pub unsafe extern "C" fn ordvec_index_search(
(LoadedIndex::RankQuant(index), Some(rows)) => {
// Ask the core for every candidate score, then normalize by the
// ABI's global row-id tie policy before truncating. The core
// subset helper breaks ties by candidate position before
// mapping back to global row IDs, so requesting only k could
// drop a boundary-tied lower row ID from an unsorted candidate
// list. Materializing all candidates preserves the ABI ordering
// contract until core exposes a global-row top-k scorer.
// subset helper uses global row IDs as score-tie keys; keeping
// the ABI normalization centralized preserves duplicate and
// boundary handling for caller-supplied candidate lists.
let (scores, indices) =
index.search_asymmetric_subset(validation.query, rows, rows.len());
normalize_global_order(scores, indices, validation.required_hits)
Expand Down
19 changes: 16 additions & 3 deletions ordvec-python/tests/test_rank_quant.py
Original file line number Diff line number Diff line change
Expand Up @@ -299,8 +299,7 @@ def test_search_asymmetric_subset_matches_full_when_candidates_eq_all():
# When the candidate set is every doc, the subset path must agree
# with full `search_asymmetric` on the top-k. Both use the
# asymmetric kernel; the subset path just iterates the candidate
# list instead of all N docs. (Allow set equality — ties may
# permute within the same scoring tier.)
# list instead of all N docs.
vectors = unit_vectors(40, 128, seed=0)
idx = RankQuant(dim=128, bits=2)
idx.add(vectors)
Expand All @@ -310,7 +309,21 @@ def test_search_asymmetric_subset_matches_full_when_candidates_eq_all():
_, subset_ids = idx.search_asymmetric_subset(query, candidates, k=10)

_, full_ids = idx.search_asymmetric(query[None, :], k=10)
assert set(int(i) for i in subset_ids) == set(int(i) for i in full_ids[0])
np.testing.assert_array_equal(subset_ids, full_ids[0])


def test_search_asymmetric_subset_ties_use_global_row_ids():
vectors = np.ones((12, 64), dtype=np.float32)
idx = RankQuant(dim=64, bits=2)
idx.add(vectors)

candidates = np.array([9, 3, 7, 1], dtype=np.uint32)
scores, ids = idx.search_asymmetric_subset(
np.zeros(64, dtype=np.float32), candidates, k=2
)

np.testing.assert_array_equal(ids, np.array([1, 3], dtype=np.int64))
np.testing.assert_array_equal(scores, np.array([0.0, 0.0], dtype=np.float32))


def test_search_asymmetric_subset_k_caps_at_candidate_count():
Expand Down
12 changes: 3 additions & 9 deletions ordvec-python/tests/test_sign_bitmap.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,22 +80,16 @@ def test_top_m_candidates_batched_shape():


def test_batched_matches_scalar_for_each_row():
# The batched AVX-512 VPOPCNTDQ kernel must agree with the scalar
# path on the same query at top-1; we check the leading match for a
# small batch (boundary ties at deeper ranks may diverge — see the
# body-kernel tie-break follow-up, separate from sign-bitmap).
# The batched kernel must agree with the single-query path for the full
# ordered top-m row, including boundary ties.
idx = SignBitmap(dim=128)
idx.add(unit_vectors(60, 128, seed=0))
queries = unit_vectors(6, 128, seed=99)

batched = idx.top_m_candidates_batched(queries, m=5)
for i in range(6):
scalar = idx.top_m_candidates(queries[i], m=5)
# Top-1 must agree exactly across both code paths.
assert int(batched[i, 0]) == int(scalar[0]), (
f"batched vs scalar disagree on top-1 for query {i}: "
f"batched={batched[i, 0]} scalar={scalar[0]}"
)
np.testing.assert_array_equal(batched[i], scalar)


def test_empty_batch_returns_consistent_column_count():
Expand Down
6 changes: 4 additions & 2 deletions src/quant.rs
Original file line number Diff line number Diff line change
Expand Up @@ -536,7 +536,9 @@ impl RankQuant {
/// subset (e.g., the top-M from a bitmap probe). Returns
/// `(scores, indices)`: the top-`k` scores and their corresponding
/// **global** doc IDs (the local candidate positions are mapped back
/// to global IDs before returning).
/// to global IDs before returning). Results are ordered by score
/// descending, then global row ID ascending, matching the full-index
/// search tie policy even when `candidates` is unsorted.
///
/// Uses the same AVX-512 → AVX2 → scalar dispatch as
/// [`Self::search_asymmetric`] and the same centre-drop math, just
Expand Down Expand Up @@ -606,7 +608,7 @@ impl RankQuant {
// never reaches a kernel that would drop its tail chunk.
#[cfg_attr(not(target_arch = "x86_64"), allow(unused_variables))]
let simd_tier = select_simd_tier(dim, bits);
let mut top = TopK::new(k_eff);
let mut top = TopK::new_with_tie_keys(k_eff, candidates);
#[cfg_attr(not(target_arch = "x86_64"), allow(unused_mut))]
let mut centre_drop_used = false;
#[cfg(target_arch = "x86_64")]
Expand Down
Loading
Loading