Skip to content

[codex] Pin RankQuant subset duplicate contract#167

Merged
Navi Bot (project-navi-bot) merged 3 commits into
mainfrom
codex/rankquant-determinism-followups
Jun 9, 2026
Merged

[codex] Pin RankQuant subset duplicate contract#167
Navi Bot (project-navi-bot) merged 3 commits into
mainfrom
codex/rankquant-determinism-followups

Conversation

@Fieldnote-Echo

@Fieldnote-Echo Fieldnote-Echo commented Jun 4, 2026

Copy link
Copy Markdown
Member

Summary

  • assert full score-descending/global-row-id tie ordering in RankQuant fuzz targets
  • document duplicate subset-candidate behavior at the Rust, Python, C ABI, and Go wrapper call sites
  • add duplicate-candidate regression tests for Python, C ABI, and Go RankQuant subset search
  • enforce RankQuant asymmetric public ordering after SIMD center-offset reapplication so exposed score ties are sorted by row ID

Review loop

  • Scouted fuzz ordering gaps and public duplicate-candidate contract surfaces before implementation.
  • Ran adversarial review after implementation.
  • Remediated review findings by splitting score monotonicity from equal-score tie assertions and adding Rust-side ABI field docs to back the generated C header comment.
  • Remediated CI/Gemini findings by tightening full-index fuzz ties to strict row-ID increase, regenerating the C header with CI-pinned cbindgen 0.29.2, and fixing the exposed-score ordering hole found by the two-stage fuzz target.

Validation

  • cargo test -p ordvec --test determinism_contract
  • cargo check --manifest-path fuzz/Cargo.toml --all-targets
  • ASAN_OPTIONS=detect_leaks=0 LSAN_OPTIONS=detect_leaks=0 cargo +nightly-2025-08-15 fuzz run signbitmap_rankquant_twostage -- -max_total_time=15 -rss_limit_mb=4096
  • /tmp/ordvec-cbindgen-0.29.2/bin/cbindgen ordvec-ffi --config ordvec-ffi/cbindgen.toml --output ordvec-ffi/include/ordvec.h --verify
  • cargo test -p ordvec-ffi full_and_subset_search_rankquant
  • cargo test -p ordvec-ffi
  • cargo build -p ordvec-ffi --release
  • GOCACHE=/tmp/ordvec-gocache GOMODCACHE=/tmp/ordvec-gomodcache go test -count=1 ./...
  • VIRTUAL_ENV=/home/ndspence/GitHub/ordvec/ordvec-python/.venv PATH=/home/ndspence/GitHub/ordvec/ordvec-python/.venv/bin:$PATH maturin develop -m ordvec-python/Cargo.toml
  • VIRTUAL_ENV=/home/ndspence/GitHub/ordvec/ordvec-python/.venv PATH=/home/ndspence/GitHub/ordvec/ordvec-python/.venv/bin:$PATH python -m pytest ordvec-python/tests/test_rank_quant.py
  • cargo clippy -p ordvec --all-targets -- -D warnings
  • cargo clippy -p ordvec-ffi --all-targets -- -D warnings
  • cargo fmt --all --check
  • git diff --check

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request documents and tests the behavior of subset candidate searches when duplicate or unsorted global row IDs are provided, ensuring they are scored independently and can produce duplicate hits. The feedback suggests tightening an assertion in the full index search fuzz target to require strictly ascending document IDs on equal scores, since full index search document IDs are unique.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread fuzz/fuzz_targets/search_rankquant.rs
@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Fieldnote-Echo Nelson Spence (Fieldnote-Echo) force-pushed the codex/rankquant-determinism-followups branch from 4f035a0 to e06b4c9 Compare June 4, 2026 01:54
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
@qodo-code-review

Copy link
Copy Markdown

Review Summary by Qodo

Pin RankQuant subset duplicate contract with deterministic tie ordering

🐞 Bug fix ✨ Enhancement

Grey Divider

Walkthroughs

Description
• Enforce deterministic RankQuant subset search ordering by score descending, then row ID ascending
• Add center-offset reapplication sort to handle score ties after SIMD offset adjustment
• Document duplicate candidate behavior across Rust, Python, C ABI, and Go wrapper interfaces
• Add regression tests for duplicate candidate handling in Python, C ABI, and Go bindings
Diagram
flowchart LR
  A["RankQuant subset search"] --> B["Score candidates"]
  B --> C["Apply center offset"]
  C --> D["Re-sort by score desc, row ID asc"]
  D --> E["Return ordered results"]
  F["Duplicate candidates"] --> A
  F --> G["Each scored independently"]
  G --> E

Loading

Grey Divider

File Changes

1. src/quant.rs 🐞 Bug fix +33/-1

Add deterministic tie-breaking sort after center offset

src/quant.rs


2. fuzz/fuzz_targets/search_rankquant.rs 🧪 Tests +6/-1

Enhance assertions to verify score-desc and row-ID-asc ordering

fuzz/fuzz_targets/search_rankquant.rs


3. fuzz/fuzz_targets/signbitmap_rankquant_twostage.rs 🧪 Tests +19/-1

Add dedicated ordering validation function for full contract

fuzz/fuzz_targets/signbitmap_rankquant_twostage.rs


View more (7)
4. ordvec-ffi/src/lib.rs 📝 Documentation +32/-0

Document duplicate candidate behavior and add C ABI test

ordvec-ffi/src/lib.rs


5. ordvec-ffi/include/ordvec.h 📝 Documentation +4/-0

Document duplicate candidate semantics in C header

ordvec-ffi/include/ordvec.h


6. ordvec-python/src/lib.rs 📝 Documentation +5/-0

Document duplicate candidate handling in Python docstring

ordvec-python/src/lib.rs


7. ordvec-python/tests/test_rank_quant.py 🧪 Tests +14/-0

Add Python regression test for duplicate candidates

ordvec-python/tests/test_rank_quant.py


8. ordvec-go/ordvec.go 📝 Documentation +3/-0

Document duplicate candidate semantics in Go wrapper

ordvec-go/ordvec.go


9. ordvec-go/ordvec_test.go 🧪 Tests +22/-0

Add Go regression test for duplicate candidate hits

ordvec-go/ordvec_test.go


10. ordvec-go/README.md 📝 Documentation +5/-0

Document duplicate candidate behavior in Go README

ordvec-go/README.md


Grey Divider

Qodo Logo

@qodo-code-review

qodo-code-review Bot commented Jun 4, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX issues (0) 🔗 Cross-repo conflicts (0)

Grey Divider


Remediation recommended

1. Extra post-offset sort ✓ Resolved 🐞 Bug ➹ Performance
Description
When centre_drop_used is true, RankQuant::search_asymmetric and search_asymmetric_subset now
allocate and sort an additional time via sort_by_exposed_score_then_row_id, even though
TopK::finalize_into already allocated+sorted the top-k slice. This adds per-query heap churn and
O(k log k) work on the SIMD path used by public Python/C ABI/Go entrypoints.
Code

src/quant.rs[R695-716]

+// Re-applying a query-constant center offset after TopK finalization can round
+// distinct internal scores onto the same exposed score; enforce the public
+// `(score desc, row ID asc)` order on the finalized top-k slice.
+fn sort_by_exposed_score_then_row_id(scores: &mut [f32], indices: &mut [i64]) {
+    debug_assert_eq!(scores.len(), indices.len());
+    let mut pairs: Vec<(usize, f32, i64)> = scores
+        .iter()
+        .copied()
+        .zip(indices.iter().copied())
+        .enumerate()
+        .map(|(slot, (score, row_id))| (slot, score, row_id))
+        .collect();
+    pairs.sort_unstable_by(|a, b| {
+        b.1.total_cmp(&a.1)
+            .then_with(|| a.2.cmp(&b.2))
+            .then_with(|| a.0.cmp(&b.0))
+    });
+    for (slot, (_, score, row_id)) in pairs.into_iter().enumerate() {
+        scores[slot] = score;
+        indices[slot] = row_id;
+    }
+}
Evidence
The SIMD centre-drop path calls TopK::finalize_into(...) and then (new in this PR) immediately
performs another allocation+sort over the finalized top-k slice to enforce the public ordering after
offset reapplication. TopK::finalize_into already allocates and sorts pairs, so the new helper
duplicates that work and is on the public call paths used by Python and the C ABI.

src/quant.rs[351-420]
src/quant.rs[663-691]
src/quant.rs[695-716]
src/util.rs[472-516]
ordvec-python/src/lib.rs[491-513]
ordvec-ffi/src/lib.rs[830-844]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`RankQuant` currently performs two sorts on the SIMD centre-drop path:
1) `TopK::finalize_into(...)` allocates+sorts the top-k.
2) After adding the center offset, `sort_by_exposed_score_then_row_id(...)` allocates+sorts again.

The second sort is correctness-motivated (offset reapplication can round distinct internal scores onto identical exposed scores), but doing it unconditionally adds avoidable overhead.

## Issue Context
This affects user-facing APIs because Python and the C ABI forward into `RankQuant::search_asymmetric` / `search_asymmetric_subset`.

## Fix Focus Areas
- src/quant.rs[400-420]
- src/quant.rs[663-691]
- src/quant.rs[695-716]
- src/util.rs[472-516]

## Suggested fix approach
Implement one of the following optimizations (keep correctness contract):
- **Only sort when needed**: after applying the offset, do a linear pass to check whether the slice already satisfies `(score desc, row_id asc for equal exposed scores)` (use `total_cmp` equality for the score tie check). Call `sort_by_exposed_score_then_row_id` only if a violation is detected.
- **Sort once instead of twice**: extend `TopK::finalize_into` (or add an alternative finalize) to accept a `score_adjustment` (e.g., add `offset` to each finalized score before sorting), so the final order is produced in a single allocation+sort.
- **Reduce allocations**: use a stack-backed small buffer (e.g., `SmallVec`) for common `k` values, or reuse a preallocated buffer per call if feasible.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9019815077

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

Comment thread src/quant.rs Outdated
@Fieldnote-Echo Nelson Spence (Fieldnote-Echo) force-pushed the codex/rankquant-determinism-followups branch from 925a5d6 to 9db3b36 Compare June 4, 2026 02:50
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
@Fieldnote-Echo Nelson Spence (Fieldnote-Echo) force-pushed the codex/rankquant-determinism-followups branch from 9db3b36 to bc6ef84 Compare June 4, 2026 02:59
@project-navi-bot Navi Bot (project-navi-bot) merged commit cc89ae9 into main Jun 9, 2026
38 checks passed
@project-navi-bot Navi Bot (project-navi-bot) deleted the codex/rankquant-determinism-followups branch June 9, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants