Skip to content

feat: capability-gated b=8 RankQuant (research/evidence width) (#221)#228

Merged
Navi Bot (project-navi-bot) merged 11 commits into
mainfrom
feat/rankquant-b8
Jun 15, 2026
Merged

feat: capability-gated b=8 RankQuant (research/evidence width) (#221)#228
Navi Bot (project-navi-bot) merged 11 commits into
mainfrom
feat/rankquant-b8

Conversation

@Fieldnote-Echo

Copy link
Copy Markdown
Member

Closes #221.

What

Adds b=8 as a capability-gated RankQuant width — a stable, documented evidence/refinement surface (asymmetric quant after repair flows, edge-case rerank healing), not experimental-unstable, and not a broad retrieval-quant method.

Capability matrix (the constant-composition rule dim % 2^bits exists only for the symmetric analytical norm):

path b=8
bucket-code generation, pair-evidence/contingency, asymmetric scoring any dim
symmetric scoring + analytical norm only dim % 256 == 0

b=1/2/4 are unchanged — they stay the stable headline retrieval surface.

API (additive — no breaking changes)

  • RankQuantCapability { AsymmetricOnly, SymmetricAndAsymmetric } + capability() + symmetric_supported().
  • RankQuant::new(dim, 8) requires dim % 256 == 0 (full capability; fail-loud otherwise, directing to new_asymmetric). RankQuant::new_asymmetric(dim, 8) — any dim, AsymmetricOnly (auto-upgrades when 256-aligned).
  • search() on an AsymmetricOnly instance fails loud: "RankQuant b=8 symmetric scoring requires dim % 256 == 0; dim={dim} supports asymmetric/evidence APIs only." (documented; query symmetric_supported() first).
  • validate_params(dim, 8) = code-validity at any dim.

Kernel

b=8 asymmetric = a dim*256 float-LUT gather. AVX-512 vgatherdps kernel (runtime-dispatched, explicit tail handling), scalar LUT fallback. ~1.23× over scalar on an AVX-512 host — honest (gather/LUT-latency bound, not the ~10× of b=2/4 which touch no LUT); shipped because it reproducibly wins. Parity vs scalar within the crate's 1e-4 tolerance across dims 384/400/768/1024/1536.

Verified (locally, my own runs)

fmt / clippy -D warnings (default + experimental) / test (196 default + 206 experimental + no-default-features) / MSRV 1.89 — all green. b=1/2/4 behaviour and every prior test unchanged. Unsafe gather bounds proven by hand. Test matrix covers dim=384 (asym pass, symmetric rejects with exact message), 768/1024/1536 (full), b=4 dim=384 unchanged, capability reporting, brute-force asymmetric parity.

⚠️ Decisions for maintainer sign-off

  1. b=8 persistence is deferred (most important for your use case)RankQuant::write() returns io::Error(Unsupported) and writes no file for a b=8 index (the .tvrq loader admits {1,2,4} only; refusing avoids a silent broken round-trip). b=8 is in-memory only this phase. Your stated use case is "asymmetric quant storage" — if that means persisting b=8 to disk, say so and I'll add b=8 to the .tvrq format (trivial: 1 byte/coord) as a follow-up or in this PR.
  2. Symmetric-on-asym-only = fail-loud panic (not Result) — matches the crate's fail-loud guard style and avoids a breaking search signature; the new_asymmetric name + symmetric_supported() make it non-surprising. Confirm.
  3. Persistence/Python — Python binding keeps its {1,2,4} guard (b=8 is Rust-core-only; Python parity is the separate Add Python/CLI parity utilities for ordinal pair evidence #223).

Coupling

Once this lands, #226 (RankQuantSpec, #220) gets a small follow-up to accept b=8 at any dim + report symmetric_supported().

Targets 0.6 (may land in 0.5.0 — maintainer decides).

b=8 is an evidence/refinement-oriented RankQuant width (asymmetric quant after
repair flows, edge-case rerank healing) — a stable, documented surface, NOT
experimental-unstable. Capability matrix:
  - bucket-code generation / pair-evidence / asymmetric scoring: ANY dim
  - symmetric scoring + analytical norm: ONLY dim % 256 == 0

API (additive, no breaking changes to existing signatures):
- RankQuantCapability { AsymmetricOnly, SymmetricAndAsymmetric } + capability() +
  symmetric_supported().
- RankQuant::new(dim, 8) requires dim % 256 == 0 (full capability; fail-loud else,
  directing to new_asymmetric). RankQuant::new_asymmetric(dim, 8): any dim,
  AsymmetricOnly (auto-upgrades to full when 256-aligned). b=1/2/4 unchanged.
- search() on an AsymmetricOnly instance fails loud with the exact message
  'RankQuant b=8 symmetric scoring requires dim % 256 == 0; dim={dim} supports
  asymmetric/evidence APIs only.' (documented; check symmetric_supported() first).
- validate_params(dim, 8): code-validity any dim (no dim%256). Primitives widened
  to bits<=8 (mask in u16 to avoid 1u8<<8 overflow); b=8 packs 1 byte/coord.

Kernel: b=8 asymmetric uses an AVX-512 vgatherdps kernel (dim*256 LUT gather),
runtime-dispatched (avx512f + dim%16==0, explicit tail handling), scalar LUT
fallback; ~1.23x over scalar on this host (gather/LUT-latency bound — honest,
shipped). Parity vs scalar within 1e-4 across dims 384/400/768/1024/1536.

Persistence: b=8 write() returns io::Error(Unsupported) (no file) — the .tvrq
loader admits {1,2,4} only; b=8 is in-memory this phase (FLAGGED, see PR).

Verified: fmt/clippy(-D warnings, default+experimental)/test(196 default + 206
experimental + no-default-features)/MSRV 1.89 green; b=1/2/4 unchanged; unsafe
gather bounds proven. Closes #221.

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
@qodo-code-review

qodo-code-review Bot commented Jun 14, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Context used

Grey Divider


Action required

1. rankquant_eval_search contradicts b=8 gating ✓ Resolved 📎 Requirement gap ≡ Correctness
Description
The public docs state b=8 symmetric scoring is only supported when dim % 256 == 0, but
rankquant_eval_search now explicitly supports bits=8 at non-256-aligned dims via an empirical
norm. This creates a semantic mismatch in the crate’s b=8 capability matrix and can mislead
downstream users relying on consistent b=8 symmetric behavior.
Code

src/quant.rs[R44-48]

fn check_eval_bits(bits: u8) {
-    assert!((1..=7).contains(&bits), "bits must be in 1..=7");
+    // b=8 codes still fit a u8 (0..=255); the eval norm is computed empirically
+    // (not the analytical b=8 norm), so it is valid at any dim. b=9 is the first
+    // width whose codes overflow u8.
+    assert!((1..=8).contains(&bits), "bits must be in 1..=8");
Evidence
src/lib.rs documents b=8 symmetric scoring as only valid when dim % 256 == 0, but
check_eval_bits was widened to allow bits=8 specifically noting the eval norm is empirical and
valid at any dim, and a new test asserts rankquant_eval_search(..., dim=384, bits=8, ...) works.
This demonstrates a mismatch between the documented b=8 symmetric gating and the public eval-search
behavior.

Define and document b=8 RankQuant evidence surface and ensure no bit-width semantic mismatch
src/lib.rs[16-19]
src/quant.rs[44-48]
tests/index/quant_b8.rs[451-467]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`rankquant_eval_search` accepts `bits=8` at non-`256`-aligned dims (via an empirical norm), while the crate-level documentation states `b=8` symmetric scoring is only supported when `dim % 256 == 0`. This creates an internal/public semantic mismatch for b=8.

## Issue Context
- `src/lib.rs` documents b=8 symmetric scoring as `dim % 256 == 0` only.
- `check_eval_bits` and the new integration test explicitly assert eval-search supports `bits=8` at non-256-aligned dims.
- Downstream users (incl. OrdGraph migration) may treat `rankquant_eval_search` as a reference for symmetric scoring semantics.

## Fix Focus Areas
- src/quant.rs[44-48]
- tests/index/quant_b8.rs[451-467]
- src/lib.rs[16-19]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Docs omit b=8 semantics ✓ Resolved 📎 Requirement gap ⚙ Maintainability
Description
The PR adds and documents b=8 RankQuant support, but top-level docs still state RankQuant supports
only bits ∈ {1, 2, 4}. This violates the requirement to clearly document retrieval-supported vs
evidence-only widths and can mislead downstream users about b=8 availability and intended use.
Code

src/quant.rs[R3-19]

+//! Storage is `dim * bits / 8` bytes per document at `bits ∈ {1, 2, 4, 8}`
+//! (`b=8` is one byte per coordinate). Symmetric search uses a per-query,
+//! per-coord LUT; asymmetric search dispatches AVX-512 → AVX2 → scalar via
+//! the kernels in [`crate::quant_kernels`].
+//!
+//! `b=8` is an evidence/refinement-oriented width: it is supported for
+//! asymmetric scoring and code/projection generation at **any** dimension,
+//! but symmetric scoring uses the equal-bucket analytical norm and therefore
+//! requires `dim % 256 == 0`. For `b ∈ {1, 2, 4}` the existing retrieval
+//! modes remain the stable headline surface; `b=8` is an opt-in,
+//! explicitly-documented high-precision evidence/refinement surface
+//! (e.g. asymmetric quant storage after repair flows, edge-case rerank
+//! healing), not a broad retrieval-quant method. It is **not**
+//! unstable-experimental. See [`RankQuantCapability`] and
+//! [`RankQuant::new_asymmetric`]. Its asymmetric path is a per-coordinate
+//! gather against the `dim * 256` LUT: an AVX-512 `vgatherdps` kernel when
+//! available (`avx512f` + `dim % 16 == 0`), else the portable scalar LUT.
Evidence
Rule 11 requires documentation to clearly distinguish retrieval-supported widths from any
research/evidence-only widths and to align with the implementation. The PR updates src/quant.rs to
describe b=8 support and semantics, but README.md and docs/RANK_MODES.md still claim only
{1,2,4} are supported, creating an inconsistency.

Document bit-width semantics (retrieval-supported vs research/evidence-only)
src/quant.rs[3-19]
README.md[29-37]
docs/RANK_MODES.md[70-75]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
After introducing and documenting `b=8` in `src/quant.rs`, the repository documentation still describes RankQuant bit widths as `bits ∈ {1, 2, 4}` only. This creates conflicting guidance about which widths are supported and what `b=8` means (retrieval vs evidence-only surface).

## Issue Context
- `src/quant.rs` now documents `bits ∈ {1, 2, 4, 8}` and describes `b=8` as an evidence/refinement-oriented width with capability gating.
- `README.md` and `docs/RANK_MODES.md` still state RankQuant supports only `{1,2,4}`.

## Fix Focus Areas
- README.md[29-37]
- docs/RANK_MODES.md[70-75]
- src/quant.rs[3-19]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. AVX512 gather misgated ✓ Resolved 🐞 Bug ☼ Reliability
Description
scan_b8_asym dispatches to the AVX-512 gather kernel when only avx512f is detected, but the
kernel widens 16 u8 doc bytes to 16 i32 lanes; on CPUs lacking the required AVX-512 byte/word subset
this can crash with an illegal-instruction at runtime. The fix is to gate dispatch (and the kernel’s
#[target_feature]) on the full feature set actually used (at least avx512bw in addition to
avx512f).
Code

src/quant_kernels.rs[R575-586]

+        if is_x86_feature_detected!("avx512f") && dim.is_multiple_of(16) {
+            // SAFETY: `avx512f` is confirmed by the runtime detection above
+            // and `dim % 16 == 0` satisfies the kernel's lane invariant;
+            // `packed.len() == n * dim` and `lut.len() == dim * 256` hold by
+            // construction (b=8 packs one byte/coord; the LUT is built just
+            // above). The explicit block is required by
+            // `#![deny(unsafe_op_in_unsafe_fn)]`.
+            unsafe {
+                scan_b8_asym_avx512_gather(packed, n, dim, &lut, scale, top);
+            }
+            return;
+        }
Evidence
The dispatch currently checks only avx512f before calling the gather kernel, while the kernel
explicitly uses a byte-to-dword widening intrinsic; elsewhere in the codebase, similar byte-widening
AVX-512 code is gated on avx512bw, indicating that subset is expected to be checked.

src/quant_kernels.rs[555-616]
src/quant_kernels.rs[678-706]
src/fastscan.rs[440-450]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The new b=8 AVX-512 gather path is dispatched when only `avx512f` is detected, but the kernel performs byte-lane widening (`_mm512_cvtepu8_epi32`) which typically requires `avx512bw`. On x86_64 hosts that have AVX-512F but not AVX-512BW, this can lead to a runtime SIGILL when `search_asymmetric` / `search_asymmetric_subset` hits the gather path.

### Issue Context
The repo already treats AVX-512 subsets carefully elsewhere (e.g., FastScan gates on `avx512f` + `avx512bw` + `avx512dq`). The b=8 gather kernel should follow the same pattern: both the `#[target_feature(...)]` annotation and the runtime `is_x86_feature_detected!` checks must cover every required ISA subset.

### Fix Focus Areas
- src/quant_kernels.rs[564-589]
- src/quant_kernels.rs[618-707]
- src/fastscan.rs[440-450]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

4. b=8 norm mis-scaled ✓ Resolved 🐞 Bug ≡ Correctness
Description
For bits=8 at non-256-aligned dims, RankQuant::search_asymmetric (and
search_asymmetric_subset) normalizes scores using rankquant_norm(dim, 8), even though
rankquant_norm is documented as exact only when dim % 256 == 0. This makes b=8 asymmetric scores
for AsymmetricOnly instances systematically mis-normalized (potentially outside the intended score
scale/range).
Code

↗ src/quant.rs

            .for_each(|((q, out_scores), out_indices)| {
Evidence
validate_params explicitly allows b=8 at any dimension (no dim % 256 requirement), but the
asymmetric scoring paths still derive inv_norm from rankquant_norm, which is documented as exact
only when dim % 256 == 0 for bits==8. This creates a mismatch where the newly-supported
b=8/asymmetric-only configuration uses a normalization constant that is explicitly stated to be
inexact for that configuration.

src/quant.rs[239-300]
src/quant.rs[601-615]
src/quant.rs[905-916]
src/rank.rs[287-313]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`RankQuant::validate_params(dim, 8)` explicitly allows any `dim`, and `RankQuant::new_asymmetric(dim, 8)` enables asymmetric scoring at non-256-aligned dimensions. However, both `search_asymmetric` and `search_asymmetric_subset` still compute `inv_norm` using `rankquant_norm(dim, bits)`, whose contract (and doc) is the *symmetric analytical* closed-form norm that is only exact when bucket occupancy is equal (for b=8: `dim % 256 == 0`).

This leads to incorrect normalization/scaling of b=8 asymmetric scores for the newly-supported non-256-aligned dimensions.

### Issue Context
There is already an in-crate implementation pattern for computing an exact norm by iterating ranks and accumulating `bucket_centre(rank_to_bucket(rank))^2` (see the eval helpers). A similar approach can be applied for b=8 (or more generally) in the RankQuant asymmetric path.

### Fix Focus Areas
- src/quant.rs[601-616]
- src/quant.rs[901-917]
- src/rank.rs[287-313]

### Suggested fix
- Introduce an *exact* norm computation for `(dim, bits)` when `bits == 8 && dim % 256 != 0` (or more generally for any non-constant-composition case), e.g.:
 - `rankquant_norm_exact(dim, bits)` that loops `rank in 0..dim`, computes `b = rank_to_bucket(rank as u16, dim, bits)`, `c = bucket_centre(b, bits)`, and accumulates `c*c`.
- In `search_asymmetric` and `search_asymmetric_subset`, when `bits == 8` and `!dim.is_multiple_of(256)`, use the exact norm instead of the closed-form `rankquant_norm`.
- (Optional but robust) Cache the computed norm (or `inv_norm`) in the `RankQuant` instance for `b=8` asymmetric-only to avoid recomputation across calls.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Eval rejects b=8 ✓ Resolved 🐞 Bug ≡ Correctness
Description
rankquant_eval_search still validates bits via check_eval_bits which rejects bits=8, so
calling rankquant_eval_search(..., bits=8, ...) will panic even though the PR expands the
supported widths to include 8 in RankQuant::validate_params and related APIs. This creates an
inconsistent public surface where b=8 is supported for RankQuant but not for the standalone eval
helper.
Code

src/quant.rs[R256-262]

    pub fn validate_params(dim: usize, bits: u8) -> Result<(), OrdvecError> {
-        if !matches!(bits, 1 | 2 | 4) {
+        if !matches!(bits, 1 | 2 | 4 | 8) {
            return Err(OrdvecError::InvalidParameter {
                name: "bits",
-                message: "must be 1, 2, or 4".to_string(),
+                message: "must be 1, 2, 4, or 8".to_string(),
            });
        }
Evidence
The eval path hard-rejects 8 via check_eval_bits(1..=7), while the updated validation and docs in
the same module now explicitly accept bits ∈ {1,2,4,8}, making rankquant_eval_search
unexpectedly panic for a newly-supported width.

src/quant.rs[44-50]
src/quant.rs[241-261]
src/quant.rs[1093-1101]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`rankquant_eval_search` still uses `check_eval_bits` which asserts `bits` is in `1..=7`, so `bits=8` panics. After this PR, other public entry points (e.g., `RankQuant::validate_params`) explicitly accept `bits=8`, so the eval API is now inconsistent with the supported bit-width domain.

### Issue Context
The underlying primitives (`rank_to_bucket`, `bucket_centre`) now support `bits<=8`, so `rankquant_eval_search` can safely support `bits=8` as well.

### Fix Focus Areas
- src/quant.rs[44-46]
- src/quant.rs[1093-1101]
- src/quant.rs[241-262]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@qodo-code-review

Copy link
Copy Markdown

PR Summary by Qodo

Add capability-gated b=8 RankQuant with asymmetric gather kernel and tests
✨ Enhancement 🧪 Tests 🕐 40+ Minutes

Grey Divider

Walkthroughs

Description
• Add b=8 RankQuant with capability gating for symmetric vs asymmetric scoring.
• Implement AVX-512 gather-based b=8 asymmetric scan with scalar fallback.
• Add integration/unit tests that pin capability rules and scoring parity.
Diagram
graph TD
  A["Public API (lib.rs)"] --> B["RankQuant (quant.rs)"] --> C["Rank utils (rank.rs)"]
  B --> D["Kernels (quant_kernels.rs)"]
  B --> E["TVRQ IO (rank_io)"]
  F["b=8 tests (tests/index/quant_b8.rs)"] --> B

  subgraph Legend
    direction LR
    _api["API Module"] ~~~ _core["Core Logic"] ~~~ _kern["SIMD/Scalar Kernels"]
  end
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Make symmetric search return Result for unsupported capability
  • ➕ Avoids panics for AsymmetricOnly instances and enables error handling in libraries/services
  • ➕ Allows clearer propagation when consumers accidentally call symmetric search
  • ➖ Breaking API change (search signature) unless introduced as a parallel method
  • ➖ Would diverge from current crate’s fail-loud guard style
2. Split types: RankQuantAsymmetricOnly vs RankQuantFull
  • ➕ Capability becomes a compile-time property; impossible to call symmetric search on asymmetric-only
  • ➕ Reduces runtime branching and docs burden
  • ➖ More API surface and conversions; may complicate ergonomics and generics
  • ➖ Still needs persistence/versioning decisions for b=8
3. Version/extend the .tvrq format to include b=8 immediately
  • ➕ Unblocks the stated asymmetric quant storage use case with an end-to-end story
  • ➕ Avoids the current 'in-memory only' limitation for b=8
  • ➖ Requires format bump and compatibility strategy across Rust/Python loaders
  • ➖ Increases rollout coordination and review surface beyond kernel + API

Recommendation: The capability-gated design is a reasonable way to add b=8 without weakening the correctness contract of symmetric scoring. If the project strongly prefers fail-loud invariants, keeping the panic on unsupported symmetric search is consistent; otherwise consider adding a non-breaking try_search() variant returning Result. The biggest product decision is persistence: the current choice to hard-reject write() for b=8 avoids broken round-trips, but if storage is a core use case it’s worth prioritizing a .tvrq format extension soon.

Grey Divider

File Changes

Enhancement (4)
lib.rs Re-export RankQuantCapability in public API +1/-1

Re-export RankQuantCapability in public API

• Extends the crate’s public exports to include the new RankQuantCapability enum alongside RankQuant, enabling downstream callers to inspect supported scoring modes.

src/lib.rs


quant.rs Add b=8 RankQuant capability gating and asymmetric-only constructor +328/-57

Add b=8 RankQuant capability gating and asymmetric-only constructor

• Introduces RankQuantCapability and stores it in RankQuant instances. Adds 'new_asymmetric()', capability helpers, and symmetric gating that panics with a pinned message for non-256-aligned b=8. Routes b=8 symmetric scanning through a dedicated scan function and routes b=8 asymmetric scoring through a new kernel dispatch. Rejects persistence for b=8 via 'write()' returning Unsupported and documents the limitation.

src/quant.rs


quant_kernels.rs Implement b=8 scans and AVX-512 gather asymmetric kernel +450/-0

Implement b=8 scans and AVX-512 gather asymmetric kernel

• Adds a b=8 per-coordinate LUT builder, a scalar reference scan ('scan_b8_to_topk'), and a runtime-dispatched asymmetric entry point ('scan_b8_asym'). Implements an AVX-512 'vgatherdps' kernel with explicit invariants and test coverage for parity against the scalar reference, plus an ignored micro-benchmark for kernel-level performance validation.

src/quant_kernels.rs


rank.rs Widen bucketing/packing primitives to support bits=8 +141/-28

Widen bucketing/packing primitives to support bits=8

• Extends 'rank_to_bucket', 'bucket_ranks', pack/unpack helpers, bytes-per-vec, and norm helpers to accept bits=8, including fixing mask calculation to avoid '1u8<<8' overflow. Adds documentation clarifying that 'dim % 256 == 0' is a symmetric-norm requirement (not code validity), and expands unit tests for b=8 behavior and invariants.

src/rank.rs


Tests (3)
main.rs Register new b=8 integration test module +1/-0

Register new b=8 integration test module

• Adds the 'quant_b8' integration test module to the test suite entrypoint so capability and parity tests run under the index test harness.

tests/index/main.rs


quant_b8.rs Add b=8 capability matrix and parity integration tests +449/-0

Add b=8 capability matrix and parity integration tests

• Adds comprehensive integration tests that pin: constructor capability behavior, 'validate_params' semantics, fail-loud symmetric gating message for non-aligned dims, symmetric correctness on 256-aligned dims, and asymmetric parity vs a naive reference (including subset rerank path).

tests/index/quant_b8.rs


redteam_gamma.rs Update redteam guard expectation for bits<=8 +3/-2

Update redteam guard expectation for bits<=8

• Updates the redteam test commentary and expectation to reflect that bits=8 is now a supported width and the ‘bits too large’ guard begins above 8.

tests/redteam_gamma.rs


Grey Divider

Qodo Logo

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for 8-bit (b=8) bucketed-rank quantization (RankQuant) as a high-precision evidence and refinement surface. It implements capability gating via the new RankQuantCapability enum, allowing asymmetric scoring and code generation at any dimension, while restricting symmetric scoring to dimensions aligned to 256 (dim % 256 == 0). The changes include a new new_asymmetric constructor, updated parameter validation, an optimized AVX-512 gather kernel for asymmetric scans with a portable scalar fallback, and comprehensive integration tests. Since there are no review comments, I have no additional feedback to provide on this pull request.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/quant_kernels.rs Outdated
@Fieldnote-Echo

Copy link
Copy Markdown
Member Author

/agentic_review

@qodo-code-review

qodo-code-review Bot commented Jun 14, 2026

Copy link
Copy Markdown

Code review by qodo was updated up to the latest commit 790f478

Comment thread src/quant.rs Outdated
check_eval_bits capped at 1..=7, rejecting b=8 from rankquant_eval_search — but
b=8 codes fit u8 and the eval norm is computed empirically (valid at any dim,
no dim%256). Widen to 1..=8 (b=9 is the first u8-overflowing width). Test: eval
search with b=8 at a non-256-aligned dim (384).

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
qodo flagged the b=8 gather (scan_b8_asym_avx512_gather, uses
_mm512_cvtepu8_epi32) dispatching on avx512f alone. VPMOVZXBD is AVX-512F per
Intel, but gating on avx512f+avx512bw matches the rest of the crate's AVX-512
kernels (which require avx512dq), keeps the byte-widening conservatively gated,
and adds no real exclusion (F-without-BW CPUs like KNL/KNM are already excluded
by the dq requirement). Updated both the runtime dispatch and the

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
#[target_feature], plus the direct test/bench callers' guards.
Two rustdoc/comment sites (scan_b8_asym dispatch note; RankQuant::search_asymmetric
b=8 doc) still described the gate as avx512f-only after 124b5a1 widened it to
avx512f+avx512bw. Docs now match the code.

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Remaining stale sites after 44f1f2d: the quant.rs module-level b=8 dispatch note,
and the three SAFETY comments at the b=8 gather's test/bench call sites. All now
say avx512f+avx512bw, matching the dispatch + #[target_feature]. Non-b8 kernels
(bitmap vpopcntdq, b2/b4 dq, fastscan, sign) are unchanged.

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
qodo: crate-level lib.rs and README still described RankQuant widths as
bits ∈ {1,2,4}. Add the b=8 note (capability-gated evidence/refinement width:
asymmetric + code/projection at any dim; symmetric only when dim % 256 == 0),
so the headline docs match the new surface and don't mislead on b=8 scope.

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
@Fieldnote-Echo

Copy link
Copy Markdown
Member Author

/agentic_review

@qodo-code-review

qodo-code-review Bot commented Jun 14, 2026

Copy link
Copy Markdown

Code review by qodo was updated up to the latest commit 9c7923a

Comment thread src/quant.rs
The closed-form `rankquant_norm` (`sqrt(dim * var)`, `var = (2^bits-1... )`)
assumes exactly-uniform bucket occupancy, which only holds for b in {1,2,4}
and for b=8 when `dim % 256 == 0`. At a b=8 dim not divisible by 256 the
buckets are unequally occupied, so the closed form mis-scales the absolute
asymmetric scores. The ranking is unaffected (the norm is one global constant
shared by every document), but `search_asymmetric` / `search_asymmetric_subset`
report cosine-like scores that must be correctly scaled.

Add `asymmetric_norm(dim, bits)`: closed form for the uniform regimes,
exact empirical norm (`rankquant_eval_norm`, summing realised squared bucket
centres) for b=8 at non-256 dims. Wire it into both asymmetric scoring sites.
The symmetric path is untouched (it is gated to dim % 256 == 0, where the
closed form is exact).

Update `ref_b8_asymmetric` to compute the exact per-codes norm so the parity
tests validate against the true cosine at dim=384 (previously both production
and reference shared the same wrong norm, masking the mis-scale).

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
…m gating

qodo flagged an apparent contradiction: the crate docs state b=8 symmetric
scoring requires dim % 256 == 0, yet `rankquant_eval_search` supports b=8 at
non-256 dims. These are two distinct surfaces and there is no correctness bug —
clarify the docs so the capability matrix reads consistently:

- `rankquant_eval_search` rustdoc: fix the inaccurate 'analytical norm' (it has
  always used the *empirical* norm) and state explicitly that the empirical
  norm is exact under any bucket occupancy, which is why this path is unbound
  by the dim % 256 gate that the analytical-norm `RankQuant::search` carries.
- lib.rs crate doc: scope the dim % 256 restriction to analytical-norm
  symmetric `RankQuant::search`; note the empirical eval path has no such limit.
- check_eval_bits + the eval-at-any-dim test: spell out the relationship to the
  gated symmetric path.

No functional change; doc-only.

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>

# Conflicts:
#	src/lib.rs
#	src/quant.rs
The merge of main (SubsetScratch batched rerank) brought the b=8 asymmetric
routing into `search_asymmetric_subset_batched_serial_into` via the reused
scratch buffers. Add a parity test through the public
`search_asymmetric_subset_batched_serial` entry point covering both a
non-256-aligned dim (384, empirical asymmetric norm) and an aligned dim (768),
with two queries on distinct CSR candidate rows so scratch reuse across rows is
exercised. Every returned score matches the naive per-doc reference.

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
@codecov

codecov Bot commented Jun 14, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@project-navi-bot Navi Bot (project-navi-bot) merged commit 57787d9 into main Jun 15, 2026
38 checks passed
@project-navi-bot Navi Bot (project-navi-bot) deleted the feat/rankquant-b8 branch June 15, 2026 00:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decide b=8 RankQuant evidence surface for OrdGraph/Ordscope use

2 participants