feat: capability-gated b=8 RankQuant (research/evidence width) (#221)#228
Conversation
b=8 is an evidence/refinement-oriented RankQuant width (asymmetric quant after
repair flows, edge-case rerank healing) — a stable, documented surface, NOT
experimental-unstable. Capability matrix:
- bucket-code generation / pair-evidence / asymmetric scoring: ANY dim
- symmetric scoring + analytical norm: ONLY dim % 256 == 0
API (additive, no breaking changes to existing signatures):
- RankQuantCapability { AsymmetricOnly, SymmetricAndAsymmetric } + capability() +
symmetric_supported().
- RankQuant::new(dim, 8) requires dim % 256 == 0 (full capability; fail-loud else,
directing to new_asymmetric). RankQuant::new_asymmetric(dim, 8): any dim,
AsymmetricOnly (auto-upgrades to full when 256-aligned). b=1/2/4 unchanged.
- search() on an AsymmetricOnly instance fails loud with the exact message
'RankQuant b=8 symmetric scoring requires dim % 256 == 0; dim={dim} supports
asymmetric/evidence APIs only.' (documented; check symmetric_supported() first).
- validate_params(dim, 8): code-validity any dim (no dim%256). Primitives widened
to bits<=8 (mask in u16 to avoid 1u8<<8 overflow); b=8 packs 1 byte/coord.
Kernel: b=8 asymmetric uses an AVX-512 vgatherdps kernel (dim*256 LUT gather),
runtime-dispatched (avx512f + dim%16==0, explicit tail handling), scalar LUT
fallback; ~1.23x over scalar on this host (gather/LUT-latency bound — honest,
shipped). Parity vs scalar within 1e-4 across dims 384/400/768/1024/1536.
Persistence: b=8 write() returns io::Error(Unsupported) (no file) — the .tvrq
loader admits {1,2,4} only; b=8 is in-memory this phase (FLAGGED, see PR).
Verified: fmt/clippy(-D warnings, default+experimental)/test(196 default + 206
experimental + no-default-features)/MSRV 1.89 green; b=1/2/4 unchanged; unsafe
gather bounds proven. Closes #221.
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Code Review by Qodo
Context used 1.
|
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
PR Summary by QodoAdd capability-gated b=8 RankQuant with asymmetric gather kernel and tests WalkthroughsDescription• Add b=8 RankQuant with capability gating for symmetric vs asymmetric scoring. • Implement AVX-512 gather-based b=8 asymmetric scan with scalar fallback. • Add integration/unit tests that pin capability rules and scoring parity. Diagramgraph TD
A["Public API (lib.rs)"] --> B["RankQuant (quant.rs)"] --> C["Rank utils (rank.rs)"]
B --> D["Kernels (quant_kernels.rs)"]
B --> E["TVRQ IO (rank_io)"]
F["b=8 tests (tests/index/quant_b8.rs)"] --> B
subgraph Legend
direction LR
_api["API Module"] ~~~ _core["Core Logic"] ~~~ _kern["SIMD/Scalar Kernels"]
end
High-Level AssessmentThe following are alternative approaches to this PR: 1. Make symmetric search return Result for unsupported capability
2. Split types: RankQuantAsymmetricOnly vs RankQuantFull
3. Version/extend the .tvrq format to include b=8 immediately
Recommendation: The capability-gated design is a reasonable way to add b=8 without weakening the correctness contract of symmetric scoring. If the project strongly prefers fail-loud invariants, keeping the panic on unsupported symmetric search is consistent; otherwise consider adding a non-breaking File ChangesEnhancement (4)
Tests (3)
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for 8-bit (b=8) bucketed-rank quantization (RankQuant) as a high-precision evidence and refinement surface. It implements capability gating via the new RankQuantCapability enum, allowing asymmetric scoring and code generation at any dimension, while restricting symmetric scoring to dimensions aligned to 256 (dim % 256 == 0). The changes include a new new_asymmetric constructor, updated parameter validation, an optimized AVX-512 gather kernel for asymmetric scans with a portable scalar fallback, and comprehensive integration tests. Since there are no review comments, I have no additional feedback to provide on this pull request.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
/agentic_review |
|
Code review by qodo was updated up to the latest commit 790f478 |
check_eval_bits capped at 1..=7, rejecting b=8 from rankquant_eval_search — but b=8 codes fit u8 and the eval norm is computed empirically (valid at any dim, no dim%256). Widen to 1..=8 (b=9 is the first u8-overflowing width). Test: eval search with b=8 at a non-256-aligned dim (384). Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
qodo flagged the b=8 gather (scan_b8_asym_avx512_gather, uses _mm512_cvtepu8_epi32) dispatching on avx512f alone. VPMOVZXBD is AVX-512F per Intel, but gating on avx512f+avx512bw matches the rest of the crate's AVX-512 kernels (which require avx512dq), keeps the byte-widening conservatively gated, and adds no real exclusion (F-without-BW CPUs like KNL/KNM are already excluded by the dq requirement). Updated both the runtime dispatch and the Signed-off-by: Nelson Spence <nelson@projectnavi.ai> #[target_feature], plus the direct test/bench callers' guards.
Two rustdoc/comment sites (scan_b8_asym dispatch note; RankQuant::search_asymmetric b=8 doc) still described the gate as avx512f-only after 124b5a1 widened it to avx512f+avx512bw. Docs now match the code. Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Remaining stale sites after 44f1f2d: the quant.rs module-level b=8 dispatch note, and the three SAFETY comments at the b=8 gather's test/bench call sites. All now say avx512f+avx512bw, matching the dispatch + #[target_feature]. Non-b8 kernels (bitmap vpopcntdq, b2/b4 dq, fastscan, sign) are unchanged. Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
qodo: crate-level lib.rs and README still described RankQuant widths as
bits ∈ {1,2,4}. Add the b=8 note (capability-gated evidence/refinement width:
asymmetric + code/projection at any dim; symmetric only when dim % 256 == 0),
so the headline docs match the new surface and don't mislead on b=8 scope.
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
|
/agentic_review |
|
Code review by qodo was updated up to the latest commit 9c7923a |
The closed-form `rankquant_norm` (`sqrt(dim * var)`, `var = (2^bits-1... )`)
assumes exactly-uniform bucket occupancy, which only holds for b in {1,2,4}
and for b=8 when `dim % 256 == 0`. At a b=8 dim not divisible by 256 the
buckets are unequally occupied, so the closed form mis-scales the absolute
asymmetric scores. The ranking is unaffected (the norm is one global constant
shared by every document), but `search_asymmetric` / `search_asymmetric_subset`
report cosine-like scores that must be correctly scaled.
Add `asymmetric_norm(dim, bits)`: closed form for the uniform regimes,
exact empirical norm (`rankquant_eval_norm`, summing realised squared bucket
centres) for b=8 at non-256 dims. Wire it into both asymmetric scoring sites.
The symmetric path is untouched (it is gated to dim % 256 == 0, where the
closed form is exact).
Update `ref_b8_asymmetric` to compute the exact per-codes norm so the parity
tests validate against the true cosine at dim=384 (previously both production
and reference shared the same wrong norm, masking the mis-scale).
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
…m gating qodo flagged an apparent contradiction: the crate docs state b=8 symmetric scoring requires dim % 256 == 0, yet `rankquant_eval_search` supports b=8 at non-256 dims. These are two distinct surfaces and there is no correctness bug — clarify the docs so the capability matrix reads consistently: - `rankquant_eval_search` rustdoc: fix the inaccurate 'analytical norm' (it has always used the *empirical* norm) and state explicitly that the empirical norm is exact under any bucket occupancy, which is why this path is unbound by the dim % 256 gate that the analytical-norm `RankQuant::search` carries. - lib.rs crate doc: scope the dim % 256 restriction to analytical-norm symmetric `RankQuant::search`; note the empirical eval path has no such limit. - check_eval_bits + the eval-at-any-dim test: spell out the relationship to the gated symmetric path. No functional change; doc-only. Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Signed-off-by: Nelson Spence <nelson@projectnavi.ai> # Conflicts: # src/lib.rs # src/quant.rs
The merge of main (SubsetScratch batched rerank) brought the b=8 asymmetric routing into `search_asymmetric_subset_batched_serial_into` via the reused scratch buffers. Add a parity test through the public `search_asymmetric_subset_batched_serial` entry point covering both a non-256-aligned dim (384, empirical asymmetric norm) and an aligned dim (768), with two queries on distinct CSR candidate rows so scratch reuse across rows is exercised. Every returned score matches the naive per-doc reference. Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Closes #221.
What
Adds b=8 as a capability-gated RankQuant width — a stable, documented evidence/refinement surface (asymmetric quant after repair flows, edge-case rerank healing), not experimental-unstable, and not a broad retrieval-quant method.
Capability matrix (the constant-composition rule
dim % 2^bitsexists only for the symmetric analytical norm):dim % 256 == 0b=1/2/4are unchanged — they stay the stable headline retrieval surface.API (additive — no breaking changes)
RankQuantCapability { AsymmetricOnly, SymmetricAndAsymmetric }+capability()+symmetric_supported().RankQuant::new(dim, 8)requiresdim % 256 == 0(full capability; fail-loud otherwise, directing tonew_asymmetric).RankQuant::new_asymmetric(dim, 8)— any dim,AsymmetricOnly(auto-upgrades when 256-aligned).search()on anAsymmetricOnlyinstance fails loud: "RankQuant b=8 symmetric scoring requires dim % 256 == 0; dim={dim} supports asymmetric/evidence APIs only." (documented; querysymmetric_supported()first).validate_params(dim, 8)= code-validity at any dim.Kernel
b=8 asymmetric = a
dim*256float-LUT gather. AVX-512vgatherdpskernel (runtime-dispatched, explicit tail handling), scalar LUT fallback. ~1.23× over scalar on an AVX-512 host — honest (gather/LUT-latency bound, not the ~10× of b=2/4 which touch no LUT); shipped because it reproducibly wins. Parity vs scalar within the crate's1e-4tolerance across dims 384/400/768/1024/1536.Verified (locally, my own runs)
fmt / clippy
-D warnings(default + experimental) / test (196 default + 206 experimental + no-default-features) / MSRV 1.89 — all green. b=1/2/4 behaviour and every prior test unchanged. Unsafe gather bounds proven by hand. Test matrix covers dim=384 (asym pass, symmetric rejects with exact message), 768/1024/1536 (full), b=4 dim=384 unchanged, capability reporting, brute-force asymmetric parity.RankQuant::write()returnsio::Error(Unsupported)and writes no file for a b=8 index (the.tvrqloader admits{1,2,4}only; refusing avoids a silent broken round-trip). b=8 is in-memory only this phase. Your stated use case is "asymmetric quant storage" — if that means persisting b=8 to disk, say so and I'll add b=8 to the.tvrqformat (trivial: 1 byte/coord) as a follow-up or in this PR.Result) — matches the crate's fail-loud guard style and avoids a breakingsearchsignature; thenew_asymmetricname +symmetric_supported()make it non-surprising. Confirm.{1,2,4}guard (b=8 is Rust-core-only; Python parity is the separate Add Python/CLI parity utilities for ordinal pair evidence #223).Coupling
Once this lands, #226 (
RankQuantSpec, #220) gets a small follow-up to accept b=8 at any dim + reportsymmetric_supported().Targets 0.6 (may land in 0.5.0 — maintainer decides).