[codex] Reuse scalar subset rerank LUT scratch by Fieldnote-Echo · Pull Request #250 · Project-Navi/ordvec

Fieldnote-Echo · 2026-06-19T15:57:29Z

Summary

add scratch-owned scalar LUT storage for caller-owned subset rerank
route scalar and b=8 subset rerank through reusable LUT buffers
remove the scalar fallback carve-out from the allocation-free rerank contract

Closes #235.

Validation

cargo fmt --check
cargo test --features test-utils --test alloc_free -- --nocapture
cargo test --features test-utils --test index batched_subset_rerank_matches_scalar_reference_across_tiers -- --nocapture
cargo test --features test-utils --test index batched_into_is_allocation_free_after_warmup -- --nocapture
cargo test --test index quant_b8 -- --nocapture
cargo check
git diff --check

codecov · 2026-06-19T15:58:32Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

gemini-code-assist

Code Review

This pull request makes the scalar fallback path for subset reranking allocation-free by reusing a pre-allocated LUT buffer in SubsetScratch, matching the behavior of the SIMD path. Documentation and tests are updated to reflect that both paths are now allocation-free after warmup. The review feedback suggests using chunks_exact_mut in build_asym_lut_into and build_b8_asym_lut_into to avoid manual indexing and eliminate bounds checks, making the code more idiomatic and potentially more performant.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>

chatgpt-codex-connector · 2026-06-19T16:07:00Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

qodo-code-review · 2026-06-19T16:07:46Z

PR Summary by Qodo

Reuse scalar subset rerank LUT scratch to make rerank allocation-free
✨ Enhancement 📝 Documentation 🧪 Tests 🕐 40+ Minutes

Description

• Add scratch-owned scalar LUT storage to reuse buffers across subset rerank calls.
• Route scalar and b=8 asymmetric rerank through caller-provided LUT buffers.
• Update allocation-free rerank contract and tests to remove scalar fallback carve-out.

Diagram

graph TD
  A["Caller"] --> B["RankQuant _into"] --> C["SubsetScratch"] --> D["scalar_lut Vec<f32>"]
  B --> E["b=8 rerank"] --> F["scan_b8_asym_with_lut"] --> G["TopK results"]
  B --> H["scalar rerank"] --> I["scan_via_lut_scalar_with_lut"] --> G

  subgraph Legend
    direction LR
    _api["API/entrypoint"] ~~~ _buf["Reusable buffer"] ~~~ _ker["Kernel"]
  end

High-Level Assessment

The following are alternative approaches to this PR:

1. Cache LUT inside RankQuant instance

➕ No need to expand SubsetScratch API surface
➕ Simplifies scratch ownership for single-threaded callers
➖ Harder to use safely with external parallelism (would need per-thread/per-worker storage)
➖ Blurs separation between immutable index state and per-call scratch buffers

2. Use thread-local LUT cache

➕ Avoids passing/owning LUT in scratch explicitly
➕ Works well when caller uses OS threads with stable lifetimes
➖ Less predictable memory behavior for embedders (TLS lifetime and teardown)
➖ Complicates testing and allocation-free guarantees across runtimes (e.g., WASM, custom thread pools)

3. Specialize scalar-only scratch type

➕ Keeps scalar concerns out of the main SubsetScratch used by SIMD paths
➖ Adds API and type complexity (more generics/branching in call sites)
➖ Callers would need additional dispatch logic to choose scratch type

Recommendation: Current approach (adding a reusable LUT buffer to SubsetScratch and threading it through scalar and b=8 scans) is the best fit for the stated contract: allocation-free rerank after warmup in caller-managed parallel runtimes. It keeps ownership explicit, works cross-platform (including scalar-only targets), and avoids implicit TLS or shared mutable state concerns.

Files changed (5) +88 / -48

Enhancement (1) +14 / -8

quant.rsAdd scalar LUT storage to SubsetScratch and route rerank through LUT-reuse kernels +14/-8

Add scalar LUT storage to SubsetScratch and route rerank through LUT-reuse kernels

• Extends SubsetScratch with a 'scalar_lut: Vec<f32>' buffer and exposes its capacity in the test helper. Updates subset rerank dispatch to call the new '*_with_lut' kernels for b=8 and scalar paths, and updates doc comments to reflect allocation-free scalar rerank after warmup.

src/quant.rs

Refactor (1) +64 / -16

quant_kernels.rsIntroduce LUT-into builders and with-LUT scan entrypoints for scalar and b=8 +64/-16

Introduce LUT-into builders and with-LUT scan entrypoints for scalar and b=8

• Refactors LUT construction to support caller-provided buffers ('build_asym_lut_into', 'build_b8_asym_lut_into') and adds 'scan_via_lut_scalar_with_lut' and 'scan_b8_asym_with_lut' to reuse capacity after warmup. Keeps the original entrypoints as thin wrappers allocating a fresh LUT for convenience, and updates AVX-512 gather tests to use the new LUT builder.

src/quant_kernels.rs

Tests (2) +4 / -16

alloc_free.rsRemove SIMD-only gating from allocation-free rerank test +0/-15

Remove SIMD-only gating from allocation-free rerank test

• Deletes the conditional skip that previously exempted scalar fallback from the strict zero-allocation assertion. The test now enforces the allocation-free contract uniformly after warmup.

tests/alloc_free.rs

quant_b8.rsMake panic payload extraction explicit and robust +4/-1

Make panic payload extraction explicit and robust

• Replaces 'res.err().expect(...)' with an explicit 'match' to make the test logic clearer and avoid relying on chained unwraps when extracting the panic payload string.

tests/index/quant_b8.rs

Documentation (1) +6 / -8

README.mdUpdate allocation-free rerank contract to include scalar path +6/-8

Update allocation-free rerank contract to include scalar path

• Removes documentation statements claiming scalar fallback allocates a per-query LUT. Clarifies that '_into' rerank is allocation-free after warmup for both SIMD and scalar by reusing SubsetScratch and output buffers.

README.md

qodo-code-review · 2026-06-19T16:11:04Z

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0) 📜 Skill insights (0)

Context used

✅ Tickets: 🎫 perf(runtime): reuse scalar subset-rerank LUTs through SubsetScratch

1. capacities_for_test() return changed ✓ Resolved 📎 Requirement gap ⚙ Maintainability

Description

SubsetScratch::capacities_for_test changes its public return type from a 4-tuple to a 5-tuple,
which is a breaking API change for any downstream code using this (even if #[doc(hidden)]). This
violates the requirement to keep public APIs unchanged for callers.

Code

src/quant.rs[R87-94]

+    pub fn capacities_for_test(&self) -> (usize, usize, usize, usize, usize) {
        (
            self.q_unit.capacity(),
            self.sub_packed.capacity(),
+            self.scalar_lut.capacity(),
            self.local_indices.capacity(),
            self.final_order.capacity(),
        )

Relevance

⭐⭐⭐ High
Team previously accepted limiting doc(hidden) test APIs to avoid semver commitments; breaking pub
return likely blocked (PR215).
PR-#215

ⓘ Recommendations generated based on similar findings in past PRs

Evidence

PR Compliance ID 2 forbids breaking public API changes. The diff shows capacities_for_test is
pub and its return type is modified to add an additional element for scalar_lut capacity.

Public APIs and external behavior remain unchanged
src/quant.rs[82-95]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`SubsetScratch::capacities_for_test` is `pub` and its return type changed from `(usize, usize, usize, usize)` to `(usize, usize, usize, usize, usize)`, which is a breaking change.

## Issue Context
Although it is marked `#[doc(hidden)]`, it is still part of the crate's public API surface and can be used by downstream crates (including those enabling `test-utils`). The compliance checklist requires no breaking/user-visible API changes.

## Fix Focus Areas
- src/quant.rs[82-95]

## Suggested approach
- Keep the existing `capacities_for_test()` signature unchanged (return the original 4-tuple), and add a new method (e.g. `capacities_for_test_v2()` or `capacities_for_test_with_scalar_lut()`) that returns the 5-tuple including `scalar_lut` capacity.
- Update internal/integration tests to call the new method while preserving backward compatibility.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~b8 LUT shape unchecked~~ ✓ Resolved 🐞 Bug ☼ Reliability

Description

scan_b8_asym_with_lut builds the LUT sized from q_unit.len() but then uses the separate dim
parameter to drive both the scalar scan and the AVX-512 gather path. If a future internal caller
passes mismatched dim/q_unit, the AVX-512 gather kernel can read past the LUT in release builds
because it only debug_asserts the lut.len() == dim * 256 invariant.

Code

src/quant_kernels.rs[R604-616]

+pub(crate) fn scan_b8_asym_with_lut(
+    packed: &[u8],
+    n: usize,
+    dim: usize,
+    q_unit: &[f32],
+    scale: f32,
+    top: &mut TopK,
+    lut: &mut Vec<f32>,
+) {
+    build_b8_asym_lut_into(lut, q_unit);
    #[cfg(target_arch = "x86_64")]
    {
        if is_x86_feature_detected!("avx512f")

Relevance

⭐⭐⭐ High
Team previously accepted replacing debug-only preconditions to prevent release-build OOB/UB in SIMD
paths (PR #19, #228).
PR-#19
PR-#228

ⓘ Recommendations generated based on similar findings in past PRs

Evidence

The new reusable-LUT entry point builds the LUT from q_unit length, not dim, and then calls the
AVX-512 gather kernel which relies on lut.len() == dim * 256 but only checks it with
debug_assert_eq! before using lut.as_ptr() for gathers.

src/quant_kernels.rs[604-633]
src/quant_kernels.rs[169-177]
src/quant_kernels.rs[666-699]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`scan_b8_asym_with_lut` does not validate that `q_unit.len()` matches the `dim` parameter, even though it forwards `dim` to the scan routines and the AVX-512 gather kernel’s safety relies on `lut.len() == dim * 256`.

## Issue Context
- The LUT builder sizes from `q_unit.len()`.
- The AVX-512 gather kernel uses `lut.as_ptr()` and only `debug_assert`s the length invariant, so a mismatch could become an OOB read in release.

## Fix Focus Areas
- Add a hard invariant check early in `scan_b8_asym_with_lut`, e.g. `assert_eq!(q_unit.len(), dim, "...")`, before building/using the LUT.
- (Optional) After building, you may also `debug_assert_eq!(lut.len(), dim * 256)` for clarity.

- src/quant_kernels.rs[604-633]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. ~~Weak LUT invariant check~~ ✓ Resolved 🐞 Bug ≡ Correctness

Description

build_asym_lut_into now only debug_asserts q_unit.len() == dim and then uses zip, so in
release builds a length mismatch will silently leave part of the LUT zeroed (or ignore extra
q_unit), producing incorrect scores without failing fast. This turns a would-be panic into a
silent mis-scoring failure mode that is harder to detect during future refactors.

Code

src/quant_kernels.rs[R46-50]

+    debug_assert_eq!(q_unit.len(), dim);
+    lut.resize(dim * n_buckets, 0.0);
+    for (&qd, row) in q_unit.iter().zip(lut.chunks_exact_mut(n_buckets)) {
+        for (b, slot) in row.iter_mut().enumerate() {
+            *slot = qd * bucket_centre(b as u8, bits);

Relevance

⭐⭐⭐ High
Repo favors fail-loud release checks over debug_assert for invariants (e.g., made length
preconditions unconditional in PR #19).
PR-#19
PR-#26

ⓘ Recommendations generated based on similar findings in past PRs

Evidence

The code shows a debug_assert (compiled out in release) and a zip() iteration, which will not
panic on length mismatch and will instead stop early, leaving trailing LUT entries at their resized
default 0.0 values.

src/quant_kernels.rs[39-53]
src/quant_kernels.rs[55-75]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`build_asym_lut_into()` relies on `debug_assert_eq!(q_unit.len(), dim)` and then iterates via `q_unit.iter().zip(lut.chunks_exact_mut(n_buckets))`. In release builds, `debug_assert` is disabled and `zip()` truncates to the shorter iterator, so a mismatched `(dim, q_unit.len())` can silently produce a partially-zero LUT and wrong scores.

## Issue Context
This function takes both `dim` and `q_unit` as independent inputs, so a future call site or refactor could desynchronize them. Previously, indexing-style loops would naturally panic on a short slice; the new iterator form removes that fail-fast behavior in release.

## Fix Focus Areas
- src/quant_kernels.rs[39-53]

## Suggested fix
- Replace `debug_assert_eq!(q_unit.len(), dim)` with `assert_eq!(q_unit.len(), dim)` (or otherwise return an error if you prefer non-panicking internal APIs).
- Alternatively, remove the `dim` parameter and derive `dim` from `q_unit.len()` at the boundary, so the function cannot be called with inconsistent shapes.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. Test probe leaks into API 🐞 Bug ⚙ Maintainability

Description

SubsetScratch::capacities_with_scalar_lut_for_test is a new pub method compiled into normal
builds (only #[doc(hidden)]), expanding the public API surface with a test-only diagnostic. This
increases the chance downstream code accidentally relies on it and constrains future refactors of
scratch internals.

Code

src/quant.rs[R95-98]

+
+    /// Test-only capacity probe that includes the scalar LUT scratch buffer.
+    #[doc(hidden)]
+    pub fn capacities_with_scalar_lut_for_test(&self) -> (usize, usize, usize, usize, usize) {

Relevance

⭐⭐⭐ High
Team previously accepted feature-gating doc-hidden diagnostic APIs to avoid public surface (PR
#215).
PR-#215

ⓘ Recommendations generated based on similar findings in past PRs

Evidence

The new capacity probe is declared pub without a feature gate, while a similar test-only dispatch
probe in the same file is explicitly gated behind feature = "test-utils", indicating the intended
pattern for keeping diagnostics out of the stable surface.

src/quant.rs[70-106]
src/quant.rs[329-354]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`capacities_with_scalar_lut_for_test()` is a test-only diagnostic but is currently a `pub` method available in all builds (it is only hidden from docs). This unnecessarily expands the public API surface.

## Issue Context
The crate already uses `#[cfg(feature = "test-utils")]` to keep other test/diagnostic surfaces (e.g. `subset_rerank_uses_simd`) out of normal builds.

## Fix Focus Areas
- src/quant.rs[96-106]

## Suggested fix
- Add `#[cfg(feature = "test-utils")]` to `capacities_with_scalar_lut_for_test` (and likely to `capacities_for_test` for consistency), and ensure integration tests that call it are run with `--features test-utils` (as they appear to be already).
- If you want the probe callable without features, consider moving it behind a dedicated, clearly non-stable module/trait to reduce accidental downstream use.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

5. ~~Scalar path not forced in tests~~ ✓ Resolved 📎 Requirement gap ☼ Reliability

Description

The allocation-free test change removes conditional skipping but still does not force the scalar
rerank path, so CI on SIMD-capable x86 hosts may never exercise scalar fallback allocation behavior.
This fails the requirement to isolate/force scalar allocation testing independent of host CPU SIMD
capabilities.

Code

tests/alloc_free.rs[L61-74]

-    // The zero-allocation guarantee holds only when the rerank takes a SIMD
-    // kernel: the scalar LUT fallback (`scan_via_lut_scalar`) allocates a
-    // per-query LUT. Gate on the SAME dispatch decision the rerank reads — via
-    // `subset_rerank_uses_simd`, so the gate cannot drift from the actual
-    // dispatch — and skip the strict check on hosts that fall to scalar
-    // (aarch64, or x86 without AVX2+FMA / AVX-512).
-    if !ordvec::subset_rerank_uses_simd(dim, bits) {
-        eprintln!(
-            "alloc_free: rerank uses the scalar LUT fallback for \
-             (dim={dim}, bits={bits}) — it allocates a per-query LUT; \
-             skipping strict zero-alloc check"
-        );
-        return;
-    }

Relevance

⭐ Low
Alloc-free test previously gated on SIMD because scalar allocated; contract changed to alloc-free
scalar too, so no need to force scalar (PR215).
PR-#215

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
PR Compliance ID 4 requires tests to force or isolate scalar fallback allocation behavior
independent of host SIMD. The diff shows the test’s SIMD-dispatch-dependent skip logic was removed,
but no mechanism was added to force scalar fallback execution on SIMD-capable hosts.
Tests isolate and cover scalar allocation behavior independent of host CPU SIMD capabilities
tests/alloc_free.rs[53-105]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The test suite does not reliably force the scalar fallback rerank path; allocation-free assertions can still run exclusively on SIMD paths depending on the host CPU.

## Issue Context
The compliance requirement is to validate scalar fallback allocation behavior in an automated, host-independent way. Simply removing the skip/gate does not guarantee scalar coverage on AVX2/AVX-512 machines.

## Fix Focus Areas
- tests/alloc_free.rs[53-105]

## Suggested approach
- Choose a `(dim, bits)` that is constructor-valid but intentionally routes to scalar on all x86 tiers (e.g. `bits=2` with `dim` divisible by 4 but NOT by 16, such as `dim=132`).
- Add an assertion like `assert!(!ordvec::subset_rerank_uses_simd(dim, bits));` (behind `feature = "test-utils"`) to ensure the test is truly exercising scalar fallback.
- Keep the warmup + allocation-counter window assertion as-is to validate steady-state zero allocations on the scalar path.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Previous review results

Review updated until commit 5fd5c7e

Results up to commit a909ad5

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX issues (0) 🔗 Cross-repo conflicts (0) 📜 Skill insights (0)

1. capacities_for_test() return changed ✓ Resolved 📎 Requirement gap ⚙ Maintainability

Description

SubsetScratch::capacities_for_test changes its public return type from a 4-tuple to a 5-tuple,
which is a breaking API change for any downstream code using this (even if #[doc(hidden)]). This
violates the requirement to keep public APIs unchanged for callers.

Code

src/quant.rs[R87-94]

+    pub fn capacities_for_test(&self) -> (usize, usize, usize, usize, usize) {
        (
            self.q_unit.capacity(),
            self.sub_packed.capacity(),
+            self.scalar_lut.capacity(),
            self.local_indices.capacity(),
            self.final_order.capacity(),
        )

Relevance

⭐⭐⭐ High
Team previously accepted limiting doc(hidden) test APIs to avoid semver commitments; breaking pub
return likely blocked (PR215).
PR-#215

ⓘ Recommendations generated based on similar findings in past PRs

Evidence

PR Compliance ID 2 forbids breaking public API changes. The diff shows capacities_for_test is
pub and its return type is modified to add an additional element for scalar_lut capacity.

Public APIs and external behavior remain unchanged
src/quant.rs[82-95]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`SubsetScratch::capacities_for_test` is `pub` and its return type changed from `(usize, usize, usize, usize)` to `(usize, usize, usize, usize, usize)`, which is a breaking change.

## Issue Context
Although it is marked `#[doc(hidden)]`, it is still part of the crate's public API surface and can be used by downstream crates (including those enabling `test-utils`). The compliance checklist requires no breaking/user-visible API changes.

## Fix Focus Areas
- src/quant.rs[82-95]

## Suggested approach
- Keep the existing `capacities_for_test()` signature unchanged (return the original 4-tuple), and add a new method (e.g. `capacities_for_test_v2()` or `capacities_for_test_with_scalar_lut()`) that returns the 5-tuple including `scalar_lut` capacity.
- Update internal/integration tests to call the new method while preserving backward compatibility.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~Scalar path not forced in tests~~ ✓ Resolved 📎 Requirement gap ☼ Reliability

Description

The allocation-free test change removes conditional skipping but still does not force the scalar
rerank path, so CI on SIMD-capable x86 hosts may never exercise scalar fallback allocation behavior.
This fails the requirement to isolate/force scalar allocation testing independent of host CPU SIMD
capabilities.

Code

tests/alloc_free.rs[L61-74]

-    // The zero-allocation guarantee holds only when the rerank takes a SIMD
-    // kernel: the scalar LUT fallback (`scan_via_lut_scalar`) allocates a
-    // per-query LUT. Gate on the SAME dispatch decision the rerank reads — via
-    // `subset_rerank_uses_simd`, so the gate cannot drift from the actual
-    // dispatch — and skip the strict check on hosts that fall to scalar
-    // (aarch64, or x86 without AVX2+FMA / AVX-512).
-    if !ordvec::subset_rerank_uses_simd(dim, bits) {
-        eprintln!(
-            "alloc_free: rerank uses the scalar LUT fallback for \
-             (dim={dim}, bits={bits}) — it allocates a per-query LUT; \
-             skipping strict zero-alloc check"
-        );
-        return;
-    }

Relevance

⭐ Low
Alloc-free test previously gated on SIMD because scalar allocated; contract changed to alloc-free
scalar too, so no need to force scalar (PR215).
PR-#215

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
PR Compliance ID 4 requires tests to force or isolate scalar fallback allocation behavior
independent of host SIMD. The diff shows the test’s SIMD-dispatch-dependent skip logic was removed,
but no mechanism was added to force scalar fallback execution on SIMD-capable hosts.
Tests isolate and cover scalar allocation behavior independent of host CPU SIMD capabilities
tests/alloc_free.rs[53-105]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The test suite does not reliably force the scalar fallback rerank path; allocation-free assertions can still run exclusively on SIMD paths depending on the host CPU.

## Issue Context
The compliance requirement is to validate scalar fallback allocation behavior in an automated, host-independent way. Simply removing the skip/gate does not guarantee scalar coverage on AVX2/AVX-512 machines.

## Fix Focus Areas
- tests/alloc_free.rs[53-105]

## Suggested approach
- Choose a `(dim, bits)` that is constructor-valid but intentionally routes to scalar on all x86 tiers (e.g. `bits=2` with `dim` divisible by 4 but NOT by 16, such as `dim=132`).
- Add an assertion like `assert!(!ordvec::subset_rerank_uses_simd(dim, bits));` (behind `feature = "test-utils"`) to ensure the test is truly exercising scalar fallback.
- Keep the warmup + allocation-counter window assertion as-is to validate steady-state zero allocations on the scalar path.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Results up to commit 020882f

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX issues (0) 🔗 Cross-repo conflicts (0) 📜 Skill insights (0)

1. ~~Weak LUT invariant check~~ ✓ Resolved 🐞 Bug ≡ Correctness

Description

build_asym_lut_into now only debug_asserts q_unit.len() == dim and then uses zip, so in
release builds a length mismatch will silently leave part of the LUT zeroed (or ignore extra
q_unit), producing incorrect scores without failing fast. This turns a would-be panic into a
silent mis-scoring failure mode that is harder to detect during future refactors.

Code

src/quant_kernels.rs[R46-50]

+    debug_assert_eq!(q_unit.len(), dim);
+    lut.resize(dim * n_buckets, 0.0);
+    for (&qd, row) in q_unit.iter().zip(lut.chunks_exact_mut(n_buckets)) {
+        for (b, slot) in row.iter_mut().enumerate() {
+            *slot = qd * bucket_centre(b as u8, bits);

Relevance

⭐⭐⭐ High
Repo favors fail-loud release checks over debug_assert for invariants (e.g., made length
preconditions unconditional in PR #19).
PR-#19
PR-#26

ⓘ Recommendations generated based on similar findings in past PRs

Evidence

The code shows a debug_assert (compiled out in release) and a zip() iteration, which will not
panic on length mismatch and will instead stop early, leaving trailing LUT entries at their resized
default 0.0 values.

src/quant_kernels.rs[39-53]
src/quant_kernels.rs[55-75]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`build_asym_lut_into()` relies on `debug_assert_eq!(q_unit.len(), dim)` and then iterates via `q_unit.iter().zip(lut.chunks_exact_mut(n_buckets))`. In release builds, `debug_assert` is disabled and `zip()` truncates to the shorter iterator, so a mismatched `(dim, q_unit.len())` can silently produce a partially-zero LUT and wrong scores.

## Issue Context
This function takes both `dim` and `q_unit` as independent inputs, so a future call site or refactor could desynchronize them. Previously, indexing-style loops would naturally panic on a short slice; the new iterator form removes that fail-fast behavior in release.

## Fix Focus Areas
- src/quant_kernels.rs[39-53]

## Suggested fix
- Replace `debug_assert_eq!(q_unit.len(), dim)` with `assert_eq!(q_unit.len(), dim)` (or otherwise return an error if you prefer non-panicking internal APIs).
- Alternatively, remove the `dim` parameter and derive `dim` from `q_unit.len()` at the boundary, so the function cannot be called with inconsistent shapes.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Test probe leaks into API 🐞 Bug ⚙ Maintainability

Description

SubsetScratch::capacities_with_scalar_lut_for_test is a new pub method compiled into normal
builds (only #[doc(hidden)]), expanding the public API surface with a test-only diagnostic. This
increases the chance downstream code accidentally relies on it and constrains future refactors of
scratch internals.

Code

src/quant.rs[R95-98]

+
+    /// Test-only capacity probe that includes the scalar LUT scratch buffer.
+    #[doc(hidden)]
+    pub fn capacities_with_scalar_lut_for_test(&self) -> (usize, usize, usize, usize, usize) {

Relevance

⭐⭐⭐ High
Team previously accepted feature-gating doc-hidden diagnostic APIs to avoid public surface (PR
#215).
PR-#215

ⓘ Recommendations generated based on similar findings in past PRs

Evidence

The new capacity probe is declared pub without a feature gate, while a similar test-only dispatch
probe in the same file is explicitly gated behind feature = "test-utils", indicating the intended
pattern for keeping diagnostics out of the stable surface.

src/quant.rs[70-106]
src/quant.rs[329-354]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`capacities_with_scalar_lut_for_test()` is a test-only diagnostic but is currently a `pub` method available in all builds (it is only hidden from docs). This unnecessarily expands the public API surface.

## Issue Context
The crate already uses `#[cfg(feature = "test-utils")]` to keep other test/diagnostic surfaces (e.g. `subset_rerank_uses_simd`) out of normal builds.

## Fix Focus Areas
- src/quant.rs[96-106]

## Suggested fix
- Add `#[cfg(feature = "test-utils")]` to `capacities_with_scalar_lut_for_test` (and likely to `capacities_for_test` for consistency), and ensure integration tests that call it are run with `--features test-utils` (as they appear to be already).
- If you want the probe callable without features, consider moving it behind a dedicated, clearly non-stable module/trait to reduce accidental downstream use.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Results up to commit 7d492ea

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX issues (0) 🔗 Cross-repo conflicts (0) 📜 Skill insights (0)

1. ~~b8 LUT shape unchecked~~ ✓ Resolved 🐞 Bug ☼ Reliability

Description

scan_b8_asym_with_lut builds the LUT sized from q_unit.len() but then uses the separate dim
parameter to drive both the scalar scan and the AVX-512 gather path. If a future internal caller
passes mismatched dim/q_unit, the AVX-512 gather kernel can read past the LUT in release builds
because it only debug_asserts the lut.len() == dim * 256 invariant.

Code

src/quant_kernels.rs[R604-616]

+pub(crate) fn scan_b8_asym_with_lut(
+    packed: &[u8],
+    n: usize,
+    dim: usize,
+    q_unit: &[f32],
+    scale: f32,
+    top: &mut TopK,
+    lut: &mut Vec<f32>,
+) {
+    build_b8_asym_lut_into(lut, q_unit);
    #[cfg(target_arch = "x86_64")]
    {
        if is_x86_feature_detected!("avx512f")

Relevance

⭐⭐⭐ High
Team previously accepted replacing debug-only preconditions to prevent release-build OOB/UB in SIMD
paths (PR #19, #228).
PR-#19
PR-#228

ⓘ Recommendations generated based on similar findings in past PRs

Evidence

The new reusable-LUT entry point builds the LUT from q_unit length, not dim, and then calls the
AVX-512 gather kernel which relies on lut.len() == dim * 256 but only checks it with
debug_assert_eq! before using lut.as_ptr() for gathers.

src/quant_kernels.rs[604-633]
src/quant_kernels.rs[169-177]
src/quant_kernels.rs[666-699]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`scan_b8_asym_with_lut` does not validate that `q_unit.len()` matches the `dim` parameter, even though it forwards `dim` to the scan routines and the AVX-512 gather kernel’s safety relies on `lut.len() == dim * 256`.

## Issue Context
- The LUT builder sizes from `q_unit.len()`.
- The AVX-512 gather kernel uses `lut.as_ptr()` and only `debug_assert`s the length invariant, so a mismatch could become an OOB read in release.

## Fix Focus Areas
- Add a hard invariant check early in `scan_b8_asym_with_lut`, e.g. `assert_eq!(q_unit.len(), dim, "...")`, before building/using the LUT.
- (Optional) After building, you may also `debug_assert_eq!(lut.len(), dim * 256)` for clarity.

- src/quant_kernels.rs[604-633]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Fieldnote-Echo · 2026-06-19T16:34:20Z

/agentic_review

Fieldnote-Echo · 2026-06-19T16:36:18Z

/agentic_review

qodo-code-review · 2026-06-19T16:39:27Z

Code review by qodo was updated up to the latest commit 020882f

Fieldnote-Echo · 2026-06-19T16:47:23Z

/agentic_review

qodo-code-review · 2026-06-19T16:51:34Z

Code review by qodo was updated up to the latest commit 7d492ea

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>

Fieldnote-Echo · 2026-06-19T16:57:22Z

/agentic_review

qodo-code-review · 2026-06-19T17:00:45Z

Code review by qodo was updated up to the latest commit 5fd5c7e

gemini-code-assist Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread src/quant_kernels.rs

Comment thread src/quant_kernels.rs

Nelson Spence (Fieldnote-Echo) force-pushed the codex/prerelease-scalar-subset-scratch branch from d916ca8 to f1cd40e Compare June 19, 2026 16:00

Reuse scalar subset rerank LUT scratch

a909ad5

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>

Nelson Spence (Fieldnote-Echo) force-pushed the codex/prerelease-scalar-subset-scratch branch from f1cd40e to a909ad5 Compare June 19, 2026 16:04

Nelson Spence (Fieldnote-Echo) marked this pull request as ready for review June 19, 2026 16:06

Nelson Spence (Fieldnote-Echo) requested a review from Navi Bot (project-navi-bot) as a code owner June 19, 2026 16:06

qodo-code-review Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread src/quant.rs Outdated

Nelson Spence (Fieldnote-Echo) force-pushed the codex/prerelease-scalar-subset-scratch branch from 1e74798 to 020882f Compare June 19, 2026 16:35

Nelson Spence (Fieldnote-Echo) force-pushed the codex/prerelease-scalar-subset-scratch branch from 020882f to 7d492ea Compare June 19, 2026 16:47

Fix scalar scratch review findings

5fd5c7e

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>

Nelson Spence (Fieldnote-Echo) force-pushed the codex/prerelease-scalar-subset-scratch branch from 7d492ea to 5fd5c7e Compare June 19, 2026 16:57

project-navi-bot approved these changes Jun 19, 2026

View reviewed changes

Navi Bot (project-navi-bot) merged commit 9b90197 into main Jun 19, 2026
38 checks passed

Navi Bot (project-navi-bot) deleted the codex/prerelease-scalar-subset-scratch branch June 19, 2026 17:06

Uh oh!

Conversation

Fieldnote-Echo commented Jun 19, 2026

Summary

Validation

Uh oh!

codecov Bot commented Jun 19, 2026

Codecov Report

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot commented Jun 19, 2026

Uh oh!

qodo-code-review Bot commented Jun 19, 2026

PR Summary by Qodo

Uh oh!

qodo-code-review Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Previous review results

Uh oh!

Uh oh!

Fieldnote-Echo commented Jun 19, 2026

Uh oh!

Fieldnote-Echo commented Jun 19, 2026

Uh oh!

qodo-code-review Bot commented Jun 19, 2026

Uh oh!

Fieldnote-Echo commented Jun 19, 2026

Uh oh!

qodo-code-review Bot commented Jun 19, 2026

Uh oh!

Fieldnote-Echo commented Jun 19, 2026

Uh oh!

qodo-code-review Bot commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qodo-code-review Bot commented Jun 19, 2026 •

edited

Loading