fix(beir-bench): streaming row-bounded npy loader (--max-docs honored, ~2x less peak RAM) by toadkicker · Pull Request #258 · Project-Navi/ordvec

toadkicker · 2026-06-19T20:03:28Z

What

load_npy_f32 in benchmarks/beir-bench had two issues that make large scaling sweeps slow and memory-hungry:

~2× memory peak — std::fs::read loads the whole file into a Vec<u8>, then a second full Vec<f32> is allocated while the bytes are still alive. On an 8.8M × 1024 f32 corpus that's ~72 GB peak.
--max-docs ignored at load — the full corpus was read off disk and only sliced afterward, so every sub-sampled point in a --max-docs scaling sweep paid the full read + single-threaded byte-by-byte parse.

Change

New load_npy_f32_rows(path, max_rows): seeks past the header and reads only the kept rows, then parses the payload in parallel (rayon) directly into the output Vec<f32> — no intermediate full-size buffer. Peak ≈ 1× kept data instead of 2× whole file.
load_npy_f32 keeps its signature (max_rows = None); the corpus loader passes Some(n_docs) so --max-docs bounds the disk read.
Added npy_row_count (header-only read) for the corpus_ids length assertion.

Single-file change, no API/behavior change for the full-corpus path.

Verification

cargo build --release -p beir-bench clean.
--max-docs 5000 now loads exactly 5000 rows (n_docs=5000 (sub-sampled)); full-corpus run unchanged.
Used to run a 100k→2M MS MARCO scaling sweep that previously hit the memory/parse wall.

🤖 Generated with Claude Code

…, ~2x less peak RAM) load_npy_f32 had two issues that made large scaling sweeps painful: - ~2x memory peak: std::fs::read pulled the whole file into a Vec<u8>, then a second full Vec<f32> was allocated while the bytes were still alive (~72GB peak on an 8.8M x 1024 f32 corpus). - --max-docs was ignored at load time: the full corpus was read off disk and only sliced afterward, so every sub-sampled point in a scaling sweep paid the full read + single-threaded parse. New load_npy_f32_rows(path, max_rows) seeks past the header and reads ONLY the kept rows, parses the payload in parallel (rayon) directly into the output Vec<f32> with no intermediate full-size copy. Added npy_row_count (header-only) for the corpus_ids length assertion. load_npy_f32 keeps its signature (max_rows=None). Verified: builds clean; --max-docs 5000 loads exactly 5000 rows; full-corpus path unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Todd Baur <todd@baursoftware.com>

qodo-code-review · 2026-06-19T20:03:34Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

Fieldnote-Echo · 2026-06-19T20:43:21Z

/agentic_review

qodo-code-review · 2026-06-19T20:46:58Z

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX issues (0) 🔗 Cross-repo conflicts (0) 📜 Skill insights (0)

1. ~~Extra buffer doubles RAM~~ ✓ Resolved 🐞 Bug ➹ Performance

Description

load_npy_f32_rows allocates a full-size Vec<u8> payload buffer (raw) and then a same-sized
Vec<f32> output buffer (out), so peak memory is still ~2× the loaded rows (not ~1× as the
comment/doc states). This can still OOM for large --max-docs values and defeats the documented
memory optimization.

Code

benchmarks/beir-bench/src/main.rs[R235-243]

+    let mut raw = vec![0u8; n * dim * 4];
+    f.read_exact(&mut raw)
+        .unwrap_or_else(|e| panic!("read npy payload {path}: {e}"));
+    // parallel parse, no second full-size buffer beyond the f32 output
+    let mut out = vec![0.0f32; n * dim];
+    out.par_iter_mut().enumerate().for_each(|(i, v)| {
+        let c = &raw[i * 4..i * 4 + 4];
+        *v = f32::from_le_bytes([c[0], c[1], c[2], c[3]]);
+    });

Relevance

⭐⭐⭐ High
Team has accepted memory/allocation optimizations in beir-bench; misleading “no extra buffer” claim
likely fixed or refactored.
PR-#244
PR-#237

ⓘ Recommendations generated based on similar findings in past PRs

Evidence

The function comment claims there is no intermediate Vec<u8> and ~1× peak memory, but the
implementation allocates raw (bytes) and out (f32) for the same payload size at the same time,
making peak memory ~2× the kept rows.

benchmarks/beir-bench/src/main.rs[187-244]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`load_npy_f32_rows` currently allocates `raw: Vec<u8>` of size `n*dim*4` and then `out: Vec<f32>` of size `n*dim`, which keeps two full copies of the kept payload in memory at once. This contradicts the function comment (“no intermediate full `Vec<u8>` copy”) and keeps peak RAM at ~2× kept data.

### Issue Context
The function already validates `<f4`, C-order, and computes `(n_full, dim)` from the header. The goal is to avoid the intermediate `raw` buffer while preserving correctness.

### Fix Focus Areas
- benchmarks/beir-bench/src/main.rs[187-244]

### Suggested fix approach
- Allocate `out: Vec<f32>` first.
- Read the payload directly into `out`’s backing memory by viewing it as `&mut [u8]` (requires a small `unsafe` block), e.g. `std::slice::from_raw_parts_mut(out.as_mut_ptr() as *mut u8, out.len()*4)`.
- If `cfg!(target_endian = "big")`, byte-swap in-place (can be done in parallel with rayon); otherwise no per-float parse is needed.
- Update the function doc/comment to match the actual memory behavior if you choose to keep `raw` for simplicity.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~Inconsistent NPY version checks~~ ✓ Resolved 🐞 Bug ☼ Reliability

Description

npy_row_count treats any version with major != 1 as having a 4-byte header-length field and may
succeed on v3+ files, while load_npy_f32_rows explicitly rejects non-(v1|v2). Since main() calls
npy_row_count before load_npy_f32_rows, unsupported files can pass the first check and then
panic later with a less direct error path.

Code

benchmarks/beir-bench/src/main.rs[R169-174]

+    let header_len = if pre[6] == 1 {
+        u16::from_le_bytes([pre[8], pre[9]]) as usize
+    } else {
+        f.read_exact(&mut pre[10..12]).unwrap();
+        u32::from_le_bytes([pre[8], pre[9], pre[10], pre[11]]) as usize
+    };

Relevance

⭐⭐⭐ High
Team historically accepts defensive consistency guards (e.g., underflow guard) in beir-bench; likely
align npy version checks.
PR-#237

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
npy_row_count branches solely on pre[6] == 1 and otherwise assumes 4-byte header length, while
load_npy_f32_rows explicitly asserts only v1/v2 are supported; main() uses npy_row_count to
compute n_docs immediately before calling the loader.
benchmarks/beir-bench/src/main.rs[162-174]
benchmarks/beir-bench/src/main.rs[192-209]
benchmarks/beir-bench/src/main.rs[752-760]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`npy_row_count` and `load_npy_f32_rows` validate NPY versions differently: the loader asserts `major == 1 || major == 2`, but `npy_row_count` does not. This creates inconsistent behavior where an unsupported NPY version can be partially accepted and only fail later.

### Issue Context
`main()` now uses `npy_row_count()` to compute `n_docs` prior to loading the corpus.

### Fix Focus Areas
- benchmarks/beir-bench/src/main.rs[162-185]
- benchmarks/beir-bench/src/main.rs[192-209]
- benchmarks/beir-bench/src/main.rs[752-760]

### Suggested fix approach
- Add the same `major == 1 || major == 2` assertion to `npy_row_count`, or
- Factor header parsing/version validation into a shared helper used by both functions so they cannot drift.
- (Optional) Improve `expect(...)` messages in `npy_row_count` to include `{path}` for easier debugging.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Fieldnote-Echo · 2026-06-19T21:02:02Z

/codereview

Todd Baur (toadkicker) requested review from Nelson Spence (Fieldnote-Echo) and Navi Bot (project-navi-bot) as code owners June 19, 2026 20:03

Merge branch 'main' into fix/beir-bench-npy-loader

7f0ff61

Fix BEIR NPY row loader review findings

0f8554e

project-navi-bot approved these changes Jun 19, 2026

View reviewed changes

Fieldnote-Echo approved these changes Jun 19, 2026

View reviewed changes

Nelson Spence (Fieldnote-Echo) merged commit 35d8acc into Project-Navi:main Jun 19, 2026
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(beir-bench): streaming row-bounded npy loader (--max-docs honored, ~2x less peak RAM)#258

fix(beir-bench): streaming row-bounded npy loader (--max-docs honored, ~2x less peak RAM)#258
Nelson Spence (Fieldnote-Echo) merged 3 commits into
Project-Navi:mainfrom
toadkicker:fix/beir-bench-npy-loader

toadkicker commented Jun 19, 2026

Uh oh!

qodo-code-review Bot commented Jun 19, 2026

Uh oh!

Fieldnote-Echo commented Jun 19, 2026

Uh oh!

qodo-code-review Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

Fieldnote-Echo commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

toadkicker commented Jun 19, 2026

What

Change

Verification

Uh oh!

qodo-code-review Bot commented Jun 19, 2026

Qodo reviews are paused for this user.

Uh oh!

Fieldnote-Echo commented Jun 19, 2026

Uh oh!

qodo-code-review Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

Fieldnote-Echo commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qodo-code-review Bot commented Jun 19, 2026 •

edited

Loading