Skip to content

fix(beir-bench): streaming row-bounded npy loader (--max-docs honored, ~2x less peak RAM)#258

Merged
Nelson Spence (Fieldnote-Echo) merged 3 commits into
Project-Navi:mainfrom
toadkicker:fix/beir-bench-npy-loader
Jun 19, 2026
Merged

fix(beir-bench): streaming row-bounded npy loader (--max-docs honored, ~2x less peak RAM)#258
Nelson Spence (Fieldnote-Echo) merged 3 commits into
Project-Navi:mainfrom
toadkicker:fix/beir-bench-npy-loader

Conversation

@toadkicker

Copy link
Copy Markdown
Collaborator

What

load_npy_f32 in benchmarks/beir-bench had two issues that make large scaling sweeps slow and memory-hungry:

  1. ~2× memory peakstd::fs::read loads the whole file into a Vec<u8>, then a second full Vec<f32> is allocated while the bytes are still alive. On an 8.8M × 1024 f32 corpus that's ~72 GB peak.
  2. --max-docs ignored at load — the full corpus was read off disk and only sliced afterward, so every sub-sampled point in a --max-docs scaling sweep paid the full read + single-threaded byte-by-byte parse.

Change

  • New load_npy_f32_rows(path, max_rows): seeks past the header and reads only the kept rows, then parses the payload in parallel (rayon) directly into the output Vec<f32> — no intermediate full-size buffer. Peak ≈ 1× kept data instead of 2× whole file.
  • load_npy_f32 keeps its signature (max_rows = None); the corpus loader passes Some(n_docs) so --max-docs bounds the disk read.
  • Added npy_row_count (header-only read) for the corpus_ids length assertion.

Single-file change, no API/behavior change for the full-corpus path.

Verification

  • cargo build --release -p beir-bench clean.
  • --max-docs 5000 now loads exactly 5000 rows (n_docs=5000 (sub-sampled)); full-corpus run unchanged.
  • Used to run a 100k→2M MS MARCO scaling sweep that previously hit the memory/parse wall.

🤖 Generated with Claude Code

…, ~2x less peak RAM)

load_npy_f32 had two issues that made large scaling sweeps painful:
- ~2x memory peak: std::fs::read pulled the whole file into a Vec<u8>, then a
  second full Vec<f32> was allocated while the bytes were still alive (~72GB peak
  on an 8.8M x 1024 f32 corpus).
- --max-docs was ignored at load time: the full corpus was read off disk and only
  sliced afterward, so every sub-sampled point in a scaling sweep paid the full
  read + single-threaded parse.

New load_npy_f32_rows(path, max_rows) seeks past the header and reads ONLY the kept
rows, parses the payload in parallel (rayon) directly into the output Vec<f32> with
no intermediate full-size copy. Added npy_row_count (header-only) for the
corpus_ids length assertion. load_npy_f32 keeps its signature (max_rows=None).

Verified: builds clean; --max-docs 5000 loads exactly 5000 rows; full-corpus path
unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Todd Baur <todd@baursoftware.com>
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@Fieldnote-Echo

Copy link
Copy Markdown
Member

/agentic_review

@qodo-code-review

qodo-code-review Bot commented Jun 19, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX issues (0) 🔗 Cross-repo conflicts (0) 📜 Skill insights (0)

Grey Divider


Remediation recommended

1. Extra buffer doubles RAM ✓ Resolved 🐞 Bug ➹ Performance
Description
load_npy_f32_rows allocates a full-size Vec<u8> payload buffer (raw) and then a same-sized
Vec<f32> output buffer (out), so peak memory is still ~2× the loaded rows (not ~1× as the
comment/doc states). This can still OOM for large --max-docs values and defeats the documented
memory optimization.
Code

benchmarks/beir-bench/src/main.rs[R235-243]

+    let mut raw = vec![0u8; n * dim * 4];
+    f.read_exact(&mut raw)
+        .unwrap_or_else(|e| panic!("read npy payload {path}: {e}"));
+    // parallel parse, no second full-size buffer beyond the f32 output
+    let mut out = vec![0.0f32; n * dim];
+    out.par_iter_mut().enumerate().for_each(|(i, v)| {
+        let c = &raw[i * 4..i * 4 + 4];
+        *v = f32::from_le_bytes([c[0], c[1], c[2], c[3]]);
+    });
Relevance

⭐⭐⭐ High

Team has accepted memory/allocation optimizations in beir-bench; misleading “no extra buffer” claim
likely fixed or refactored.

PR-#244
PR-#237

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The function comment claims there is no intermediate Vec<u8> and ~1× peak memory, but the
implementation allocates raw (bytes) and out (f32) for the same payload size at the same time,
making peak memory ~2× the kept rows.

benchmarks/beir-bench/src/main.rs[187-244]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`load_npy_f32_rows` currently allocates `raw: Vec<u8>` of size `n*dim*4` and then `out: Vec<f32>` of size `n*dim`, which keeps two full copies of the kept payload in memory at once. This contradicts the function comment (“no intermediate full `Vec<u8>` copy”) and keeps peak RAM at ~2× kept data.

### Issue Context
The function already validates `<f4`, C-order, and computes `(n_full, dim)` from the header. The goal is to avoid the intermediate `raw` buffer while preserving correctness.

### Fix Focus Areas
- benchmarks/beir-bench/src/main.rs[187-244]

### Suggested fix approach
- Allocate `out: Vec<f32>` first.
- Read the payload directly into `out`’s backing memory by viewing it as `&mut [u8]` (requires a small `unsafe` block), e.g. `std::slice::from_raw_parts_mut(out.as_mut_ptr() as *mut u8, out.len()*4)`.
- If `cfg!(target_endian = "big")`, byte-swap in-place (can be done in parallel with rayon); otherwise no per-float parse is needed.
- Update the function doc/comment to match the actual memory behavior if you choose to keep `raw` for simplicity.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Informational

2. Inconsistent NPY version checks ✓ Resolved 🐞 Bug ☼ Reliability
Description
npy_row_count treats any version with major != 1 as having a 4-byte header-length field and may
succeed on v3+ files, while load_npy_f32_rows explicitly rejects non-(v1|v2). Since main() calls
npy_row_count before load_npy_f32_rows, unsupported files can pass the first check and then
panic later with a less direct error path.
Code

benchmarks/beir-bench/src/main.rs[R169-174]

+    let header_len = if pre[6] == 1 {
+        u16::from_le_bytes([pre[8], pre[9]]) as usize
+    } else {
+        f.read_exact(&mut pre[10..12]).unwrap();
+        u32::from_le_bytes([pre[8], pre[9], pre[10], pre[11]]) as usize
+    };
Relevance

⭐⭐⭐ High

Team historically accepts defensive consistency guards (e.g., underflow guard) in beir-bench; likely
align npy version checks.

PR-#237

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
npy_row_count branches solely on pre[6] == 1 and otherwise assumes 4-byte header length, while
load_npy_f32_rows explicitly asserts only v1/v2 are supported; main() uses npy_row_count to
compute n_docs immediately before calling the loader.

benchmarks/beir-bench/src/main.rs[162-174]
benchmarks/beir-bench/src/main.rs[192-209]
benchmarks/beir-bench/src/main.rs[752-760]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`npy_row_count` and `load_npy_f32_rows` validate NPY versions differently: the loader asserts `major == 1 || major == 2`, but `npy_row_count` does not. This creates inconsistent behavior where an unsupported NPY version can be partially accepted and only fail later.

### Issue Context
`main()` now uses `npy_row_count()` to compute `n_docs` prior to loading the corpus.

### Fix Focus Areas
- benchmarks/beir-bench/src/main.rs[162-185]
- benchmarks/beir-bench/src/main.rs[192-209]
- benchmarks/beir-bench/src/main.rs[752-760]

### Suggested fix approach
- Add the same `major == 1 || major == 2` assertion to `npy_row_count`, or
- Factor header parsing/version validation into a shared helper used by both functions so they cannot drift.
- (Optional) Improve `expect(...)` messages in `npy_row_count` to include `{path}` for easier debugging.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@Fieldnote-Echo

Copy link
Copy Markdown
Member

/codereview

@Fieldnote-Echo Nelson Spence (Fieldnote-Echo) merged commit 35d8acc into Project-Navi:main Jun 19, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants