feat(beir-bench): --ef-search flag for HNSW (was hardcoded const) by toadkicker · Pull Request #265 · Project-Navi/ordvec

toadkicker · 2026-06-19T23:00:36Z

What

HNSW ef_search was a compile-time const (128), so the harness only ever measured HNSW at one fixed recall/latency operating point. This exposes it as a --ef-search CLI flag (default unchanged at 128).

Why

To compare ordvec vs HNSW fairly, you need HNSW's own recall-vs-latency Pareto, not a single point. --ef-search lets you sweep it alongside ordvec's --candidates sweep. On fiqa (57k, single-thread): ef=16 → p50 0.32ms / nDCG@10 0.456; ef=256 → 2.43ms / 0.470 — the knob traces HNSW's frontier as intended, and the two methods land co-Pareto (ordvec ahead at the fast-cheap end).

Scope

Single file, +13/−4. No behavior change at the default. Removes the now-unused HNSW_EF_SEARCH const; threads cfg.ef_search into both the serial and batched search calls; reports it in the build log.

🤖 Generated with Claude Code

HNSW ef_search was a compile-time const (128), so the harness ran HNSW at one fixed recall/latency operating point. Expose it as --ef-search so HNSW's own recall-vs-latency Pareto can be swept and compared fairly against ordvec's candidate-count sweep. Default unchanged (128). No other behavior change. Verified: ef=16 -> p50 0.32ms / nDCG 0.456; ef=256 -> 2.43ms / 0.470 on fiqa (the knob traces HNSW's frontier as intended). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Todd Baur <todd@baursoftware.com>

qodo-code-review · 2026-06-19T23:00:43Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc4c26f539

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

Two issues raised on the --ef-search flag: 1. ef-sweep emitted all points under the fixed 'hnsw' slug with no ef field, so topk/summary/timing rows collided (full-corpus overwrite; timing dedup by method/n_docs) and the recall/latency frontier couldn't be reconstructed. Fix: slug is now 'hnsw_ef{N}' so every operating point is recorded distinctly. 2. hnsw_rs clamps ef=max(ef,knbn) internally, so --ef-search below --top-k would be silently bumped, flattening the low end of a sweep. Fix: clamp ef_search up to top_k explicitly with a warning, so the recorded ef matches what actually ran. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Todd Baur <todd@baursoftware.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc7753a757

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

github-actions

Taste Rating: 🟢 Good taste — Clean, focused change that exactly solves the stated problem.

The ef-sweep validation (lines 147-159) correctly prevents silent clamping by hnsw_rs, with a clear warning. Threading cfg.ef_search into both search paths (serial and parallel) is correct. The slug format hnsw_ef{value} ensures distinct records per operating point.

One note for maintainers: benchmarks/beir/beir_plot.py hardcodes "hnsw" in METHOD_STYLE and will only plot records matching that exact slug. When sweeping --ef-search, results will be recorded as hnsw_ef128, hnsw_ef256, etc., which the current plot script won't recognize. This is out-of-scope for this single-file PR but worth tracking for the sweep workflow.

Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/Project-Navi/ordvec/actions/runs/27853997793

chatgpt-codex-connector · 2026-06-20T00:22:53Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d00313be6b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".

chatgpt-codex-connector · 2026-06-20T00:34:50Z

-    eprintln!("  building HNSW M={HNSW_M} ef_c={HNSW_EF_CONSTRUCTION} ({n_docs} docs) ...");
+    // ef in the slug so an ef-sweep does not overwrite topk/summary/timing rows
+    // (each operating point on the recall/latency frontier is recorded distinctly).
+    let slug = format!("hnsw_ef{}", cfg.ef_search);


Remove stale default HNSW artifacts

When rerunning the standard Makefile targets in a results directory that already contains the old default hnsw.topk.jsonl/hnsw.summary.json, this new default slug writes hnsw_ef128.* instead of overwriting those files. beir_eval.discover_runs() globs every *.topk.jsonl, and bench-beir-quality/bench-beir-scaling only remove timing.jsonl, so the regenerated report includes a stale hnsw row alongside the new hnsw_ef128 row. Please remove/migrate the legacy HNSW artifacts when emitting the ef-qualified slug, or keep the default output name and record ef separately.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-20T00:34:50Z

+    slugs = {r["method"] for r in records if "method" in r}
+    return sorted(slugs, key=_method_order_key)


Filter probe methods out of default plots

Because this now returns every method slug present in an append-only timing.jsonl, plot_scaling and plot_bars no longer restrict the README figures to METHOD_STYLE/BENCH_METHODS. If the file contains a prior probe run such as sign-rq2-threaded (which the Makefile explicitly excludes from committed figures until reviewed), the plotter will render that stale/probe method with the gray fallback instead of only adding ef-qualified HNSW variants. Please filter to the supported/default families while normalizing hnsw_ef* to the HNSW family.

Useful? React with 👍 / 👎.

Todd Baur (toadkicker) requested review from Nelson Spence (Fieldnote-Echo) and Navi Bot (project-navi-bot) as code owners June 19, 2026 23:00

chatgpt-codex-connector Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread benchmarks/beir-bench/src/main.rs

Comment thread benchmarks/beir-bench/src/main.rs

chatgpt-codex-connector Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread benchmarks/beir-bench/src/main.rs

Nelson Spence (Fieldnote-Echo) added the review-this Trigger OpenHands PR review label Jun 20, 2026

github-actions Bot reviewed Jun 20, 2026

View reviewed changes

Fix BEIR HNSW ef sweep consumers

c06270e

Merge branch 'main' into feat/beir-bench-ef-search-flag

d00313b

project-navi-bot approved these changes Jun 20, 2026

View reviewed changes

Fieldnote-Echo approved these changes Jun 20, 2026

View reviewed changes

Nelson Spence (Fieldnote-Echo) merged commit ae7cf65 into Project-Navi:main Jun 20, 2026
26 checks passed

chatgpt-codex-connector Bot reviewed Jun 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(beir-bench): --ef-search flag for HNSW (was hardcoded const)#265

feat(beir-bench): --ef-search flag for HNSW (was hardcoded const)#265
Nelson Spence (Fieldnote-Echo) merged 4 commits into
Project-Navi:mainfrom
toadkicker:feat/beir-bench-ef-search-flag

toadkicker commented Jun 19, 2026

Uh oh!

qodo-code-review Bot commented Jun 19, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

chatgpt-codex-connector Bot commented Jun 20, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 20, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		slugs = {r["method"] for r in records if "method" in r}
		return sorted(slugs, key=_method_order_key)

Uh oh!

Conversation

toadkicker commented Jun 19, 2026

What

Why

Scope

Uh oh!

qodo-code-review Bot commented Jun 19, 2026

Qodo reviews are paused for this user.

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot commented Jun 20, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants