Skip to content

feat(beir-bench): --ef-search flag for HNSW (was hardcoded const)#265

Merged
Nelson Spence (Fieldnote-Echo) merged 4 commits into
Project-Navi:mainfrom
toadkicker:feat/beir-bench-ef-search-flag
Jun 20, 2026
Merged

feat(beir-bench): --ef-search flag for HNSW (was hardcoded const)#265
Nelson Spence (Fieldnote-Echo) merged 4 commits into
Project-Navi:mainfrom
toadkicker:feat/beir-bench-ef-search-flag

Conversation

@toadkicker

Copy link
Copy Markdown
Collaborator

What

HNSW ef_search was a compile-time const (128), so the harness only ever measured HNSW at one fixed recall/latency operating point. This exposes it as a --ef-search CLI flag (default unchanged at 128).

Why

To compare ordvec vs HNSW fairly, you need HNSW's own recall-vs-latency Pareto, not a single point. --ef-search lets you sweep it alongside ordvec's --candidates sweep. On fiqa (57k, single-thread): ef=16 → p50 0.32ms / nDCG@10 0.456; ef=256 → 2.43ms / 0.470 — the knob traces HNSW's frontier as intended, and the two methods land co-Pareto (ordvec ahead at the fast-cheap end).

Scope

Single file, +13/−4. No behavior change at the default. Removes the now-unused HNSW_EF_SEARCH const; threads cfg.ef_search into both the serial and batched search calls; reports it in the build log.

🤖 Generated with Claude Code

HNSW ef_search was a compile-time const (128), so the harness ran HNSW at one
fixed recall/latency operating point. Expose it as --ef-search so HNSW's own
recall-vs-latency Pareto can be swept and compared fairly against ordvec's
candidate-count sweep. Default unchanged (128). No other behavior change.

Verified: ef=16 -> p50 0.32ms / nDCG 0.456; ef=256 -> 2.43ms / 0.470 on fiqa
(the knob traces HNSW's frontier as intended).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Todd Baur <todd@baursoftware.com>
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc4c26f539

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

Comment thread benchmarks/beir-bench/src/main.rs
Comment thread benchmarks/beir-bench/src/main.rs
Two issues raised on the --ef-search flag:
1. ef-sweep emitted all points under the fixed 'hnsw' slug with no ef field, so
   topk/summary/timing rows collided (full-corpus overwrite; timing dedup by
   method/n_docs) and the recall/latency frontier couldn't be reconstructed.
   Fix: slug is now 'hnsw_ef{N}' so every operating point is recorded distinctly.
2. hnsw_rs clamps ef=max(ef,knbn) internally, so --ef-search below --top-k would be
   silently bumped, flattening the low end of a sweep. Fix: clamp ef_search up to
   top_k explicitly with a warning, so the recorded ef matches what actually ran.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Todd Baur <todd@baursoftware.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc7753a757

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

Comment thread benchmarks/beir-bench/src/main.rs
@Fieldnote-Echo Nelson Spence (Fieldnote-Echo) added the review-this Trigger OpenHands PR review label Jun 20, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟢 Good taste — Clean, focused change that exactly solves the stated problem.

The ef-sweep validation (lines 147-159) correctly prevents silent clamping by hnsw_rs, with a clear warning. Threading cfg.ef_search into both search paths (serial and parallel) is correct. The slug format hnsw_ef{value} ensures distinct records per operating point.

One note for maintainers: benchmarks/beir/beir_plot.py hardcodes "hnsw" in METHOD_STYLE and will only plot records matching that exact slug. When sweeping --ef-search, results will be recorded as hnsw_ef128, hnsw_ef256, etc., which the current plot script won't recognize. This is out-of-scope for this single-file PR but worth tracking for the sweep workflow.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/Project-Navi/ordvec/actions/runs/27853997793

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@Fieldnote-Echo Nelson Spence (Fieldnote-Echo) merged commit ae7cf65 into Project-Navi:main Jun 20, 2026
26 checks passed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d00313be6b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".

eprintln!(" building HNSW M={HNSW_M} ef_c={HNSW_EF_CONSTRUCTION} ({n_docs} docs) ...");
// ef in the slug so an ef-sweep does not overwrite topk/summary/timing rows
// (each operating point on the recall/latency frontier is recorded distinctly).
let slug = format!("hnsw_ef{}", cfg.ef_search);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove stale default HNSW artifacts

When rerunning the standard Makefile targets in a results directory that already contains the old default hnsw.topk.jsonl/hnsw.summary.json, this new default slug writes hnsw_ef128.* instead of overwriting those files. beir_eval.discover_runs() globs every *.topk.jsonl, and bench-beir-quality/bench-beir-scaling only remove timing.jsonl, so the regenerated report includes a stale hnsw row alongside the new hnsw_ef128 row. Please remove/migrate the legacy HNSW artifacts when emitting the ef-qualified slug, or keep the default output name and record ef separately.

Useful? React with 👍 / 👎.

Comment on lines +93 to +94
slugs = {r["method"] for r in records if "method" in r}
return sorted(slugs, key=_method_order_key)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Filter probe methods out of default plots

Because this now returns every method slug present in an append-only timing.jsonl, plot_scaling and plot_bars no longer restrict the README figures to METHOD_STYLE/BENCH_METHODS. If the file contains a prior probe run such as sign-rq2-threaded (which the Makefile explicitly excludes from committed figures until reviewed), the plotter will render that stale/probe method with the gray fallback instead of only adding ef-qualified HNSW variants. Please filter to the supported/default families while normalizing hnsw_ef* to the HNSW family.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review-this Trigger OpenHands PR review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants