Centralize shared utilities across benchmark backends by jnke2016 · Pull Request #2040 · rapidsai/cuvs

jnke2016 · 2026-04-25T18:04:34Z

Summary

This PR centralizes duplicated logic identified during review of the OpenSearch and Elasticsearch backend PRs. Both backends independently reimplemented common operations (vector file loading, YAML parsing, dataset lookup, DatasetConfig construction, parameter expansion, recall computation) that should be shared across all backends. This PR moves that shared logic into the framework so that backend authors focus only on backend-specific concerns. It also routes the CLI through the new pluggable benchmark API and restores a fully CLI-based workflow.

Motivation

With the pluggable backend architecture, each new Python-native backend needs to load binary vector files, parse dataset YAML configs, look up datasets by name, expand parameter combinations, and compute recall. Without centralized utilities, each backend reimplements these independently, leading to:

Format drift (e.g., OpenSearch supports 5 binary formats while Elasticsearch only supports 2)
If a bug is found or a general change is needed, every backend needs to be updated individually instead of fixing it in one place
Unnecessary code for new backend authors to write

Changes

1. Shared vector loading utility (backends/utils.py)

Added load_vectors() and dtype_from_filename() (originating from generate_groundtruth/utils.py) for reading binary vector files. Supports all formats (.fbin, .f16bin, .u8bin, .i8bin, .ibin) with subset_size support and input validation (unsupported extensions, truncated headers/payloads). The C++ backend is unaffected since it passes file paths to the subprocess.

2. Promoted config-loading methods to base ConfigLoader class

Moved load_yaml_file, get_dataset_configuration, and gather_algorithm_configs from CppGBenchConfigLoader to the base ConfigLoader class. These were being reimplemented by both the OpenSearch and Elasticsearch config loaders. They are now inherited automatically. Also added a warning for invalid algorithm_configuration paths (previously silently ignored). Switched gather_algorithm_configs from exclusion-based filtering to .yaml/.yml inclusion.

3. Dataset refactored from dataclass to regular class with transparent vector loading

The Dataset class now transparently loads vectors from file paths on first access via properties. Python-native backends just access dataset.base_vectors and get a numpy array regardless of whether vectors were provided directly or loaded from disk. The C++ backend continues to use dataset.base_file only, never triggering any loading.

4. ConfigLoader template method

ConfigLoader.load() is now a concrete method that handles the shared steps (loading dataset YAML, finding the dataset by name, constructing DatasetConfig with path resolution via build_dataset_config()) then delegates to the abstract _build_benchmark_configs() for backend-specific logic. Backend authors implement _build_benchmark_configs() and receive the DatasetConfig already built. They cannot skip or reimplement the shared steps. The orchestrator is unchanged.

5. Centralized parameter expansion (backends/utils.py)

Added expand_param_grid() for expanding YAML parameter specs (dict-of-lists to list-of-dicts via Cartesian product). All backends use it transparently. The C++ backend uses it in prepare_indexes() (replacing inline itertools.product) and then applies its own constraint validation on the expanded combinations. Python-native backends use it directly without filtering.

6. Recall computation moved to orchestrator

Added compute_recall() to backends/utils.py. The orchestrator now computes recall transparently after backend.search() returns, for any backend that returns actual neighbor arrays. The C++ backend computes recall in the subprocess and returns empty neighbors, so this is naturally skipped for it. Backends no longer need to compute recall themselves.

7. CLI routed through orchestrator

Updated run/__main__.py to call BenchmarkOrchestrator directly instead of the legacy run.py. Added new CLI flags: --mode (sweep/tune), --constraints (tune mode targets), --n-trials (Optuna trials), and --backend-config (YAML config for non-C++ backends). The deprecation warning has been removed. The legacy run.py and runners.py are kept as deprecated for now.

8. Restored CLI-based docs

Updated docs/source/cuvs_bench/index.rst to use CLI commands for step 2 (build and search) instead of Python BenchmarkOrchestrator code blocks. All four steps (prepare, build/search, export, plot) are now fully CLI-based.

9. New and updated tests

tests/test_utils.py (45 tests): covers dtype_from_filename, load_vectors, Dataset transparent loading, base ConfigLoader methods, expand_param_grid, and compute_recall
tests/test_cli.py (5 new tests): covers --mode, --backend-config, --n-trials, --constraints flags and invalid backend config handling

Remaining work

Consider consolidating pre-existing I/O duplicates across generate_groundtruth/utils.py, get_dataset/fbin_to_f16bin.py, and get_dataset/hdf5_to_fbin.py into a top-level cuvs_bench/io.py module
Streaming write capability (incremental batches without knowing total size upfront)
Remove run.py and runners.py once deprecated period is over

…ing across Python-native backends

…nfiguration, gather_algorithm_configs) from CppGBenchConfigLoader to base ConfigLoader class.

copy-pr-bot · 2026-04-25T18:04:54Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-04-25T18:08:48Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- New CLI options for sweep/tune modes, constraints, trial count, and YAML backend config; automatic recall computation added to search results.
- Backend YAML support and improved C++ executable discovery via optional executable directory.
Refactor
- Dataset object now lazily loads vectors/ground-truth to reduce memory use.
- Config loader refactored to share common loading helpers and a template hook.
Documentation
- Examples updated to use the CLI workflow.
Tests
- Extensive new unit and CLI tests for utilities, loading, recall, and registry behaviors.

Walkthrough

Adds shared backend utilities (dtype inference, binary vector loader, param-grid expansion, recall computation); converts Dataset to lazy file-backed loading; refactors config loading into a template-method for C++ loader; wires recall computation into the orchestrator; updates CLI for sweep/tune/backends and adjusts tests/docs accordingly.

Changes

Backend utilities, Dataset lazy-loading, and tests

Layer / File(s)	Summary
Data Shape / helpers `python/cuvs_bench/cuvs_bench/backends/utils.py`	Adds `dtype_from_filename`, `load_vectors`, `expand_param_grid`, and `compute_recall` for dtype inference, big-ann-bench binary parsing (with subset support and validation), Cartesian-product expansion, and average recall@k computation.
Core model `python/cuvs_bench/cuvs_bench/backends/base.py`	Replaces `Dataset` dataclass with a class that accepts optional in-memory arrays or file paths and lazily loads `base_vectors`, `query_vectors`, and `groundtruth_neighbors` on first access; preserves `dims`, `n_base`, `n_queries` as properties.
Tests `python/cuvs_bench/cuvs_bench/tests/test_utils.py`	Adds comprehensive unit tests for dtype mapping, `load_vectors` formats/error cases, Dataset lazy-loading semantics and caching, `expand_param_grid`, and `compute_recall`.

Config-loader template refactor and C++ loader updates

Layer / File(s)	Summary
Shared YAML helpers `python/cuvs_bench/cuvs_bench/orchestrator/config_loaders.py`	Adds `ConfigLoader.load()` template method and shared helpers: `load_yaml_file`, `get_dataset_configuration`, `build_dataset_config`, and `gather_algorithm_configs` to centralize YAML/dataset/algorithm discovery.
Core C++ loader `python/cuvs_bench/cuvs_bench/orchestrator/config_loaders.py`	Refactors `CppGBenchConfigLoader` to implement `_build_benchmark_configs(...)` hook and to use inherited helpers for loading dataset and algorithm YAMLs and expanding parameter grids.
Executable discovery / wiring `python/cuvs_bench/cuvs_bench/orchestrator/config_loaders.py`	Updates executable discovery API to accept an optional `executable_dir` (propagated through `_prepare_benchmark_configs`, `find_executable`, and `get_build_path`) and prefers that directory when locating build artifacts.
Tests / docs `python/cuvs_bench/cuvs_bench/tests/test_utils.py`, `docs/source/cuvs_bench/index.rst`	Adds tests for YAML loading and dataset lookup; updates docs examples to use the CLI `python -m cuvs_bench.run ... --build --search`.

Orchestrator/CLI behavior and registry test updates

Layer / File(s)	Summary
Orchestrator recall wiring `python/cuvs_bench/cuvs_bench/orchestrator/orchestrator.py`	Imports `compute_recall` and assigns `SearchResult.recall` after backend `search(...)` in both sweep and tune flows when search returns neighbors and ground-truth is available; removes module-level `run_benchmark` wrapper.
CLI / entrypoint `python/cuvs_bench/cuvs_bench/run/__main__.py`	Adds CLI options `--mode`, `--constraints`, `--n-trials`, and `--backend-config`; when not exporting data, parses optional backend YAML, instantiates `BenchmarkOrchestrator`, and calls `orchestrator.run_benchmark(...)` with explicit tuning parameters instead of the deprecated wrapper.
Registry tests / backend fixtures `python/cuvs_bench/cuvs_bench/tests/test_registry.py`, `python/cuvs_bench/cuvs_bench/tests/test_cli.py`	Updates test dummy backends to use an `algo` property and new `build`/`search` signatures that accept `indexes` and `dry_run`; updates registry calls to pass `config={"name": ...}` and adds CLI parsing tests for the new flags.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main objective of the PR: centralizing duplicated utilities across benchmark backends.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, providing clear motivation, implementation details, and context for all major changes across multiple files.
Docstring Coverage	✅ Passed	Docstring coverage is 91.76% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Review rate limit: 9/10 reviews remaining, refill in 6 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuvs_bench/cuvs_bench/backends/utils.py`:
- Around line 57-58: The code currently assigns dtype = _DTYPE_FOR_EXT.get(ext,
np.float32) which silently defaults unknown file extensions to float32; change
this to fail fast by looking up ext in _DTYPE_FOR_EXT and raising a clear
ValueError (including ext and path) when not found instead of defaulting. Update
the logic around ext and dtype (the ext variable and _DTYPE_FOR_EXT lookup) so
unknown suffixes raise an exception with a descriptive message to prevent
corrupted benchmark inputs.
- Around line 60-66: The code that reads binary matrices (reading n_rows/n_cols
and payload) must validate header and payload lengths and ensure subset_size is
non-negative and within bounds: check that f.read(4) calls returned 4 bytes
before converting to uint32 and raise a ValueError if not; validate subset_size
>= 0 and if provided clamp it to n_rows but error if it's > n_rows; compute
expected_bytes = n_rows * n_cols * np.dtype(dtype).itemsize and verify f.read
returns exactly that many bytes (or at least enough for the requested subset)
before calling np.frombuffer, raising a clear ValueError if sizes mismatch;
update the logic around n_rows, n_cols, subset_size, dtype, np.frombuffer and
f.read to use these checks so malformed files produce actionable errors rather
than cryptic crashes.

In `@python/cuvs_bench/cuvs_bench/orchestrator/config_loaders.py`:
- Around line 234-243: The code silently ignores invalid algorithm_configuration
values when they are neither files nor directories; update the block handling
algorithm_configuration (the variable algorithm_configuration that populates
algos_conf_fs) to detect the invalid path and raise a clear exception (e.g.,
raise ValueError) or log-and-raise with the provided path and its type so
callers know the override is invalid instead of silently falling back to
defaults; implement this in the same conditional after the
os.path.isdir/os.path.isfile checks where algos_conf_fs is modified.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4d1822e3-7adb-4f35-a054-0962f282eec1

📥 Commits

Reviewing files that changed from the base of the PR and between f2bffb6 and 6e91f0e.

📒 Files selected for processing (2)

python/cuvs_bench/cuvs_bench/backends/utils.py
python/cuvs_bench/cuvs_bench/orchestrator/config_loaders.py

coderabbitai · 2026-04-25T18:08:52Z

+        if algorithm_configuration:
+            if os.path.isdir(algorithm_configuration):
+                algos_conf_fs += [
+                    os.path.join(algorithm_configuration, f)
+                    for f in os.listdir(algorithm_configuration)
+                    if ".json" not in f
+                ]
+            elif os.path.isfile(algorithm_configuration):
+                algos_conf_fs.append(algorithm_configuration)
+        return algos_conf_fs


⚠️ Potential issue | 🟠 Major

Do not silently ignore invalid algorithm_configuration paths.

Line 234 accepts an override input, but when it is neither a file nor a directory, the code currently falls back silently to defaults.

Proposed fix

if algorithm_configuration: if os.path.isdir(algorithm_configuration): algos_conf_fs += [ os.path.join(algorithm_configuration, f) for f in os.listdir(algorithm_configuration) if ".json" not in f ] elif os.path.isfile(algorithm_configuration): algos_conf_fs.append(algorithm_configuration) + else: + raise FileNotFoundError( + f"algorithm_configuration path does not exist: " + f"{algorithm_configuration}" + ) return algos_conf_fs

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python/cuvs_bench/cuvs_bench/orchestrator/config_loaders.py` around lines 234 - 243, The code silently ignores invalid algorithm_configuration values when they are neither files nor directories; update the block handling algorithm_configuration (the variable algorithm_configuration that populates algos_conf_fs) to detect the invalid path and raise a clear exception (e.g., raise ValueError) or log-and-raise with the provided path and its type so callers know the override is invalid instead of silently falling back to defaults; implement this in the same conditional after the os.path.isdir/os.path.isfile checks where algos_conf_fs is modified.

…idate header/payload size, warn on invalid algorithm config paths.

…tor loading from file paths on first access

…or load_vectors to use it.

…stry.py to match current BenchmarkBackend abstract interface

…ansparent vector loading, and base ConfigLoader inherited methods

…taset lookup, DatasetConfig construction) and delegate backend-specific logic to abstract _build_benchmark_configs().

…sts for Python-native backends.

… user-specified executable path capability.

…-n-trials, and --backend-config flags.

…figs.

…ints flags.

jrbourbeau

Thanks @jnke2016 for moving these to a shared place. Overall LGTM -- just left a couple minor (nonblocking) comments

jrbourbeau · 2026-04-27T15:36:59Z

+        # it was passed in or loaded from disk.
+        self._base_vectors = base_vectors if base_vectors is not None else np.empty((0, 0))
+        self._query_vectors = query_vectors if query_vectors is not None else np.empty((0, 0))
+        self._groundtruth_neighbors = groundtruth_neighbors


Why is _groundtruth_neighbors treated differently here than _base_vectors and _query_vectors?

base_vectors and query_vectors default to None in the constructor but are converted to np.empty((0, 0)) internally so that the dims, n_base, and n_queries properties can safely call .shape and .size without None guards. groundtruth_neighbors stays as None because it's optional (not every dataset has ground truth), and those properties don't depend on it.

However, I can refactor both to match groundtruth_neighbors by defaulting to None instead and introducing a None guard in the dims, n_base, and n_queries properties. This would make all three fields consistent

jrbourbeau · 2026-04-27T15:55:46Z

+            if ".json" not in f
+            and "constraint" not in f
+            and ".py" not in f
+            and "__pycache__" not in f


Are these supposed to be yaml files? If so, checking for that extension (.yaml + .yml) might be better. Maybe the situation is more complex though

Yes those are supposed to be yaml file. The exclusion filter was part of the legacy benchmark scripts (run.py) in gather_algorithm_configs(), which lists all files in the config/algos/ directory and filters out non-config files (__init__.py, __pycache__/, constraints/, .json files) to keep only the .yaml algorithm configuration files. It was ported as-is to the base ConfigLoader in the new API. Your suggestion to check for .yaml/.yml inclusion instead is better since it's more explicit and won't accidentally pick up new non-YAML file types that may be added to the directory in the future. I'll update it.

…parently after backend.search() returns

jrbourbeau

@jnke2016 just checking in here. I see this is marked as a draft -- are there other things you'd still like to include in this PR?

jnke2016 · 2026-05-01T22:18:30Z

@jrbourbeau Yes, there is one remaining change (minor): making the parameter expansion transparent so that the orchestrator handles it and no backend needs to implement. This should be pushed later today

…nclusion.

…all backends use the same expansion logic transparently.

…utils

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (1)

python/cuvs_bench/cuvs_bench/tests/test_registry.py (1)

32-33: ⚡ Quick win

Fail fast when indexes is empty in dummy backend methods.

Both test backends currently return successful results even when indexes is empty, which can hide regressions in caller logic. Raise a clear ValueError in build() and search() instead.

Proposed patch

 class DummyBackend(BenchmarkBackend):
@@
     def build(self, dataset, indexes, force=False, dry_run=False):
-        first = indexes[0] if indexes else None
+        if not indexes:
+            raise ValueError("Expected at least one IndexConfig in 'indexes', got 0")
+        first = indexes[0]
         return BuildResult(
-            index_path=first.file if first else "",
+            index_path=first.file,
@@
-            build_params=first.build_param if first else {},
+            build_params=first.build_param,
             success=True,
         )
@@
         distances = np.random.rand(n_queries, k)
-        first = indexes[0] if indexes else None
+        if not indexes:
+            raise ValueError("Expected at least one IndexConfig in 'indexes', got 0")
+        first = indexes[0]
@@
-            search_params=first.search_params if first else [],
+            search_params=first.search_params,
             success=True,
         )
@@
 class AnotherDummyBackend(BenchmarkBackend):
@@
     def build(self, dataset, indexes, force=False, dry_run=False):
-        first = indexes[0] if indexes else None
+        if not indexes:
+            raise ValueError("Expected at least one IndexConfig in 'indexes', got 0")
+        first = indexes[0]
         return BuildResult(
-            index_path=first.file if first else "",
+            index_path=first.file,
@@
-            build_params=first.build_param if first else {},
+            build_params=first.build_param,
             success=True,
         )
@@
         distances = np.random.rand(n_queries, k)
-        first = indexes[0] if indexes else None
+        if not indexes:
+            raise ValueError("Expected at least one IndexConfig in 'indexes', got 0")
+        first = indexes[0]
@@
-            search_params=first.search_params if first else [],
+            search_params=first.search_params,
             success=True,
         )

As per coding guidelines, "Ensure missing validation does not cause crashes on invalid input through proper size/type checks."

Also applies to: 57-57, 79-80, 103-103

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuvs_bench/cuvs_bench/tests/test_registry.py` around lines 32 - 33,
The dummy backend methods must fail fast when given an empty indexes list:
update the build() and search() methods in the dummy backend class so they
validate indexes is non-empty and raise a clear ValueError (e.g., "indexes must
not be empty") instead of returning success; do the same validation in the other
dummy methods that accept indexes (the other build()/search()-like stubs
referenced in the diff) so that calls to build(), search(), and any related
dummy backend helpers immediately raise ValueError when indexes is empty.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/cuvs_bench/cuvs_bench/backends/base.py`:
- Around line 91-120: The change made base_vectors, query_vectors, and
groundtruth_neighbors read-only, breaking callers that assign to these
attributes; add property setters for base_vectors, query_vectors, and
groundtruth_neighbors that accept a numpy array (or None where appropriate) and
assign to the underlying backing fields (_base_vectors, _query_vectors,
_groundtruth_neighbors) so the lazy-loading logic in the getters stays intact
while preserving assignability; ensure setters validate/convert input types
minimally (e.g., accept np.ndarray or None) and keep existing lazy-load behavior
in the getter methods (base_vectors, query_vectors, groundtruth_neighbors).

In `@python/cuvs_bench/cuvs_bench/backends/utils.py`:
- Around line 149-180: The compute_recall function assumes neighbors and
groundtruth are 2-D arrays with the same number of rows but does not validate
that; add upfront checks in compute_recall to (1) verify neighbors and
groundtruth are numpy ndarrays, (2) verify both have ndim == 2, and (3) verify
groundtruth.shape[0] == n_queries (where n_queries = neighbors.shape[0]); if any
check fails raise a clear ValueError describing the mismatch (mentioning
neighbors.shape and groundtruth.shape) so the benchmark loop fails fast with an
explanatory error rather than raising during indexing.

In `@python/cuvs_bench/cuvs_bench/orchestrator/__init__.py`:
- Line 6: Add a deprecated shim for the old run_benchmark export: import
warnings and import the original run_benchmark from .orchestrator under a
private name (e.g., _run_benchmark), then define a public run_benchmark function
that emits a DeprecationWarning (with a clear message to use
BenchmarkOrchestrator) and forwards all args/kwargs to _run_benchmark; keep
BenchmarkOrchestrator re-export unchanged so both BenchmarkOrchestrator and the
deprecated run_benchmark are available for one release.

In `@python/cuvs_bench/cuvs_bench/orchestrator/config_loaders.py`:
- Around line 142-198: The new _build_benchmark_configs() is currently abstract
which breaks existing ConfigLoader subclasses that only override load(); remove
the `@abstractmethod` decorator and provide a non-abstract default implementation
named _build_benchmark_configs(self, dataset_config, dataset_conf, dataset,
dataset_path, **kwargs) that emits a DeprecationWarning (or uses
warnings.warn(..., DeprecationWarning)) indicating subclasses should implement
the hook and returns an empty list (or reasonable default) so existing
subclasses continue to work until they are migrated; keep the method signature
exactly as in the diff so subclasses that do override it will still be used.

In `@python/cuvs_bench/cuvs_bench/orchestrator/orchestrator.py`:
- Around line 258-268: The recall computation runs even for failed searches and
can raise if a backend returned partial/malformed neighbors; adjust the logic to
check search_result.success before computing or assigning search_result.recall.
Specifically, in the block that inspects search_result.neighbors and
bench_dataset.groundtruth_neighbors, first verify search_result.success is
truthy, then ensure neighbors is non-empty and gt is not None before calling
compute_recall; apply the same change to the analogous block that currently
recomputes recall later in the file (the other
search_result/neighbors/compute_recall block). Ensure you only set
search_result.recall when success is true and compute_recall completes.

In `@python/cuvs_bench/cuvs_bench/run/__main__.py`:
- Around line 264-268: The code currently assumes yaml.safe_load(backend_config)
returns a mapping and silently falls back to "cpp_gbench"; instead validate the
parsed cfg before reading "backend": after yaml.safe_load(f) ensure cfg is a
dict (isinstance(cfg, dict)) and explicitly check that "backend" is present and
of expected type (str); if the file is not a mapping or the "backend" key is
missing/invalid, raise a clear ValueError that states the file must contain a
mapping with a "backend" key and include the actual type/value found; then set
backend_type = cfg.pop("backend") and backend_kwargs = cfg only after validation
(do not silently default to "cpp_gbench").
- Around line 271-278: The call to orchestrator.run_benchmark uses ternaries
(build if build else True, search if search else True) which override explicit
False flags and break build-only/search-only modes; change the arguments to pass
the build and search variables directly (build=build, search=search) so
run_benchmark receives the actual flags passed by the CLI (ensure any earlier
CLI parsing sets these to proper booleans or None as intended) and update any
related defaults where run_benchmark is defined if needed.

In `@python/cuvs_bench/cuvs_bench/tests/test_registry.py`:
- Around line 342-349: The tests use a hardcoded Unix-specific path
"/tmp/test_index" in the IndexConfig instances (see
IndexConfig(name="test_index", ... file="/tmp/test_index")), which triggers Ruff
S108; replace those hardcoded strings with the pytest tmp_path fixture by
constructing the path via tmp_path / "test_index" and passing it to
IndexConfig.file (cast to str if IndexConfig.file expects a string). Update both
occurrences (the block around IndexConfig at the first diff and the similar
block at the later occurrence) to use tmp_path so the tests are portable and
secure.

---

Nitpick comments:
In `@python/cuvs_bench/cuvs_bench/tests/test_registry.py`:
- Around line 32-33: The dummy backend methods must fail fast when given an
empty indexes list: update the build() and search() methods in the dummy backend
class so they validate indexes is non-empty and raise a clear ValueError (e.g.,
"indexes must not be empty") instead of returning success; do the same
validation in the other dummy methods that accept indexes (the other
build()/search()-like stubs referenced in the diff) so that calls to build(),
search(), and any related dummy backend helpers immediately raise ValueError
when indexes is empty.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e85be508-b1ee-4430-9066-bdd9bd4e6376

📥 Commits

Reviewing files that changed from the base of the PR and between 6e91f0e and 592afc3.

📒 Files selected for processing (10)

docs/source/cuvs_bench/index.rst
python/cuvs_bench/cuvs_bench/backends/base.py
python/cuvs_bench/cuvs_bench/backends/utils.py
python/cuvs_bench/cuvs_bench/orchestrator/__init__.py
python/cuvs_bench/cuvs_bench/orchestrator/config_loaders.py
python/cuvs_bench/cuvs_bench/orchestrator/orchestrator.py
python/cuvs_bench/cuvs_bench/run/__main__.py
python/cuvs_bench/cuvs_bench/tests/test_cli.py
python/cuvs_bench/cuvs_bench/tests/test_registry.py
python/cuvs_bench/cuvs_bench/tests/test_utils.py

coderabbitai · 2026-05-04T18:07:39Z

 #

-from .orchestrator import BenchmarkOrchestrator, run_benchmark
+from .orchestrator import BenchmarkOrchestrator


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep a deprecated run_benchmark export for one release.

Dropping this re-export breaks existing from cuvs_bench.orchestrator import run_benchmark imports immediately. Please leave a compatibility shim in place and warn before removing it.

As per coding guidelines, Prevent API breaking changes to public Python interfaces, including removing or renaming public methods/attributes without proper deprecation (minimum one release cycle).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuvs_bench/cuvs_bench/orchestrator/__init__.py` at line 6, Add a deprecated shim for the old run_benchmark export: import warnings and import the original run_benchmark from .orchestrator under a private name (e.g., _run_benchmark), then define a public run_benchmark function that emits a DeprecationWarning (with a clear message to use BenchmarkOrchestrator) and forwards all args/kwargs to _run_benchmark; keep BenchmarkOrchestrator re-export unchanged so both BenchmarkOrchestrator and the deprecated run_benchmark are available for one release.

coderabbitai · 2026-05-04T18:07:39Z

+    def load(
+        self,
+        dataset: str,
+        dataset_path: str,
+        **kwargs,
+    ) -> Tuple[DatasetConfig, List[BenchmarkConfig]]:
        """
        Load and prepare benchmark configurations.

+        Handles shared config-loading steps (dataset YAML, dataset lookup,
+        DatasetConfig construction) then delegates to the backend-specific
+        _build_benchmark_configs() for producing BenchmarkConfig objects.
+
+        Parameters
+        ----------
+        dataset : str
+            Dataset name
+        dataset_path : str
+            Path to dataset directory
+        **kwargs
+            Backend-specific arguments passed through to
+            _build_benchmark_configs()
+
        Returns
        -------
        Tuple[DatasetConfig, List[BenchmarkConfig]]
            Dataset configuration and list of benchmark configurations to run
        """
+        # Shared step 1: Load dataset YAML
+        ds_yaml_path = kwargs.get("dataset_configuration") or os.path.join(
+            self.config_path, "datasets", "datasets.yaml"
+        )
+        ds_conf_all = self.load_yaml_file(ds_yaml_path)
+
+        # Shared step 2: Find dataset by name
+        ds_conf = self.get_dataset_configuration(dataset, ds_conf_all)
+
+        # Shared step 3: Construct DatasetConfig with resolved paths
+        dataset_config = self.build_dataset_config(
+            ds_conf, dataset_path, kwargs.get("subset_size")
+        )
+
+        # Backend-specific step: produce BenchmarkConfig list
+        benchmark_configs = self._build_benchmark_configs(
+            dataset_config, ds_conf, dataset, dataset_path, **kwargs
+        )
+        return dataset_config, benchmark_configs
+
+    @abstractmethod
+    def _build_benchmark_configs(
+        self,
+        dataset_config: DatasetConfig,
+        dataset_conf: dict,
+        dataset: str,
+        dataset_path: str,
+        **kwargs,
+    ) -> List[BenchmarkConfig]:


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't make _build_benchmark_configs() a required override immediately.

This turns the refactor into a breaking change for existing out-of-tree ConfigLoader subclasses that only override load(): they can no longer be instantiated after this release. Keep the new hook non-abstract for at least one deprecation cycle.

Proposed compatibility-preserving fix

- `@abstractmethod` def _build_benchmark_configs( self, dataset_config: DatasetConfig, dataset_conf: dict, dataset: str, @@ Returns ------- List[BenchmarkConfig] List of benchmark configurations to run """ - pass + raise NotImplementedError

As per coding guidelines, Prevent API breaking changes to public Python interfaces, including removing or renaming public methods/attributes without proper deprecation (minimum one release cycle).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuvs_bench/cuvs_bench/orchestrator/config_loaders.py` around lines 142 - 198, The new _build_benchmark_configs() is currently abstract which breaks existing ConfigLoader subclasses that only override load(); remove the `@abstractmethod` decorator and provide a non-abstract default implementation named _build_benchmark_configs(self, dataset_config, dataset_conf, dataset, dataset_path, **kwargs) that emits a DeprecationWarning (or uses warnings.warn(..., DeprecationWarning)) indicating subclasses should implement the hook and returns an empty list (or reasonable default) so existing subclasses continue to work until they are migrated; keep the method signature exactly as in the diff so subclasses that do override it will still be used.

Nothing in the codebase imports it. The CLI and BenchmarkOrchestrator class cover all use cases.

@jnke2016, understood — since ConfigLoader is an internal class with no external consumers, and all entry points go through the CLI or BenchmarkOrchestrator, the breaking-change concern doesn't apply here. Thanks for the clarification!

( ˶ᵔ ᵕ ᵔ˶ ) 🐇

…build/search flag passthrough

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/cuvs_bench/cuvs_bench/backends/utils.py`:
- Around line 96-110: The code currently only checks subset_size < 1, which
allows non-integer values to slip through and error later; add an explicit type
validation before the numeric comparison: ensure subset_size is an integer
(e.g., isinstance(subset_size, (int, np.integer))) and reject bools (e.g.,
isinstance(subset_size, bool) -> disallow) and raise a clear
TypeError/ValueError if not an integer, then continue to check subset_size >= 1
and use that validated value when computing n_rows; reference the subset_size
variable in utils.py to locate the change.
- Around line 185-192: The compute_recall function currently allows negative k
(it computes gt_k = min(k, groundtruth.shape[1])) which can produce invalid
recall values; add an input validation at the start of compute_recall to reject
non-positive or negative k (e.g., if k < 0) by raising a ValueError with a clear
message that includes the expected range and the actual k, e.g. "k must be
non-negative integer, got {k}"; ensure you reference the compute_recall function
and variables k and groundtruth in the check so callers immediately see the
failure instead of getting a negative gt_k or wrong denominator.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 76943c95-d590-4322-98b6-b719cd0e0842

📥 Commits

Reviewing files that changed from the base of the PR and between 592afc3 and b31ae14.

📒 Files selected for processing (5)

python/cuvs_bench/cuvs_bench/backends/base.py
python/cuvs_bench/cuvs_bench/backends/utils.py
python/cuvs_bench/cuvs_bench/orchestrator/orchestrator.py
python/cuvs_bench/cuvs_bench/run/__main__.py
python/cuvs_bench/cuvs_bench/tests/test_registry.py

🚧 Files skipped from review as they are similar to previous changes (3)

python/cuvs_bench/cuvs_bench/backends/base.py
python/cuvs_bench/cuvs_bench/run/main.py
python/cuvs_bench/cuvs_bench/orchestrator/orchestrator.py

coderabbitai · 2026-05-04T19:39:53Z

+    if subset_size is not None and subset_size < 1:
+        raise ValueError(
+            f"subset_size must be a positive integer, got {subset_size}"
+        )
+    with open(path, "rb") as f:
+        header = f.read(8)
+        if len(header) < 8:
+            raise ValueError(
+                f"File too small to contain a valid header (expected 8 bytes, "
+                f"got {len(header)}): {path}"
+            )
+        n_rows = int(np.frombuffer(header[:4], dtype=np.uint32)[0])
+        n_cols = int(np.frombuffer(header[4:], dtype=np.uint32)[0])
+        if subset_size is not None:
+            n_rows = min(n_rows, subset_size)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate subset_size type explicitly before arithmetic/comparison.

Lines 96-110 only enforce a lower bound. Non-integer values (e.g., "1000" or 1.5) can fail later with non-actionable errors instead of a clear config validation error.

Proposed fix

def load_vectors(path: str, subset_size: Optional[int] = None) -> np.ndarray: @@ - if subset_size is not None and subset_size < 1: - raise ValueError( - f"subset_size must be a positive integer, got {subset_size}" - ) + if subset_size is not None: + if not isinstance(subset_size, (int, np.integer)): + raise ValueError( + f"subset_size must be an integer, got {type(subset_size).__name__}" + ) + if subset_size < 1: + raise ValueError( + f"subset_size must be a positive integer, got {subset_size}" + )

As per coding guidelines, "Ensure missing validation does not cause crashes on invalid input through proper size/type checks."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuvs_bench/cuvs_bench/backends/utils.py` around lines 96 - 110, The code currently only checks subset_size < 1, which allows non-integer values to slip through and error later; add an explicit type validation before the numeric comparison: ensure subset_size is an integer (e.g., isinstance(subset_size, (int, np.integer))) and reject bools (e.g., isinstance(subset_size, bool) -> disallow) and raise a clear TypeError/ValueError if not an integer, then continue to check subset_size >= 1 and use that validated value when computing n_rows; reference the subset_size variable in utils.py to locate the change.

coderabbitai · 2026-05-04T19:39:53Z

+    gt_k = min(k, groundtruth.shape[1])
+    if gt_k == 0 or n_queries == 0:
+        return 0.0
+    n_correct = sum(
+        len(set(neighbors[i, :k].tolist()) & set(groundtruth[i, :gt_k].tolist()))
+        for i in range(n_queries)
+    )
+    return n_correct / (n_queries * gt_k)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject negative k in compute_recall to avoid invalid metrics.

Lines 185-192 compute gt_k = min(k, groundtruth.shape[1]) without validating k. A negative k can yield a negative denominator and invalid recall output.

Proposed fix

def compute_recall( neighbors: np.ndarray, groundtruth: np.ndarray, k: int ) -> float: @@ + if not isinstance(k, (int, np.integer)): + raise ValueError(f"k must be an integer, got {type(k).__name__}") + if k < 0: + raise ValueError(f"k must be >= 0, got {k}") + n_queries = neighbors.shape[0] @@ gt_k = min(k, groundtruth.shape[1])

As per coding guidelines, "Provide clear and actionable error messages that include expected vs actual values where helpful."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuvs_bench/cuvs_bench/backends/utils.py` around lines 185 - 192, The compute_recall function currently allows negative k (it computes gt_k = min(k, groundtruth.shape[1])) which can produce invalid recall values; add an input validation at the start of compute_recall to reject non-positive or negative k (e.g., if k < 0) by raising a ValueError with a clear message that includes the expected range and the actual k, e.g. "k must be non-negative integer, got {k}"; ensure you reference the compute_recall function and variables k and groundtruth in the check so callers immediately see the failure instead of getting a negative gt_k or wrong denominator.

jnke2016 added 2 commits April 25, 2026 16:45

Add shared load_vectors utility to centralize binary vector file load…

c067d52

…ing across Python-native backends

Promote shared config-loading methods (load_yaml_file, get_dataset_co…

6e91f0e

…nfiguration, gather_algorithm_configs) from CppGBenchConfigLoader to base ConfigLoader class.

jnke2016 requested a review from a team as a code owner April 25, 2026 18:04

github-project-automation Bot added this to Unstructured Data Processing Apr 25, 2026

jnke2016 marked this pull request as draft April 25, 2026 18:04

coderabbitai Bot reviewed Apr 25, 2026

View reviewed changes

jnke2016 added 13 commits April 25, 2026 18:17

Add input validation: fail on unsupported vector file extensions, val…

99fb908

…idate header/payload size, warn on invalid algorithm config paths.

update docstrings

1e1ef66

Refactor Dataset from dataclass to regular class with transparent vec…

21d8621

…tor loading from file paths on first access

Add dtype_from_filename from generate_groundtruth/utils.py and refact…

60f3aeb

…or load_vectors to use it.

Fix outdated DummyBackend/AnotherDummyBackend signatures in test_regi…

39867d9

…stry.py to match current BenchmarkBackend abstract interface

Add test_utils.py with tests for shared backend utilities, Dataset tr…

1e2de0f

…ansparent vector loading, and base ConfigLoader inherited methods

Refactor ConfigLoader.load() to handle shared steps (YAML loading, da…

9480257

…taset lookup, DatasetConfig construction) and delegate backend-specific logic to abstract _build_benchmark_configs().

Add shared expand_param_grid() and compute_recall() utilities with te…

3399692

…sts for Python-native backends.

Add executable_dir support to CppGBenchConfigLoader to match run.py's…

896323d

… user-specified executable path capability.

Route CLI through BenchmarkOrchestrator, add --mode, --constraints, -…

93b78db

…-n-trials, and --backend-config flags.

update docs

c99cf88

Fix executable_dir not being passed through to _prepare_benchmark_con…

4f01af3

…figs.

Add CLI tests for --mode, --backend-config, --n-trials, and --constra…

aa913dc

…ints flags.

jrbourbeau approved these changes Apr 27, 2026

View reviewed changes

Move recall computation from backends to orchestrator, computed trans…

dddf1d2

…parently after backend.search() returns

jrbourbeau reviewed May 1, 2026

View reviewed changes

aamijar assigned jnke2016 May 1, 2026

aamijar added non-breaking Introduces a non-breaking change benchmarking improvement Improves an existing functionality labels May 1, 2026

Switch gather_algorithm_configs filter from exclusion to .yaml/.yml i…

948fb8a

…nclusion.

jnke2016 added 2 commits May 4, 2026 17:48

Centralize parameter expansion through shared expand_param_grid() so …

4781e32

…all backends use the same expansion logic transparently.

Merge remote-tracking branch 'upstream/main' into consolidate-shared-…

592afc3

…utils

jnke2016 marked this pull request as ready for review May 4, 2026 18:00

jnke2016 requested a review from a team as a code owner May 4, 2026 18:00

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

jnke2016 added 2 commits May 4, 2026 18:33

fix style

3e561c3

add input validation, property setters, portable test paths, and fix …

b31ae14

…build/search flag passthrough

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

Conversation

jnke2016 commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Remaining work

Uh oh!

copy-pr-bot Bot commented Apr 25, 2026

Uh oh!

coderabbitai Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

jrbourbeau Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

jnke2016 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

jnke2016 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

jrbourbeau Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

jnke2016 May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

jnke2016 commented May 1, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnke2016 May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

jnke2016 commented Apr 25, 2026 •

edited

Loading

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading

jnke2016 May 1, 2026 •

edited

Loading

coderabbitai Bot May 4, 2026 •

edited

Loading