Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions .github/workflows/fuzz.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: fuzz

# Bounded cargo-fuzz smoke. The eight targets in fuzz/ are normally exercised
# Bounded cargo-fuzz smoke. The nine targets in fuzz/ are normally exercised
# in manual campaigns; this adds CI cadence so a regression that reintroduces a
# loader panic / OOM, breaks the write->load round-trip, or destabilises the
# FastScan or two-stage retrieval kernels surface in CI rather than only at
Expand All @@ -10,7 +10,7 @@ name: fuzz
# * pull_request / push(main): a SHORT smoke (60s/target) over the
# highest-value targets — fast enough to run on every change.
# * schedule (weekly) / workflow_dispatch: a LONGER sweep (300s/target)
# across ALL eight targets.
# across ALL nine targets.
#
# This runs UNATTENDED on a cron schedule, so every third-party action is
# SHA-pinned, cargo-fuzz is installed with its bundled lockfile on a pinned
Expand Down Expand Up @@ -74,7 +74,7 @@ jobs:
TARGET: ${{ matrix.target }}
run: cargo "+${FUZZ_NIGHTLY}" fuzz run "$TARGET" -- -max_total_time=60 -rss_limit_mb=4096

# Weekly full sweep over all eight targets at a larger time budget.
# Weekly full sweep over all nine targets at a larger time budget.
weekly:
name: fuzz weekly (300s)
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
Expand All @@ -88,6 +88,7 @@ jobs:
- load_rankquant
- load_bitmap
- load_sign_bitmap
- load_fastscan
- roundtrip_rankquant
- search_rankquant
- fastscan_b2
Expand Down
15 changes: 13 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
and a corpus-size scaling sweep on public BEIR datasets, with the corpus
embedded by Harrier-Q8 (GGUF `Q8_0` via `llama-cpp-python`, CUDA). The README
now leads with the resulting scaling curve, latency bars, and nDCG@10 table;
every figure is regenerated by the harness (nothing hand-entered). Replaces the
previous private-arXiv real-embedding numbers in the README.
every figure is regenerated by the harness and the README tables transcribe
its summary outputs. Replaces the previous private-arXiv real-embedding
numbers in the README.
- **`RankQuantFastscan` is now a stable, public API** (previously re-exported
`#[doc(hidden)]`), with `.ovfs` / `OVFS` persistence via
`RankQuantFastscan::{write,load}` and a ninth `load_fastscan` cargo-fuzz
target. Metadata-probe support (`probe_index_metadata`) for `.ovfs` is
deferred to 0.8.0 (#233, #232).

### Performance

Expand All @@ -58,6 +64,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed

- **On-disk format magics renamed to `OV*`** (`OVR1` / `OVRQ` / `OVBM` /
`OVSB`). The loaders still accept the legacy `TV*` magics, so every
previously-written `.tvr` / `.tvrq` / `.tvbm` / `.tvsb` file continues to load
unchanged; only the file extensions and magic bytes written by `write()`
change (#230).
- **Release-hardened the caller-owned serial two-stage primitives** (no API
change; added in 0.5.0). The trust model is now explicit and tested:
- Rejection-path regression tests for the full CSR/query/buffer validation set
Expand Down
42 changes: 25 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,9 @@

Training-free ordinal & sign quantization for vector retrieval.

`ordvec` is a small, dependency-light Rust crate for compressed
nearest-neighbour search over high-dimensional embeddings.
`ordvec` is a small, pure-Rust crate for compressed nearest-neighbour search
that quantizes the **ordinal (rank) and sign structure** of an embedding —
no codebook, no learned rotation, no graph to build.

## Benchmark at a glance

Expand Down Expand Up @@ -56,10 +57,12 @@ keeps a near-flat per-query cost as the corpus grows, while exact brute-force

## What's different

Compressed-retrieval libraries usually either **fit a codebook to your
data** (product / scalar quantization) or **wrap vectors in a graph**
(HNSW). ordvec does neither — it quantizes the *ordinal* structure of each
vector on its own:
Compressed-retrieval paths almost all carry a **fit step**: product
quantization fits a k-means codebook, OPQ adds a learned rotation,
scalar / binary quantizers calibrate to the data distribution, graph indexes
(HNSW) build a navigable graph, and Matryoshka needs a model trained with its
loss. ordvec fits **none** of them — it quantizes the *ordinal and sign*
structure of each vector on its own:

- **Training-free, data-oblivious.** No codebook, no learned rotation, no
fit step. Encoding is a per-vector rank (or sign) transform — index the
Expand All @@ -78,7 +81,10 @@ vector on its own:
when `dim % 256 == 0` — not a broad retrieval mode.)
- **Two-stage retrieval, built in.** A cheap bitmap / sign-popcount
prefilter feeds an exact rerank — the coarse→fine pipeline ships as
library primitives.
library primitives. The coarse-scan→exact-rerank pattern, and the
`RankQuantFastscan` block-32 4-bit LUT path, follow the FAISS FastScan and
binary-quantization-plus-rescore lineage; ordvec ships them
batteries-included and dependency-free, not as new techniques.

ordvec is a compressed **flat-scan** substrate (optionally two-stage): small
codes scored by fast SIMD — AVX-512/AVX2 runtime-dispatched on x86_64, baseline
Expand All @@ -100,11 +106,13 @@ large-scale serving rather than competing with one.

Two further paths, for callers who need them:

- **`RankQuantFastscan`** *(`#[doc(hidden)]` — reachable as
`ordvec::RankQuantFastscan`, but the API is not yet stable)* — an optional
b=2 FastScan kernel (block-32 PQ-LUT) for absolute-minimum scan latency, at
2× the RankQuant b=2 footprint (`dim/2` bytes/doc). Surfaced here so
latency-critical callers know it exists.
- **`RankQuantFastscan`** — a stable, documented *but specialized* public
type: an optional b=2 FastScan kernel (block-32 nibble/PQ-LUT, AVX-512 → AVX2
→ scalar dispatch) for absolute-minimum stage-1 scan latency, at 2× the
RankQuant b=2 footprint (`dim/2` bytes/doc) and 8-bit LUT scoring noise. It
persists to `.ovfs` (magic `OVFS`). Reach for it only when scan latency at
b=2 is the binding constraint; the headline retrieval surface is still
`RankQuant` / `Bitmap` / two-stage.
- **`MultiBucketBitmap`** *(behind `--features experimental`)* — the
multi-bucket bilinear-overlap probe behind the research-side decomposition;
an algebraic scaffold, not the top-bucket theorem surface or a production
Expand Down Expand Up @@ -291,11 +299,11 @@ thread count, no Python/FFI in the hot path:

- **`flat`** — exact inner-product brute force (identical retrieval to FAISS
`IndexFlatIP`), a pure-Rust SIMD GEMM. *Baseline, not ground truth.*
- **`hnsw`** — pure-Rust HNSW (`hnsw_rs`, M=32, ef=128) — the portable
stand-in for the C++ hnswlib.
- **`hnsw`** — pure-Rust HNSW (`hnsw_rs`, M=32, ef_construction=200,
ef_search=128) — the portable stand-in for the C++ hnswlib.

Reproduce end-to-end (downloads the data, embeds, runs every method, renders the
figures) — nothing below is hand-entered:
figures, and emits the summary tables transcribed below):

```sh
make bench-beir-setup # Python deps + CUDA llama-cpp-python
Expand Down Expand Up @@ -426,8 +434,8 @@ clean-checkout kernel sanity check.

## Security: index-file trust

The on-disk formats (`.ovr` / `.ovrq` / `.ovbm` / `.ovsb`; legacy `.tvr` /
`.tvrq` / `.tvbm` / `.tvsb` files still load) carry **no built-in
The on-disk formats (`.ovr` / `.ovrq` / `.ovbm` / `.ovsb` / `.ovfs`; legacy
`.tvr` / `.tvrq` / `.tvbm` / `.tvsb` files still load) carry **no built-in
checksum, MAC, or signature — by design.** The loaders validate *structure*
(magic, version, bounds, exact-length payload) but not *origin*: a
structurally valid file can still be untrusted. If an index file crosses a
Expand Down
28 changes: 15 additions & 13 deletions THREAT_MODEL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Threat Model — `ordvec`

> **Status:** v0.5.0 (pre-1.0), 2026-06-13. This is the maintained threat model
> **Status:** v0.5.0 (pre-1.0), 2026-06-15. This is the maintained threat model
> for the `ordvec` Rust crate, C ABI, Go wrapper, PyO3/maturin Python bindings,
> and the `ordvec-manifest` sidecar verifier. It is reviewed when the
> attack surface changes (new persistence formats, new `unsafe` kernels, new
Expand Down Expand Up @@ -66,7 +66,7 @@ absence of a second maintainer is itself a tracked supply-chain residual

| Layer | Components | Trust boundary |
|---|---|---|
| **Deserialization** | `rank_io.rs` — `.ovr` / `.ovrq` / `.ovbm` / `.ovsb` loaders (also accept the legacy `.tvr` / `.tvrq` / `.tvbm` / `.tvsb` magics) | Untrusted filesystem / network byte stream |
| **Deserialization** | `rank_io.rs` — `.ovr` / `.ovrq` / `.ovbm` / `.ovsb` / `.ovfs` loaders (`.ovfs`/`OVFS` is the FastScan format and has no legacy magic; the other four also accept the legacy `.tvr` / `.tvrq` / `.tvbm` / `.tvsb` magics) | Untrusted filesystem / network byte stream |
| **Manifest verification** | `ordvec-manifest` — JSON sidecar verifier | Manifest + index + optional row-map files before load |
| **Compute kernels** | `fastscan.rs`, `quant_kernels.rs`, `bitmap.rs`, `sign_bitmap.rs` | Trust established after format validation |
| **Index API** | `rank.rs`, `quant.rs`, `bitmap.rs`, `sign_bitmap.rs` | Caller-controlled query embeddings |
Expand All @@ -75,13 +75,14 @@ absence of a second maintainer is itself a tracked supply-chain residual
| **Python FFI** | `ordvec-python` (PyO3 / maturin) | Python ↔ Rust boundary; NumPy buffers |
| **CI / supply chain** | GitHub Actions workflows; `Cargo.lock`; crates.io + PyPI | GitHub OIDC, crates.io, PyPI trust chains |

The `fuzz/` directory holds **eight** cargo-fuzz targets: `load_rank`,
`load_rankquant`, `load_bitmap`, `load_sign_bitmap` (deserialization);
`roundtrip_rankquant` (write→load round-trip); `search_rankquant` (the
single-rate ingest + asymmetric-search compute path); `fastscan_b2` (the
FastScan b=2 block-32 kernel — the one `unsafe`-heavy scan path the others do
not reach); and `signbitmap_rankquant_twostage` (sign candidate generation
followed by RankQuant subset reranking).
The `fuzz/` directory holds **nine** cargo-fuzz targets: `load_rank`,
`load_rankquant`, `load_bitmap`, `load_sign_bitmap`, `load_fastscan`
(deserialization — the last drives the `.ovfs`/`OVFS` FastScan loader via
`RankQuantFastscan::load`); `roundtrip_rankquant` (write→load round-trip);
`search_rankquant` (the single-rate ingest + asymmetric-search compute path);
`fastscan_b2` (the FastScan b=2 block-32 kernel — the one `unsafe`-heavy scan
path the others do not reach); and `signbitmap_rankquant_twostage` (sign
candidate generation followed by RankQuant subset reranking).

### 1.2 Deployment contexts (for integrators)

Expand Down Expand Up @@ -125,14 +126,15 @@ followed by RankQuant subset reranking).
persistence API is the index types' `write()` / `load()`, making the
write→load round-trip a type-level guarantee.

The four loaders are covered by cargo-fuzz targets (the `load_*` targets).
The five loaders are covered by cargo-fuzz targets (the `load_*` targets,
including `load_fastscan` for the `.ovfs` FastScan format).

### 2.2 Index-file risk classes

**THREAT-DESER-001 (library-owned, P4): Malformed index file.**
The loader must reject corrupt/invalid files without panic, OOM, or
trailing-data acceptance. The current implementation satisfies this for all
four formats. *Residual:* `file.metadata()?.len()` is sampled at open time;
five formats. *Residual:* `file.metadata()?.len()` is sampled at open time;
on NFS/FUSE mounts with concurrent writers a TOCTOU window exists between
`metadata()` and the reads. On writable shared mounts the practical outcome is
a read error or `InvalidData`, not an exploit. *Likelihood:* Very Low.
Expand Down Expand Up @@ -433,7 +435,7 @@ knowledge of quantization parameters and the document distribution.

## 8. Fuzzing coverage (THREAT-FUZZ)

Eight targets cover the four loaders, the write→load round-trip, the
Nine targets cover the five loaders, the write→load round-trip, the
single-rate compute path, the FastScan kernel, and the composed
SignBitmap→RankQuant retrieval path.

Expand All @@ -448,7 +450,7 @@ it exercises the AVX-512 kernel.
regression.** A `fuzz.yml` workflow now runs a bounded smoke on every pull
request and push to `main` (`-max_total_time=60` over `load_rank`,
`load_rankquant`, `fastscan_b2`, and `signbitmap_rankquant_twostage`) plus a
weekly full sweep (`-max_total_time=300` over all eight targets), so a
weekly full sweep (`-max_total_time=300` over all nine targets), so a
regression that
reintroduces a loader panic / OOM, breaks the write→load round-trip, or
destabilises the FastScan kernel or composed sign→RankQuant path surfaces in CI
Expand Down
46 changes: 23 additions & 23 deletions docs/RANK_MODES.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ That runs the head-to-head on a structured synthetic corpus (D=256,
N=30,000, 200 queries, 200 cluster prototypes, latent_dim=64; see
[Stress test](#stress-test-low-rank-clustered-synthetic) for the
exact construction). Results on real embedding corpora are
user-runnable via `--corpus-npy` / `--queries-npy`; the current
arXiv paper-harness result is summarized in the README and the
user-runnable via `--corpus-npy` / `--queries-npy`, and the reproducible
BEIR harness (`make benchmark-beir`) is summarized in the README; the
reproduction shape is described under
[External-corpus results](#external-corpus-results-user-runnable).

Expand Down Expand Up @@ -56,7 +56,7 @@ to the top-k scores at finalize so the displayed cosines stay exact.
|---|---|
| CPU | AMD Ryzen 9 9950X (Zen 5, 16C/32T, full 512-bit AVX-512 datapath) |
| OS | CachyOS Linux |
| Compiler | rustc 1.95.0 |
| Compiler | rustc 1.95.0 (bench machine toolchain; the crate MSRV is 1.89 — see [compatibility-policy.md](./compatibility-policy.md)) |
| Build | `cargo build --release` with `lto = true, codegen-units = 1, opt-level = 3` |
| Detected SIMD | sse4.2, avx2, fma, avx512f, avx512bw, avx512vl |
| Latency mode | single-thread per query (rayon parallelises *across* queries; per-query rows measure scan only) |
Expand Down Expand Up @@ -175,7 +175,7 @@ under [Stress test](#stress-test-low-rank-clustered-synthetic) — it
is a generated Gaussian low-rank fixture, useful for exercising the
rank-mode kernels and their size/latency tradeoffs. Treat its recall
spread as a stress-test result, not the lead retrieval-quality claim;
the current real-embedding arXiv benchmark in the README is the better
the reproducible BEIR benchmark in the README is the better
guide to retrieval-relevant ordinal behaviour.

Results are with the AVX-512 asymmetric scan enabled where applicable
Expand Down Expand Up @@ -253,8 +253,8 @@ but it is still a generated Gaussian fixture. It is useful for
self-contained kernel checks and for stressing the compression modes;
it should not be read as the strongest evidence for the retrieval task.
Real sentence/passage embeddings are anisotropic in task-specific ways,
and the current arXiv source-recovery benchmark is more favorable to
the rank transform than this small synthetic fixture.
and real-corpus benchmarks (the reproducible BEIR harness in the README)
are more favorable to the rank transform than this small synthetic fixture.

## What the head-to-head shows

Expand Down Expand Up @@ -383,14 +383,13 @@ minimal built-in reader — no Python dependency at bench time, and no
BLAS.

What to expect from real embeddings: dense sentence/passage encoders
often carry retrieval signal in their coordinate order. The current
paper-harness arXiv run (207,695 embeddings, 7,200 source-recovery
queries) has full ordinal rank-cosine within bootstrap noise of dense
exact search and slightly ahead of the tested FAISS HNSW configuration;
RankQuant b=2 asym matches that HNSW configuration within bootstrap
noise at 256 bytes/vector. Run the command above on your target
embeddings to get the number that matters for your deployment — the
arXiv artifact set is not shipped in this crate.
often carry retrieval signal in their coordinate order. The reproducible
in-repo BEIR harness (`make benchmark-beir`, summarized in the README) is
the sanctioned real-corpus measurement — on public BEIR data, ordvec's
ordinal modes (`RankQuant` b=2 / b=4) land within bootstrap noise of dense
exact search at 8–16× smaller vectors. Run the command above on your
target embeddings to get the number that matters for your deployment; no
external corpus is shipped in this crate.

## A null result reported up front

Expand Down Expand Up @@ -431,10 +430,11 @@ Candidate IDs are global row ordinals; duplicate candidates are scored as
separate entries and can produce duplicate hits, so callers that need
unique output rows should deduplicate candidate lists before reranking.

`RankQuantFastscan` (re-exported `#[doc(hidden)]`) is an optional
single-pass b=2 fast path; it supports `add`/`search` but not
`swap_remove`/`write`/`load` (see its module docs in
`src/fastscan.rs`). `MultiBucketBitmap` underwrites the
`RankQuantFastscan` is a stable, public (but specialized) single-pass b=2
fast path; it supports `add`/`search`/`write`/`load` (`.ovfs` persistence,
magic `OVFS`) but not `swap_remove` (see its module docs in
`src/fastscan.rs`). Metadata-probe support via `probe_index_metadata` is
deferred to 0.8.0 (#232). `MultiBucketBitmap` underwrites the
bilinear bucket-overlap decomposition and is reachable only behind the
`experimental` feature.

Expand Down Expand Up @@ -516,11 +516,11 @@ multi-seed stability is your call.
because there is no rotation matmul and no codebook fit — the
per-vector cost is the `argsort`.
4. **Recall is corpus-dependent.** The generated Gaussian fixture is a
stress test, not the lead quality claim. On the current real arXiv
embedding task, full ordinal rank-cosine is within bootstrap noise
of dense exact search, and RankQuant b=2 asym matches the tested
FAISS HNSW configuration within bootstrap noise. Run the
external-corpus bench on your data — see above.
stress test, not the lead quality claim. On public BEIR data (the
reproducible `make benchmark-beir` harness in the README), ordvec's
ordinal modes land within bootstrap noise of dense exact search at a
fraction of the storage. Run the external-corpus bench on your data —
see above.
5. **The audit-by-removal rationale.** RankQuant removes training,
rotation, codebooks, and per-document norms from the pipeline. That
retrieval still works after the removal is the interesting result:
Expand Down
10 changes: 5 additions & 5 deletions docs/c-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,8 +189,8 @@ double free are undefined behavior.

## V1 Exclusions

ABI v1 intentionally excludes `Rank`, `SignBitmap`, external IDs, ID maps,
builders, mutating index APIs, logging callbacks, custom allocators, async
search, batched search, richer measured timing breakdowns, and release
packaging. Those can be added in later ABI versions without changing the v1
struct-size rule.
ABI v1 intentionally excludes `Rank`, `SignBitmap`, `RankQuantFastscan`
(the `.ovfs` FastScan path), external IDs, ID maps, builders, mutating index
APIs, logging callbacks, custom allocators, async search, batched search,
richer measured timing breakdowns, and release packaging. Those can be added in
later ABI versions without changing the v1 struct-size rule.
5 changes: 3 additions & 2 deletions docs/compatibility-policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,10 @@ documented minor release removes them.
The `experimental` feature is a default-off research surface. Today it exposes
`MultiBucketBitmap`; it is not patch-stable before 1.0.

`#[doc(hidden)]` exports such as `RankQuantFastscan` and
`RankQuantFastscan` is a stable, public (but specialized) type, covered by the
normal pre-1.0 compatibility policy above. `#[doc(hidden)]` exports such as
`search_asymmetric_byte_lut` are reachable for internal benchmarks and parity
tests, but they are not part of the stable default API.
tests, but are not part of the stable default API.

New feature flags must declare their stability class before merging:

Expand Down
2 changes: 1 addition & 1 deletion docs/determinism.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ the public hit order still follows the global ordering rule above.

## FastScan

`RankQuantFastscan` is a hidden, optional b=2 pre-ranker. It is deterministic
`RankQuantFastscan` is a public, specialized, optional b=2 pre-ranker. It is deterministic
for a fixed index, query, and backend dispatch, and its scalar and AVX-512
FastScan kernels operate on the same quantized LUT inputs. It is not
score-equivalent to exact `RankQuant::search_asymmetric`: the global 8-bit LUT
Expand Down
Loading
Loading