Project-Navi · Navi Bot (project-navi-bot) · Jun 15, 2026 · Jun 15, 2026 · Jun 15, 2026 · Jun 15, 2026
@@ -1,6 +1,6 @@
 name: fuzz
 
-# Bounded cargo-fuzz smoke. The eight targets in fuzz/ are normally exercised
+# Bounded cargo-fuzz smoke. The nine targets in fuzz/ are normally exercised
 # in manual campaigns; this adds CI cadence so a regression that reintroduces a
 # loader panic / OOM, breaks the write->load round-trip, or destabilises the
 # FastScan or two-stage retrieval kernels surface in CI rather than only at
@@ -10,7 +10,7 @@ name: fuzz
 #   * pull_request / push(main): a SHORT smoke (60s/target) over the
 #     highest-value targets — fast enough to run on every change.
 #   * schedule (weekly) / workflow_dispatch: a LONGER sweep (300s/target)
-#     across ALL eight targets.
+#     across ALL nine targets.
 #
 # This runs UNATTENDED on a cron schedule, so every third-party action is
 # SHA-pinned, cargo-fuzz is installed with its bundled lockfile on a pinned
@@ -74,7 +74,7 @@ jobs:
           TARGET: ${{ matrix.target }}
         run: cargo "+${FUZZ_NIGHTLY}" fuzz run "$TARGET" -- -max_total_time=60 -rss_limit_mb=4096
 
-  # Weekly full sweep over all eight targets at a larger time budget.
+  # Weekly full sweep over all nine targets at a larger time budget.
   weekly:
     name: fuzz weekly (300s)
     if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
@@ -88,6 +88,7 @@ jobs:
           - load_rankquant
           - load_bitmap
           - load_sign_bitmap
+          - load_fastscan
           - roundtrip_rankquant
           - search_rankquant
           - fastscan_b2

@@ -31,8 +31,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   and a corpus-size scaling sweep on public BEIR datasets, with the corpus
   embedded by Harrier-Q8 (GGUF `Q8_0` via `llama-cpp-python`, CUDA). The README
   now leads with the resulting scaling curve, latency bars, and nDCG@10 table;
-  every figure is regenerated by the harness (nothing hand-entered). Replaces the
-  previous private-arXiv real-embedding numbers in the README.
+  every figure is regenerated by the harness and the README tables transcribe
+  its summary outputs. Replaces the previous private-arXiv real-embedding
+  numbers in the README.
+- **`RankQuantFastscan` is now a stable, public API** (previously re-exported
+  `#[doc(hidden)]`), with `.ovfs` / `OVFS` persistence via
+  `RankQuantFastscan::{write,load}` and a ninth `load_fastscan` cargo-fuzz
+  target. Metadata-probe support (`probe_index_metadata`) for `.ovfs` is
+  deferred to 0.8.0 (#233, #232).
 
 ### Performance
 
@@ -58,6 +64,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Changed
 
+- **On-disk format magics renamed to `OV*`** (`OVR1` / `OVRQ` / `OVBM` /
+  `OVSB`). The loaders still accept the legacy `TV*` magics, so every
+  previously-written `.tvr` / `.tvrq` / `.tvbm` / `.tvsb` file continues to load
+  unchanged; only the file extensions and magic bytes written by `write()`
+  change (#230).
 - **Release-hardened the caller-owned serial two-stage primitives** (no API
   change; added in 0.5.0). The trust model is now explicit and tested:
   - Rejection-path regression tests for the full CSR/query/buffer validation set

@@ -12,8 +12,9 @@
 
 Training-free ordinal & sign quantization for vector retrieval.
 
-`ordvec` is a small, dependency-light Rust crate for compressed
-nearest-neighbour search over high-dimensional embeddings.
+`ordvec` is a small, pure-Rust crate for compressed nearest-neighbour search
+that quantizes the **ordinal (rank) and sign structure** of an embedding —
+no codebook, no learned rotation, no graph to build.
 
 ## Benchmark at a glance
 
@@ -56,10 +57,12 @@ keeps a near-flat per-query cost as the corpus grows, while exact brute-force
 
 ## What's different
 
-Compressed-retrieval libraries usually either **fit a codebook to your
-data** (product / scalar quantization) or **wrap vectors in a graph**
-(HNSW). ordvec does neither — it quantizes the *ordinal* structure of each
-vector on its own:
+Compressed-retrieval paths almost all carry a **fit step**: product
+quantization fits a k-means codebook, OPQ adds a learned rotation,
+scalar / binary quantizers calibrate to the data distribution, graph indexes
+(HNSW) build a navigable graph, and Matryoshka needs a model trained with its
+loss. ordvec fits **none** of them — it quantizes the *ordinal and sign*
+structure of each vector on its own:
 
 - **Training-free, data-oblivious.** No codebook, no learned rotation, no
   fit step. Encoding is a per-vector rank (or sign) transform — index the
@@ -78,7 +81,10 @@ vector on its own:
   when `dim % 256 == 0` — not a broad retrieval mode.)
 - **Two-stage retrieval, built in.** A cheap bitmap / sign-popcount
   prefilter feeds an exact rerank — the coarse→fine pipeline ships as
-  library primitives.
+  library primitives. The coarse-scan→exact-rerank pattern, and the
+  `RankQuantFastscan` block-32 4-bit LUT path, follow the FAISS FastScan and
+  binary-quantization-plus-rescore lineage; ordvec ships them
+  batteries-included and dependency-free, not as new techniques.
 
 ordvec is a compressed **flat-scan** substrate (optionally two-stage): small
 codes scored by fast SIMD — AVX-512/AVX2 runtime-dispatched on x86_64, baseline
@@ -100,11 +106,13 @@ large-scale serving rather than competing with one.
 
 Two further paths, for callers who need them:
 
-- **`RankQuantFastscan`** *(`#[doc(hidden)]` — reachable as
-  `ordvec::RankQuantFastscan`, but the API is not yet stable)* — an optional
-  b=2 FastScan kernel (block-32 PQ-LUT) for absolute-minimum scan latency, at
-  2× the RankQuant b=2 footprint (`dim/2` bytes/doc). Surfaced here so
-  latency-critical callers know it exists.
+- **`RankQuantFastscan`** — a stable, documented *but specialized* public
+  type: an optional b=2 FastScan kernel (block-32 nibble/PQ-LUT, AVX-512 → AVX2
+  → scalar dispatch) for absolute-minimum stage-1 scan latency, at 2× the
+  RankQuant b=2 footprint (`dim/2` bytes/doc) and 8-bit LUT scoring noise. It
+  persists to `.ovfs` (magic `OVFS`). Reach for it only when scan latency at
+  b=2 is the binding constraint; the headline retrieval surface is still
+  `RankQuant` / `Bitmap` / two-stage.
 - **`MultiBucketBitmap`** *(behind `--features experimental`)* — the
   multi-bucket bilinear-overlap probe behind the research-side decomposition;
   an algebraic scaffold, not the top-bucket theorem surface or a production
@@ -291,11 +299,11 @@ thread count, no Python/FFI in the hot path:
 
 - **`flat`** — exact inner-product brute force (identical retrieval to FAISS
   `IndexFlatIP`), a pure-Rust SIMD GEMM. *Baseline, not ground truth.*
-- **`hnsw`** — pure-Rust HNSW (`hnsw_rs`, M=32, ef=128) — the portable
-  stand-in for the C++ hnswlib.
+- **`hnsw`** — pure-Rust HNSW (`hnsw_rs`, M=32, ef_construction=200,
+  ef_search=128) — the portable stand-in for the C++ hnswlib.
 
 Reproduce end-to-end (downloads the data, embeds, runs every method, renders the
-figures) — nothing below is hand-entered:
+figures, and emits the summary tables transcribed below):
 
 ```sh
 make bench-beir-setup      # Python deps + CUDA llama-cpp-python
@@ -426,8 +434,8 @@ clean-checkout kernel sanity check.
 
 ## Security: index-file trust
 
-The on-disk formats (`.ovr` / `.ovrq` / `.ovbm` / `.ovsb`; legacy `.tvr` /
-`.tvrq` / `.tvbm` / `.tvsb` files still load) carry **no built-in
+The on-disk formats (`.ovr` / `.ovrq` / `.ovbm` / `.ovsb` / `.ovfs`; legacy
+`.tvr` / `.tvrq` / `.tvbm` / `.tvsb` files still load) carry **no built-in
 checksum, MAC, or signature — by design.** The loaders validate *structure*
 (magic, version, bounds, exact-length payload) but not *origin*: a
 structurally valid file can still be untrusted. If an index file crosses a

@@ -1,6 +1,6 @@
 # Threat Model — `ordvec`
 
-> **Status:** v0.5.0 (pre-1.0), 2026-06-13. This is the maintained threat model
+> **Status:** v0.5.0 (pre-1.0), 2026-06-15. This is the maintained threat model
 > for the `ordvec` Rust crate, C ABI, Go wrapper, PyO3/maturin Python bindings,
 > and the `ordvec-manifest` sidecar verifier. It is reviewed when the
 > attack surface changes (new persistence formats, new `unsafe` kernels, new
@@ -66,7 +66,7 @@ absence of a second maintainer is itself a tracked supply-chain residual
 
 | Layer | Components | Trust boundary |
 |---|---|---|
-| **Deserialization** | `rank_io.rs` — `.ovr` / `.ovrq` / `.ovbm` / `.ovsb` loaders (also accept the legacy `.tvr` / `.tvrq` / `.tvbm` / `.tvsb` magics) | Untrusted filesystem / network byte stream |
+| **Deserialization** | `rank_io.rs` — `.ovr` / `.ovrq` / `.ovbm` / `.ovsb` / `.ovfs` loaders (`.ovfs`/`OVFS` is the FastScan format and has no legacy magic; the other four also accept the legacy `.tvr` / `.tvrq` / `.tvbm` / `.tvsb` magics) | Untrusted filesystem / network byte stream |
 | **Manifest verification** | `ordvec-manifest` — JSON sidecar verifier | Manifest + index + optional row-map files before load |
 | **Compute kernels** | `fastscan.rs`, `quant_kernels.rs`, `bitmap.rs`, `sign_bitmap.rs` | Trust established after format validation |
 | **Index API** | `rank.rs`, `quant.rs`, `bitmap.rs`, `sign_bitmap.rs` | Caller-controlled query embeddings |
@@ -75,13 +75,14 @@ absence of a second maintainer is itself a tracked supply-chain residual
 | **Python FFI** | `ordvec-python` (PyO3 / maturin) | Python ↔ Rust boundary; NumPy buffers |
 | **CI / supply chain** | GitHub Actions workflows; `Cargo.lock`; crates.io + PyPI | GitHub OIDC, crates.io, PyPI trust chains |
 
-The `fuzz/` directory holds **eight** cargo-fuzz targets: `load_rank`,
-`load_rankquant`, `load_bitmap`, `load_sign_bitmap` (deserialization);
-`roundtrip_rankquant` (write→load round-trip); `search_rankquant` (the
-single-rate ingest + asymmetric-search compute path); `fastscan_b2` (the
-FastScan b=2 block-32 kernel — the one `unsafe`-heavy scan path the others do
-not reach); and `signbitmap_rankquant_twostage` (sign candidate generation
-followed by RankQuant subset reranking).
+The `fuzz/` directory holds **nine** cargo-fuzz targets: `load_rank`,
+`load_rankquant`, `load_bitmap`, `load_sign_bitmap`, `load_fastscan`
+(deserialization — the last drives the `.ovfs`/`OVFS` FastScan loader via
+`RankQuantFastscan::load`); `roundtrip_rankquant` (write→load round-trip);
+`search_rankquant` (the single-rate ingest + asymmetric-search compute path);
+`fastscan_b2` (the FastScan b=2 block-32 kernel — the one `unsafe`-heavy scan
+path the others do not reach); and `signbitmap_rankquant_twostage` (sign
+candidate generation followed by RankQuant subset reranking).
 
 ### 1.2 Deployment contexts (for integrators)
 
@@ -125,14 +126,15 @@ followed by RankQuant subset reranking).
   persistence API is the index types' `write()` / `load()`, making the
   write→load round-trip a type-level guarantee.
 
-The four loaders are covered by cargo-fuzz targets (the `load_*` targets).
+The five loaders are covered by cargo-fuzz targets (the `load_*` targets,
+including `load_fastscan` for the `.ovfs` FastScan format).
 
 ### 2.2 Index-file risk classes
 
 **THREAT-DESER-001 (library-owned, P4): Malformed index file.**
 The loader must reject corrupt/invalid files without panic, OOM, or
 trailing-data acceptance. The current implementation satisfies this for all
-four formats. *Residual:* `file.metadata()?.len()` is sampled at open time;
+five formats. *Residual:* `file.metadata()?.len()` is sampled at open time;
 on NFS/FUSE mounts with concurrent writers a TOCTOU window exists between
 `metadata()` and the reads. On writable shared mounts the practical outcome is
 a read error or `InvalidData`, not an exploit. *Likelihood:* Very Low.
@@ -433,7 +435,7 @@ knowledge of quantization parameters and the document distribution.
 
 ## 8. Fuzzing coverage (THREAT-FUZZ)
 
-Eight targets cover the four loaders, the write→load round-trip, the
+Nine targets cover the five loaders, the write→load round-trip, the
 single-rate compute path, the FastScan kernel, and the composed
 SignBitmap→RankQuant retrieval path.
 
@@ -448,7 +450,7 @@ it exercises the AVX-512 kernel.
 regression.** A `fuzz.yml` workflow now runs a bounded smoke on every pull
 request and push to `main` (`-max_total_time=60` over `load_rank`,
 `load_rankquant`, `fastscan_b2`, and `signbitmap_rankquant_twostage`) plus a
-weekly full sweep (`-max_total_time=300` over all eight targets), so a
+weekly full sweep (`-max_total_time=300` over all nine targets), so a
 regression that
 reintroduces a loader panic / OOM, breaks the write→load round-trip, or
 destabilises the FastScan kernel or composed sign→RankQuant path surfaces in CI

@@ -27,8 +27,8 @@ That runs the head-to-head on a structured synthetic corpus (D=256,
 N=30,000, 200 queries, 200 cluster prototypes, latent_dim=64; see
 [Stress test](#stress-test-low-rank-clustered-synthetic) for the
 exact construction). Results on real embedding corpora are
-user-runnable via `--corpus-npy` / `--queries-npy`; the current
-arXiv paper-harness result is summarized in the README and the
+user-runnable via `--corpus-npy` / `--queries-npy`, and the reproducible
+BEIR harness (`make benchmark-beir`) is summarized in the README; the
 reproduction shape is described under
 [External-corpus results](#external-corpus-results-user-runnable).
 
@@ -56,7 +56,7 @@ to the top-k scores at finalize so the displayed cosines stay exact.
 |---|---|
 | CPU | AMD Ryzen 9 9950X (Zen 5, 16C/32T, full 512-bit AVX-512 datapath) |
 | OS | CachyOS Linux |
-| Compiler | rustc 1.95.0 |
+| Compiler | rustc 1.95.0 (bench machine toolchain; the crate MSRV is 1.89 — see [compatibility-policy.md](./compatibility-policy.md)) |
 | Build | `cargo build --release` with `lto = true, codegen-units = 1, opt-level = 3` |
 | Detected SIMD | sse4.2, avx2, fma, avx512f, avx512bw, avx512vl |
 | Latency mode | single-thread per query (rayon parallelises *across* queries; per-query rows measure scan only) |
@@ -175,7 +175,7 @@ under [Stress test](#stress-test-low-rank-clustered-synthetic) — it
 is a generated Gaussian low-rank fixture, useful for exercising the
 rank-mode kernels and their size/latency tradeoffs. Treat its recall
 spread as a stress-test result, not the lead retrieval-quality claim;
-the current real-embedding arXiv benchmark in the README is the better
+the reproducible BEIR benchmark in the README is the better
 guide to retrieval-relevant ordinal behaviour.
 
 Results are with the AVX-512 asymmetric scan enabled where applicable
@@ -253,8 +253,8 @@ but it is still a generated Gaussian fixture. It is useful for
 self-contained kernel checks and for stressing the compression modes;
 it should not be read as the strongest evidence for the retrieval task.
 Real sentence/passage embeddings are anisotropic in task-specific ways,
-and the current arXiv source-recovery benchmark is more favorable to
-the rank transform than this small synthetic fixture.
+and real-corpus benchmarks (the reproducible BEIR harness in the README)
+are more favorable to the rank transform than this small synthetic fixture.
 
 ## What the head-to-head shows
 
@@ -383,14 +383,13 @@ minimal built-in reader — no Python dependency at bench time, and no
 BLAS.
 
 What to expect from real embeddings: dense sentence/passage encoders
-often carry retrieval signal in their coordinate order. The current
-paper-harness arXiv run (207,695 embeddings, 7,200 source-recovery
-queries) has full ordinal rank-cosine within bootstrap noise of dense
-exact search and slightly ahead of the tested FAISS HNSW configuration;
-RankQuant b=2 asym matches that HNSW configuration within bootstrap
-noise at 256 bytes/vector. Run the command above on your target
-embeddings to get the number that matters for your deployment — the
-arXiv artifact set is not shipped in this crate.
+often carry retrieval signal in their coordinate order. The reproducible
+in-repo BEIR harness (`make benchmark-beir`, summarized in the README) is
+the sanctioned real-corpus measurement — on public BEIR data, ordvec's
+ordinal modes (`RankQuant` b=2 / b=4) land within bootstrap noise of dense
+exact search at 8–16× smaller vectors. Run the command above on your
+target embeddings to get the number that matters for your deployment; no
+external corpus is shipped in this crate.
 
 ## A null result reported up front
 
@@ -431,10 +430,11 @@ Candidate IDs are global row ordinals; duplicate candidates are scored as
 separate entries and can produce duplicate hits, so callers that need
 unique output rows should deduplicate candidate lists before reranking.
 
-`RankQuantFastscan` (re-exported `#[doc(hidden)]`) is an optional
-single-pass b=2 fast path; it supports `add`/`search` but not
-`swap_remove`/`write`/`load` (see its module docs in
-`src/fastscan.rs`). `MultiBucketBitmap` underwrites the
+`RankQuantFastscan` is a stable, public (but specialized) single-pass b=2
+fast path; it supports `add`/`search`/`write`/`load` (`.ovfs` persistence,
+magic `OVFS`) but not `swap_remove` (see its module docs in
+`src/fastscan.rs`). Metadata-probe support via `probe_index_metadata` is
+deferred to 0.8.0 (#232). `MultiBucketBitmap` underwrites the
 bilinear bucket-overlap decomposition and is reachable only behind the
 `experimental` feature.
 
@@ -516,11 +516,11 @@ multi-seed stability is your call.
    because there is no rotation matmul and no codebook fit — the
    per-vector cost is the `argsort`.
 4. **Recall is corpus-dependent.** The generated Gaussian fixture is a
-   stress test, not the lead quality claim. On the current real arXiv
-   embedding task, full ordinal rank-cosine is within bootstrap noise
-   of dense exact search, and RankQuant b=2 asym matches the tested
-   FAISS HNSW configuration within bootstrap noise. Run the
-   external-corpus bench on your data — see above.
+   stress test, not the lead quality claim. On public BEIR data (the
+   reproducible `make benchmark-beir` harness in the README), ordvec's
+   ordinal modes land within bootstrap noise of dense exact search at a
+   fraction of the storage. Run the external-corpus bench on your data —
+   see above.
 5. **The audit-by-removal rationale.** RankQuant removes training,
    rotation, codebooks, and per-document norms from the pipeline. That
    retrieval still works after the removal is the interesting result:

@@ -189,8 +189,8 @@ double free are undefined behavior.
 
 ## V1 Exclusions
 
-ABI v1 intentionally excludes `Rank`, `SignBitmap`, external IDs, ID maps,
-builders, mutating index APIs, logging callbacks, custom allocators, async
-search, batched search, richer measured timing breakdowns, and release
-packaging. Those can be added in later ABI versions without changing the v1
-struct-size rule.
+ABI v1 intentionally excludes `Rank`, `SignBitmap`, `RankQuantFastscan`
+(the `.ovfs` FastScan path), external IDs, ID maps, builders, mutating index
+APIs, logging callbacks, custom allocators, async search, batched search,
+richer measured timing breakdowns, and release packaging. Those can be added in
+later ABI versions without changing the v1 struct-size rule.
@@ -61,9 +61,10 @@ documented minor release removes them.
 The `experimental` feature is a default-off research surface. Today it exposes
 `MultiBucketBitmap`; it is not patch-stable before 1.0.
 
-`#[doc(hidden)]` exports such as `RankQuantFastscan` and
+`RankQuantFastscan` is a stable, public (but specialized) type, covered by the
+normal pre-1.0 compatibility policy above. `#[doc(hidden)]` exports such as
 `search_asymmetric_byte_lut` are reachable for internal benchmarks and parity
-tests, but they are not part of the stable default API.
+tests, but are not part of the stable default API.
 
 New feature flags must declare their stability class before merging:
 

@@ -67,7 +67,7 @@ the public hit order still follows the global ordering rule above.
 
 ## FastScan
 
-`RankQuantFastscan` is a hidden, optional b=2 pre-ranker. It is deterministic
+`RankQuantFastscan` is a public, specialized, optional b=2 pre-ranker. It is deterministic
 for a fixed index, query, and backend dispatch, and its scalar and AVX-512
 FastScan kernels operate on the same quantized LUT inputs. It is not
 score-equivalent to exact `RankQuant::search_asymmetric`: the global 8-bit LUT