diff --git a/.github/workflows/fuzz.yml b/.github/workflows/fuzz.yml index 8ec38196..e68a5dce 100644 --- a/.github/workflows/fuzz.yml +++ b/.github/workflows/fuzz.yml @@ -1,6 +1,6 @@ name: fuzz -# Bounded cargo-fuzz smoke. The eight targets in fuzz/ are normally exercised +# Bounded cargo-fuzz smoke. The nine targets in fuzz/ are normally exercised # in manual campaigns; this adds CI cadence so a regression that reintroduces a # loader panic / OOM, breaks the write->load round-trip, or destabilises the # FastScan or two-stage retrieval kernels surface in CI rather than only at @@ -10,7 +10,7 @@ name: fuzz # * pull_request / push(main): a SHORT smoke (60s/target) over the # highest-value targets — fast enough to run on every change. # * schedule (weekly) / workflow_dispatch: a LONGER sweep (300s/target) -# across ALL eight targets. +# across ALL nine targets. # # This runs UNATTENDED on a cron schedule, so every third-party action is # SHA-pinned, cargo-fuzz is installed with its bundled lockfile on a pinned @@ -74,7 +74,7 @@ jobs: TARGET: ${{ matrix.target }} run: cargo "+${FUZZ_NIGHTLY}" fuzz run "$TARGET" -- -max_total_time=60 -rss_limit_mb=4096 - # Weekly full sweep over all eight targets at a larger time budget. + # Weekly full sweep over all nine targets at a larger time budget. weekly: name: fuzz weekly (300s) if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' @@ -88,6 +88,7 @@ jobs: - load_rankquant - load_bitmap - load_sign_bitmap + - load_fastscan - roundtrip_rankquant - search_rankquant - fastscan_b2 diff --git a/CHANGELOG.md b/CHANGELOG.md index a4b47b0f..69c32cc7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -31,8 +31,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 and a corpus-size scaling sweep on public BEIR datasets, with the corpus embedded by Harrier-Q8 (GGUF `Q8_0` via `llama-cpp-python`, CUDA). The README now leads with the resulting scaling curve, latency bars, and nDCG@10 table; - every figure is regenerated by the harness (nothing hand-entered). Replaces the - previous private-arXiv real-embedding numbers in the README. + every figure is regenerated by the harness and the README tables transcribe + its summary outputs. Replaces the previous private-arXiv real-embedding + numbers in the README. +- **`RankQuantFastscan` is now a stable, public API** (previously re-exported + `#[doc(hidden)]`), with `.ovfs` / `OVFS` persistence via + `RankQuantFastscan::{write,load}` and a ninth `load_fastscan` cargo-fuzz + target. Metadata-probe support (`probe_index_metadata`) for `.ovfs` is + deferred to 0.8.0 (#233, #232). ### Performance @@ -58,6 +64,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Changed +- **On-disk format magics renamed to `OV*`** (`OVR1` / `OVRQ` / `OVBM` / + `OVSB`). The loaders still accept the legacy `TV*` magics, so every + previously-written `.tvr` / `.tvrq` / `.tvbm` / `.tvsb` file continues to load + unchanged; only the file extensions and magic bytes written by `write()` + change (#230). - **Release-hardened the caller-owned serial two-stage primitives** (no API change; added in 0.5.0). The trust model is now explicit and tested: - Rejection-path regression tests for the full CSR/query/buffer validation set diff --git a/README.md b/README.md index 42975ae3..2afed35c 100644 --- a/README.md +++ b/README.md @@ -12,8 +12,9 @@ Training-free ordinal & sign quantization for vector retrieval. -`ordvec` is a small, dependency-light Rust crate for compressed -nearest-neighbour search over high-dimensional embeddings. +`ordvec` is a small, pure-Rust crate for compressed nearest-neighbour search +that quantizes the **ordinal (rank) and sign structure** of an embedding — +no codebook, no learned rotation, no graph to build. ## Benchmark at a glance @@ -56,10 +57,12 @@ keeps a near-flat per-query cost as the corpus grows, while exact brute-force ## What's different -Compressed-retrieval libraries usually either **fit a codebook to your -data** (product / scalar quantization) or **wrap vectors in a graph** -(HNSW). ordvec does neither — it quantizes the *ordinal* structure of each -vector on its own: +Compressed-retrieval paths almost all carry a **fit step**: product +quantization fits a k-means codebook, OPQ adds a learned rotation, +scalar / binary quantizers calibrate to the data distribution, graph indexes +(HNSW) build a navigable graph, and Matryoshka needs a model trained with its +loss. ordvec fits **none** of them — it quantizes the *ordinal and sign* +structure of each vector on its own: - **Training-free, data-oblivious.** No codebook, no learned rotation, no fit step. Encoding is a per-vector rank (or sign) transform — index the @@ -78,7 +81,10 @@ vector on its own: when `dim % 256 == 0` — not a broad retrieval mode.) - **Two-stage retrieval, built in.** A cheap bitmap / sign-popcount prefilter feeds an exact rerank — the coarse→fine pipeline ships as - library primitives. + library primitives. The coarse-scan→exact-rerank pattern, and the + `RankQuantFastscan` block-32 4-bit LUT path, follow the FAISS FastScan and + binary-quantization-plus-rescore lineage; ordvec ships them + batteries-included and dependency-free, not as new techniques. ordvec is a compressed **flat-scan** substrate (optionally two-stage): small codes scored by fast SIMD — AVX-512/AVX2 runtime-dispatched on x86_64, baseline @@ -100,11 +106,13 @@ large-scale serving rather than competing with one. Two further paths, for callers who need them: -- **`RankQuantFastscan`** *(`#[doc(hidden)]` — reachable as - `ordvec::RankQuantFastscan`, but the API is not yet stable)* — an optional - b=2 FastScan kernel (block-32 PQ-LUT) for absolute-minimum scan latency, at - 2× the RankQuant b=2 footprint (`dim/2` bytes/doc). Surfaced here so - latency-critical callers know it exists. +- **`RankQuantFastscan`** — a stable, documented *but specialized* public + type: an optional b=2 FastScan kernel (block-32 nibble/PQ-LUT, AVX-512 → AVX2 + → scalar dispatch) for absolute-minimum stage-1 scan latency, at 2× the + RankQuant b=2 footprint (`dim/2` bytes/doc) and 8-bit LUT scoring noise. It + persists to `.ovfs` (magic `OVFS`). Reach for it only when scan latency at + b=2 is the binding constraint; the headline retrieval surface is still + `RankQuant` / `Bitmap` / two-stage. - **`MultiBucketBitmap`** *(behind `--features experimental`)* — the multi-bucket bilinear-overlap probe behind the research-side decomposition; an algebraic scaffold, not the top-bucket theorem surface or a production @@ -291,11 +299,11 @@ thread count, no Python/FFI in the hot path: - **`flat`** — exact inner-product brute force (identical retrieval to FAISS `IndexFlatIP`), a pure-Rust SIMD GEMM. *Baseline, not ground truth.* -- **`hnsw`** — pure-Rust HNSW (`hnsw_rs`, M=32, ef=128) — the portable - stand-in for the C++ hnswlib. +- **`hnsw`** — pure-Rust HNSW (`hnsw_rs`, M=32, ef_construction=200, + ef_search=128) — the portable stand-in for the C++ hnswlib. Reproduce end-to-end (downloads the data, embeds, runs every method, renders the -figures) — nothing below is hand-entered: +figures, and emits the summary tables transcribed below): ```sh make bench-beir-setup # Python deps + CUDA llama-cpp-python @@ -426,8 +434,8 @@ clean-checkout kernel sanity check. ## Security: index-file trust -The on-disk formats (`.ovr` / `.ovrq` / `.ovbm` / `.ovsb`; legacy `.tvr` / -`.tvrq` / `.tvbm` / `.tvsb` files still load) carry **no built-in +The on-disk formats (`.ovr` / `.ovrq` / `.ovbm` / `.ovsb` / `.ovfs`; legacy +`.tvr` / `.tvrq` / `.tvbm` / `.tvsb` files still load) carry **no built-in checksum, MAC, or signature — by design.** The loaders validate *structure* (magic, version, bounds, exact-length payload) but not *origin*: a structurally valid file can still be untrusted. If an index file crosses a diff --git a/THREAT_MODEL.md b/THREAT_MODEL.md index 3ca834d5..0c8a0230 100644 --- a/THREAT_MODEL.md +++ b/THREAT_MODEL.md @@ -1,6 +1,6 @@ # Threat Model — `ordvec` -> **Status:** v0.5.0 (pre-1.0), 2026-06-13. This is the maintained threat model +> **Status:** v0.5.0 (pre-1.0), 2026-06-15. This is the maintained threat model > for the `ordvec` Rust crate, C ABI, Go wrapper, PyO3/maturin Python bindings, > and the `ordvec-manifest` sidecar verifier. It is reviewed when the > attack surface changes (new persistence formats, new `unsafe` kernels, new @@ -66,7 +66,7 @@ absence of a second maintainer is itself a tracked supply-chain residual | Layer | Components | Trust boundary | |---|---|---| -| **Deserialization** | `rank_io.rs` — `.ovr` / `.ovrq` / `.ovbm` / `.ovsb` loaders (also accept the legacy `.tvr` / `.tvrq` / `.tvbm` / `.tvsb` magics) | Untrusted filesystem / network byte stream | +| **Deserialization** | `rank_io.rs` — `.ovr` / `.ovrq` / `.ovbm` / `.ovsb` / `.ovfs` loaders (`.ovfs`/`OVFS` is the FastScan format and has no legacy magic; the other four also accept the legacy `.tvr` / `.tvrq` / `.tvbm` / `.tvsb` magics) | Untrusted filesystem / network byte stream | | **Manifest verification** | `ordvec-manifest` — JSON sidecar verifier | Manifest + index + optional row-map files before load | | **Compute kernels** | `fastscan.rs`, `quant_kernels.rs`, `bitmap.rs`, `sign_bitmap.rs` | Trust established after format validation | | **Index API** | `rank.rs`, `quant.rs`, `bitmap.rs`, `sign_bitmap.rs` | Caller-controlled query embeddings | @@ -75,13 +75,14 @@ absence of a second maintainer is itself a tracked supply-chain residual | **Python FFI** | `ordvec-python` (PyO3 / maturin) | Python ↔ Rust boundary; NumPy buffers | | **CI / supply chain** | GitHub Actions workflows; `Cargo.lock`; crates.io + PyPI | GitHub OIDC, crates.io, PyPI trust chains | -The `fuzz/` directory holds **eight** cargo-fuzz targets: `load_rank`, -`load_rankquant`, `load_bitmap`, `load_sign_bitmap` (deserialization); -`roundtrip_rankquant` (write→load round-trip); `search_rankquant` (the -single-rate ingest + asymmetric-search compute path); `fastscan_b2` (the -FastScan b=2 block-32 kernel — the one `unsafe`-heavy scan path the others do -not reach); and `signbitmap_rankquant_twostage` (sign candidate generation -followed by RankQuant subset reranking). +The `fuzz/` directory holds **nine** cargo-fuzz targets: `load_rank`, +`load_rankquant`, `load_bitmap`, `load_sign_bitmap`, `load_fastscan` +(deserialization — the last drives the `.ovfs`/`OVFS` FastScan loader via +`RankQuantFastscan::load`); `roundtrip_rankquant` (write→load round-trip); +`search_rankquant` (the single-rate ingest + asymmetric-search compute path); +`fastscan_b2` (the FastScan b=2 block-32 kernel — the one `unsafe`-heavy scan +path the others do not reach); and `signbitmap_rankquant_twostage` (sign +candidate generation followed by RankQuant subset reranking). ### 1.2 Deployment contexts (for integrators) @@ -125,14 +126,15 @@ followed by RankQuant subset reranking). persistence API is the index types' `write()` / `load()`, making the write→load round-trip a type-level guarantee. -The four loaders are covered by cargo-fuzz targets (the `load_*` targets). +The five loaders are covered by cargo-fuzz targets (the `load_*` targets, +including `load_fastscan` for the `.ovfs` FastScan format). ### 2.2 Index-file risk classes **THREAT-DESER-001 (library-owned, P4): Malformed index file.** The loader must reject corrupt/invalid files without panic, OOM, or trailing-data acceptance. The current implementation satisfies this for all -four formats. *Residual:* `file.metadata()?.len()` is sampled at open time; +five formats. *Residual:* `file.metadata()?.len()` is sampled at open time; on NFS/FUSE mounts with concurrent writers a TOCTOU window exists between `metadata()` and the reads. On writable shared mounts the practical outcome is a read error or `InvalidData`, not an exploit. *Likelihood:* Very Low. @@ -433,7 +435,7 @@ knowledge of quantization parameters and the document distribution. ## 8. Fuzzing coverage (THREAT-FUZZ) -Eight targets cover the four loaders, the write→load round-trip, the +Nine targets cover the five loaders, the write→load round-trip, the single-rate compute path, the FastScan kernel, and the composed SignBitmap→RankQuant retrieval path. @@ -448,7 +450,7 @@ it exercises the AVX-512 kernel. regression.** A `fuzz.yml` workflow now runs a bounded smoke on every pull request and push to `main` (`-max_total_time=60` over `load_rank`, `load_rankquant`, `fastscan_b2`, and `signbitmap_rankquant_twostage`) plus a -weekly full sweep (`-max_total_time=300` over all eight targets), so a +weekly full sweep (`-max_total_time=300` over all nine targets), so a regression that reintroduces a loader panic / OOM, breaks the write→load round-trip, or destabilises the FastScan kernel or composed sign→RankQuant path surfaces in CI diff --git a/docs/RANK_MODES.md b/docs/RANK_MODES.md index f95b4ace..381ebee2 100644 --- a/docs/RANK_MODES.md +++ b/docs/RANK_MODES.md @@ -27,8 +27,8 @@ That runs the head-to-head on a structured synthetic corpus (D=256, N=30,000, 200 queries, 200 cluster prototypes, latent_dim=64; see [Stress test](#stress-test-low-rank-clustered-synthetic) for the exact construction). Results on real embedding corpora are -user-runnable via `--corpus-npy` / `--queries-npy`; the current -arXiv paper-harness result is summarized in the README and the +user-runnable via `--corpus-npy` / `--queries-npy`, and the reproducible +BEIR harness (`make benchmark-beir`) is summarized in the README; the reproduction shape is described under [External-corpus results](#external-corpus-results-user-runnable). @@ -56,7 +56,7 @@ to the top-k scores at finalize so the displayed cosines stay exact. |---|---| | CPU | AMD Ryzen 9 9950X (Zen 5, 16C/32T, full 512-bit AVX-512 datapath) | | OS | CachyOS Linux | -| Compiler | rustc 1.95.0 | +| Compiler | rustc 1.95.0 (bench machine toolchain; the crate MSRV is 1.89 — see [compatibility-policy.md](./compatibility-policy.md)) | | Build | `cargo build --release` with `lto = true, codegen-units = 1, opt-level = 3` | | Detected SIMD | sse4.2, avx2, fma, avx512f, avx512bw, avx512vl | | Latency mode | single-thread per query (rayon parallelises *across* queries; per-query rows measure scan only) | @@ -175,7 +175,7 @@ under [Stress test](#stress-test-low-rank-clustered-synthetic) — it is a generated Gaussian low-rank fixture, useful for exercising the rank-mode kernels and their size/latency tradeoffs. Treat its recall spread as a stress-test result, not the lead retrieval-quality claim; -the current real-embedding arXiv benchmark in the README is the better +the reproducible BEIR benchmark in the README is the better guide to retrieval-relevant ordinal behaviour. Results are with the AVX-512 asymmetric scan enabled where applicable @@ -253,8 +253,8 @@ but it is still a generated Gaussian fixture. It is useful for self-contained kernel checks and for stressing the compression modes; it should not be read as the strongest evidence for the retrieval task. Real sentence/passage embeddings are anisotropic in task-specific ways, -and the current arXiv source-recovery benchmark is more favorable to -the rank transform than this small synthetic fixture. +and real-corpus benchmarks (the reproducible BEIR harness in the README) +are more favorable to the rank transform than this small synthetic fixture. ## What the head-to-head shows @@ -383,14 +383,13 @@ minimal built-in reader — no Python dependency at bench time, and no BLAS. What to expect from real embeddings: dense sentence/passage encoders -often carry retrieval signal in their coordinate order. The current -paper-harness arXiv run (207,695 embeddings, 7,200 source-recovery -queries) has full ordinal rank-cosine within bootstrap noise of dense -exact search and slightly ahead of the tested FAISS HNSW configuration; -RankQuant b=2 asym matches that HNSW configuration within bootstrap -noise at 256 bytes/vector. Run the command above on your target -embeddings to get the number that matters for your deployment — the -arXiv artifact set is not shipped in this crate. +often carry retrieval signal in their coordinate order. The reproducible +in-repo BEIR harness (`make benchmark-beir`, summarized in the README) is +the sanctioned real-corpus measurement — on public BEIR data, ordvec's +ordinal modes (`RankQuant` b=2 / b=4) land within bootstrap noise of dense +exact search at 8–16× smaller vectors. Run the command above on your +target embeddings to get the number that matters for your deployment; no +external corpus is shipped in this crate. ## A null result reported up front @@ -431,10 +430,11 @@ Candidate IDs are global row ordinals; duplicate candidates are scored as separate entries and can produce duplicate hits, so callers that need unique output rows should deduplicate candidate lists before reranking. -`RankQuantFastscan` (re-exported `#[doc(hidden)]`) is an optional -single-pass b=2 fast path; it supports `add`/`search` but not -`swap_remove`/`write`/`load` (see its module docs in -`src/fastscan.rs`). `MultiBucketBitmap` underwrites the +`RankQuantFastscan` is a stable, public (but specialized) single-pass b=2 +fast path; it supports `add`/`search`/`write`/`load` (`.ovfs` persistence, +magic `OVFS`) but not `swap_remove` (see its module docs in +`src/fastscan.rs`). Metadata-probe support via `probe_index_metadata` is +deferred to 0.8.0 (#232). `MultiBucketBitmap` underwrites the bilinear bucket-overlap decomposition and is reachable only behind the `experimental` feature. @@ -516,11 +516,11 @@ multi-seed stability is your call. because there is no rotation matmul and no codebook fit — the per-vector cost is the `argsort`. 4. **Recall is corpus-dependent.** The generated Gaussian fixture is a - stress test, not the lead quality claim. On the current real arXiv - embedding task, full ordinal rank-cosine is within bootstrap noise - of dense exact search, and RankQuant b=2 asym matches the tested - FAISS HNSW configuration within bootstrap noise. Run the - external-corpus bench on your data — see above. + stress test, not the lead quality claim. On public BEIR data (the + reproducible `make benchmark-beir` harness in the README), ordvec's + ordinal modes land within bootstrap noise of dense exact search at a + fraction of the storage. Run the external-corpus bench on your data — + see above. 5. **The audit-by-removal rationale.** RankQuant removes training, rotation, codebooks, and per-document norms from the pipeline. That retrieval still works after the removal is the interesting result: diff --git a/docs/c-api.md b/docs/c-api.md index d936ae3b..a1c2e23b 100644 --- a/docs/c-api.md +++ b/docs/c-api.md @@ -189,8 +189,8 @@ double free are undefined behavior. ## V1 Exclusions -ABI v1 intentionally excludes `Rank`, `SignBitmap`, external IDs, ID maps, -builders, mutating index APIs, logging callbacks, custom allocators, async -search, batched search, richer measured timing breakdowns, and release -packaging. Those can be added in later ABI versions without changing the v1 -struct-size rule. +ABI v1 intentionally excludes `Rank`, `SignBitmap`, `RankQuantFastscan` +(the `.ovfs` FastScan path), external IDs, ID maps, builders, mutating index +APIs, logging callbacks, custom allocators, async search, batched search, +richer measured timing breakdowns, and release packaging. Those can be added in +later ABI versions without changing the v1 struct-size rule. diff --git a/docs/compatibility-policy.md b/docs/compatibility-policy.md index d96bfd65..7be7d561 100644 --- a/docs/compatibility-policy.md +++ b/docs/compatibility-policy.md @@ -61,9 +61,10 @@ documented minor release removes them. The `experimental` feature is a default-off research surface. Today it exposes `MultiBucketBitmap`; it is not patch-stable before 1.0. -`#[doc(hidden)]` exports such as `RankQuantFastscan` and +`RankQuantFastscan` is a stable, public (but specialized) type, covered by the +normal pre-1.0 compatibility policy above. `#[doc(hidden)]` exports such as `search_asymmetric_byte_lut` are reachable for internal benchmarks and parity -tests, but they are not part of the stable default API. +tests, but are not part of the stable default API. New feature flags must declare their stability class before merging: diff --git a/docs/determinism.md b/docs/determinism.md index 364d09ad..0896d99f 100644 --- a/docs/determinism.md +++ b/docs/determinism.md @@ -67,7 +67,7 @@ the public hit order still follows the global ordering rule above. ## FastScan -`RankQuantFastscan` is a hidden, optional b=2 pre-ranker. It is deterministic +`RankQuantFastscan` is a public, specialized, optional b=2 pre-ranker. It is deterministic for a fixed index, query, and backend dispatch, and its scalar and AVX-512 FastScan kernels operate on the same quantized LUT inputs. It is not score-equivalent to exact `RankQuant::search_asymmetric`: the global 8-bit LUT diff --git a/fuzz/run_full_fuzz.sh b/fuzz/run_full_fuzz.sh index 71f8ec7b..3c94d73b 100755 --- a/fuzz/run_full_fuzz.sh +++ b/fuzz/run_full_fuzz.sh @@ -9,12 +9,12 @@ # # Requires: a nightly toolchain and cargo-fuzz (`cargo install cargo-fuzz`). # -# HEAVY BY DEFAULT. The defaults are a long, many-core campaign (~3h x 8 -# targets ~= 24h total; FORKS = cores - 2; peak RAM ~= FORKS x RSS_LIMIT_MB) +# HEAVY BY DEFAULT. The defaults are a long, many-core campaign (~3h x 9 +# targets ~= 27h total; FORKS = cores - 2; peak RAM ~= FORKS x RSS_LIMIT_MB) # tuned for a big workstation. On a laptop or smaller box, DIAL IT DOWN with the # env knobs below so you don't peg every core or exhaust RAM. A quick run: # -# SECS_PER_TARGET=120 FORKS=2 ./fuzz/run_full_fuzz.sh # ~16 min on 2 cores +# SECS_PER_TARGET=120 FORKS=2 ./fuzz/run_full_fuzz.sh # ~18 min on 2 cores # # The script prints the estimated total time + RAM up front and, when run # interactively, waits 5s so you can Ctrl-C and re-run with smaller knobs. @@ -29,7 +29,7 @@ # SECS_PER_TARGET per-target wall-clock budget (default 10800 = 3h) # FORKS concurrent fork workers (default = nproc - 2) # RSS_LIMIT_MB per-process RSS cap (default 3072) -# TARGETS space-separated target list (default = all eight) +# TARGETS space-separated target list (default = all nine) # # Examples: # SECS_PER_TARGET=43200 ./fuzz/run_full_fuzz.sh # 12h per target @@ -53,7 +53,7 @@ SECS_PER_TARGET="${SECS_PER_TARGET:-10800}" NCPU="$(nproc 2>/dev/null || getconf _NPROCESSORS_ONLN 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 1)" FORKS="${FORKS:-$(( NCPU > 2 ? NCPU - 2 : 1 ))}" RSS_LIMIT_MB="${RSS_LIMIT_MB:-3072}" -TARGETS="${TARGETS:-load_rank load_rankquant load_bitmap load_sign_bitmap roundtrip_rankquant search_rankquant fastscan_b2 signbitmap_rankquant_twostage}" +TARGETS="${TARGETS:-load_rank load_rankquant load_bitmap load_sign_bitmap load_fastscan roundtrip_rankquant search_rankquant fastscan_b2 signbitmap_rankquant_twostage}" read -ra _targets <<<"${TARGETS}" n_targets=${#_targets[@]} diff --git a/ordvec-python/python/ordvec/__init__.py b/ordvec-python/python/ordvec/__init__.py index 18d65556..df608def 100644 --- a/ordvec-python/python/ordvec/__init__.py +++ b/ordvec-python/python/ordvec/__init__.py @@ -15,7 +15,9 @@ probing and manifest-verification helpers remain available through the Rust crates and the ``ordvec-manifest`` CLI; the low-level ``rank_io`` read/write functions are reached through the classes' ``write()`` / ``load()`` methods -rather than exposed as standalone free functions. +rather than exposed as standalone free functions. The specialized +``RankQuantFastscan`` b=2 fast path (and its ``.ovfs`` persistence) is a +Rust-only type and is intentionally not exposed in this binding. ``Bitmap`` exposes the constant-weight top-bucket overlap statistic formalized in the companion ``ordvec-formalization`` Lean repo: under explicit finite diff --git a/src/fastscan.rs b/src/fastscan.rs index dd01b956..a1c3c0e8 100644 --- a/src/fastscan.rs +++ b/src/fastscan.rs @@ -15,12 +15,12 @@ //! a single-shot `add()` (the block layout's tail padding does not //! compose with incremental extend). //! -//! This module is intentionally *not* part of the headline API. The -//! [`RankQuantFastscan`] wrapper is re-exported `#[doc(hidden)]` -//! and the free [`search_asymmetric_fastscan_b2`] entry point is -//! `pub(crate)`: production callers should reach for +//! [`RankQuantFastscan`] is a stable, documented *but specialized* public +//! type — not the headline API. The free [`search_asymmetric_fastscan_b2`] +//! entry point stays `pub(crate)`: production callers should reach for //! [`RankQuant::search_asymmetric`](crate::RankQuant::search_asymmetric), -//! whose AVX-512 → AVX2 → scalar dispatch is the maintained surface. +//! whose AVX-512 → AVX2 → scalar dispatch is the maintained surface. Prefer +//! FastScan only when b=2 scan latency is the binding constraint. //! This latency path is not part of the constant-weight bitmap overlap //! calibration theorem. //! diff --git a/src/lib.rs b/src/lib.rs index 400ff02c..1b131390 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -33,7 +33,9 @@ //! b=2 storage and 8-bit LUT scoring noise. Reach for it only when scan latency //! is the binding constraint. //! -//! These four families are the retrieval surface. The `experimental` +//! These four families are the headline retrieval surface, with +//! [`RankQuantFastscan`] as the specialized b=2 latency companion above. The +//! `experimental` //! `MultiBucketBitmap` indexed contingency / projection API is a niche //! research/analysis substrate for the bilinear bucket-overlap decomposition — //! it is **not** a default single-score retrieval path and was never diff --git a/src/rank_io.rs b/src/rank_io.rs index 5da76ce2..27e6dc68 100644 --- a/src/rank_io.rs +++ b/src/rank_io.rs @@ -44,7 +44,9 @@ //! //! The supported persistence API is the index types' `write()` / `load()` //! methods: [`Rank`](crate::Rank) / [`RankQuant`](crate::RankQuant) / -//! [`Bitmap`](crate::Bitmap) / [`SignBitmap`](crate::SignBitmap). The +//! [`Bitmap`](crate::Bitmap) / [`SignBitmap`](crate::SignBitmap) / +//! [`RankQuantFastscan`](crate::RankQuantFastscan) (the last via the `.ovfs` +//! format). The //! `write_*` / `load_*` format helpers in this module are **crate-internal** //! (`pub(crate)`); only the `MAX_*` capacity constants are public. //! diff --git a/src/sign_bitmap.rs b/src/sign_bitmap.rs index e27d4a36..2004a514 100644 --- a/src/sign_bitmap.rs +++ b/src/sign_bitmap.rs @@ -7,10 +7,10 @@ //! This is the **SimHash family** primitive (Charikar 2002) applied to //! native embedding coords rather than random projections. For //! contrastively-trained embeddings (e.g. BGE or OpenAI ada), the -//! native coord axes already carry semantically-aligned -//! signal — making direct sign quantization competitive with, and -//! sometimes superior to, learned hash codes or rank-thresholded -//! bitmaps at the same byte budget. +//! native coord axes already carry semantically-aligned signal, so the +//! sign pattern alone preserves much of the angular structure that cosine +//! ranking depends on — which is what lets a `dim/8`-byte sign code serve +//! as a useful candidate-generation substrate. //! //! Score: `agreement(q, d) = dim - popcount(q ^ d)`. The kernel //! computes the per-doc Hamming distance via popcount(XOR); the