Project-Navi · Navi Bot (project-navi-bot) · May 26, 2026 · May 26, 2026 · May 26, 2026 · May 26, 2026
@@ -0,0 +1,69 @@
+# Releasing `ordvec`
+
+> **Publish is held.** A real `cargo publish` / PyPI publish happens only
+> on the maintainer's explicit go. CI never publishes for real — the crate job
+> runs `cargo publish -p ordvec --dry-run --locked`, and the PyPI wheel is
+> `publish = false` on crates.io and ships separately.
+
+`ordvec` (the Rust crate) and `ordvec` on PyPI (the PyO3 wheel built from
+`ordvec-python/`) are released by **manually dispatching** the release
+workflows. Nothing ships on a tag push or a merge.
+
+## Release pipeline controls
+
+Both `release-crate.yml` and `release-python.yml`:
+
+- are **`workflow_dispatch`-only** (no `push` / tag trigger);
+- run a **`require-ci-green`** gate confirming `ci.yml` (and, for the wheel,
+  `python.yml`) are green for the target commit on `main`;
+- publish via **OIDC trusted publishing** (no long-lived crates.io / PyPI
+  tokens in the repo);
+- emit **SLSA build provenance** (`actions/attest-build-provenance`) **before**
+  publishing — a failed attestation fails the release closed, so nothing ships
+  without provenance recorded first;
+- pin every third-party action by **commit SHA**, set
+  `persist-credentials: false`, and default to `permissions: contents: read`.
+
+`release-python.yml` additionally produces **PEP 740** attestations via the PyPI
+Trusted Publishing step.
+
+### Environment protection (configured in repo settings, not in code)
+
+- **Required reviewer** — each environment (`crates-io`, `pypi`) requires
+  maintainer (`Fieldnote-Echo`) approval before the publish job runs.
+- **Deployment branch** — each environment is restricted to **`main`**, the
+  only ref a release may be dispatched from. This makes "only `main` can
+  publish" a configuration invariant rather than a manual check at approval
+  time.
+
+> These two settings are the supply-chain backstop the workflow code cannot
+> express on its own (THREAT-SUPPLY-001 in [THREAT_MODEL.md](THREAT_MODEL.md)).
+
+### Recommended (open)
+
+- A **`v*` tag-protection ruleset** (block update + deletion) and a basic
+  `main` ruleset, so a release tag cannot be force-moved and `main` cannot be
+  force-pushed/deleted (THREAT-SUPPLY-002). Registries are already immutable
+  (crates.io is yank-only; PyPI burns a version on delete), so this closes the
+  remaining GitHub-side mutability surface.
+
+## Checklist
+
+1. Land everything on `main`; confirm the working tree and `Cargo.lock` are in
+   sync (`cargo build --locked`).
+2. Bump the version (crate `Cargo.toml`, and `ordvec-python` if the wheel
+   changed) and update `CHANGELOG.md`. Commit on `main`.
+3. Confirm CI is **green for that exact `main` SHA** (the dispatch ref must be
+   `main` — the environment will refuse any other branch).
+4. Get the maintainer's explicit go to publish.
+5. Dispatch `release-crate.yml` (crate) and/or `release-python.yml` (wheel)
+   from **`main`**.
+6. Approve the environment deployment when prompted (required reviewer).
+7. Verify the published artifact (crates.io / docs.rs / PyPI) and its
+   provenance, and — for a coordinated release — the Zenodo deposit.
+
+## Coordinated release note
+
+The crate publish, the PyPI wheel, and the paper's Zenodo deposit are
+coordinated (the paper consumes the bindings for a final cold-repro run). Do
+not ship one leg in isolation without the maintainer's go.
@@ -18,4 +18,12 @@ We aim to acknowledge reports within a few business days.
 `ordvec` parses serialized index files (`.tvr` / `.tvrq` / `.tvbm` /
 `.tvsb`); the loaders are fuzzed (`cargo +nightly fuzz`), so
 parsing-robustness reports against the deserialization paths are especially
-welcome.
+welcome. Reports are also welcome against the `unsafe` SIMD kernels (shape /
+bounds invariants), the Python FFI contract (buffer handling, GIL discipline),
+and the release pipeline.
+
+## Threat model
+
+See [`THREAT_MODEL.md`](THREAT_MODEL.md) for the full attack-surface analysis —
+existing defenses, known residual risks, and the library-owned vs
+deployment-owned split.
@@ -0,0 +1,25 @@
+# Codecov is a dashboard + README badge for this repo. The *enforced* coverage
+# gate is the cargo-llvm-cov `--fail-under-lines 78` floor in
+# .github/workflows/coverage.yml — set under the AVX-512-free runner figure:
+# the hosted coverage runner has no AVX-512, so the runtime SIMD dispatch never
+# reaches the AVX-512 kernels (they are exercised by the separate `avx512` job
+# under Intel SDE). See issue #68.
+coverage:
+  status:
+    project:
+      default:
+        target: 78%       # mirror the enforced cargo-llvm-cov floor
+        threshold: 1%
+    patch:
+      default:
+        # The AVX-512 kernels cannot be covered on the no-AVX-512 coverage
+        # runner, so patch coverage on any SIMD-kernel change is a false signal
+        # (touching a kernel re-indents lines the runner never executes — see
+        # #68). Keep patch advisory rather than blocking PRs on it; real
+        # coverage enforcement lives in the workflow floor above.
+        informational: true
+
+# The cargo-fuzz workspace is excluded from the crate build and is not part of
+# the tested surface measured by cargo-llvm-cov.
+ignore:
+  - "fuzz"
@@ -0,0 +1,55 @@
+# Index file provenance
+
+`ordvec` persists indexes as `.tvr` / `.tvrq` / `.tvbm` / `.tvsb` files and
+reloads them through `Rank::load`, `RankQuant::load`, `Bitmap::load`, and
+`SignBitmap::load`. This note states exactly **what the loaders guarantee and
+what they do not**, so you can decide whether an index file needs out-of-band
+verification before you load it.
+
+## What the loaders validate
+
+The loaders treat the byte stream as **untrusted input** and reject malformed
+files without panicking, aborting, or silently accepting garbage:
+
+- magic + version checks before any allocation;
+- fallible allocation (`try_reserve_exact`) — an attacker-controlled length
+  field returns `InvalidData`, never an OOM abort;
+- all payload sizes computed with `checked_mul`; overflow is an error;
+- a 128 GiB `MAX_PAYLOAD` cap plus `MAX_VECTORS` / `MAX_DIM` caps;
+- an exact file-length match (trailing bytes or short files are rejected);
+- per-row **structural** invariants: `Rank` rows must be a true permutation of
+  `[0, dim)`, `RankQuant` rows must satisfy constant composition, `Bitmap` rows
+  must have exactly `n_top` bits set.
+
+A file that survives all of this is **structurally well-formed**. The four
+loaders are exercised by `cargo fuzz` (the `load_*` targets).
+
+## What the loaders do NOT validate
+
+The loaders validate **structure, not origin or truth**:
+
+- They do **not** authenticate who produced the file or whether it was modified
+  in transit or at rest. There is no signature, MAC, or checksum in the format.
+- A **structurally valid but semantically poisoned** index — one whose ranks,
+  buckets, or bitmaps were crafted to bias retrieval — passes every check and
+  returns attacker-influenced results. This is a *provenance* problem, not a
+  parser problem (THREAT-DESER-002 / THREAT-POISON-\* in
+  [../THREAT_MODEL.md](../THREAT_MODEL.md)).
+
+## Guidance for deployments where index files cross a trust boundary
+
+If you load index files that were produced elsewhere, transferred over a
+network, or stored on shared/mutable infrastructure, verify them **before**
+loading using whatever your deployment already trusts:
+
+- a checksum manifest (e.g. SHA-256) recorded by the build that produced the
+  index, verified at load time;
+- your artifact store's integrity controls;
+- a signature / attestation layer (e.g. Sigstore, GitHub artifact attestations)
+  over the index files.
+
+`ordvec` deliberately ships **no** built-in signing/MAC layer today: without a
+concrete deployment requiring it, an in-format crypto layer would add key
+management with no clear owner. A sidecar verifier (e.g. an `ordvec verify`
+utility, or an external HMAC/BLAKE3 manifest) can be added later **without a
+file-format change** if a real deployment needs tamper-evidence.
@@ -64,3 +64,12 @@ path = "fuzz_targets/roundtrip_rankquant.rs"
 test = false
 doc = false
 bench = false
+
+# FastScan b=2 compute path (`RankQuantFastscan`): the one unsafe-heavy scan
+# kernel the `search_rankquant` target does not reach.
+[[bin]]
+name = "fastscan_b2"
+path = "fuzz_targets/fastscan_b2.rs"
+test = false
+doc = false
+bench = false
@@ -0,0 +1,52 @@
+//! libFuzzer target for the FastScan b=2 compute path (`RankQuantFastscan`):
+//! `add` (rank_transform -> bucket -> block-32 re-pack via `pack_fastscan_b2`)
+//! then `search` (`search_asymmetric_fastscan_b2` -> the scalar / AVX-512
+//! VPSHUFB-LUT kernel -> TopK). This is the one `unsafe`-heavy scan path the
+//! `search_rankquant` target does NOT reach: `RankQuant::search_asymmetric`
+//! dispatches the single-rate kernels, never the FastScan block-32 kernel.
+//!
+//! `dim` is fixed at 64 — `RankQuantFastscan::new` requires `dim % 4 == 0`
+//! (b=2 constant composition) and `dim <= u16::MAX`; 64 also gives a
+//! `dim / 2 = 32`-pair inner loop. The fuzzer shapes the doc count (crossing
+//! the 32-doc block boundary so tail-padding blocks are exercised), the
+//! embedding/query values, and `k` (including `k == 0`). Values map to finite
+//! f32: the public API rejects NaN / ±Inf by contract, so raw float bit
+//! patterns would only re-exercise that guard, not the kernel.
+//!
+//! On CI runners without AVX-512 this drives the scalar reference kernel
+//! (`scan_b2_fastscan_scalar`); under Intel SDE it drives the AVX-512 kernel.
+//!
+//! Contract: no panic, abort, or out-of-bounds access on any input.
+#![no_main]
+
+use libfuzzer_sys::fuzz_target;
+use ordvec::RankQuantFastscan;
+
+fuzz_target!(|data: &[u8]| {
+    if data.len() < 3 {
+        return;
+    }
+    // dim % 4 == 0 and dim <= u16::MAX (RankQuantFastscan::new contract).
+    const DIM: usize = 64;
+    // 1..=100 docs — crosses the 32-doc block boundary (1..=4 blocks) so the
+    // tail-padding path (`n % 32 != 0`) is exercised.
+    let n = (data[0] as usize % 100) + 1;
+    let k = data[1] as usize % (n + 1); // 0..=n
+
+    let payload = &data[2..];
+    let total = (n + 1) * DIM;
+    let floats: Vec<f32> = (0..total)
+        .map(|i| {
+            if payload.is_empty() {
+                0.0
+            } else {
+                payload[i % payload.len()] as f32 - 128.0
+            }
+        })
+        .collect();
+    let (vecs, query) = floats.split_at(n * DIM);
+
+    let mut idx = RankQuantFastscan::new(DIM);
+    idx.add(vecs);
+    let _ = idx.search(query, k);
+});