Project-Navi · Navi Bot (project-navi-bot) · Jun 20, 2026 · Jun 19, 2026 · Jun 19, 2026 · Jun 19, 2026
@@ -9,10 +9,33 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Security
 
+- Hardened the Python binding's GIL-released search, candidate, scoring, and
+  `add` paths: NumPy inputs are now copied into Rust-owned buffers before
+  `py.detach`, so safe Python code cannot race a detached Rust read by mutating
+  the same array from another thread. This intentionally trades zero-copy
+  detached reads for race-free copied inputs; large calls may temporarily require
+  an additional input-sized buffer.
 - Updated release governance to document and audit the two-approver
   `crates-io` / `pypi` GitHub Environment gates: `Fieldnote-Echo` and
   `toadkicker` are listed as required reviewers, self-review is blocked, and a
   30-minute wait timer applies before registry publish jobs can proceed.
+- Exposed the calibration-profile byte limit through the `ordvec-manifest`
+  Python bindings, including the default constant, `default_resource_limits()`,
+  and verifier/create keyword arguments.
+- Aligned `.ovfs` / `OVFS` security and provenance docs with the now-public
+  `RankQuantFastscan` persistence loader and fuzz target.
+- Updated formalization links and release invariants after the companion
+  `ordvec-formalization` repository moved under `Project-Navi`.
+
+### Fixed
+
+- Added a persisted-format registry that drives probe, manifest-coverage, and
+  C-ABI load decisions from one table; `.ovfs` now remains explicitly
+  known-but-not-probeable/not-manifest-covered, and the C ABI reports it as an
+  unsupported format rather than a corrupt index.
+- Hid the `SubsetScratch::capacities_for_test` helper behind the non-default
+  `test-utils` feature and cleaned stale release-doc comments around FastScan
+  and b=8 bucket rustdoc.
 
 ## 0.5.0 - 2026-06-19
 

@@ -58,9 +58,9 @@ exclude = [
     "tests/release_signed_release_invariants.sh",
 ]
 
-# docs.rs build configuration: build with default features only, so the
-# experimental MultiBucketBitmap scaffold stays off the published docs.
-# (The `#[doc(hidden)]` FastScan path is hidden by its attribute either way.)
+# docs.rs build configuration: build with default features only. Stable default
+# APIs, including `RankQuantFastscan`, are documented; the experimental
+# MultiBucketBitmap scaffold stays off the published docs.
 [package.metadata.docs.rs]
 all-features = false
 

@@ -170,7 +170,7 @@ are machine-checked in Lean 4, both `sorry`-free on Lean's standard axiom base
   signal model makes an overlap-count threshold Bayes-optimal among
   deterministic admission rules, and the uniform constant-weight bitmap null
   assigns that same threshold event exactly the hypergeometric upper tail — in
-  [`ordvec-formalization`](https://github.com/Fieldnote-Echo/ordvec-formalization)
+  [`ordvec-formalization`](https://github.com/Project-Navi/ordvec-formalization)
   (theorem `exists_uniformBitmapOverlapTail_finiteBayesRisk_le_and_hypergeomTail`).
 
 This is an *in-model* result. It proves the rule shape and the idealized finite
@@ -276,9 +276,10 @@ The runtime dependency floor is `numpy>=2.2`.
 The consolidated cross-language ownership and lifetime contract is in
 [`docs/bindings-safety.md`](docs/bindings-safety.md).
 
-Python search, candidate-generation, and scoring methods release the GIL and
-read NumPy inputs in place. Callers must not mutate query, corpus, candidate,
-or scoring input arrays passed to those methods until the call returns.
+Python search, candidate-generation, scoring, and `add` methods release the GIL
+after copying NumPy inputs into Rust-owned buffers, so ordinary Python in-place
+array mutation in another thread cannot race the detached Rust scan. Large calls
+may temporarily require an additional input-sized buffer.
 
 The C ABI allows concurrent search and info calls on one loaded handle.
 `ordvec_index_free` must not race with any other call on the same handle.
@@ -310,10 +311,10 @@ candidate slices passed to `Search` until the call returns.
   [`docs/compatibility-policy.md`](docs/compatibility-policy.md) defines the
   stable, experimental, repo-local sidecar, persisted-format, examples/docs,
   MSRV, and release-note review surfaces.
-- **Formal proof spine:** [`ordvec-formalization`](https://github.com/Fieldnote-Echo/ordvec-formalization),
-  including its [`proof-spine`](https://github.com/Fieldnote-Echo/ordvec-formalization/blob/main/docs/proof-spine.md),
-  [`theorem-map`](https://github.com/Fieldnote-Echo/ordvec-formalization/blob/main/docs/theorem-map.md),
-  and [`reviewer brief`](https://github.com/Fieldnote-Echo/ordvec-formalization/blob/main/docs/reviewer-brief.md).
+- **Formal proof spine:** [`ordvec-formalization`](https://github.com/Project-Navi/ordvec-formalization),
+  including its [`proof-spine`](https://github.com/Project-Navi/ordvec-formalization/blob/main/docs/proof-spine.md),
+  [`theorem-map`](https://github.com/Project-Navi/ordvec-formalization/blob/main/docs/theorem-map.md),
+  and [`reviewer brief`](https://github.com/Project-Navi/ordvec-formalization/blob/main/docs/reviewer-brief.md).
 - **API docs:** <https://docs.rs/ordvec>, <https://docs.rs/ordvec-manifest>
 - **Paper (OrdVec / RankQuant):** _link TBD — see
   [Research collaboration](#research-collaboration)._

@@ -16,12 +16,12 @@ Use GitHub's private vulnerability reporting:
 We aim to acknowledge reports within a few business days.
 
 `ordvec` parses serialized index files (`.ovr` / `.ovrq` / `.ovbm` /
-`.ovsb`; the loaders also accept the legacy `.tvr` / `.tvrq` / `.tvbm` /
-`.tvsb` magics); the loaders are fuzzed (`cargo +nightly fuzz`), so
-parsing-robustness reports against the deserialization paths are especially
-welcome. Reports are also welcome against the `unsafe` SIMD kernels (shape /
-bounds invariants), the Python FFI contract (buffer handling, GIL discipline),
-and the release pipeline.
+`.ovsb` / `.ovfs`; `.ovfs` uses `OVFS` FastScan magic, and the other loaders
+also accept the legacy `.tvr` / `.tvrq` / `.tvbm` / `.tvsb` magics); the loaders
+are fuzzed (`cargo +nightly fuzz`), so parsing-robustness reports against the
+deserialization paths are especially welcome. Reports are also welcome against
+the `unsafe` SIMD kernels (shape / bounds invariants), the Python FFI contract
+(buffer handling, GIL discipline), and the release pipeline.
 
 ## Threat model
 

@@ -49,14 +49,16 @@ See also: [`SECURITY.md`](SECURITY.md) (reporting), [`RELEASING.md`](RELEASING.m
 
 ## Maintenance budget
 
-`ordvec` is maintained by a single primary contributor. Mitigations are
-prioritized when they are (1) low-maintenance once merged, (2) enforceable by
-tests or CI, (3) local to the library boundary, and (4) unlikely to add
-operational burden downstream. Heavyweight controls (mandatory index signing,
-long-running fuzz farms, service-level admission control) are documented as
-**deployment guidance** until there is maintainer capacity to own them. The
-absence of a second maintainer is itself a tracked supply-chain residual
-(see THREAT-SUPPLY-001).
+`ordvec` has one project lead plus an additional maintainer / release
+approver. Mitigations are prioritized when they are (1) low-maintenance once
+merged, (2) enforceable by tests or CI, (3) local to the library boundary, and
+(4) unlikely to add operational burden downstream. Heavyweight controls
+(mandatory index signing, long-running fuzz farms, service-level admission
+control) are documented as **deployment guidance** unless the project has
+maintainer capacity to own them. Release publication requires a non-triggering
+approver through protected GitHub Environments; the residual release
+supply-chain risk is approver account compromise / collusion, not a
+single-owner project structure (see THREAT-SUPPLY-001).
 
 ---
 

@@ -25,10 +25,13 @@ files without panicking, aborting, or silently accepting garbage:
 - an exact file-length match (trailing bytes or short files are rejected);
 - per-row **structural** invariants: `Rank` rows must be a true permutation of
   `[0, dim)`, `RankQuant` rows must satisfy constant composition, `Bitmap` rows
-  must have exactly `n_top` bits set.
+  must have exactly `n_top` bits set, and direct `RankQuantFastscan` `.ovfs`
+  rows must use valid FastScan nibbles, satisfy b=2 constant composition, and
+  have zero block-tail padding.
 
-A file that survives all of this is **structurally well-formed**. The four
-loaders are exercised by `cargo fuzz` (the `load_*` targets).
+A file that survives all of this is **structurally well-formed**. The five
+loaders are exercised by `cargo fuzz` (the `load_*` targets, including
+`load_fastscan` for `.ovfs`).
 
 ## What the loaders do NOT validate
 

@@ -130,7 +130,7 @@ unknown embedding distribution.
 **Checked finite model: symmetry, quotient sufficiency, threshold,
 calibration.** The proof chain now has a larger machine-checked middle
 than the implementation docs used to claim. In
-[`ordvec-formalization`](https://github.com/Fieldnote-Echo/ordvec-formalization),
+[`ordvec-formalization`](https://github.com/Project-Navi/ordvec-formalization),
 Lean proves that literal bitmap overlap is the canonical invariant
 under query-preserving coordinate relabelings; finite quotient
 sufficiency reduces the admission decision to ordered overlap

@@ -13,8 +13,9 @@ still own scheduling, path trust, input mutability, and deployment provenance.
   Mutation methods such as `add` require exclusive access.
 - Python search, candidate-generation, scoring, and `add` methods release the
   GIL while Rust performs the heavy work. PyO3 still enforces object borrow
-  rules, but caller-owned NumPy arrays are read in place while the GIL is
-  released.
+  rules, and the binding copies NumPy input arrays into Rust-owned buffers
+  before releasing the GIL. Large calls may temporarily require an additional
+  input-sized buffer.
 - The C ABI permits concurrent `ordvec_index_search`,
   `ordvec_index_probe`, and `ordvec_index_info` calls on one loaded handle.
   `ordvec_index_free` must not race with any other call on that handle.
@@ -23,11 +24,13 @@ still own scheduling, path trust, input mutability, and deployment provenance.
 
 ## Borrowed Inputs
 
-Caller-provided buffers are borrowed for the duration of the call and are not
-retained after the function returns.
+Caller-provided Rust slices, C buffers, and Go slices are borrowed for the
+duration of the call and are not retained after the function returns. Python
+NumPy inputs that cross a GIL-released call are copied before the GIL is
+released.
 
-- Do not mutate Rust slices, NumPy arrays, C buffers, or Go slices while a call
-  that received them is in progress.
+- Do not mutate Rust slices, C buffers, or Go slices while a call that received
+  them is in progress.
 - Query, corpus, candidate, output, hit, and stats buffers remain caller-owned
   unless a specific API says otherwise.
 - Candidate lists are entry lists, not sets. Duplicate candidate IDs are scored

@@ -34,7 +34,7 @@ tiered below by **what survived scrutiny**. Read the tiers, not every doc.
 |-----|-------|
 | [density_collapse_results.md](density_collapse_results.md) | **Mechanism.** RankQuant b=2 density collapse = Hamming-near codes the scorer can't separate. Among those lookalikes, true neighbours have lower intra-code Kendall-tau (gap ≈ 0.04, CI > 0). Real but small. |
 | [tau_rerank_bakeoff_results.md](tau_rerank_bakeoff_results.md) | **The verdict.** Does that tau signal beat b=4? NO — b=4 wins even at the tau ceiling; tau scores below b=2's own ordering. Signal is real-but-inert; just use b=4. Closes the line: research, not a feature. |
-| [crt_seam_oracle_results.md](crt_seam_oracle_results.md) | CRT vernier seam theorem — exhaustive finite proof: lcm spacing, one coincidence/period, capped density `∏min(2t+1,m_i)/m_i`. Lean 4 formalization lives in the companion repo: [ordvec-formalization#17](https://github.com/Fieldnote-Echo/ordvec-formalization/pull/17) (open PR, `sorry`-free). |
+| [crt_seam_oracle_results.md](crt_seam_oracle_results.md) | CRT vernier seam theorem — exhaustive finite proof: lcm spacing, one coincidence/period, capped density `∏min(2t+1,m_i)/m_i`. Lean 4 formalization lives in the companion repo: [ordvec-formalization#17](https://github.com/Project-Navi/ordvec-formalization/pull/17) (open PR, `sorry`-free). |
 | [shard_recall_results.md](shard_recall_results.md) | Controlled ablation (post RNG-desync fix): random phase offsets add nothing vs aligned grids across R random directions. |
 | [oblivious_directions_results.md](oblivious_directions_results.md) | **The directions arc (round 2).** Data-oblivious low-discrepancy directions (golden-angle / Sobol / Kronecker) do NOT beat iid-random for training-free routing — across 5 encoders (nomic, bge-m3, bge-large, snowflake-arctic-v2, harrier-oss) at real intrinsic dim 18–24. CLASS-DEAD, pre-registered, replicated (the one mid-ladder flicker failed to replicate). Centering removes the cone but fails at b=4 (penalty grows with capacity). One robust positive: data-aligned (PCA) directions lead at higher ID — the lever is data-alignment, which training-free forbids. Also **resolves the twonn_id PARTIAL**: real-corpus ID measured at ~18–24 across 5 encoders, and ID is a **corpus** property (repo≈13 vs fiqa≈24, same encoder), not an encoder constant. Probes: `uniformity_lemma.rs`, `overlap_decomp.rs`, `centering_recall.rs`, `subspace_directions.rs`, `partition_balance.rs`, `fib_*.rs`. |
 | [length_mixture_lake_results.md](length_mixture_lake_results.md) | **Path B — chunk-length-mixture lake (closes the synthetic-lake arc).** Same fiqa docs embedded at 4 chunk lengths {128,256,512,1100} unioned into a 230k-doc lake; b=4 raw R@10 vs FP32 cosine is **immune** (+0.002, CR@100=1.0). Bonus measurement of the "chunk length is a third geometry axis" claim: real but **small and co-axial** — R̄ spreads only 0.705→0.723 over an 8.6× length range, cone axes ≥0.986 aligned (not the distinct geometries the mixture framing imagined). With Phase B (multi-domain) this leaves every synthetic lake pathology — multi-cone, hub, multi-length — benign for "spend the bits, b=4." Probe: `make_length_lake.py` + `centering_recall.rs`. |

@@ -1,7 +1,7 @@
 # CRT seam oracle — corrected vernier theorem (exhaustive finite proof)
 
 > Lean 4 formalization of this theorem lives in the companion repo:
-> [ordvec-formalization#17](https://github.com/Fieldnote-Echo/ordvec-formalization/pull/17)
+> [ordvec-formalization#17](https://github.com/Project-Navi/ordvec-formalization/pull/17)
 > (open PR, `sorry`-free).
 
 `examples/crt_seam_oracle.rs` enumerates the full ring Z/M to verify the

@@ -234,6 +234,10 @@ void ordvec_index_free(ordvec_index_t *index);
  * and may be unsorted or duplicated. Duplicate candidates are scored as
  * separate entries and can produce duplicate hits; callers that need unique
  * output rows must deduplicate before calling.
+ * Full search is represented by `candidate_count == 0 && candidate_rows == NULL`.
+ * ABI v1 treats `candidate_count == 0 && candidate_rows != NULL` as
+ * `ORDVEC_STATUS_BAD_ARGUMENT`; callers should short-circuit explicit empty
+ * survivor sets before crossing the ABI.
  *
  * # Safety
  *

@@ -10,7 +10,10 @@ use std::path::Path;
 use std::ptr;
 use std::time::Instant;
 
-use ordvec::{probe_index_metadata, Bitmap, IndexKind, IndexMetadata, IndexParams, RankQuant};
+use ordvec::{
+    probe_index_metadata, Bitmap, FfiLoadSupport, IndexKind, IndexMetadata, IndexParams,
+    PersistedFormat, RankQuant,
+};
 
 pub type ordvec_status_t = u32;
 pub type ordvec_index_kind_t = u32;
@@ -733,25 +736,29 @@ pub unsafe extern "C" fn ordvec_index_load(
             .map_err(|err| io_to_ffi(err, "stat index"))?
             .len();
 
-        // Accept both the current `OV*` magics and the legacy turbovec-era
-        // `TV*` magics (back-compat) — mirrors the loaders in `rank_io.rs`.
-        let index = match &magic {
-            b"OVRQ" | b"TVRQ" => LoadedIndex::RankQuant(
-                RankQuant::load(path).map_err(|err| io_to_ffi(err, "load RankQuant index"))?,
-            ),
-            b"OVBM" | b"TVBM" => LoadedIndex::Bitmap(
-                Bitmap::load(path).map_err(|err| io_to_ffi(err, "load Bitmap index"))?,
-            ),
-            b"OVR1" | b"OVSB" | b"TVR1" | b"TVSB" => {
-                return Err(FfiError::new(
-                    ORDVEC_STATUS_UNSUPPORTED_FORMAT,
-                    "ABI v1 supports only RankQuant and Bitmap indexes",
-                ))
+        let spec = ordvec::format::lookup_magic(&magic).ok_or_else(|| {
+            FfiError::new(
+                ORDVEC_STATUS_CORRUPT_INDEX,
+                "unrecognized ordvec index magic",
+            )
+        })?;
+        let index = match spec.ffi_load {
+            FfiLoadSupport::Supported => match spec.format {
+                PersistedFormat::RankQuant => LoadedIndex::RankQuant(
+                    RankQuant::load(path).map_err(|err| io_to_ffi(err, "load RankQuant index"))?,
+                ),
+                PersistedFormat::Bitmap => LoadedIndex::Bitmap(
+                    Bitmap::load(path).map_err(|err| io_to_ffi(err, "load Bitmap index"))?,
+                ),
+                _ => unreachable!("only RankQuant and Bitmap are FFI-loadable in ABI v1"),
+            },
+            FfiLoadSupport::Unsupported { reason } => {
+                return Err(FfiError::new(ORDVEC_STATUS_UNSUPPORTED_FORMAT, reason))
             }
             _ => {
                 return Err(FfiError::new(
-                    ORDVEC_STATUS_CORRUPT_INDEX,
-                    "unrecognized ordvec index magic",
+                    ORDVEC_STATUS_UNSUPPORTED_FORMAT,
+                    "ABI v1 does not support this persisted index format",
                 ))
             }
         };
@@ -894,6 +901,10 @@ pub unsafe extern "C" fn ordvec_index_free(index: *mut ordvec_index_t) {
 /// and may be unsorted or duplicated. Duplicate candidates are scored as
 /// separate entries and can produce duplicate hits; callers that need unique
 /// output rows must deduplicate before calling.
+/// Full search is represented by `candidate_count == 0 && candidate_rows == NULL`.
+/// ABI v1 treats `candidate_count == 0 && candidate_rows != NULL` as
+/// `ORDVEC_STATUS_BAD_ARGUMENT`; callers should short-circuit explicit empty
+/// survivor sets before crossing the ABI.
 ///
 /// # Safety
 ///
@@ -1500,14 +1511,25 @@ mod tests {
         sign.add(&[0.0f32; 64]);
         sign.write(&sign_path).unwrap();
 
+        let fastscan_path = temp_path("fastscan", "ovfs");
+        let mut fastscan = Vec::new();
+        fastscan.extend_from_slice(b"OVFS");
+        fastscan.push(1);
+        fastscan.extend_from_slice(&8u32.to_le_bytes());
+        fastscan.extend_from_slice(&0u32.to_le_bytes());
+        std::fs::File::create(&fastscan_path)
+            .unwrap()
+            .write_all(&fastscan)
+            .unwrap();
+
         let corrupt_path = temp_path("corrupt", "ovrq");
         std::fs::File::create(&corrupt_path)
             .unwrap()
             .write_all(b"OVRQ\x01")
             .unwrap();
 
         unsafe {
-            for path in [&rank_path, &sign_path] {
+            for path in [&rank_path, &sign_path, &fastscan_path] {
                 let cpath = CString::new(path.to_str().unwrap()).unwrap();
                 let mut out = ptr::null_mut();
                 assert_eq!(
@@ -1526,6 +1548,7 @@ mod tests {
         }
         std::fs::remove_file(rank_path).ok();
         std::fs::remove_file(sign_path).ok();
+        std::fs::remove_file(fastscan_path).ok();
         std::fs::remove_file(corrupt_path).ok();
     }
 }
@@ -11,6 +11,7 @@
     DEFAULT_MAX_AUXILIARY_ARTIFACT_BYTES,
     DEFAULT_MAX_AUXILIARY_ARTIFACTS,
     DEFAULT_MAX_CACHED_REPORT_BYTES,
+    DEFAULT_MAX_CALIBRATION_PROFILE_BYTES,
     DEFAULT_MAX_ENCODER_DISTORTION_PROFILE_BYTES,
     DEFAULT_MAX_MANIFEST_BYTES,
     DEFAULT_MAX_REPORT_ISSUES,
@@ -37,6 +38,7 @@
     "DEFAULT_MAX_ROW_IDENTITY_TRACKED_DB_ID_BYTES",
     "DEFAULT_MAX_AUXILIARY_ARTIFACTS",
     "DEFAULT_MAX_AUXILIARY_ARTIFACT_BYTES",
+    "DEFAULT_MAX_CALIBRATION_PROFILE_BYTES",
     "DEFAULT_MAX_ENCODER_DISTORTION_PROFILE_BYTES",
     "DEFAULT_MAX_REPORT_ISSUES",
     "DEFAULT_MAX_CACHED_REPORT_BYTES",