diff --git a/Cargo.toml b/Cargo.toml
index 8e7ec30c..9f38180d 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -42,6 +42,7 @@ exclude = [
     "docs/FOLLOWUP_BODY_KERNEL_TIE_BREAK.md",
     "docs/INDEX_PROVENANCE.md",
     "docs/c-api.md",
+    "experiments/",
     "fuzz/",
     "ordvec-ffi/",
     "ordvec-go/",
diff --git a/README.md b/README.md
index 41104549..ff81c9ab 100644
--- a/README.md
+++ b/README.md
@@ -82,6 +82,12 @@ structure of each vector on its own:
   fit step. Encoding is a per-vector rank (or sign) transform — index the
   very first vector with no prior data, and never refit when the corpus
   drifts.
+- **Robust by construction on messy corpora.** Because the code is built
+  from per-vector ranks (magnitude discarded) with no global frequency / IDF
+  term, the two things that corrupt learned codebooks — near-duplicate hubs
+  and mixed chunk lengths — have nothing to grab: b=4 R@10 moved −0.002 under
+  15% templated-hub injection and +0.002 across a four-chunk-length mixture.
+  ([details + honest scope](docs/RANK_MODES.md#a-consequence-robust-by-construction-on-messy-corpora))
 - **Zero system dependencies.** Pure Rust — no BLAS / LAPACK / `ndarray` /
   `faer`. Builds and cross-compiles cleanly, including to `aarch64` and
   `wasm32`.
diff --git a/docs/RANK_MODES.md b/docs/RANK_MODES.md
index 381ebee2..01b54091 100644
--- a/docs/RANK_MODES.md
+++ b/docs/RANK_MODES.md
@@ -158,6 +158,43 @@ fraction of the full-scan latency. The `bench_rank` run prints this
 as its `TwoStage ...` rows with the per-M candidate-recall (`CR`)
 figure attached.
 
+## A consequence: robust by construction on messy corpora
+
+The same two properties that make the score combinatorial — it is a
+function of **ranks** (magnitude discarded) and carries **no global
+per-coordinate frequency / IDF term** — also make the encoding
+*robust* on the heterogeneous data a real corpus dump contains. The
+two failure modes that corrupt learned codebooks and IDF-weighted
+schemes have nothing to grab here:
+
+- **Templated / near-duplicate hubs.** A hub poisons schemes with a
+  global frequency term by inflating it. RankQuant has no such term, so
+  injected boilerplate stays far from real queries in rank space and
+  never enters the top-k. Measured: injecting templated near-duplicate
+  clusters into a 281k-doc multi-domain union at rising prevalence
+  moved b=4 R@10 by **−0.002 through 15%** — flat to noise.
+- **Mixed chunk lengths.** Documents chunked at every length at once
+  produce cones of slightly different tightness. Because the per-vector
+  rank transform discards the magnitude that a tightness shift moves,
+  the *ranks* — and the score — are unchanged. Measured: a 230k-doc
+  lake unioning the same documents embedded at four chunk lengths
+  {128, 256, 512, 1100} moved b=4 R@10 by **+0.002** (candidate recall
+  stayed 1.0) versus the single-length baseline.
+
+This is robustness *because* the code is deliberately oblivious, not in
+spite of it — the property is a direct corollary of the
+constant-composition mechanism above, no tuning involved.
+
+**Scope (honest).** These are clean embeddings of curated corpora
+(fiqa / nq / quora) made messy synthetically — multi-cone unions,
+templated hubs, and chunk-length mixtures. They establish that
+multi-domain, hub-heavy, and multi-length geometry are benign. They do
+**not** model OCR garbage, mixed-language, or broken-encoding text,
+which clean embeddings cannot reproduce; that case is untested. The
+full pre-registered record (and the matching negative for
+oblivious-*direction* structure across five encoders) is in
+[`experiments/ordinal-routing-research/`](../experiments/ordinal-routing-research/README.md).
+
 ## Synthetic stress-test numbers
 
 This is the clean-checkout stress test — regenerated by the default
diff --git a/experiments/ordinal-routing-research/ADVERSARIAL_REVIEW.md b/experiments/ordinal-routing-research/ADVERSARIAL_REVIEW.md
new file mode 100644
index 00000000..ad7cfbde
--- /dev/null
+++ b/experiments/ordinal-routing-research/ADVERSARIAL_REVIEW.md
@@ -0,0 +1,107 @@
+# Adversarial review findings (three independent hostile reviewers)
+
+This branch was reviewed by three adversarial agents (Rust probes, math proofs,
+Lean skeleton). Their findings are recorded here verbatim-in-summary so the PR
+carries its own critique. Conclusions are tiered by what SURVIVED review.
+
+## SOUND (ships as claimed)
+
+- **CRT seam oracle** (`examples/crt_seam_oracle.rs`): exhaustive finite proof.
+  Coincidence spacing = lcm; exactly one all-L coincidence per period; the
+  honest negative result that phases cannot generically rescue a pointwise floor.
+  All three reviewers independent-agree this file is honest.
+- **TwoNN metric fix** (chord vs cosine): the squared-distance bug fix is correct
+  and validated on sphere controls.
+
+## CORRECTED IN THIS BRANCH (was wrong/loose, now fixed)
+
+- **CRT density closed form**: must be `∏ min(2t+1, m_i)/m_i`, NOT `∏(2t+1)/m_i`.
+  The literal product exceeds 1 at t=3 (period-3 grid saturates) — the SAME error
+  class the doc attributed to the rejected `(2t)^L/M` form. The Rust code already
+  caps with `.min(p)`; the markdown and Lean statements are corrected to match.
+  Valid precondition for the uncapped form: `2t+1 ≤ min_i m_i` (t≤1 here).
+- **Lean `crtEquiv`**: false as typed — `ZMod 0 = ℤ` breaks the cardinality
+  match. Needs `[∀ i, NeZero (m i)]`. Signature corrected; `hcongr` lemma plan
+  corrected to `Equiv.subtypeEquiv` + `Fintype.card_congr` (the cited
+  `Equiv.card_filter_map` does not exist).
+
+## DEMOTED TO EXPLORATORY (known confounds — NOT a settled finding)
+
+- **"Routing keys are super-Poisson, never rigid"** (`spectral_probe.rs`,
+  `withdrawn/corpus_zoo_results.md`): **WITHDRAWN — the probe does not measure what it
+  claims.** Attempted salvage (added `--unfold-smooth K`, a K-knot empirical
+  unfold) INVERTED the result: under the smooth unfold the isotropic corpus
+  reads super-Poisson (Σ²/L 1.7→12) and clustered reads LOWER (1.4→6.8) — the
+  opposite ordering from the Gaussian-unfold version. So the two unfolds
+  disagree and NEITHER is validated against analytic ground truth. The
+  Gaussian-unfold "isotropic = clean Poisson 0.99" was not a control passing —
+  it was the single marginal the wrong unfold happened to fit. The smooth unfold
+  has its own artifact (knot-scale + interpolation structure). CONCLUSION: the
+  number-variance empirical finding is UNSUPPORTED in either direction. Fixing it
+  requires rebuilding the estimator against a case with KNOWN analytic number
+  variance (e.g. a stationary process with closed-form Σ²(L)) to calibrate the
+  unfold + window estimator before trusting any corpus reading. This is open
+  follow-up work, not a patch. The THEORY (rigidity_impossibility_proofs.md) is
+  unaffected — it does not depend on this probe.
+- **"Random offsets redundant / coprime adds nothing"** (`shard_recall.rs`):
+  **Bug L FIXED.** `build_projs` now seeds direction and phase RNGs separately
+  and identically across arms, so aligned vs random-offset share the same R
+  directions and the ablation is clean. Re-run: aligned 0.9095 vs random-offset
+  0.9080 (tied) — the "random offsets add nothing" claim now holds on a
+  controlled comparison. Still-open caveat: the "fair envelope" undersells the
+  coprime arms (they subdivide buckets and saturate below high budgets), and
+  coprimality across R directions is the wrong geometry — the within-axis vernier
+  harness remains unbuilt (theory is in crt_seam_oracle).
+- **`gen_corpus` Bug O FIXED.** Corpus and queries now share one geometry
+  (`A` + prototypes seeded from a dedicated geometry-only RNG keyed by cfg.seed),
+  so query/corpus latent spaces match and shard_recall ground truth is valid.
+
+## IMPOSSIBILITY PROOFS — repairable, currently overstated
+
+- **Theorem 2**: conclusion (no rigidity) holds, but proof text is WRONG as
+  written — n i.i.d. uniforms are a BINOMIAL process, Σ²(L) = L(1−L/n), not
+  "Poisson, Σ²=L exactly, independent". Restate with the binomial value; the
+  Θ(L) conclusion is unaffected.
+- **Theorem 3**: correct under its hypothesis but the hypothesis is smuggled —
+  "any fixed-distribution corpus" must be stated as "conditionally i.i.d. given
+  latent θ (mixture of i.i.d.)". de Finetti is decorative; it's the law of total
+  variance. Finite-without-replacement (the escape hatch) is excluded by
+  ASSUMPTION, not proof.
+- **NON-SEQUITUR (must retract)**: "quantile bucketing is optimal against the
+  entire achievable class / prime-spectral structure provably is not there" does
+  NOT follow from Σ²(L) ≥ L. Number variance and partition recall/load-balance
+  are different figures of merit. Narrow the claim to: "the key is not
+  number-variance-rigid" (true), and drop the optimality-over-all-partitions
+  claim unless separately proved (likely false).
+
+## Round 2 (real-embedding pipeline + post-PR code review)
+
+After the real-embedding work and the PR was opened, a second hostile review
+(plus the Gemini/qodo PR bots) hit the new material:
+
+- **CRITICAL — density-collapse headline was an artifact, now corrected.** The
+  win-rate climb 0.667→0.930 with top-k was an estimator-variance effect (M2),
+  and tau was computed on the probe's own coords, coupling it to cosine (M1).
+  FIXED: tau now uses the per-pair UNION of top coords (de-circularized), and we
+  report the tau GAP (effect size) with a bootstrap 95% CI instead of win rate.
+  Result survives but is MODEST and FLAT: gap ≈ 0.04, CI strictly > 0 at every
+  top-k. The "sharpening / signature of a real effect" claim is RETRACTED; the
+  small-but-real separation stands.
+- **qodo: bucketing bug FIXED.** density_collapse reimplemented bucketing as
+  `rank/(d/2^bits)` (panics at d/2^bits==0; wrong for non-divisible dims). Now
+  uses `ordvec::rank::rank_to_bucket` — measures REAL RankQuant behavior.
+- **embed_ollama.py hardened:** E2 (silent row misalignment if ollama returns
+  wrong count) now aborts; E3 (empty corpus) guarded.
+- **Reproducibility (E4/E5):** the repo-sentence extraction + embed procedure is
+  now recorded verbatim in density_collapse_results.md (was unrecorded).
+- **G1 overclaim:** body language softened; single-corpus/single-model
+  generality explicitly NOT claimed.
+
+## Net
+
+The mathematically defensible deliverables are: the CRT vernier structure
+(oracle + corrected density + Lean skeleton with corrected `crtEquiv` signature),
+and the TwoNN metric fix. The empirical rigidity/routing findings are
+exploratory with identified confounds and concrete salvage paths. The
+impossibility theorems are directionally right but need restatement (binomial
+not Poisson; mixture hypothesis explicit; optimality claim retracted).
diff --git a/experiments/ordinal-routing-research/README.md b/experiments/ordinal-routing-research/README.md
new file mode 100644
index 00000000..0b5305f1
--- /dev/null
+++ b/experiments/ordinal-routing-research/README.md
@@ -0,0 +1,89 @@
+# Ordinal-routing research — reviewer's guide
+
+Exploratory investigation into ordvec's **density behavior** and whether
+prime/spectral structure can improve training-free routing. Everything lives in
+`experiments/ordinal-routing-research/` — the findings/proofs (`*.md`) next to the
+probes that produced them (`*.rs`, `embed_ollama.py`) — **no changes to the
+`ordvec` crate or its public API.** This directory is `package.exclude`d, so it
+ships with the source tree but not the published crate.
+
+> **Running the probes.** These were developed and run as Cargo examples. The
+> `cargo run --release --example <name>` commands and `examples/…` paths
+> throughout these docs refer to that original layout. To reproduce, copy the
+> relevant `.rs` from this directory into the crate's `examples/` directory and
+> run the command shown (they depend only on the existing `ordvec` / `rand` /
+> `rayon` dev-deps). They are kept here as reference source, not as a compiled
+> build target.
+
+Reviewed by three internal adversarial agents plus the PR bots; findings are
+tiered below by **what survived scrutiny**. Read the tiers, not every doc.
+
+## 3-minute path
+
+1. This file (the tiers + the verdict at the bottom).
+2. **[density_collapse_results.md](density_collapse_results.md)** — the mechanism
+   (real-embedding, with its honest correction), then
+   **[tau_rerank_bakeoff_results.md](tau_rerank_bakeoff_results.md)** — the
+   decisive negative: it doesn't beat b=4.
+3. **[ADVERSARIAL_REVIEW.md](ADVERSARIAL_REVIEW.md)** — what was challenged,
+   fixed, retracted, withdrawn. The integrity record.
+
+## SOUND — proven or real-data confirmed
+
+| doc | claim |
+|-----|-------|
+| [density_collapse_results.md](density_collapse_results.md) | **Mechanism.** RankQuant b=2 density collapse = Hamming-near codes the scorer can't separate. Among those lookalikes, true neighbours have lower intra-code Kendall-tau (gap ≈ 0.04, CI > 0). Real but small. |
+| [tau_rerank_bakeoff_results.md](tau_rerank_bakeoff_results.md) | **The verdict.** Does that tau signal beat b=4? NO — b=4 wins even at the tau ceiling; tau scores below b=2's own ordering. Signal is real-but-inert; just use b=4. Closes the line: research, not a feature. |
+| [crt_seam_oracle_results.md](crt_seam_oracle_results.md) | CRT vernier seam theorem — exhaustive finite proof: lcm spacing, one coincidence/period, capped density `∏min(2t+1,m_i)/m_i`. Lean 4 formalization lives in the companion repo: [ordvec-formalization#17](https://github.com/Fieldnote-Echo/ordvec-formalization/pull/17) (open PR, `sorry`-free). |
+| [shard_recall_results.md](shard_recall_results.md) | Controlled ablation (post RNG-desync fix): random phase offsets add nothing vs aligned grids across R random directions. |
+| [oblivious_directions_results.md](oblivious_directions_results.md) | **The directions arc (round 2).** Data-oblivious low-discrepancy directions (golden-angle / Sobol / Kronecker) do NOT beat iid-random for training-free routing — across 5 encoders (nomic, bge-m3, bge-large, snowflake-arctic-v2, harrier-oss) at real intrinsic dim 18–24. CLASS-DEAD, pre-registered, replicated (the one mid-ladder flicker failed to replicate). Centering removes the cone but fails at b=4 (penalty grows with capacity). One robust positive: data-aligned (PCA) directions lead at higher ID — the lever is data-alignment, which training-free forbids. Also **resolves the twonn_id PARTIAL**: real-corpus ID measured at ~18–24 across 5 encoders, and ID is a **corpus** property (repo≈13 vs fiqa≈24, same encoder), not an encoder constant. Probes: `uniformity_lemma.rs`, `overlap_decomp.rs`, `centering_recall.rs`, `subspace_directions.rs`, `partition_balance.rs`, `fib_*.rs`. |
+| [length_mixture_lake_results.md](length_mixture_lake_results.md) | **Path B — chunk-length-mixture lake (closes the synthetic-lake arc).** Same fiqa docs embedded at 4 chunk lengths {128,256,512,1100} unioned into a 230k-doc lake; b=4 raw R@10 vs FP32 cosine is **immune** (+0.002, CR@100=1.0). Bonus measurement of the "chunk length is a third geometry axis" claim: real but **small and co-axial** — R̄ spreads only 0.705→0.723 over an 8.6× length range, cone axes ≥0.986 aligned (not the distinct geometries the mixture framing imagined). With Phase B (multi-domain) this leaves every synthetic lake pathology — multi-cone, hub, multi-length — benign for "spend the bits, b=4." Probe: `make_length_lake.py` + `centering_recall.rs`. |
+
+## THEORY — directionally right, restated honestly
+
+| doc | status |
+|-----|--------|
+| [rigidity_impossibility_proofs.md](rigidity_impossibility_proofs.md) | The routing key is not number-variance-rigid (Thm 2/3, binomial `L(1-L/n)`). The over-broad "quantile optimal over all partitions" claim is **retracted** as a non-sequitur. |
+| [conjecture_citation_audit.md](conjecture_citation_audit.md) | Citations verified by direct fetch (Ethayarajh, Broughan-Barnett, etc.); a few subagent confabulations caught and corrected. |
+| [twonn_id_results.md](twonn_id_results.md) | ⚠️ PARTIAL. The chord-metric fix is sound (sphere-validated to ID ~12); the OLS-through-origin estimator is biased (not MLE), and **no clean real-corpus ID is recorded here**. A low-tens sentence-transformer ID is a hypothesis by cross-domain analogy (Ansuini's low-tens are vision CNNs, not sentence encoders) — not established or measured in this branch. |
+
+## WITHDRAWN — see [withdrawn/](withdrawn/)
+
+The number-variance "super-Poisson" finding ([withdrawn/spectral_probe_results.md](withdrawn/spectral_probe_results.md),
+[withdrawn/corpus_zoo_results.md](withdrawn/corpus_zoo_results.md)) did not
+survive: its unfold is uncalibrated (a salvage attempt inverted the result). The
+*theory* above does not depend on it. Kept for the record, not as a claim.
+
+## Conjecture verdict (the framing question)
+
+Prime / Li(x) / Sacks-spiral constructions don't help retrieval: they act on the
+index (ℕ) and carry no corpus information. The exploitable dense-region structure
+lives on the permutohedron `S_D` — the data's own order — which is the
+density-collapse result above. Detail across the theory docs + ADVERSARIAL_REVIEW.
+
+## Reproduce
+
+Per-doc commands are at the bottom of each file. Real-embedding pipeline (GPU via
+ollama) is fully recorded in [density_collapse_results.md](density_collapse_results.md);
+external-corpus recipe in [REAL_CORPUS_RUNBOOK.md](REAL_CORPUS_RUNBOOK.md).
+
+## The deployment question — RESOLVED (negative)
+
+[tau_rerank_bakeoff_results.md](tau_rerank_bakeoff_results.md): the decisive
+matched-bytes experiment was run. **b=4 wins decisively, even at the tau ceiling**
+(real embeddings: b4 0.942, b2 0.898, tau-rerank 0.597, fp32-rerank 1.000). The
+b=2 candidate pool contains every true neighbour (fp32-rerank=1.0), but the ~0.04
+tau gap is too weak to ORDER them — it scores below b=2's own ordering. The
+density-collapse signal is **real but inert**: "just use b=4," no ordvec feature
+follows.
+
+The **deployment-robustness** sub-arc is likewise resolved negative: across every
+synthetic lake pathology — multi-domain cones + templated hubs (Phase B) and now a
+chunk-length mixture ([length_mixture_lake_results.md](length_mixture_lake_results.md),
+Path B) — b=4 raw routing does not degrade (hub Δ −0.002 through 15%; length-mixture
+Δ +0.002). The only un-run test left needs *real* dirty data (OCR / multilingual S3
+sludge), uncapturable from clean embeddings.
+
+This is the honest bottom line of the whole branch: a characterized mechanism and
+a clean negative. **Research, not a feature** — the prime/spectral/permutation
+ideas for dense-region retrieval do not beat the boring baseline (spend the bits).
diff --git a/experiments/ordinal-routing-research/REAL_CORPUS_RUNBOOK.md b/experiments/ordinal-routing-research/REAL_CORPUS_RUNBOOK.md
new file mode 100644
index 00000000..84848edf
--- /dev/null
+++ b/experiments/ordinal-routing-research/REAL_CORPUS_RUNBOOK.md
@@ -0,0 +1,48 @@
+# Real-corpus runbook for the conjecture probes
+
+All four examples now consume the SAME `.npy` format `bench_rank` documents:
+2-D little-endian float32 (`<f4`), C-order, shape `(n, dim)`. One corpus dump
+flows through the entire investigation. No BLAS, no Python at run time.
+
+Recommended public corpora (per bench_rank header): GloVe, OpenAI
+text-embedding-3 dumps, or any sentence-transformer output saved as .npy.
+
+## 1. Intrinsic dimension — sizes the projection budget
+```
+cargo run --release --example twonn_id -- --corpus-npy corpus.npy
+```
+Reports chord-metric TwoNN ID. Expect low tens (READ AS LOWER BOUND — finite
+sample deflates above ~12). If ID≈d_int, the routing layer wants R≈c·d_int
+projections, i.e. R∈{8,16}.
+
+## 2. Number variance — is the routing key rigid or Poisson?
+```
+cargo run --release --example spectral_probe -- --corpus-npy corpus.npy
+cargo run --release --example spectral_probe -- --corpus-npy corpus.npy --unfold-empirical
+```
+Σ²(L)/L flat≈1 ⇒ Poisson; climbing ⇒ clustered; falling ⇒ rigid (the only
+result that would reopen the spectral conjecture). --unfold-empirical confirms
+quantile tiling balances the key (Σ²→0).
+
+## 3. Shard recall — does the oblivious router work; does coprime help?
+```
+cargo run --release --example shard_recall -- \
+    --corpus-npy corpus.npy --queries-npy queries.npy
+```
+Needs BOTH files (real queries for honest recall). Fair envelope = recall@k at
+equal candidates-scanned. Watch: does recall keep climbing R=1→16 (sets the
+budget), and does coprime/random-offset beat plain aligned (predicted: no).
+
+## 4. Headline retrieval quality (the existing bench)
+```
+cargo run --release --example bench_rank -- \
+    --corpus-npy corpus.npy --queries-npy queries.npy --queries 200 --k 10
+```
+
+## Expected story on real embeddings (prior)
+
+Consistent with synthetic + verified literature: ID low tens → R∈{8,16};
+key Poisson/clustered not rigid → quantile bucketing; coprime adds nothing →
+R shared-width random projections is the router. A FALLING Σ²(L)/L or a
+coprime>random-offset gap that survives reseeding would be the surprise worth
+chasing.
diff --git a/experiments/ordinal-routing-research/centering_recall.rs b/experiments/ordinal-routing-research/centering_recall.rs
new file mode 100644
index 00000000..16d94eea
--- /dev/null
+++ b/experiments/ordinal-routing-research/centering_recall.rs
@@ -0,0 +1,229 @@
+//! CONTEXT-BOXED, PRE-REGISTERED — the DECISIVE gate for the centering finding.
+//! overlap_decomp.rs showed per-coord mean-centering removes the cone and
+//! amplifies true-neighbour overlap 2-5x. Necessary but NOT sufficient: the only
+//! metric that promotes this to an ordvec change is RECALL at matched bytes,
+//! and it must survive at b=4 (the incumbent that beat every other idea on this
+//! branch). Isolated by request: touches no harness, writes nothing.
+//!
+//! TEST: full-scan RankQuant-style retrieval, R@10 vs FP32 cosine ground truth.
+//!   Encode = dimension-wise rank -> bucket into 2^B equal-width bins (B bits/coord).
+//!   Score  = -L1 distance between B-bin codes (symmetric rank metric; identical
+//!            scoring for both arms, so only the ENCODING differs).
+//!   Ground truth = FP32 cosine top-10 on L2-normalized RAW vectors (the real task;
+//!                  centering changes the CODE, never the target).
+//! Two arms:
+//!   raw      : rank the L2-normalized vector directly.
+//!   centered : subtract the per-coord CORPUS mean (a stored, data-oblivious
+//!              statistic — no codebook, no per-query training) before ranking.
+//!              The SAME mean is applied to queries (deployable asymmetric setup).
+//! Also reports the two-stage candidate-recall (CR@M): fraction of FP32 top-10
+//! captured in the top-M by code distance — the bitmap-prefilter-relevant number.
+//!
+//! PRE-REGISTERED VERDICT (fixed before running):
+//!   FEATURE-WORTHY if centered R@10 - raw R@10 >= 0.02 at b=2 AND >= 0.02 at b=4
+//!     (must survive at the incumbent b=4), AND centered never worse than raw by
+//!     > 0.005 at any B.
+//!   PARTIAL if it helps at low bits (b<=2) but the b=4 gain is < 0.02 (a
+//!     low-bit-only tweak, not a headline change).
+//!   FAIL if centered <= raw at b=4, or net-negative anywhere material.
+//!   Report actual R@10 and CR@M for both arms regardless.
+//!
+//! Run: cargo run --release --example centering_recall -- \
+//!          --corpus-npy /tmp/repo_corpus.npy --queries-npy /tmp/repo_q.npy
+
+use rayon::prelude::*;
+
+fn load_npy_f32(path: &str) -> (Vec<f32>, usize, usize) {
+    let bytes = std::fs::read(path).expect("read npy");
+    assert!(
+        bytes.len() >= 10 && &bytes[..6] == b"\x93NUMPY",
+        "not a numpy file"
+    );
+    let major = bytes[6];
+    assert!(
+        major == 1 || major == 2,
+        "unsupported numpy .npy major version {major}"
+    );
+    if major == 2 {
+        assert!(bytes.len() >= 12, "truncated numpy v2 header");
+    }
+    let (hlen, hstart) = if major == 1 {
+        (u16::from_le_bytes([bytes[8], bytes[9]]) as usize, 10)
+    } else {
+        (
+            u32::from_le_bytes([bytes[8], bytes[9], bytes[10], bytes[11]]) as usize,
+            12,
+        )
+    };
+    assert!(hstart + hlen <= bytes.len(), "truncated numpy header");
+    let header = std::str::from_utf8(&bytes[hstart..hstart + hlen]).expect("utf8 header");
+    assert!(header.contains("'descr': '<f4'"), "expected <f4 dtype");
+    let after = &header[header.find("'shape':").expect("shape")..];
+    let inner = &after[after.find('(').unwrap() + 1..after.find(')').unwrap()];
+    let dims: Vec<usize> = inner
+        .split(',')
+        .filter_map(|s| s.trim().parse().ok())
+        .collect();
+    let (n, dim) = (dims[0], dims[1]);
+    let dstart = hstart + hlen;
+    let mut out = vec![0.0f32; n * dim];
+    for (i, ch) in bytes[dstart..].chunks_exact(4).enumerate() {
+        out[i] = f32::from_le_bytes([ch[0], ch[1], ch[2], ch[3]]);
+    }
+    (out, n, dim)
+}
+
+fn l2_normalize_rows(v: &mut [f32], dim: usize) {
+    for i in 0..v.len() / dim {
+        let row = &mut v[i * dim..(i + 1) * dim];
+        let nrm: f32 = row.iter().map(|x| x * x).sum::<f32>().sqrt();
+        if nrm > 0.0 {
+            for x in row.iter_mut() {
+                *x /= nrm;
+            }
+        }
+    }
+}
+
+fn coord_mean(corpus: &[f32], n: usize, dim: usize) -> Vec<f32> {
+    let mut m = vec![0f32; dim];
+    for i in 0..n {
+        for c in 0..dim {
+            m[c] += corpus[i * dim + c];
+        }
+    }
+    for m_c in m.iter_mut() {
+        *m_c /= n as f32;
+    }
+    m
+}
+
+/// Bucketed-rank code: rank each (optionally mean-subtracted) coord, then bucket
+/// the rank into 2^bits equal-width bins. One u8 bucket id per coord.
+fn encode(rows: &[f32], n: usize, dim: usize, bits: u32, sub: Option<&[f32]>) -> Vec<u8> {
+    let nb = 1u32 << bits;
+    let mut out = vec![0u8; n * dim];
+    out.par_chunks_mut(dim).enumerate().for_each(|(r, code)| {
+        let mut idx: Vec<u16> = (0..dim as u16).collect();
+        let val = |c: u16| rows[r * dim + c as usize] - sub.map_or(0.0, |s| s[c as usize]);
+        idx.sort_by(|&a, &b| val(a).partial_cmp(&val(b)).unwrap());
+        for (rank, &c) in idx.iter().enumerate() {
+            let bucket = (rank as u32 * nb) / dim as u32;
+            code[c as usize] = bucket as u8;
+        }
+    });
+    out
+}
+
+fn l1(a: &[u8], b: &[u8]) -> i64 {
+    a.iter()
+        .zip(b)
+        .map(|(x, y)| (*x as i64 - *y as i64).abs())
+        .sum()
+}
+
+fn ground_truth(corpus: &[f32], queries: &[f32], dim: usize, k: usize) -> Vec<Vec<u32>> {
+    let n = corpus.len() / dim;
+    let nq = queries.len() / dim;
+    (0..nq)
+        .into_par_iter()
+        .map(|qi| {
+            let q = &queries[qi * dim..(qi + 1) * dim];
+            let mut s: Vec<(u32, f32)> = (0..n)
+                .map(|di| {
+                    let d = &corpus[di * dim..(di + 1) * dim];
+                    (di as u32, q.iter().zip(d).map(|(a, b)| a * b).sum::<f32>())
+                })
+                .collect();
+            s.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
+            s.into_iter().take(k).map(|(i, _)| i).collect()
+        })
+        .collect()
+}
+
+/// Full-scan R@10 and candidate-recall@M for a given arm's codes.
+fn evaluate(
+    ccode: &[u8],
+    qcode: &[u8],
+    n: usize,
+    nq: usize,
+    dim: usize,
+    truth: &[Vec<u32>],
+    m: usize,
+) -> (f64, f64) {
+    let res: Vec<(f64, f64)> = (0..nq)
+        .into_par_iter()
+        .map(|qi| {
+            let qc = &qcode[qi * dim..(qi + 1) * dim];
+            let mut s: Vec<(u32, i64)> = (0..n)
+                .map(|di| (di as u32, l1(qc, &ccode[di * dim..(di + 1) * dim])))
+                .collect();
+            s.sort_by_key(|x| x.1);
+            let truth_set: std::collections::HashSet<u32> = truth[qi].iter().copied().collect();
+            // R@10: top-10 by code distance that are in FP32 top-10
+            let top10: std::collections::HashSet<u32> = s.iter().take(10).map(|x| x.0).collect();
+            let r10 = top10.intersection(&truth_set).count() as f64 / truth[qi].len() as f64;
+            // CR@M: FP32 top-10 captured within top-M by code distance
+            let topm: std::collections::HashSet<u32> = s.iter().take(m).map(|x| x.0).collect();
+            let cr = truth[qi].iter().filter(|t| topm.contains(t)).count() as f64
+                / truth[qi].len() as f64;
+            (r10, cr)
+        })
+        .collect();
+    let r10 = res.iter().map(|x| x.0).sum::<f64>() / nq as f64;
+    let cr = res.iter().map(|x| x.1).sum::<f64>() / nq as f64;
+    (r10, cr)
+}
+
+fn main() {
+    let mut cp = None;
+    let mut qp = None;
+    let mut m_arg = 100usize;
+    let mut a = std::env::args().skip(1);
+    while let Some(x) = a.next() {
+        match x.as_str() {
+            "--corpus-npy" => cp = Some(a.next().unwrap()),
+            "--queries-npy" => qp = Some(a.next().unwrap()),
+            "--m" => m_arg = a.next().unwrap().parse().unwrap(),
+            _ => {}
+        }
+    }
+    let (mut corpus, n, dim) = load_npy_f32(&cp.expect("--corpus-npy"));
+    let (mut queries, nq, dq) = load_npy_f32(&qp.expect("--queries-npy"));
+    assert_eq!(dim, dq);
+    l2_normalize_rows(&mut corpus, dim);
+    l2_normalize_rows(&mut queries, dim);
+    let truth = ground_truth(&corpus, &queries, dim, 10);
+    let mean = coord_mean(&corpus, n, dim);
+    let m = m_arg; // CR@M operating point (default 100, --m to override)
+
+    println!("# centering_recall (CONTEXT-BOXED, pre-registered). corpus {n}x{dim}, queries {nq}, CR@M M={m}");
+    println!("# R@10 vs FP32 cosine top-10; symmetric -L1 on B-bin rank codes; only ENCODING differs between arms.");
+    println!("B\tbytes/vec\traw_R@10\tcen_R@10\tΔR@10\traw_CR\tcen_CR\tΔCR");
+    let mut deltas = vec![];
+    for &bits in &[1u32, 2, 4] {
+        let bytes = dim * bits as usize / 8;
+        let craw = encode(&corpus, n, dim, bits, None);
+        let qraw = encode(&queries, nq, dim, bits, None);
+        let ccen = encode(&corpus, n, dim, bits, Some(&mean));
+        let qcen = encode(&queries, nq, dim, bits, Some(&mean));
+        let (r_raw, cr_raw) = evaluate(&craw, &qraw, n, nq, dim, &truth, m);
+        let (r_cen, cr_cen) = evaluate(&ccen, &qcen, n, nq, dim, &truth, m);
+        let d_r = r_cen - r_raw;
+        let d_cr = cr_cen - cr_raw;
+        deltas.push((bits, d_r));
+        println!("{bits}\t{bytes}\t{r_raw:.4}\t{r_cen:.4}\t{d_r:+.4}\t{cr_raw:.4}\t{cr_cen:.4}\t{d_cr:+.4}");
+    }
+    println!("\n# PRE-REGISTERED: FEATURE-WORTHY if ΔR@10 >= +0.02 at b=2 AND b=4, and never worse than -0.005.");
+    let b2 = deltas.iter().find(|x| x.0 == 2).unwrap().1;
+    let b4 = deltas.iter().find(|x| x.0 == 4).unwrap().1;
+    let worst = deltas.iter().map(|x| x.1).fold(f64::INFINITY, f64::min);
+    let verdict = if b2 >= 0.02 && b4 >= 0.02 && worst >= -0.005 {
+        "FEATURE-WORTHY"
+    } else if b4 < 0.02 && deltas.iter().any(|x| x.1 >= 0.02) {
+        "PARTIAL (low-bit only; does not survive at incumbent b=4)"
+    } else {
+        "FAIL"
+    };
+    println!("# VERDICT: {verdict}  (Δb2={b2:+.4}, Δb4={b4:+.4}, worst={worst:+.4})");
+}
diff --git a/experiments/ordinal-routing-research/conjecture_citation_audit.md b/experiments/ordinal-routing-research/conjecture_citation_audit.md
new file mode 100644
index 00000000..7fd0161c
--- /dev/null
+++ b/experiments/ordinal-routing-research/conjecture_citation_audit.md
@@ -0,0 +1,41 @@
+# Citation audit — prime/RH/embedding-geometry conjecture
+
+Verified by direct WebFetch against primary sources (subagent sandbox had no web
+egress; their "VERIFIED" tags were memory passes and are NOT relied on here).
+WebSearch was backend-down; all checks below are WebFetch on known URLs.
+
+## VERIFIED ✓ (fetched primary source)
+
+| Claim | Verified value | Source |
+|---|---|---|
+| GPT-2 final-layer anisotropy | ≈0.99 ("almost perfect cosine similarity"); ~0.6 in layers 2–8, rising exponentially layers 8–12 | Ethayarajh 2019, arXiv:1909.00512 (via ar5iv) |
+| Rogue dimensions dominate cosine | "a small number of rogue dimensions, often just 1–3, dominate these measures" (exact quote) | Timkey & van Schijndel 2021, arXiv:2109.04404 |
+| Super-prime counting function | π_q(x) ~ x/(log x)²; q_n ~ n(log n)² | Broughan & Barnett 2009, J. Integer Seq. vol 12 |
+| Mathlib CRT primitive | `ZMod.chineseRemainder : ZMod (m*n) ≃+* ZMod m × ZMod n` (exact signature) | mathlib4_docs ZMod/Basic |
+| Discrepancy rates | uniform grid O(1/N); optimal LDS O(log N/N); random/Poisson O(√(log log N/N)) | Wikipedia Low-discrepancy_sequence |
+
+## PARTIAL / QUALITATIVE (source confirms direction, not exact figure)
+
+| Claim | Status |
+|---|---|
+| Mu & Viswanath D≈d/100 rule | Principle confirmed ("common mean + a few top directions"); the exact d/100 ratio NOT stated in fetched text. Cite as "a few top components", not d/100. arXiv:1702.01417 |
+| Ansuini intrinsic dimension | NUMBERS CONFIRMED (ar5iv): last hidden layers ID ≈ 12–25; peak (early layers, rel.depth 0.2–0.4) ≈ 100–120; ID/ambient ≈ 2e-4. Hunchback = rise then contract. arXiv:1905.12784. NOTE: vision/CNN nets, not sentence encoders. |
+| Valeriani et al. 2023 transformer ID | Qualitative confirmed: same expand–contract–stabilize ID profile in transformers (protein LM + image). Exact peak number not extracted. arXiv:2302.00294 |
+| Routing implication | d_int ≈ low tens ⇒ JL R ≈ c·d_int ⇒ R∈{8,16} projections suffice (matches shard_recall envelope still rising at R=16). Projection budget sized by ~20, not ambient 256/1024. |
+
+## CORRECTED (subagent confabulations caught)
+
+- k-fold prime-cascade ~1/(log x)^k: **HEURISTIC, not proved.** Subagent falsely
+  cited Goldston–Yıldırım (that paper is about small prime gaps — unrelated).
+  Only the single iteration (Broughan&Barnett) is proved.
+- Super-prime count "x/(ln x · ln ln x)": **wrong form**; paper proves x/(log x)².
+- Prime-sequence discrepancy "O(n^-1/2)" (my own earlier claim): **unsupported**;
+  no clean published rate for scaled primes. Verdict holds via uniform/quantile
+  dominance regardless.
+
+## STILL UNVERIFIED (needs working WebSearch or specific PDF)
+
+- Exact intrinsic-dimension number for modern sentence embeddings
+  (sentence-transformers / E5 / text-embedding-3). No confident published TwoNN
+  figure located. This is the one empirical gap for the Level-2 anisotropy argument.
+- Cai et al. 2021 "local isotropy" arXiv id unconfirmed.
diff --git a/experiments/ordinal-routing-research/crt_seam_oracle.rs b/experiments/ordinal-routing-research/crt_seam_oracle.rs
new file mode 100644
index 00000000..6832a66b
--- /dev/null
+++ b/experiments/ordinal-routing-research/crt_seam_oracle.rs
@@ -0,0 +1,164 @@
+//! CRT seam oracle: exhaustive finite verification of the vernier theorem.
+//!
+//! This is the SPEC for the within-axis vernier experiment. The CRT claim is
+//! a finite statement over the ring Z/M (M = product of periods), so we
+//! enumerate every position and CHECK the predicted structure exactly.
+//! Exhaustive enumeration over the finite ring is a proof for these parameters
+//! — the same thing a Lean theorem would assert. We surface the phase subtlety
+//! (the naive zero-offset statement is FALSE at the origin) before any recall
+//! experiment is built on top.
+//!
+//! Setup: L grids on ONE axis, grid i has period m_i and phase phi_i. A
+//! position x is a "seam" of grid i if (x - phi_i) mod m_i == 0. The vernier
+//! claim concerns how often seams of different grids COINCIDE.
+//!
+//! Checks:
+//!   1. coincidence spacing: two coprime grids (phi=0) share a seam exactly
+//!      every lcm(m_i,m_j) = m_i*m_j. NON-coprime share every lcm < product.
+//!   2. all-L coincidence count over one period M: with phi=0 there is exactly
+//!      one (at x=0); the claim "one per period" is about COUNT, and it holds.
+//!   3. near-seam blind-spot density: fraction of x within band |.|<=t of a
+//!      seam on ALL L grids simultaneously ~= (2t)^L / M  (the headline).
+//!   4. the phase refutation: with phi=0 every grid has a seam at x=0, so the
+//!      pointwise "max-over-grids distance-to-seam >= c" floor is FALSE there.
+//!
+//! Run: cargo run --release --example crt_seam_oracle
+
+fn gcd(mut a: u64, mut b: u64) -> u64 {
+    while b != 0 {
+        let t = b;
+        b = a % b;
+        a = t;
+    }
+    a
+}
+fn lcm(a: u64, b: u64) -> u64 {
+    a / gcd(a, b) * b
+}
+
+/// Is x a seam of a grid with period m and phase phi (integer phase)?
+/// Seam <=> (x - phi) ≡ 0 (mod m).
+fn is_seam(x: i64, m: u64, phi: i64) -> bool {
+    let m = m as i64;
+    (((x - phi) % m) + m) % m == 0
+}
+
+/// Distance (in integer steps, on the ring) from x to the nearest seam of
+/// grid (m, phi): min over k of |x - phi - k*m|, i.e. the centered residue.
+fn dist_to_seam(x: i64, m: u64, phi: i64) -> u64 {
+    let m = m as i64;
+    let r = (((x - phi) % m) + m) % m; // in [0, m)
+    r.min(m - r) as u64
+}
+
+fn main() {
+    println!("# CRT seam oracle — exhaustive finite verification");
+
+    // ---- Check 1: pairwise coincidence spacing ----
+    println!("\n## Check 1: first common seam of two grids (phi=0)");
+    println!("m_i\tm_j\tcoprime\tfirst_common\tlcm(=expected)");
+    for (mi, mj) in [(3u64, 5u64), (4, 6), (5, 7), (6, 9), (8, 12)] {
+        let mut first = None;
+        for x in 1..=(mi * mj) as i64 {
+            if is_seam(x, mi, 0) && is_seam(x, mj, 0) {
+                first = Some(x);
+                break;
+            }
+        }
+        let cop = gcd(mi, mj) == 1;
+        println!(
+            "{mi}\t{mj}\t{cop}\t{}\t{}",
+            first.unwrap(),
+            lcm(mi, mj)
+        );
+    }
+
+    // remaining checks appended in next pieces
+    checks_2_3_4();
+}
+
+fn checks_2_3_4() {
+    // L coprime grids on one axis.
+    let periods: [u64; 4] = [3, 5, 7, 11];
+    let m: u64 = periods.iter().product(); // M = 1155
+    let l = periods.len();
+
+    // ---- Check 2: all-L coincidence count over one period, phi=0 ----
+    // Claim: exactly ONE position in [0, M) is a seam of all L grids.
+    let mut all_seam_zero = 0u64;
+    for x in 0..m as i64 {
+        if periods.iter().all(|&p| is_seam(x, p, 0)) {
+            all_seam_zero += 1;
+        }
+    }
+    println!("\n## Check 2: all-{l} coincidences in [0,M), phi=0  (M={m})");
+    println!("count = {all_seam_zero}  (claim: exactly 1, at x=0)");
+
+    // ---- Check 3: near-seam blind-spot density vs (2t)^L / M ----
+    // A position is in the joint blind spot if it is within band |.|<=t of a
+    // seam on ALL L grids. Predicted density = product over grids of
+    // (per-grid near-seam density) = ((2t+1)/m_i) ... but the clean continuous
+    // form is (2t)^L / M. We report measured vs both the integer band count
+    // ((2t+1) residues per grid) and the continuous (2t)^L/M headline.
+    println!("\n## Check 3: joint near-seam density vs prediction  (phi=0)");
+    println!("t\tmeasured\t(2t+1)^L/M_int\t(2t)^L/M_cont");
+    for t in [0u64, 1, 2, 3] {
+        let mut hits = 0u64;
+        for x in 0..m as i64 {
+            if periods.iter().all(|&p| dist_to_seam(x, p, 0) <= t) {
+                hits += 1;
+            }
+        }
+        let measured = hits as f64 / m as f64;
+        // integer band: each grid admits (2t+1) residues (centered), but capped
+        // at m_i; product of per-grid fractions = prod((2t+1)/m_i).
+        let int_pred: f64 = periods
+            .iter()
+            .map(|&p| ((2 * t + 1).min(p) as f64) / p as f64)
+            .product();
+        let cont_pred = (2 * t).pow(l as u32) as f64 / m as f64;
+        println!("{t}\t{measured:.6}\t{int_pred:.6}\t{cont_pred:.6}");
+    }
+
+    // ---- Check 4: the phase refutation ----
+    // With phi=0, x=0 is a seam on every grid -> max-over-grids dist = 0, so a
+    // pointwise floor "some grid keeps you >= c from its seam" is FALSE there.
+    // With chosen distinct phases, the all-seam coincidence can be MOVED but,
+    // by CRT, still occurs exactly once per period (we just relocate it).
+    let phis_zero = [0i64; 4];
+    let worst_zero = (0..m as i64)
+        .map(|x| {
+            periods
+                .iter()
+                .zip(phis_zero.iter())
+                .map(|(&p, &ph)| dist_to_seam(x, p, ph))
+                .max()
+                .unwrap()
+        })
+        .min()
+        .unwrap();
+    // staggered phases: spread the seams
+    let phis_stag = [0i64, 1, 2, 3];
+    let worst_stag = (0..m as i64)
+        .map(|x| {
+            periods
+                .iter()
+                .zip(phis_stag.iter())
+                .map(|(&p, &ph)| dist_to_seam(x, p, ph))
+                .max()
+                .unwrap()
+        })
+        .min()
+        .unwrap();
+    println!("\n## Check 4: pointwise max-over-grids dist-to-seam (the floor)");
+    println!("min_x max_i dist  (phi=0)      = {worst_zero}  (=> 0: floor is FALSE at origin)");
+    println!("min_x max_i dist  (phi staggered) = {worst_stag}  (phases can lift the floor)");
+
+    // verdict line for downstream
+    println!("\n## Verdict");
+    println!(
+        "coincidence_count_per_period={all_seam_zero} (=1 ✓); \
+         floor_phi0={worst_zero} (=0, naive claim false ✓); \
+         floor_staggered={worst_stag} (phase-dependent)"
+    );
+}
diff --git a/experiments/ordinal-routing-research/crt_seam_oracle_results.md b/experiments/ordinal-routing-research/crt_seam_oracle_results.md
new file mode 100644
index 00000000..b6293dbd
--- /dev/null
+++ b/experiments/ordinal-routing-research/crt_seam_oracle_results.md
@@ -0,0 +1,87 @@
+# CRT seam oracle — corrected vernier theorem (exhaustive finite proof)
+
+> Lean 4 formalization of this theorem lives in the companion repo:
+> [ordvec-formalization#17](https://github.com/Fieldnote-Echo/ordvec-formalization/pull/17)
+> (open PR, `sorry`-free).
+
+`examples/crt_seam_oracle.rs` enumerates the full ring Z/M to verify the
+within-axis vernier (coprime-grid) theorem exactly. This is the SPEC for the
+vernier recall experiment and the corrected target for any Lean formalization.
+Exhaustive enumeration over the finite ring IS the proof for these parameters.
+
+Params: L=4 grids on one axis, coprime periods {3,5,7,11}, M=1155.
+
+## CONFIRMED exactly
+
+1. **Coincidence spacing = lcm.** First common seam of grids (m_i,m_j) at phi=0
+   is exactly lcm(m_i,m_j). Coprime ⇒ lcm = m_i·m_j (maximal spacing: 3,5→15;
+   5,7→35). Non-coprime coincide early (4,6→12; 6,9→18). This is WHY coprimality
+   maximizes seam separation.
+2. **Exactly one all-L coincidence per period M.** count = 1 over [0,M). ✓
+
+## CORRECTED (the closed form is the CAPPED product)
+
+3. **Joint near-seam density is ∏_i min(2t+1, m_i)/m_i.** The uncapped
+   ∏(2t+1)/m_i is correct ONLY in the interior `2t+1 ≤ min_i m_i` (here t≤1);
+   beyond that, bands saturate a grid and that grid's factor caps at 1. The
+   uncapped form exceeds 1 at t=3 (period-3 grid: 7/3 > 1) — the SAME nonsense
+   as the rejected continuous (2t)^L/M. (Caught in adversarial review; the Rust
+   code already computes the capped product via `.min(p)`.)
+
+   | t | measured | ∏min(2t+1,m_i)/m_i (capped) | ∏(2t+1)/m_i (uncapped) | (2t)^L/M |
+   |---|----------|------------------------------|------------------------|----------|
+   | 0 | 0.000866 | 0.000866                     | 0.000866               | 0.000000 |
+   | 1 | 0.070130 | 0.070130                     | 0.070130               | 0.013853 |
+   | 2 | 0.324675 | 0.324675                     | 1.039 (>1, wrong)      | 0.221645 |
+   | 3 | 0.636364 | 0.636364                     | 2.078 (>1, wrong)      | 1.122    |
+
+   Measured matches the CAPPED product to 6 d.p. at all t. The uncapped product
+   is valid only for t≤1 (where 2t+1 ≤ min m_i = 3); it already exceeds 1 at t=2.
+   The provable/Lean statement is the capped form, OR the uncapped form under the
+   explicit precondition `2t+1 ≤ min_i m_i`.
+
+## REFUTED (stronger than the earlier caveat)
+
+4. **No generic pointwise seam-margin floor — phases cannot rescue it.**
+   min_x max_i dist_to_seam = 0 with phi=0 AND with staggered phases [0,1,2,3].
+   For L=4 with these small periods, some position always sits on every grid's
+   seam regardless of phase. The "choose phases for a pointwise floor >= c"
+   claim FAILS in this regime; the floor is not generic, it exists only for
+   specific (periods, phases, L) and must be verified per-instance, not assumed.
+
+## Proof of the density closed form (was: numerically confirmed only)
+
+Claim: joint near-seam density = ∏_i min(2t+1, m_i)/m_i for pairwise-coprime
+m_i. Equivalently = ∏_i (2t+1)/m_i under the precondition 2t+1 ≤ min_i m_i.
+
+Proof. CRT: x ↦ (x mod m_1, …, x mod m_L) is a BIJECTION Z/M → ∏ Z/m_i (M=∏m_i,
+pairwise coprime). Position x is within band t of a seam on grid i iff
+(x − φ_i) mod m_i lies in the centered band {−t,…,+t} mod m_i. The number of
+DISTINCT such residues is min(2t+1, m_i): exactly 2t+1 when the band fits
+(2t+1 ≤ m_i), capping at m_i once it saturates (the band wraps to cover all
+residues). The wraparound/min in the distance condition is INTRA-coordinate
+(depends only on x mod m_i), so it does not couple grids — CRT independence
+across coordinates is untouched. The joint count therefore factors:
+
+    count = ∏_i min(2t+1, m_i),   density = count/M = ∏_i min(2t+1,m_i)/m_i.  ∎
+
+(Even m_i have an extra antipodal-residue caveat at the exact half-period; the
+chosen odd periods avoid it.) The independence is precisely what CRT supplies;
+without coprimality the residue coordinates are coupled and the product fails
+(matches Check 1: non-coprime grids coincide early). This is the provable form —
+NOT the uncapped ∏(2t+1)/m_i (>1 for t≥2 here) nor the continuous (2t)^L/M.
+
+## Consequence for the workplan
+
+- The provable theorem is the DENSITY/COUNT statement (checks 1–3, CAPPED product
+  ∏min(2t+1,m_i)/m_i), not a pointwise floor. A Lean proof should target the
+  capped form (or the uncapped form under explicit 2t+1 ≤ min m_i) and
+  "one coincidence per period", and should NOT claim a phase-tunable margin.
+- The vernier RECALL experiment must therefore test whether the (small,
+  bounded) joint blind-spot region actually costs recall — it cannot lean on a
+  guaranteed per-query margin, because there isn't one.
+- This is the ambush avoided: building the recall test on a phase-tunable-floor
+  premise, then having Lean fail to prove it, would have invalidated the
+  experiment after the fact. Caught by finite enumeration first.
+
+Reproduce: `cargo run --release --example crt_seam_oracle`
diff --git a/experiments/ordinal-routing-research/density_collapse.rs b/experiments/ordinal-routing-research/density_collapse.rs
new file mode 100644
index 00000000..735e660f
--- /dev/null
+++ b/experiments/ordinal-routing-research/density_collapse.rs
@@ -0,0 +1,385 @@
+//! Density-collapse probe: what does ordvec lose in dense regions, and is the
+//! lost signal recoverable from intra-bucket permutation order?
+//!
+//! Mechanism under test. RankQuant b=2 encodes each vector as which QUARTILE
+//! each coordinate's rank falls in. Two vectors that share top-bucket membership
+//! COLLIDE — b=2 cannot tell them apart. This is worst in tight semantic
+//! clusters (near-parallel vectors share their largest dims → same top ranks).
+//!
+//! But the FULL rank vector (a permutation of 0..D) still orders the
+//! coordinates WITHIN the top bucket; b=2 discards that. The question:
+//!
+//!   Within a collided bucket, do TRUE FP32 neighbors differ in Kendall-tau of
+//!   their top-k coordinate order LESS than random pairs in the same bucket?
+//!
+//! If yes, intra-bucket order carries the signal b=2 threw away, and a
+//! tau-rerank recovers it WITHOUT new storage (the order is in the Rank code).
+//! Ground truth = FP32 cosine, so this cannot miscalibrate (unlike the
+//! number-variance probe).
+//!
+//! Run: cargo run --release --example density_collapse
+//!      cargo run --release --example density_collapse -- --noise 0.15   (tighter)
+//! No external data, no BLAS. Uses ordvec::rank::rank_transform.
+
+use ordvec::rank::{rank_to_bucket, rank_transform};
+use rand::{RngExt, SeedableRng};
+use rand_chacha::ChaCha8Rng;
+use rayon::prelude::*;
+
+const SEED: u64 = 1;
+
+struct Cfg {
+    dim: usize,
+    n: usize,
+    latent: usize,
+    clusters: usize,
+    noise: f32, // cluster tightness: smaller = denser = more collapse
+    bits: u8,   // bucket bits (2 = quartiles); ordvec RankQuant uses {1,2,4}
+    topk: usize, // # top coords used for the intra-bucket tau test
+    corpus_npy: Option<String>, // real embeddings; overrides synthetic + dim/n
+}
+
+fn parse() -> Cfg {
+    let mut c = Cfg {
+        dim: 256,
+        n: 30_000,
+        latent: 64,
+        clusters: 200,
+        noise: 0.3,
+        bits: 2,
+        topk: 16,
+        corpus_npy: None,
+    };
+    let mut a = std::env::args().skip(1);
+    while let Some(x) = a.next() {
+        match x.as_str() {
+            "--dim" => c.dim = a.next().unwrap().parse().unwrap(),
+            "--n" => c.n = a.next().unwrap().parse().unwrap(),
+            "--latent" => c.latent = a.next().unwrap().parse().unwrap(),
+            "--clusters" => c.clusters = a.next().unwrap().parse().unwrap(),
+            "--noise" => c.noise = a.next().unwrap().parse().unwrap(),
+            "--bits" => c.bits = a.next().unwrap().parse().unwrap(),
+            "--topk" => c.topk = a.next().unwrap().parse().unwrap(),
+            "--corpus-npy" => c.corpus_npy = Some(a.next().unwrap()),
+            other => panic!("unknown arg {other}"),
+        }
+    }
+    c
+}
+
+fn gauss(rng: &mut ChaCha8Rng) -> f32 {
+    let u1: f32 = rng.random_range(1e-9..1.0);
+    let u2: f32 = rng.random_range(0.0..1.0);
+    (-2.0 * u1.ln()).sqrt() * (std::f32::consts::TAU * u2).cos()
+}
+
+/// Cosine of two unit vectors (corpus is L2-normalized).
+fn cos(a: &[f32], b: &[f32]) -> f32 {
+    a.iter().zip(b).map(|(x, y)| x * y).sum()
+}
+
+/// Kendall-tau DISTANCE between the orderings the two vectors induce on a fixed
+/// coordinate set `coords` (fraction of discordant pairs, in [0,1]). Uses the
+/// raw values to decide order on each coordinate pair. 0 = identical order.
+fn kendall_tau(a: &[f32], b: &[f32], coords: &[usize]) -> f32 {
+    let m = coords.len();
+    if m < 2 {
+        return 0.0;
+    }
+    let mut disc = 0usize;
+    let mut tot = 0usize;
+    for x in 0..m {
+        for y in (x + 1)..m {
+            let (cx, cy) = (coords[x], coords[y]);
+            let sa = (a[cx] - a[cy]).signum();
+            let sb = (b[cx] - b[cy]).signum();
+            if sa != sb {
+                disc += 1;
+            }
+            tot += 1;
+        }
+    }
+    disc as f32 / tot.max(1) as f32
+}
+
+/// Indices of the top-k coordinates of a vector by value (descending).
+fn top_coords(v: &[f32], k: usize) -> Vec<usize> {
+    let mut idx: Vec<usize> = (0..v.len()).collect();
+    idx.sort_by(|&i, &j| v[j].partial_cmp(&v[i]).unwrap_or(std::cmp::Ordering::Equal));
+    idx.truncate(k);
+    idx
+}
+
+/// Hamming distance between two equal-length bucket codes (# coords that
+/// landed in a different bucket). This is what the b=2 SCORER conflates: small
+/// Hamming distance = codes the kernel can barely separate = real collapse.
+fn hamming(a: &[u8], b: &[u8]) -> u32 {
+    a.iter().zip(b).filter(|(x, y)| x != y).count() as u32
+}
+
+/// Report the code-distance structure: exact collisions are ~0 on continuous
+/// data (constant-composition codes are length-D sequences), so the real
+/// "collapse" is NEAR-collision — how many docs sit within small Hamming
+/// distance of a probe. Sampled over probes to stay O(n * samples).
+fn collision_report(cfg: &Cfg, codes: &[Vec<u8>]) {
+    let n = cfg.n;
+    let n_probes = 200.min(n);
+    let stride = (n / n_probes).max(1);
+    let probes: Vec<usize> = (0..n).step_by(stride).take(n_probes).collect();
+    // for each probe, min Hamming distance to any other doc + count within 2*min
+    let stats: Vec<(u32, usize)> = probes
+        .par_iter()
+        .map(|&p| {
+            let cp = &codes[p];
+            let mut min_h = u32::MAX;
+            for j in 0..n {
+                if j == p {
+                    continue;
+                }
+                let h = hamming(cp, &codes[j]);
+                if h < min_h {
+                    min_h = h;
+                }
+            }
+            let thresh = (min_h + min_h / 2).max(min_h + 1);
+            let near = (0..n)
+                .filter(|&j| j != p && hamming(cp, &codes[j]) <= thresh)
+                .count();
+            (min_h, near)
+        })
+        .collect();
+    let mean_min = stats.iter().map(|s| s.0 as f64).sum::<f64>() / stats.len() as f64;
+    let mean_near = stats.iter().map(|s| s.1 as f64).sum::<f64>() / stats.len() as f64;
+    println!("# density_collapse: n={} dim={} bits={} noise={} topk={} (smaller noise=denser)",
+        cfg.n, cfg.dim, cfg.bits, cfg.noise, cfg.topk);
+    println!("## Code near-collision structure (b={} bucket code, length {})", cfg.bits, cfg.dim);
+    println!("mean nearest-code Hamming dist = {mean_min:.1} of {} coords", cfg.dim);
+    println!("mean #docs within ~1.5x that distance = {mean_near:.1}");
+    println!("(small Hamming = codes the b={} scorer can barely separate = collapse)", cfg.bits);
+}
+
+/// Minimal NumPy v1/v2 .npy reader for 2-D little-endian f32 C-order arrays.
+fn load_npy_f32(path: &str) -> (Vec<f32>, usize, usize) {
+    let bytes = std::fs::read(path).expect("read npy");
+    assert!(bytes.len() >= 10 && &bytes[..6] == b"\x93NUMPY", "not a numpy file");
+    let major = bytes[6];
+    assert!(
+        major == 1 || major == 2,
+        "unsupported numpy .npy major version {major}"
+    );
+    if major == 2 {
+        assert!(bytes.len() >= 12, "truncated numpy v2 header");
+    }
+    let (hlen, hstart) = if major == 1 {
+        (u16::from_le_bytes([bytes[8], bytes[9]]) as usize, 10)
+    } else {
+        (u32::from_le_bytes([bytes[8], bytes[9], bytes[10], bytes[11]]) as usize, 12)
+    };
+    assert!(hstart + hlen <= bytes.len(), "truncated numpy header");
+    let header = std::str::from_utf8(&bytes[hstart..hstart + hlen]).expect("utf8 header");
+    assert!(header.contains("'descr': '<f4'"), "expected <f4 dtype");
+    assert!(header.contains("'fortran_order': False"), "expected C order");
+    let after = &header[header.find("'shape':").expect("shape")..];
+    let inner = &after[after.find('(').unwrap() + 1..after.find(')').unwrap()];
+    let dims: Vec<usize> = inner.split(',').filter_map(|s| s.trim().parse().ok()).collect();
+    assert_eq!(dims.len(), 2, "expected 2-D array");
+    let (n, dim) = (dims[0], dims[1]);
+    let dstart = hstart + hlen;
+    assert_eq!(bytes.len() - dstart, n * dim * 4, "data length mismatch");
+    let mut out = vec![0.0f32; n * dim];
+    for (i, ch) in bytes[dstart..].chunks_exact(4).enumerate() {
+        out[i] = f32::from_le_bytes([ch[0], ch[1], ch[2], ch[3]]);
+    }
+    // L2-normalize rows (cosine geometry)
+    for i in 0..n {
+        let row = &mut out[i * dim..(i + 1) * dim];
+        let nrm: f32 = row.iter().map(|x| x * x).sum::<f32>().sqrt();
+        if nrm > 0.0 {
+            for x in row.iter_mut() {
+                *x /= nrm;
+            }
+        }
+    }
+    (out, n, dim)
+}
+
+fn main() {
+    let mut cfg = parse();
+    let corpus = if let Some(path) = cfg.corpus_npy.clone() {
+        let (v, n, dim) = load_npy_f32(&path);
+        eprintln!("# loaded REAL corpus {n} x {dim} from {path}");
+        cfg.n = n;
+        cfg.dim = dim;
+        v
+    } else {
+        make_corpus(&cfg)
+    };
+    let d = cfg.dim;
+
+    // Per-coordinate b-bit bucket id, using ordvec's OWN bucketing
+    // (`rank_to_bucket`) so the probe measures real RankQuant behavior — it is
+    // guarded (d>0, bits<=7, rank<d) and dimension-correct for non-divisible
+    // dims, unlike a hand-rolled `rank / (d/2^bits)`. (qodo review.)
+    assert!(cfg.bits <= 7, "--bits must be <= 7 (RankQuant uses 1/2/4)");
+    let codes: Vec<Vec<u8>> = (0..cfg.n)
+        .into_par_iter()
+        .map(|i| {
+            let ranks = rank_transform(&corpus[i * d..(i + 1) * d]);
+            ranks.iter().map(|&r| rank_to_bucket(r, d, cfg.bits)).collect()
+        })
+        .collect();
+
+    collision_report(&cfg, &codes);
+    tau_report(&cfg, &corpus, &codes);
+}
+
+/// THE TEST. For each probe, gather the M docs whose b=2 codes are
+/// Hamming-CLOSEST (the "lookalikes" the scorer conflates with the probe).
+/// Split them into the FP32-true neighbours (top half by cosine) and the
+/// FP32-far lookalikes (bottom half). Question: does intra-bucket top-k
+/// Kendall-tau SEPARATE these two groups — i.e. do the true neighbours have
+/// lower tau than the false lookalikes that the b=2 code can't tell apart?
+/// If yes, the fine permutation order recovers exactly the signal b=2 lost,
+/// from the existing Rank code, no new storage. FP32 cosine = ground truth.
+fn tau_report(cfg: &Cfg, corpus: &[f32], codes: &[Vec<u8>]) {
+    let d = cfg.dim;
+    let n = cfg.n;
+    let m = 40usize; // size of the b2-lookalike neighbourhood per probe
+    let n_probes = 300.min(n);
+    let stride = (n / n_probes).max(1);
+    let probes: Vec<usize> = (0..n).step_by(stride).take(n_probes).collect();
+
+    // Per probe: (mean tau of true-neighbour half, mean tau of far-lookalike half,
+    //             mean cosine of each half) — averaged over probes.
+    let rows: Vec<(f32, f32, f32, f32)> = probes
+        .par_iter()
+        .map(|&p| {
+            let pv = &corpus[p * d..(p + 1) * d];
+            let cp = &codes[p];
+            // M code-nearest docs by Hamming
+            let mut by_h: Vec<(u32, usize)> = (0..n)
+                .filter(|&j| j != p)
+                .map(|j| (hamming(cp, &codes[j]), j))
+                .collect();
+            by_h.sort_by_key(|x| x.0);
+            by_h.truncate(m);
+            // among those lookalikes, rank by TRUE cosine to the probe
+            let mut by_cos: Vec<(f32, usize)> = by_h
+                .iter()
+                .map(|&(_, j)| (cos(pv, &corpus[j * d..(j + 1) * d]), j))
+                .collect();
+            by_cos.sort_by(|a, b| b.0.partial_cmp(&a.0).unwrap_or(std::cmp::Ordering::Equal));
+            let half = by_cos.len() / 2;
+            // M1 fix (review): tau coords must NOT be the probe's own top-k
+            // (that couples tau to cosine, making the test near-tautological).
+            // Use the UNION of each PAIR's own top coords — chosen independently
+            // of the cosine ranking — so tau measures order agreement on the
+            // dims the two vectors jointly care about, not the probe's alone.
+            let tau_of = |slice: &[(f32, usize)]| -> (f32, f32) {
+                let mut st = 0.0f32;
+                let mut sc = 0.0f32;
+                for &(c, j) in slice {
+                    let jv = &corpus[j * d..(j + 1) * d];
+                    let mut coords = top_coords(pv, cfg.topk);
+                    coords.extend(top_coords(jv, cfg.topk));
+                    coords.sort_unstable();
+                    coords.dedup();
+                    st += kendall_tau(pv, jv, &coords);
+                    sc += c;
+                }
+                let k = slice.len().max(1) as f32;
+                (st / k, sc / k)
+            };
+            let (tau_near, cos_near) = tau_of(&by_cos[..half]);
+            let (tau_far, cos_far) = tau_of(&by_cos[half..]);
+            (tau_near, tau_far, cos_near, cos_far)
+        })
+        .collect();
+
+    let mean = |f: &dyn Fn(&(f32, f32, f32, f32)) -> f32| {
+        rows.iter().map(f).sum::<f32>() / rows.len().max(1) as f32
+    };
+    let tau_near = mean(&|r| r.0);
+    let tau_far = mean(&|r| r.1);
+    let cos_near = mean(&|r| r.2);
+    let cos_far = mean(&|r| r.3);
+    // Per-probe tau GAP (far - near): positive => true neighbours have lower
+    // tau. Report its mean with a bootstrap 95% CI over probes. The gap (an
+    // EFFECT SIZE) is the honest statistic — NOT the win rate, whose climb with
+    // top-k is just estimator-variance reduction, not a sharpening effect (M2).
+    let gaps: Vec<f64> = rows.iter().map(|r| (r.1 - r.0) as f64).collect();
+    let gap_mean = gaps.iter().sum::<f64>() / gaps.len().max(1) as f64;
+    let (lo, hi) = bootstrap_ci(&gaps, 2000);
+    println!("\n## Intra-code Kendall-tau test (top-{} coords PER-PAIR union, M={m}, {} probes)",
+        cfg.topk, rows.len());
+    println!("among b2-lookalikes:   mean cosine   mean tau-distance");
+    println!("  FP32-TRUE neighbours {cos_near:.4}        {tau_near:.4}");
+    println!("  FP32-FAR lookalikes  {cos_far:.4}        {tau_far:.4}");
+    println!("tau GAP (far - near) = {gap_mean:.4}  95% CI [{lo:.4}, {hi:.4}]");
+    let verdict = if lo > 0.0 {
+        "SIGNAL: gap CI strictly > 0 — true neighbours have lower tau (effect, not just win-rate)"
+    } else if hi < 0.0 {
+        "INVERTED: gap CI strictly < 0"
+    } else {
+        "NO SIGNAL: gap CI spans 0 — cannot distinguish from noise"
+    };
+    println!("verdict: {verdict}");
+}
+
+/// Percentile bootstrap 95% CI for the mean of `xs`, `b` resamples.
+/// Deterministic RNG (no Date/rand-of-clock) so the CI is reproducible.
+fn bootstrap_ci(xs: &[f64], b: usize) -> (f64, f64) {
+    if xs.is_empty() {
+        return (0.0, 0.0);
+    }
+    let mut rng = ChaCha8Rng::seed_from_u64(SEED ^ 0xB007);
+    let n = xs.len();
+    let mut means: Vec<f64> = (0..b)
+        .map(|_| {
+            let mut s = 0.0;
+            for _ in 0..n {
+                s += xs[rng.random_range(0..n)];
+            }
+            s / n as f64
+        })
+        .collect();
+    means.sort_by(|a, c| a.partial_cmp(c).unwrap_or(std::cmp::Ordering::Equal));
+    (means[(0.025 * b as f64) as usize], means[((0.975 * b as f64) as usize).min(b - 1)])
+}
+
+/// Low-rank clustered corpus; `noise` is the density dial.
+fn make_corpus(cfg: &Cfg) -> Vec<f32> {
+    let mut rng = ChaCha8Rng::seed_from_u64(SEED);
+    let (d, l) = (cfg.dim, cfg.latent);
+    let mut a = vec![0.0f32; d * l];
+    for x in a.iter_mut() {
+        *x = gauss(&mut rng);
+    }
+    let mut protos = vec![0.0f32; cfg.clusters * l];
+    for x in protos.iter_mut() {
+        *x = gauss(&mut rng);
+    }
+    let mut corpus = vec![0.0f32; cfg.n * d];
+    for i in 0..cfg.n {
+        let c = rng.random_range(0..cfg.clusters);
+        let mut z = vec![0.0f32; l];
+        for j in 0..l {
+            z[j] = protos[c * l + j] + cfg.noise * gauss(&mut rng);
+        }
+        let row = &mut corpus[i * d..(i + 1) * d];
+        for ii in 0..d {
+            let mut acc = 0.0f32;
+            for j in 0..l {
+                acc += a[ii * l + j] * z[j];
+            }
+            row[ii] = acc;
+        }
+        let nrm: f32 = row.iter().map(|v| v * v).sum::<f32>().sqrt();
+        if nrm > 0.0 {
+            for v in row.iter_mut() {
+                *v /= nrm;
+            }
+        }
+    }
+    corpus
+}
diff --git a/experiments/ordinal-routing-research/density_collapse_results.md b/experiments/ordinal-routing-research/density_collapse_results.md
new file mode 100644
index 00000000..e1801a26
--- /dev/null
+++ b/experiments/ordinal-routing-research/density_collapse_results.md
@@ -0,0 +1,142 @@
+# Density collapse in ordvec: mechanism and a recoverable signal
+
+What "density collapse" actually is in RankQuant, and whether the lost signal is
+recoverable. Source: `examples/density_collapse.rs`. Ground truth = FP32 cosine,
+so this probe cannot miscalibrate (unlike the withdrawn number-variance probe).
+
+## Mechanism (corrected mid-investigation)
+
+First model was wrong: I expected b=2 codes to COLLIDE (identical) in dense
+regions. They don't — the rank transform is a permutation of 0..D, so a b=2 code
+is a length-D sequence of bucket ids; two docs collide only if all D coordinates
+bucket identically (astronomically rare on continuous data; measured collision
+rate ≈ 0 even at noise=0.08).
+
+The real mechanism is NEAR-collision: docs whose b=2 codes are HAMMING-CLOSE
+(differ in few coordinate-buckets) are what the scorer cannot separate.
+- noise=0.30: mean nearest-code Hamming = 59.9 / 256; ~123 docs within ~1.5×.
+- noise=0.10: mean nearest-code Hamming = 18.4 / 256; ~58 docs within ~1.5×.
+Denser clusters → tighter Hamming neighbourhoods → more docs the b=2 kernel
+conflates. THAT is collapse: not identical codes, but codes too close for the
+2-bit resolution to rank correctly.
+
+## The recoverable signal (positive result)
+
+Within each probe's b=2 "lookalike" set (M=40 Hamming-nearest), split by TRUE
+FP32 cosine into the true-neighbour half vs the far-lookalike half. Compare the
+top-k (k=16) Kendall-tau distance of the probe's coordinate ORDER to each:
+
+| density | true-nbr cosine | true-nbr tau | far cosine | far tau | tau gap (far − near) | 95% CI |
+|---------|-----------------|--------------|------------|---------|----------------------|--------|
+| noise=0.30 | 0.9397 | 0.3345 | 0.9250 | 0.3656 | 0.0311 | [0.0291, 0.0331] |
+| noise=0.10 | 0.9926 | 0.1423 | 0.9905 | 0.1602 | 0.0179 | [0.0166, 0.0191] |
+
+**The FP32-true neighbours have systematically LOWER intra-code Kendall-tau than
+the b2-lookalikes the code conflates them with — gap ≈ 0.018–0.031, 95% CI
+strictly above 0 at both densities** (reported as the tau gap, not the retracted
+win-rate — see the M1/M2 correction below). The fine permutation order (which
+coordinate outranks which, WITHIN the top bucket) separates true neighbours from
+false lookalikes.
+
+## Why this matters
+
+That fine order is ALREADY in the full `Rank` code (u16 per coord) and is
+exactly what RankQuant b=2 discards. So the signal to break dense-region ties is
+recoverable WITHOUT new storage — as a rerank stage on Kendall-tau of the top-k
+coordinates of the b=2 survivors. This is the data-dependent, on-the-permutohedron
+version of "uncover structure in dense regions": the exploitable combinatorial
+structure is in S_D (the order the encoder induced), NOT on the integer line
+(primes / Sacks spiral, which act on the index and carry no corpus information —
+see the conjecture thread).
+
+## REAL EMBEDDINGS (nomic-embed-text via ollama on RTX 5080, 768-d)
+
+Corpus: 8665 real sentences extracted from this repo's markdown + Rust, embedded
+with `nomic-embed-text` (GPU-resident via ollama). Generator:
+`examples/embed_ollama.py`. Run: `density_collapse --corpus-npy repo_real.npy`.
+
+First real-data facts:
+- A low intrinsic dimension for nomic-embed-text is a working hypothesis, not a
+  measurement: deep vision-CNN IDs are low-tens (Ansuini), but there is no
+  published sentence-transformer figure, and our TwoNN probe (sphere-validated
+  only; biased OLS estimate) records no clean nomic ID here. Treat low-tens as a
+  cross-domain analogy that motivates the density test, not an established fact
+  (see `twonn_id_results.md`).
+- Real embeddings are FAR more entangled at b=2 than synthetic clusters: mean
+  nearest-code Hamming 314/768, and ~5083 of 8665 docs sit in each probe's
+  b2-lookalike shell (vs ~60–120 synthetic). Real geometry, much denser collapse.
+
+### CORRECTED after a second adversarial review (M1 + M2)
+
+The first writeup reported a win-rate climb 0.667→0.930 with top-k as "the
+signature of a real effect." TWO review findings invalidated that framing, and
+the test was rewritten:
+
+- **M2 (the win-rate climb was an artifact):** the per-probe tau GAP is flat
+  across top-k; only the win-rate estimator's variance shrinks as k grows,
+  mechanically pushing the win rate up. Win rate was the wrong statistic.
+- **M1 (circularity):** tau was computed on the PROBE'S OWN top-k coords, which
+  couples tau to cosine and makes the test near-tautological. Fixed to use the
+  per-PAIR UNION of top coords, chosen independently of the cosine ranking.
+
+Rewritten test reports the tau GAP (far − near) as an effect size, with a
+bootstrap 95% CI over probes (de-circularized coords):
+
+| top-k | tau gap (far − near) | 95% CI | verdict |
+|-------|----------------------|--------|---------|
+| 8  | 0.0420 | [0.0380, 0.0463] | signal |
+| 16 | 0.0417 | [0.0381, 0.0454] | signal |
+| 32 | 0.0440 | [0.0409, 0.0472] | signal |
+| 64 | 0.0453 | [0.0424, 0.0483] | signal |
+
+**Honest conclusion:** there IS a real effect — true neighbours have lower
+intra-code Kendall-tau than the b2-lookalikes the code conflates them with, gap
+≈ 0.04, CI strictly above 0 at every top-k. But it is MODEST and FLAT, not the
+dramatic "sharpening" originally claimed. The win-rate monotonicity is retracted.
+The lever (intra-code permutation order in the Rank code) shows a measurable but
+small separation on this corpus/model; whether ~0.04 tau converts to useful
+recall gain vs simply using b=4 is the unanswered deployment question.
+
+## Caveats / open
+
+- Real corpus here is repo-domain (md+rust), 8665 docs — narrow domain, modest
+  size. Confirm on a larger, broader corpus (MS MARCO / Wikipedia passages).
+- **RESOLVED (negative):** the tau-rerank-vs-b4 bake-off was run
+  (tau_rerank_bakeoff_results.md). b=4 wins decisively even at the tau ceiling
+  (real: b4 0.942 vs tau 0.597, and tau is worse than b2's own 0.898). The
+  ~0.04 tau gap is a real binary discriminator but too weak to ORDER candidates;
+  it does not convert to retrieval value. Conclusion: just use b=4. The signal
+  is real-but-inert — no ordvec feature follows.
+- Kendall-tau is computed on FP32 values restricted to the per-pair union of
+  top-k coords; a deployable version computes it on the stored ranks.
+- Single corpus, single model (nomic-embed-text), no cross-encoder check — do
+  NOT read "confirmed" generality from one narrow corpus. Repeat on ≥1 more
+  encoder before any strong claim.
+
+Reproduce (synthetic):
+```
+cargo run --release --example density_collapse                 # noise=0.30
+cargo run --release --example density_collapse -- --noise 0.10 # denser
+```
+
+Reproduce (real embeddings — full recorded procedure, was an E4 repro gap):
+```
+# 1. extract repo sentences (md + rust, 30..300 chars, >=4 words, deduped):
+python - <<'PY'
+import re, glob, os, tempfile
+texts=[]
+for f in glob.glob("**/*.md",recursive=True)+glob.glob("**/*.rs",recursive=True):
+    if "target" in f.split(os.sep) or ".git" in f.split(os.sep): continue
+    t=re.sub(r'```.*?```','',open(f,encoding='utf-8',errors='ignore').read(),flags=re.S)
+    for line in re.split(r'(?<=[.!?])\s+|\n',t):
+        s=line.strip(" #-*/>|`").strip()
+        if 30<=len(s)<=300 and re.search(r'[a-zA-Z]{4}',s) and s.count(' ')>=4: texts.append(s)
+seen=set(); out=[s for s in texts if not (s in seen or seen.add(s))]
+open(os.path.join(tempfile.gettempdir(),"repo_texts.txt"),"w",encoding="utf-8").write("\n".join(out))
+print(len(out),"sentences")
+PY
+# 2. embed via ollama nomic-embed-text (GPU) -> .npy:
+python examples/embed_ollama.py --texts "$TMP/repo_texts.txt" --out "$TMP/repo_real.npy" --n 9000 --batch 128
+# 3. run the probe:
+cargo run --release --example density_collapse -- --corpus-npy "$TMP/repo_real.npy" --topk 32
+```
diff --git a/experiments/ordinal-routing-research/embed_ollama.py b/experiments/ordinal-routing-research/embed_ollama.py
new file mode 100644
index 00000000..aa16f640
--- /dev/null
+++ b/experiments/ordinal-routing-research/embed_ollama.py
@@ -0,0 +1,114 @@
+#!/usr/bin/env python3
+"""Generate real sentence embeddings via ollama (GPU) -> .npy for the probes.
+
+ollama runs nomic-embed-text (768-d) resident on the GPU. We embed a corpus of
+short texts and write a 2-D little-endian float32 C-order .npy matching the
+format bench_rank / the conjecture probes read (--corpus-npy).
+
+Corpus source (in priority order):
+  1. --texts FILE   : one text per line
+  2. built-in       : a few thousand templated sentences across many topics,
+                      enough varied geometry for a first real-encoder pass
+                      (no external download, no datasets lib).
+
+Usage:
+  python examples/embed_ollama.py --out corpus_real.npy --n 5000
+  python examples/embed_ollama.py --texts mylines.txt --out corpus_real.npy
+"""
+import argparse, json, struct, sys, urllib.request
+
+OLLAMA = "http://localhost:11434/api/embed"
+MODEL = "nomic-embed-text"
+
+
+def embed_batch(texts):
+    req = urllib.request.Request(
+        OLLAMA,
+        data=json.dumps({"model": MODEL, "input": texts}).encode(),
+        headers={"Content-Type": "application/json"},
+    )
+    with urllib.request.urlopen(req, timeout=120) as r:
+        embs = json.load(r)["embeddings"]
+    # E2 fix: ollama must return exactly one vector per input, IN ORDER.
+    # A mismatch would silently misalign rows against the source texts.
+    if len(embs) != len(texts):
+        raise RuntimeError(
+            f"embedding count {len(embs)} != input count {len(texts)} "
+            "(row misalignment) — aborting rather than writing corrupt .npy"
+        )
+    return embs
+
+
+def build_builtin_corpus(n):
+    # Templated sentences spanning many semantic clusters -> real anisotropy,
+    # real cluster structure, without an external dataset. Deterministic.
+    subjects = ["the engineer", "a chef", "the astronomer", "my neighbor",
+                "the orchestra", "a startup", "the river", "an old library",
+                "the algorithm", "a mountain village", "the immune system",
+                "a jazz quartet", "the supply chain", "an ancient manuscript",
+                "the coral reef", "a chess grandmaster", "the power grid",
+                "a desert caravan", "the neural network", "a vineyard"]
+    verbs = ["studied", "rebuilt", "abandoned", "celebrated", "measured",
+             "optimized", "flooded", "catalogued", "trained", "harvested",
+             "defended", "improvised", "rerouted", "deciphered", "mapped"]
+    objects = ["under heavy rain", "for three decades", "with great precision",
+               "against all odds", "in the summer of 1998", "without any funding",
+               "across twelve countries", "before the deadline", "at dawn",
+               "using only open data", "despite the noise", "on a tight budget",
+               "in complete silence", "after the merger", "beyond the horizon"]
+    out = []
+    i = 0
+    while len(out) < n:
+        s = subjects[i % len(subjects)]
+        v = verbs[(i // len(subjects)) % len(verbs)]
+        o = objects[(i // (len(subjects) * len(verbs))) % len(objects)]
+        out.append(f"{s} {v} {o}.")
+        i += 1
+    return out[:n]
+
+
+def write_npy(path, vecs):
+    if not vecs:
+        raise ValueError("no vectors to write (empty corpus?)")
+    n = len(vecs)
+    dim = len(vecs[0])
+    header = ("{'descr': '<f4', 'fortran_order': False, "
+              f"'shape': ({n}, {dim}), }}")
+    hb = header.encode()
+    pad = (64 - (10 + len(hb) + 1) % 64) % 64
+    hb = hb + b" " * pad + b"\n"
+    with open(path, "wb") as f:
+        f.write(b"\x93NUMPY" + bytes([1, 0]))
+        f.write(struct.pack("<H", len(hb)))
+        f.write(hb)
+        for v in vecs:
+            f.write(struct.pack(f"<{dim}f", *v))
+    print(f"# wrote {path}: {n} x {dim} f32")
+
+
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--out", required=True)
+    ap.add_argument("--texts")
+    ap.add_argument("--n", type=int, default=5000)
+    ap.add_argument("--batch", type=int, default=64)
+    args = ap.parse_args()
+
+    if args.texts:
+        with open(args.texts, encoding="utf-8") as fh:
+            texts = [ln.strip() for ln in fh if ln.strip()][: args.n]
+    else:
+        texts = build_builtin_corpus(args.n)
+    print(f"# embedding {len(texts)} texts via ollama/{MODEL} (GPU)")
+
+    vecs = []
+    for i in range(0, len(texts), args.batch):
+        batch = texts[i : i + args.batch]
+        vecs.extend(embed_batch(batch))
+        if (i // args.batch) % 10 == 0:
+            print(f"  {len(vecs)}/{len(texts)}", file=sys.stderr)
+    write_npy(args.out, vecs)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/experiments/ordinal-routing-research/fib_address_gate.rs b/experiments/ordinal-routing-research/fib_address_gate.rs
new file mode 100644
index 00000000..99ae72da
--- /dev/null
+++ b/experiments/ordinal-routing-research/fib_address_gate.rs
@@ -0,0 +1,285 @@
+//! CONTEXT-BOXED, PRE-REGISTERED PROBE — isolated by request.
+//!
+//! This is a standalone curiosity gate. It does NOT touch the routing harness,
+//! does NOT write to benchmarks/, and its result MUST NOT be used to reframe any
+//! other experiment on this branch. It answers ONE question and stops.
+//!
+//! QUESTION (the user's hypothesis, sharpened):
+//!   Is there a closed-form golden-angle ("sine spiral") ADDRESS on the data such
+//!   that address-distance preserves cosine locality? If yes, retrieval could be
+//!   COMPUTED (invert the address) instead of INDEXED (search). If no, the spiral
+//!   address is inert — the same wall the ambient-probe experiment hit.
+//!
+//! METHOD (ceiling test, like the tau bake-off):
+//!   Tile dim into d/3 triples. For a unit vector, each triple -> point on S^2;
+//!   its ADDRESS on that triple is the nearest golden-spiral index (argmin over R
+//!   spiral points — the EXACT nearest, an upper bound on what a closed-form
+//!   inverse could achieve). Address(v) = the tuple of per-triple indices.
+//!   Address-distance = sum of |Δindex| across triples.
+//!   Primary metric: Spearman rho(address-distance, FP32 cosine-distance) over a
+//!   random sample of corpus pairs.
+//!
+//! CONTROLS (golden must beat BOTH to mean anything):
+//!   - random-pts : same machinery, spiral replaced by R random S^2 points
+//!                  (isolates "does the GOLDEN structure help" vs "any quantizer").
+//!   - sign       : ordvec's existing lever — sign bits, Hamming distance vs cosine
+//!                  (isolates "does the spiral beat what ordvec already has").
+//!
+//! PRE-REGISTERED VERDICT (fixed BEFORE running — do not move):
+//!   PASS         : golden rho >= 0.50  AND  golden beats both controls by >= 0.05.
+//!                  => locality-preserving address; computed-retrieval worth pursuing.
+//!   INCONCLUSIVE : 0.30 <= golden rho < 0.50, or golden within 0.05 of a control.
+//!   FAIL         : golden rho < 0.30, or golden <= either control.
+//!                  => spiral address is inert; stop, same wall as ambient probes.
+//!
+//! Run: cargo run --release --example fib_address_gate -- \
+//!          --corpus-npy /tmp/repo_corpus.npy
+//! Output is a single table; nothing is committed.
+
+// Research probe: the header keeps aligned multi-line doc lists for readability.
+#![allow(clippy::doc_overindented_list_items)]
+
+use rand::{RngExt, SeedableRng};
+use rand_chacha::ChaCha8Rng;
+use rayon::prelude::*;
+
+const SEED: u64 = 1;
+
+fn load_npy_f32(path: &str) -> (Vec<f32>, usize, usize) {
+    let bytes = std::fs::read(path).expect("read npy");
+    assert!(
+        bytes.len() >= 10 && &bytes[..6] == b"\x93NUMPY",
+        "not a numpy file"
+    );
+    let major = bytes[6];
+    assert!(
+        major == 1 || major == 2,
+        "unsupported numpy .npy major version {major}"
+    );
+    if major == 2 {
+        assert!(bytes.len() >= 12, "truncated numpy v2 header");
+    }
+    let (hlen, hstart) = if major == 1 {
+        (u16::from_le_bytes([bytes[8], bytes[9]]) as usize, 10)
+    } else {
+        (
+            u32::from_le_bytes([bytes[8], bytes[9], bytes[10], bytes[11]]) as usize,
+            12,
+        )
+    };
+    assert!(hstart + hlen <= bytes.len(), "truncated numpy header");
+    let header = std::str::from_utf8(&bytes[hstart..hstart + hlen]).expect("utf8 header");
+    assert!(header.contains("'descr': '<f4'"), "expected <f4 dtype");
+    let after = &header[header.find("'shape':").expect("shape")..];
+    let inner = &after[after.find('(').unwrap() + 1..after.find(')').unwrap()];
+    let dims: Vec<usize> = inner
+        .split(',')
+        .filter_map(|s| s.trim().parse().ok())
+        .collect();
+    let (n, dim) = (dims[0], dims[1]);
+    let dstart = hstart + hlen;
+    let mut out = vec![0.0f32; n * dim];
+    for (i, ch) in bytes[dstart..].chunks_exact(4).enumerate() {
+        out[i] = f32::from_le_bytes([ch[0], ch[1], ch[2], ch[3]]);
+    }
+    (out, n, dim)
+}
+
+fn l2_normalize_rows(v: &mut [f32], dim: usize) {
+    for i in 0..v.len() / dim {
+        let row = &mut v[i * dim..(i + 1) * dim];
+        let nrm: f32 = row.iter().map(|x| x * x).sum::<f32>().sqrt();
+        if nrm > 0.0 {
+            for x in row.iter_mut() {
+                *x /= nrm;
+            }
+        }
+    }
+}
+
+fn gauss(rng: &mut ChaCha8Rng) -> f32 {
+    let u1: f32 = rng.random_range(1e-9..1.0);
+    let u2: f32 = rng.random_range(0.0..1.0);
+    (-2.0 * u1.ln()).sqrt() * (std::f32::consts::TAU * u2).cos()
+}
+
+/// Golden-angle spiral point i of R on S^2 (the "sine spiral" address grid).
+fn spiral_point(i: usize, r: usize) -> [f32; 3] {
+    let golden = std::f32::consts::PI * (3.0 - 5.0f32.sqrt());
+    let z = if r == 1 {
+        0.0
+    } else {
+        1.0 - 2.0 * (i as f32 + 0.5) / r as f32
+    };
+    let rad = (1.0 - z * z).max(0.0).sqrt();
+    let theta = i as f32 * golden;
+    [rad * theta.cos(), rad * theta.sin(), z]
+}
+
+/// R random points on S^2 (the structure-free control grid), seeded deterministically.
+fn random_points(r: usize) -> Vec<[f32; 3]> {
+    let mut rng = ChaCha8Rng::seed_from_u64(SEED ^ 0x5A5A_5A5A);
+    (0..r)
+        .map(|_| {
+            let mut p = [gauss(&mut rng), gauss(&mut rng), gauss(&mut rng)];
+            let n = (p[0] * p[0] + p[1] * p[1] + p[2] * p[2]).sqrt();
+            for x in p.iter_mut() {
+                *x /= n;
+            }
+            p
+        })
+        .collect()
+}
+
+/// Per-triple address: tuple of nearest-grid-point indices (argmin = exact nearest,
+/// the CEILING for any closed-form inverse). grid is the R points on S^2.
+fn address(v: &[f32], dim: usize, grid: &[[f32; 3]]) -> Vec<u16> {
+    let n_tri = dim / 3;
+    let mut addr = Vec::with_capacity(n_tri);
+    for t in 0..n_tri {
+        let s = &v[t * 3..t * 3 + 3];
+        let nrm = (s[0] * s[0] + s[1] * s[1] + s[2] * s[2]).sqrt().max(1e-12);
+        let p = [s[0] / nrm, s[1] / nrm, s[2] / nrm];
+        // nearest grid point by max dot (points are unit) == min distance
+        let mut best = 0usize;
+        let mut best_dot = f32::MIN;
+        for (gi, g) in grid.iter().enumerate() {
+            let d = p[0] * g[0] + p[1] * g[1] + p[2] * g[2];
+            if d > best_dot {
+                best_dot = d;
+                best = gi;
+            }
+        }
+        addr.push(best as u16);
+    }
+    addr
+}
+
+fn addr_dist(a: &[u16], b: &[u16]) -> f32 {
+    a.iter()
+        .zip(b)
+        .map(|(x, y)| (*x as i32 - *y as i32).unsigned_abs() as f32)
+        .sum()
+}
+
+fn sign_bits(v: &[f32]) -> Vec<u64> {
+    let mut bits = vec![0u64; v.len().div_ceil(64)];
+    for (i, &x) in v.iter().enumerate() {
+        if x >= 0.0 {
+            bits[i / 64] |= 1u64 << (i % 64);
+        }
+    }
+    bits
+}
+
+fn hamming(a: &[u64], b: &[u64]) -> f32 {
+    a.iter()
+        .zip(b)
+        .map(|(x, y)| (x ^ y).count_ones())
+        .sum::<u32>() as f32
+}
+
+/// Spearman rho = Pearson on ranks. Ties broken by stable index (adequate for a gate).
+fn spearman(xs: &[f32], ys: &[f32]) -> f32 {
+    let rank = |v: &[f32]| -> Vec<f32> {
+        let mut idx: Vec<usize> = (0..v.len()).collect();
+        idx.sort_by(|&a, &b| v[a].partial_cmp(&v[b]).unwrap());
+        let mut r = vec![0.0f32; v.len()];
+        for (rank, &i) in idx.iter().enumerate() {
+            r[i] = rank as f32;
+        }
+        r
+    };
+    let rx = rank(xs);
+    let ry = rank(ys);
+    let n = rx.len() as f32;
+    let mx = rx.iter().sum::<f32>() / n;
+    let my = ry.iter().sum::<f32>() / n;
+    let mut cov = 0.0;
+    let mut vx = 0.0;
+    let mut vy = 0.0;
+    for i in 0..rx.len() {
+        let dx = rx[i] - mx;
+        let dy = ry[i] - my;
+        cov += dx * dy;
+        vx += dx * dx;
+        vy += dy * dy;
+    }
+    cov / (vx.sqrt() * vy.sqrt()).max(1e-12)
+}
+
+fn main() {
+    let mut corpus_path = None;
+    let mut a = std::env::args().skip(1);
+    while let Some(x) = a.next() {
+        if x == "--corpus-npy" {
+            corpus_path = Some(a.next().unwrap());
+        }
+    }
+    let path = corpus_path.expect("--corpus-npy required (real embeddings)");
+    let (mut corpus, n, dim) = load_npy_f32(&path);
+    l2_normalize_rows(&mut corpus, dim);
+    let n_pairs = 200_000usize;
+    let mut rng = ChaCha8Rng::seed_from_u64(SEED);
+    let pairs: Vec<(usize, usize)> = (0..n_pairs)
+        .map(|_| {
+            let i = rng.random_range(0..n);
+            let mut j = rng.random_range(0..n);
+            if j == i {
+                j = (j + 1) % n;
+            }
+            (i, j)
+        })
+        .collect();
+
+    let row = |i: usize| &corpus[i * dim..(i + 1) * dim];
+    let cos_dist: Vec<f32> = pairs
+        .par_iter()
+        .map(|&(i, j)| {
+            let d: f32 = row(i).iter().zip(row(j)).map(|(a, b)| a * b).sum();
+            1.0 - d
+        })
+        .collect();
+
+    println!(
+        "# fib_address_gate (CONTEXT-BOXED, pre-registered). corpus {n}x{dim}, pairs {n_pairs}"
+    );
+    println!("# Spearman rho vs FP32 cosine-distance (higher = better locality preservation)");
+    println!("grid\tR\trho");
+
+    // sign baseline (no R)
+    let signs: Vec<Vec<u64>> = (0..n).into_par_iter().map(|i| sign_bits(row(i))).collect();
+    let sign_d: Vec<f32> = pairs
+        .par_iter()
+        .map(|&(i, j)| hamming(&signs[i], &signs[j]))
+        .collect();
+    let rho_sign = spearman(&sign_d, &cos_dist);
+    println!("sign\t-\t{rho_sign:.4}");
+
+    let mut best_golden = f32::MIN;
+    for &r in &[8usize, 32, 128] {
+        for (name, grid) in [
+            (
+                "golden",
+                (0..r).map(|i| spiral_point(i, r)).collect::<Vec<_>>(),
+            ),
+            ("random-pts", random_points(r)),
+        ] {
+            let addrs: Vec<Vec<u16>> = (0..n)
+                .into_par_iter()
+                .map(|i| address(row(i), dim, &grid))
+                .collect();
+            let ad: Vec<f32> = pairs
+                .par_iter()
+                .map(|&(i, j)| addr_dist(&addrs[i], &addrs[j]))
+                .collect();
+            let rho = spearman(&ad, &cos_dist);
+            if name == "golden" && rho > best_golden {
+                best_golden = rho;
+            }
+            println!("{name}\t{r}\t{rho:.4}");
+        }
+    }
+    println!("\n# best golden rho = {best_golden:.4}");
+    println!("# PRE-REGISTERED: PASS if >=0.50 and beats controls by >=0.05; FAIL if <0.30 or <= a control.");
+}
diff --git a/experiments/ordinal-routing-research/fib_directions.rs b/experiments/ordinal-routing-research/fib_directions.rs
new file mode 100644
index 00000000..d2b77ccb
--- /dev/null
+++ b/experiments/ordinal-routing-research/fib_directions.rs
@@ -0,0 +1,581 @@
+//! Direction-PLACEMENT bake-off for the training-free routing layer.
+//!
+//! Open thread from the routing research (PR #211): `shard_recall.rs` always
+//! draws iid-Gaussian projection directions — low-discrepancy *placement* of the
+//! directions themselves was never an arm. This rig tests one hypothesis and is
+//! built to KILL it cheaply if it's false:
+//!
+//!   Shared-index triple-tiled golden-angle directions (a 2-sphere spiral per
+//!   coord-triple, all triples coupled by ONE Fibonacci index so they cohere on
+//!   the permutahedron) need FEWER candidates-scanned for matched recall@k than
+//!   iid-Gaussian — strongest at small R and at Fibonacci R, decaying as dim
+//!   grows (concentration of measure).
+//!
+//! Two arms, EVERYTHING else held identical to shard_recall's "aligned" calib
+//! (single shared base_width, phase=0). Direction construction is the ONLY var.
+//!   gaussian   — iid Gaussian normalized (the control)
+//!   fib-spiral — shared-index triple-tiled golden-angle, per-triple rotated
+//!
+//! Honesty checks baked in (printed): at R=1 the arms MUST coincide (one
+//! direction cannot carry low-discrepancy structure) — divergence => harness
+//! bug, not a result. SEED=1 => identical tables across runs.
+//!
+//! Run (synthetic dim=256): cargo run --release --example fib_directions
+//! Run (real dim=768):      cargo run --release --example fib_directions -- \
+//!                              --corpus-npy repo_real.npy --queries-npy repo_q.npy
+//! No external data on the synthetic path, no BLAS.
+
+use rand::{RngExt, SeedableRng};
+use rand_chacha::ChaCha8Rng;
+use rayon::prelude::*;
+
+/// Base seed; overridable with --seed for the reseed-stability honesty check.
+/// A real effect must survive reseeding (changes gaussian directions AND the
+/// fib/project per-triple rotations); a seed-specific gap is an artifact.
+static SEED: std::sync::atomic::AtomicU64 = std::sync::atomic::AtomicU64::new(1);
+fn seed() -> u64 {
+    SEED.load(std::sync::atomic::Ordering::Relaxed)
+}
+
+struct Cfg {
+    dim: usize,
+    n: usize,
+    n_queries: usize,
+    k: usize,
+    latent_dim: usize,
+    n_clusters: usize,
+    cone: f32,
+    r_values: Vec<usize>,
+    no_triple_rotation: bool,
+    corpus_npy: Option<String>,
+    queries_npy: Option<String>,
+}
+
+fn parse() -> Cfg {
+    let mut c = Cfg {
+        dim: 256,
+        n: 50_000,
+        n_queries: 200,
+        k: 10,
+        latent_dim: 64,
+        n_clusters: 200,
+        // Cone anisotropy: real transformer embeddings are NOT gaussian/centered —
+        // they sit in a narrow cone (a dominant shared mean direction). `cone` is
+        // the weight of a fixed shared axis added to every vector before
+        // normalization; 0.0 = centered (old behaviour), larger = tighter cone.
+        // Default tuned so synthetic mean-resultant-length is in the ballpark of
+        // nomic (printed at startup so it can be matched to real --corpus-npy data).
+        cone: 3.0,
+        // Fibonacci R values (1,2,3,5,8,13,21) interleaved with non-Fib (4,16)
+        // so a Fibonacci-specific bump is legible against its neighbours.
+        r_values: vec![1, 2, 3, 4, 5, 8, 13, 16, 21],
+        no_triple_rotation: false,
+        corpus_npy: None,
+        queries_npy: None,
+    };
+    let mut a = std::env::args().skip(1);
+    while let Some(x) = a.next() {
+        match x.as_str() {
+            "--dim" => c.dim = a.next().unwrap().parse().unwrap(),
+            "--n" => c.n = a.next().unwrap().parse().unwrap(),
+            "--queries" => c.n_queries = a.next().unwrap().parse().unwrap(),
+            "--k" => c.k = a.next().unwrap().parse().unwrap(),
+            "--latent" => c.latent_dim = a.next().unwrap().parse().unwrap(),
+            "--clusters" => c.n_clusters = a.next().unwrap().parse().unwrap(),
+            "--cone" => c.cone = a.next().unwrap().parse().unwrap(),
+            "--seed" => SEED.store(
+                a.next().unwrap().parse().unwrap(),
+                std::sync::atomic::Ordering::Relaxed,
+            ),
+            "--no-triple-rotation" => c.no_triple_rotation = true,
+            "--corpus-npy" => c.corpus_npy = Some(a.next().unwrap()),
+            "--queries-npy" => c.queries_npy = Some(a.next().unwrap()),
+            other => panic!("unknown arg {other}"),
+        }
+    }
+    c
+}
+
+/// Minimal NumPy v1/v2 .npy reader for 2-D little-endian f32 C-order arrays
+/// (same contract as shard_recall / bench_rank).
+fn load_npy_f32(path: &str) -> (Vec<f32>, usize, usize) {
+    let bytes = std::fs::read(path).expect("read npy");
+    assert!(
+        bytes.len() >= 10 && &bytes[..6] == b"\x93NUMPY",
+        "not a numpy file"
+    );
+    let major = bytes[6];
+    assert!(
+        major == 1 || major == 2,
+        "unsupported numpy .npy major version {major}"
+    );
+    if major == 2 {
+        assert!(bytes.len() >= 12, "truncated numpy v2 header");
+    }
+    let (hlen, hstart) = if major == 1 {
+        (u16::from_le_bytes([bytes[8], bytes[9]]) as usize, 10)
+    } else {
+        (
+            u32::from_le_bytes([bytes[8], bytes[9], bytes[10], bytes[11]]) as usize,
+            12,
+        )
+    };
+    assert!(hstart + hlen <= bytes.len(), "truncated numpy header");
+    let header = std::str::from_utf8(&bytes[hstart..hstart + hlen]).expect("utf8 header");
+    assert!(header.contains("'descr': '<f4'"), "expected <f4 dtype");
+    assert!(
+        header.contains("'fortran_order': False"),
+        "expected C order"
+    );
+    let after = &header[header.find("'shape':").expect("shape")..];
+    let inner = &after[after.find('(').unwrap() + 1..after.find(')').unwrap()];
+    let dims: Vec<usize> = inner
+        .split(',')
+        .filter_map(|s| s.trim().parse().ok())
+        .collect();
+    assert_eq!(dims.len(), 2, "expected 2-D array");
+    let (n, dim) = (dims[0], dims[1]);
+    let dstart = hstart + hlen;
+    assert_eq!(bytes.len() - dstart, n * dim * 4, "data length mismatch");
+    let mut out = vec![0.0f32; n * dim];
+    for (i, ch) in bytes[dstart..].chunks_exact(4).enumerate() {
+        out[i] = f32::from_le_bytes([ch[0], ch[1], ch[2], ch[3]]);
+    }
+    (out, n, dim)
+}
+
+fn l2_normalize_rows(v: &mut [f32], dim: usize) {
+    let n = v.len() / dim;
+    for i in 0..n {
+        let row = &mut v[i * dim..(i + 1) * dim];
+        let nrm: f32 = row.iter().map(|x| x * x).sum::<f32>().sqrt();
+        if nrm > 0.0 {
+            for x in row.iter_mut() {
+                *x /= nrm;
+            }
+        }
+    }
+}
+
+fn gauss(rng: &mut ChaCha8Rng) -> f32 {
+    let u1: f32 = rng.random_range(1e-9..1.0);
+    let u2: f32 = rng.random_range(0.0..1.0);
+    (-2.0 * u1.ln()).sqrt() * (std::f32::consts::TAU * u2).cos()
+}
+
+/// Low-rank clustered corpus + matched queries (same construction as shard_recall).
+fn make_corpus(cfg: &mut Cfg) -> (Vec<f32>, Vec<f32>) {
+    if let (Some(cp), Some(qp)) = (cfg.corpus_npy.clone(), cfg.queries_npy.clone()) {
+        let (mut corpus, n, d) = load_npy_f32(&cp);
+        let (mut queries, nq, dq) = load_npy_f32(&qp);
+        assert_eq!(d, dq, "corpus/query dim mismatch");
+        l2_normalize_rows(&mut corpus, d);
+        l2_normalize_rows(&mut queries, dq);
+        cfg.dim = d;
+        cfg.n = n;
+        cfg.n_queries = nq;
+        eprintln!("# loaded corpus {n}x{d}, queries {nq}x{dq}");
+        return (corpus, queries);
+    }
+    let mut rng = ChaCha8Rng::seed_from_u64(seed());
+    let d = cfg.dim;
+    let l = cfg.latent_dim;
+    let mut a = vec![0.0f32; d * l];
+    for x in a.iter_mut() {
+        *x = gauss(&mut rng);
+    }
+    let mut protos = vec![0.0f32; cfg.n_clusters * l];
+    for x in protos.iter_mut() {
+        *x = gauss(&mut rng);
+    }
+    // Shared cone axis: one fixed unit direction added (weight `cone`) to every
+    // vector before normalization, making the corpus anisotropic like real
+    // embeddings (a dominant mean direction) rather than centered/gaussian.
+    let mut cone_axis = vec![0.0f32; d];
+    for x in cone_axis.iter_mut() {
+        *x = gauss(&mut rng);
+    }
+    {
+        let nrm: f32 = cone_axis.iter().map(|v| v * v).sum::<f32>().sqrt();
+        for x in cone_axis.iter_mut() {
+            *x /= nrm;
+        }
+    }
+    let cone = cfg.cone;
+    let make = |proto: &[f32], noise: f32, rng: &mut ChaCha8Rng| -> Vec<f32> {
+        let mut z = vec![0.0f32; l];
+        for j in 0..l {
+            z[j] = proto[j] + noise * gauss(rng);
+        }
+        let mut out = vec![0.0f32; d];
+        for i in 0..d {
+            let mut acc = 0.0f32;
+            for j in 0..l {
+                acc += a[i * l + j] * z[j];
+            }
+            out[i] = acc + cone * cone_axis[i];
+        }
+        let nrm: f32 = out.iter().map(|v| v * v).sum::<f32>().sqrt();
+        if nrm > 0.0 {
+            for v in out.iter_mut() {
+                *v /= nrm;
+            }
+        }
+        out
+    };
+    let mut corpus = vec![0.0f32; cfg.n * d];
+    for i in 0..cfg.n {
+        let c = rng.random_range(0..cfg.n_clusters);
+        let e = make(&protos[c * l..(c + 1) * l], 0.3, &mut rng);
+        corpus[i * d..(i + 1) * d].copy_from_slice(&e);
+    }
+    let mut queries = vec![0.0f32; cfg.n_queries * d];
+    for i in 0..cfg.n_queries {
+        let c = rng.random_range(0..cfg.n_clusters);
+        let e = make(&protos[c * l..(c + 1) * l], 0.1, &mut rng);
+        queries[i * d..(i + 1) * d].copy_from_slice(&e);
+    }
+    (corpus, queries)
+}
+
+/// FP32 brute-force cosine top-k ground truth (corpus is L2-normalized already).
+fn ground_truth(corpus: &[f32], queries: &[f32], dim: usize, k: usize) -> Vec<Vec<usize>> {
+    let n = corpus.len() / dim;
+    let nq = queries.len() / dim;
+    (0..nq)
+        .into_par_iter()
+        .map(|qi| {
+            let q = &queries[qi * dim..(qi + 1) * dim];
+            let mut scored: Vec<(usize, f32)> = (0..n)
+                .map(|di| {
+                    let doc = &corpus[di * dim..(di + 1) * dim];
+                    let dot: f32 = q.iter().zip(doc).map(|(a, b)| a * b).sum();
+                    (di, dot)
+                })
+                .collect();
+            scored.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
+            scored.into_iter().take(k).map(|(i, _)| i).collect()
+        })
+        .collect()
+}
+
+/// Mean resultant length R̄ = ‖ (1/n) Σ x_i ‖ over L2-normalized rows.
+/// 0 ⇒ isotropic (directions cancel); →1 ⇒ tight cone (one shared direction).
+/// Printed for BOTH synthetic and real corpora so the synthetic anisotropy can
+/// be matched to nomic's — the whole point of the user's anisotropy note.
+fn mean_resultant_length(corpus: &[f32], dim: usize) -> f32 {
+    let n = corpus.len() / dim;
+    let mut mean = vec![0.0f64; dim];
+    for di in 0..n {
+        let row = &corpus[di * dim..(di + 1) * dim];
+        let nrm: f32 = row.iter().map(|x| x * x).sum::<f32>().sqrt();
+        if nrm > 0.0 {
+            for (m, &x) in mean.iter_mut().zip(row) {
+                *m += (x / nrm) as f64;
+            }
+        }
+    }
+    for m in mean.iter_mut() {
+        *m /= n as f64;
+    }
+    (mean.iter().map(|m| m * m).sum::<f64>().sqrt()) as f32
+}
+
+/// The dominant cone axis of a corpus = its normalized mean direction. Returned
+/// as a unit vector; used by the project-then-spiral arm to deflate the shared
+/// component before placing directions (anisotropic data lives off this axis).
+fn dominant_axis(corpus: &[f32], dim: usize) -> Vec<f32> {
+    let n = corpus.len() / dim;
+    let mut mean = vec![0.0f32; dim];
+    for di in 0..n {
+        let row = &corpus[di * dim..(di + 1) * dim];
+        let nrm: f32 = row.iter().map(|x| x * x).sum::<f32>().sqrt();
+        if nrm > 0.0 {
+            for (m, &x) in mean.iter_mut().zip(row) {
+                *m += x / nrm;
+            }
+        }
+    }
+    let nrm: f32 = mean.iter().map(|x| x * x).sum::<f32>().sqrt();
+    if nrm > 0.0 {
+        for m in mean.iter_mut() {
+            *m /= nrm;
+        }
+    }
+    mean
+}
+
+#[derive(Clone, Copy, PartialEq)]
+enum DirArm {
+    Gaussian,
+    FibSpiral,
+    ProjectSpiral,
+}
+
+impl DirArm {
+    fn name(self) -> &'static str {
+        match self {
+            DirArm::Gaussian => "gaussian",
+            DirArm::FibSpiral => "fib-spiral",
+            DirArm::ProjectSpiral => "project-spiral",
+        }
+    }
+}
+
+/// A single golden-angle spiral point on S^2 for index `i` of `r` total points.
+/// z evenly stepped in [-1,1], azimuth advanced by the golden angle π(3-√5).
+fn spiral_point(i: usize, r: usize) -> [f32; 3] {
+    let golden = std::f32::consts::PI * (3.0 - 5.0f32.sqrt());
+    let z = if r == 1 {
+        0.0
+    } else {
+        1.0 - 2.0 * (i as f32 + 0.5) / r as f32
+    };
+    let rad = (1.0 - z * z).max(0.0).sqrt();
+    let theta = i as f32 * golden;
+    [rad * theta.cos(), rad * theta.sin(), z]
+}
+
+/// Deterministic 3x3 rotation for triple `t`: built by Gram–Schmidt on three
+/// Gaussian columns from a triple-indexed RNG. Data-oblivious and reproducible.
+/// Identity when `no_rotation` (the pure shared-index variant).
+fn triple_rotation(t: usize, no_rotation: bool) -> [[f32; 3]; 3] {
+    if no_rotation {
+        return [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]];
+    }
+    let mut rng = ChaCha8Rng::seed_from_u64(seed() ^ 0xA071_u64.wrapping_mul(t as u64 + 1));
+    let col = |rng: &mut ChaCha8Rng| [gauss(rng), gauss(rng), gauss(rng)];
+    let dot = |a: &[f32; 3], b: &[f32; 3]| a[0] * b[0] + a[1] * b[1] + a[2] * b[2];
+    let norm = |a: &mut [f32; 3]| {
+        let n = dot(a, a).sqrt();
+        if n > 0.0 {
+            for x in a.iter_mut() {
+                *x /= n;
+            }
+        }
+    };
+    let mut e0 = col(&mut rng);
+    norm(&mut e0);
+    let mut v1 = col(&mut rng);
+    let p1 = dot(&v1, &e0);
+    for i in 0..3 {
+        v1[i] -= p1 * e0[i];
+    }
+    norm(&mut v1);
+    // e2 = e0 × e1 (right-handed, guaranteed orthonormal)
+    let e2 = [
+        e0[1] * v1[2] - e0[2] * v1[1],
+        e0[2] * v1[0] - e0[0] * v1[2],
+        e0[0] * v1[1] - e0[1] * v1[0],
+    ];
+    [e0, v1, e2]
+}
+
+/// Build R unit directions of dimension `dim` for one arm.
+/// - Gaussian: iid normal columns, normalized (the control).
+/// - FibSpiral: tile `dim` into d/3 triples; direction i = concat over triples
+///   of (rotation_t · spiral_point(i, R)); ONE shared index i couples triples.
+/// - ProjectSpiral: same as FibSpiral but each direction is then deflated of its
+///   component along `cone_axis` (the data's dominant direction) and renormalized,
+///   so directions live OFF the shared cone where anisotropic data actually splits.
+fn build_dirs(arm: DirArm, r: usize, dim: usize, no_rot: bool, cone_axis: &[f32]) -> Vec<Vec<f32>> {
+    match arm {
+        DirArm::Gaussian => {
+            let mut rng = ChaCha8Rng::seed_from_u64(seed() ^ 0xD132_0000 ^ (r as u64));
+            (0..r)
+                .map(|_| {
+                    let mut v = vec![0.0f32; dim];
+                    for x in v.iter_mut() {
+                        *x = gauss(&mut rng);
+                    }
+                    let nrm: f32 = v.iter().map(|t| t * t).sum::<f32>().sqrt();
+                    for x in v.iter_mut() {
+                        *x /= nrm;
+                    }
+                    v
+                })
+                .collect()
+        }
+        DirArm::FibSpiral | DirArm::ProjectSpiral => {
+            let n_tri = dim.div_ceil(3); // pad the tail triple if dim % 3 != 0
+            let rots: Vec<[[f32; 3]; 3]> = (0..n_tri).map(|t| triple_rotation(t, no_rot)).collect();
+            (0..r)
+                .map(|i| {
+                    let p = spiral_point(i, r); // shared index i across all triples
+                    let mut v = vec![0.0f32; dim];
+                    for (t, rot) in rots.iter().enumerate() {
+                        // rotated triple vector
+                        let rv = [
+                            rot[0][0] * p[0] + rot[1][0] * p[1] + rot[2][0] * p[2],
+                            rot[0][1] * p[0] + rot[1][1] * p[1] + rot[2][1] * p[2],
+                            rot[0][2] * p[0] + rot[1][2] * p[1] + rot[2][2] * p[2],
+                        ];
+                        for (c, &rvc) in rv.iter().enumerate() {
+                            let idx = t * 3 + c;
+                            if idx < dim {
+                                v[idx] = rvc;
+                            }
+                        }
+                    }
+                    if arm == DirArm::ProjectSpiral {
+                        // deflate the dominant cone component, then renormalize
+                        let proj: f32 = v.iter().zip(cone_axis).map(|(a, b)| a * b).sum();
+                        for (x, &c) in v.iter_mut().zip(cone_axis) {
+                            *x -= proj * c;
+                        }
+                    }
+                    let nrm: f32 = v.iter().map(|t| t * t).sum::<f32>().sqrt();
+                    if nrm > 0.0 {
+                        for x in v.iter_mut() {
+                            *x /= nrm;
+                        }
+                    }
+                    v
+                })
+                .collect()
+        }
+    }
+}
+
+/// One projection grid: unit direction + grid width (phase=0 for all arms here).
+struct Proj {
+    dir: Vec<f32>,
+    width: f32,
+}
+
+impl Proj {
+    fn bucket(&self, v: &[f32]) -> i64 {
+        let dot: f32 = self.dir.iter().zip(v).map(|(a, b)| a * b).sum();
+        (dot / self.width).floor() as i64
+    }
+}
+
+type BucketIndex = Vec<std::collections::HashMap<i64, Vec<u32>>>;
+
+fn build_index(projs: &[Proj], corpus: &[f32], dim: usize) -> BucketIndex {
+    let n = corpus.len() / dim;
+    projs
+        .par_iter()
+        .map(|p| {
+            let mut m: std::collections::HashMap<i64, Vec<u32>> = std::collections::HashMap::new();
+            for di in 0..n {
+                let b = p.bucket(&corpus[di * dim..(di + 1) * dim]);
+                m.entry(b).or_default().push(di as u32);
+            }
+            m
+        })
+        .collect()
+}
+
+fn probe_recall(
+    projs: &[Proj],
+    index: &BucketIndex,
+    q: &[f32],
+    rad: i64,
+    truth: &[usize],
+) -> (usize, f32) {
+    use std::collections::HashSet;
+    let mut cand: HashSet<u32> = HashSet::new();
+    for (pi, p) in projs.iter().enumerate() {
+        let b = p.bucket(q);
+        for bb in (b - rad)..=(b + rad) {
+            if let Some(ids) = index[pi].get(&bb) {
+                cand.extend(ids.iter().copied());
+            }
+        }
+    }
+    let hits = truth
+        .iter()
+        .filter(|&&i| cand.contains(&(i as u32)))
+        .count();
+    let recall = hits as f32 / truth.len().max(1) as f32;
+    (cand.len(), recall)
+}
+
+fn run(cfg: &Cfg, corpus: &[f32], queries: &[f32], truth: &[Vec<usize>]) {
+    // base_width: single unit grid → ~sqrt(n) buckets (shard_recall calibration).
+    let spread = 6.0 / (cfg.dim as f32).sqrt();
+    let base_width = spread / (cfg.n as f32).sqrt();
+    let cone_axis = dominant_axis(corpus, cfg.dim);
+    let arms = [DirArm::Gaussian, DirArm::FibSpiral, DirArm::ProjectSpiral];
+
+    let mut pts: Vec<(&'static str, f64, f64)> = Vec::new();
+    println!("# recall vs candidates-scanned (mean over queries), per arm per R");
+    println!("arm\tR\trad\tcand_scanned\trecall@k");
+    for &r in &cfg.r_values {
+        for &arm in &arms {
+            let dirs = build_dirs(arm, r, cfg.dim, cfg.no_triple_rotation, &cone_axis);
+            let projs: Vec<Proj> = dirs
+                .into_iter()
+                .map(|dir| Proj {
+                    dir,
+                    width: base_width,
+                })
+                .collect();
+            let index = build_index(&projs, corpus, cfg.dim);
+            for rad in [0i64, 1, 2, 4] {
+                let stats: Vec<(usize, f32)> = (0..cfg.n_queries)
+                    .into_par_iter()
+                    .map(|qi| {
+                        let q = &queries[qi * cfg.dim..(qi + 1) * cfg.dim];
+                        probe_recall(&projs, &index, q, rad, &truth[qi])
+                    })
+                    .collect();
+                let mean_cand = stats.iter().map(|s| s.0 as f64).sum::<f64>() / stats.len() as f64;
+                let mean_rec = stats.iter().map(|s| s.1 as f64).sum::<f64>() / stats.len() as f64;
+                println!(
+                    "{}\t{r}\t{rad}\t{:.0}\t{:.4}",
+                    arm.name(),
+                    mean_cand,
+                    mean_rec
+                );
+                pts.push((arm.name(), mean_cand, mean_rec));
+            }
+        }
+    }
+
+    println!("\n# FAIR ENVELOPE: max recall@k at candidates-scanned <= budget");
+    let budgets = [1000.0, 2000.0, 4000.0, 8000.0, 16000.0];
+    print!("budget");
+    for a in &arms {
+        print!("\t{}", a.name());
+    }
+    println!();
+    for &bud in &budgets {
+        print!("{bud:.0}");
+        for a in &arms {
+            let best = pts
+                .iter()
+                .filter(|(name, c, _)| *name == a.name() && *c <= bud)
+                .map(|(_, _, r)| *r)
+                .fold(0.0f64, f64::max);
+            print!("\t{best:.4}");
+        }
+        println!();
+    }
+}
+
+fn main() {
+    let mut cfg = parse();
+    let (corpus, queries) = make_corpus(&mut cfg);
+    let truth = ground_truth(&corpus, &queries, cfg.dim, cfg.k);
+    let src = if cfg.corpus_npy.is_some() {
+        "npy"
+    } else {
+        "synthetic-cone"
+    };
+    let rbar_c = mean_resultant_length(&corpus, cfg.dim);
+    let rbar_q = mean_resultant_length(&queries, cfg.dim);
+    println!(
+        "# fib_directions: n={} q={} dim={} k={} corpus={src} cone={}",
+        cfg.n, cfg.n_queries, cfg.dim, cfg.k, cfg.cone
+    );
+    println!(
+        "# ANISOTROPY mean-resultant-length: corpus={rbar_c:.4} queries={rbar_q:.4} \
+         (0=isotropic, 1=single cone) — match synthetic to real before trusting synthetic"
+    );
+    if cfg.no_triple_rotation {
+        println!("# fib/project arms: NO per-triple rotation (pure shared-index variant)");
+    }
+    run(&cfg, &corpus, &queries, &truth);
+}
diff --git a/experiments/ordinal-routing-research/fib_reduction.rs b/experiments/ordinal-routing-research/fib_reduction.rs
new file mode 100644
index 00000000..0b09ba43
--- /dev/null
+++ b/experiments/ordinal-routing-research/fib_reduction.rs
@@ -0,0 +1,403 @@
+//! CONTEXT-BOXED, PRE-REGISTERED — three independent probes of the hypothesis
+//! "golden/Fibonacci is a REDUCTION RATE of the permutation space" (NOT an
+//! address). Isolated by request: touches no routing harness, writes nothing,
+//! and each experiment's verdict is fixed BELOW, before any data is seen. No
+//! result here may reframe another experiment on this branch.
+//!
+//! Permutahedron P_{n-1} has n! vertices; a rank code is one vertex. The
+//! hypothesis: at scale, the RETRIEVAL-relevant structure is a sub-factorial
+//! reduction of that space, and golden ratio is its rate.
+//!
+//! ===================== EXP1 — FAN-OUT RATE =====================
+//! Math fact (not measured, it's a theorem): #{σ : |σ(i)-i| <= 1} = F_{n+1},
+//! ratio -> φ. The MEASURABLE crux: do the encoder's FP32 true neighbours
+//! actually LIVE in the low-displacement (local) permutation neighbourhood of
+//! the query's rank code, and does that neighbourhood grow at rate φ in DATA?
+//!   metric A (locality)  = mean footrule(query, true-nbr) / mean footrule(random pair)
+//!   metric B (φ fan-out) = geometric growth ratio of |docs within footrule r| over r
+//! PRE-REGISTERED:
+//!   locality PASS if A <= 0.75 (true nbrs meaningfully closer in rank space)
+//!   fan-out  GOLDEN if B in [1.50, 1.75]; else "data-determined (intrinsic dim), not φ"
+//!
+//! ================= EXP2 — COARSENING SCHEDULE =================
+//! Bucket ranks into B bins; retrieve top-K by L1 on the B-bin code; recall@10
+//! vs bits/coord = log2(B). Compare bit-allocation schedules:
+//!   fib {2,3,5,8,13,21}  pow2 {2,4,8,16,32}  linear {2,3,4,5,6,7}
+//! PRE-REGISTERED:
+//!   Fibonacci PASS if at matched bits it beats BOTH pow2 and linear by >= 0.03
+//!     recall somewhere AND is never worse than either by > 0.03; else
+//!     "schedule-independent (recall is one smooth curve in bits; φ no advantage)"
+//!
+//! ==================== EXP3 — MERGE RATE ======================
+//! As ranks are progressively coarsened (B descending, Fibonacci-spaced), at
+//! what RATE do permutations merge? Two levels: code-identity (do full codes
+//! collide?) and neighbourhood (|docs within a fixed footrule ball|).
+//! PRE-REGISTERED:
+//!   GOLDEN if successive neighbourhood-merge ratios converge to [1.50, 1.75];
+//!   else report the actual law (expected: combinatorial ~1/B, not φ).
+//!
+//! Run: cargo run --release --example fib_reduction -- \
+//!          --corpus-npy /tmp/repo_corpus.npy --queries-npy /tmp/repo_q.npy
+//!      (optional first positional: exp1 | exp2 | exp3; default runs all three)
+
+use rand::{RngExt, SeedableRng};
+use rand_chacha::ChaCha8Rng;
+use rayon::prelude::*;
+
+const SEED: u64 = 1;
+
+fn load_npy_f32(path: &str) -> (Vec<f32>, usize, usize) {
+    let bytes = std::fs::read(path).expect("read npy");
+    assert!(
+        bytes.len() >= 10 && &bytes[..6] == b"\x93NUMPY",
+        "not a numpy file"
+    );
+    let major = bytes[6];
+    assert!(
+        major == 1 || major == 2,
+        "unsupported numpy .npy major version {major}"
+    );
+    if major == 2 {
+        assert!(bytes.len() >= 12, "truncated numpy v2 header");
+    }
+    let (hlen, hstart) = if major == 1 {
+        (u16::from_le_bytes([bytes[8], bytes[9]]) as usize, 10)
+    } else {
+        (
+            u32::from_le_bytes([bytes[8], bytes[9], bytes[10], bytes[11]]) as usize,
+            12,
+        )
+    };
+    assert!(hstart + hlen <= bytes.len(), "truncated numpy header");
+    let header = std::str::from_utf8(&bytes[hstart..hstart + hlen]).expect("utf8 header");
+    assert!(header.contains("'descr': '<f4'"), "expected <f4 dtype");
+    let after = &header[header.find("'shape':").expect("shape")..];
+    let inner = &after[after.find('(').unwrap() + 1..after.find(')').unwrap()];
+    let dims: Vec<usize> = inner
+        .split(',')
+        .filter_map(|s| s.trim().parse().ok())
+        .collect();
+    let (n, dim) = (dims[0], dims[1]);
+    let dstart = hstart + hlen;
+    let mut out = vec![0.0f32; n * dim];
+    for (i, ch) in bytes[dstart..].chunks_exact(4).enumerate() {
+        out[i] = f32::from_le_bytes([ch[0], ch[1], ch[2], ch[3]]);
+    }
+    (out, n, dim)
+}
+
+fn l2_normalize_rows(v: &mut [f32], dim: usize) {
+    for i in 0..v.len() / dim {
+        let row = &mut v[i * dim..(i + 1) * dim];
+        let nrm: f32 = row.iter().map(|x| x * x).sum::<f32>().sqrt();
+        if nrm > 0.0 {
+            for x in row.iter_mut() {
+                *x /= nrm;
+            }
+        }
+    }
+}
+
+/// Rank-encode: rank[i] = ordinal position (0..dim) of coordinate i within the row.
+fn rank_encode(rows: &[f32], n: usize, dim: usize) -> Vec<u16> {
+    let mut out = vec![0u16; n * dim];
+    out.par_chunks_mut(dim).enumerate().for_each(|(r, code)| {
+        let row = &rows[r * dim..(r + 1) * dim];
+        let mut idx: Vec<u16> = (0..dim as u16).collect();
+        idx.sort_by(|&a, &b| row[a as usize].partial_cmp(&row[b as usize]).unwrap());
+        for (rank, &i) in idx.iter().enumerate() {
+            code[i as usize] = rank as u16;
+        }
+    });
+    out
+}
+
+fn footrule(a: &[u16], b: &[u16]) -> f64 {
+    a.iter()
+        .zip(b)
+        .map(|(x, y)| (*x as i32 - *y as i32).unsigned_abs() as f64)
+        .sum()
+}
+
+/// FP32 cosine top-k ground truth (rows already L2-normalized).
+fn ground_truth(corpus: &[f32], queries: &[f32], dim: usize, k: usize) -> Vec<Vec<u32>> {
+    let n = corpus.len() / dim;
+    let nq = queries.len() / dim;
+    (0..nq)
+        .into_par_iter()
+        .map(|qi| {
+            let q = &queries[qi * dim..(qi + 1) * dim];
+            let mut s: Vec<(u32, f32)> = (0..n)
+                .map(|di| {
+                    let d = &corpus[di * dim..(di + 1) * dim];
+                    (di as u32, q.iter().zip(d).map(|(a, b)| a * b).sum::<f32>())
+                })
+                .collect();
+            s.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
+            s.into_iter().take(k).map(|(i, _)| i).collect()
+        })
+        .collect()
+}
+
+struct Data {
+    corpus: Vec<f32>,
+    queries: Vec<f32>,
+    dim: usize,
+    n: usize,
+    nq: usize,
+    crank: Vec<u16>,
+    qrank: Vec<u16>,
+    truth: Vec<Vec<u32>>,
+}
+
+fn exp1_fanout(d: &Data) {
+    println!("\n## EXP1 — FAN-OUT RATE");
+    let crow = |i: usize| &d.crank[i * d.dim..(i + 1) * d.dim];
+    let qrow = |i: usize| &d.qrank[i * d.dim..(i + 1) * d.dim];
+
+    // metric A: locality of true neighbours vs random pairs, in rank (footrule) space
+    let true_med: f64 = (0..d.nq)
+        .into_par_iter()
+        .map(|qi| {
+            let mut fs: Vec<f64> = d.truth[qi]
+                .iter()
+                .map(|&di| footrule(qrow(qi), crow(di as usize)))
+                .collect();
+            fs.sort_by(|a, b| a.partial_cmp(b).unwrap());
+            fs[fs.len() / 2]
+        })
+        .sum::<f64>()
+        / d.nq as f64;
+    let mut rng = ChaCha8Rng::seed_from_u64(SEED);
+    let rand_pairs: Vec<(usize, usize)> = (0..20_000)
+        .map(|_| (rng.random_range(0..d.n), rng.random_range(0..d.n)))
+        .collect();
+    let rand_med = {
+        let mut fs: Vec<f64> = rand_pairs
+            .par_iter()
+            .map(|&(i, j)| footrule(crow(i), crow(j)))
+            .collect();
+        fs.sort_by(|a, b| a.partial_cmp(b).unwrap());
+        fs[fs.len() / 2]
+    };
+    let a_ratio = true_med / rand_med;
+    println!("  metric A locality: true-nbr median footrule {true_med:.0} / random {rand_med:.0} = {a_ratio:.4}");
+    println!(
+        "    -> {}",
+        if a_ratio <= 0.75 {
+            "PASS (true neighbours closer in rank space)"
+        } else {
+            "FAIL (rank space does not localize neighbours)"
+        }
+    );
+
+    // metric B: neighbourhood growth ratio over footrule shells, on a query sample
+    let sample: Vec<usize> = (0..d.nq).step_by(4).collect();
+    let radii: Vec<f64> = (1..=8).map(|s| rand_med * (s as f64) / 8.0).collect();
+    let mut counts = vec![0f64; radii.len()];
+    for &qi in &sample {
+        let dists: Vec<f64> = (0..d.n)
+            .into_par_iter()
+            .map(|di| footrule(qrow(qi), crow(di)))
+            .collect();
+        for (ri, &r) in radii.iter().enumerate() {
+            counts[ri] += dists.iter().filter(|&&x| x <= r).count() as f64;
+        }
+    }
+    let mut ratios = vec![];
+    for w in counts.windows(2) {
+        if w[0] > 0.0 {
+            ratios.push(w[1] / w[0]);
+        }
+    }
+    let gmean = ratios.iter().map(|r| r.ln()).sum::<f64>() / ratios.len() as f64;
+    let g = gmean.exp();
+    println!("  metric B fan-out growth ratio (per equal radius step) = {g:.3}");
+    println!(
+        "    -> {}",
+        if (1.50..=1.75).contains(&g) {
+            "GOLDEN (≈φ)"
+        } else {
+            "data-determined (set by intrinsic dim), NOT φ"
+        }
+    );
+}
+
+fn exp2_schedule(d: &Data) {
+    println!("\n## EXP2 — COARSENING SCHEDULE");
+    let crow = |i: usize| &d.crank[i * d.dim..(i + 1) * d.dim];
+    let qrow = |i: usize| &d.qrank[i * d.dim..(i + 1) * d.dim];
+    let cand_k = 100usize;
+    let bucket = |rank: u16, b: u32| -> u16 { ((rank as u32 * b) / d.dim as u32) as u16 };
+    // recall@10 (truth top-10 captured within top-cand_k by L1 on B-bin code)
+    let recall_for_b = |b: u32| -> f64 {
+        let ccode: Vec<u16> = d.crank.iter().map(|&r| bucket(r, b)).collect();
+        (0..d.nq)
+            .into_par_iter()
+            .map(|qi| {
+                let qc: Vec<u16> = qrow(qi).iter().map(|&r| bucket(r, b)).collect();
+                let mut s: Vec<(u32, i64)> = (0..d.n)
+                    .map(|di| {
+                        let dc = &ccode[di * d.dim..(di + 1) * d.dim];
+                        let dist: i64 = qc
+                            .iter()
+                            .zip(dc)
+                            .map(|(x, y)| (*x as i64 - *y as i64).abs())
+                            .sum();
+                        (di as u32, dist)
+                    })
+                    .collect();
+                s.sort_by_key(|x| x.1);
+                let top: std::collections::HashSet<u32> =
+                    s.into_iter().take(cand_k).map(|x| x.0).collect();
+                let hit = d.truth[qi].iter().filter(|t| top.contains(t)).count();
+                hit as f64 / d.truth[qi].len() as f64
+            })
+            .sum::<f64>()
+            / d.nq as f64
+    };
+    let all_b = [2u32, 3, 4, 5, 6, 7, 8, 13, 16, 21, 32];
+    let mut rec = std::collections::HashMap::new();
+    for &b in &all_b {
+        rec.insert(b, recall_for_b(b));
+    }
+    let show = |name: &str, bs: &[u32], rec: &std::collections::HashMap<u32, f64>| {
+        println!("  {name}:");
+        for &b in bs {
+            println!(
+                "    B={b:>2} bits={:.2}  recall@10={:.4}",
+                (b as f64).log2(),
+                rec[&b]
+            );
+        }
+    };
+    let _ = crow; // crow unused here; codes built from crank directly
+    show("fib   ", &[2, 3, 5, 8, 13, 21], &rec);
+    show("pow2  ", &[2, 4, 8, 16, 32], &rec);
+    show("linear", &[2, 3, 4, 5, 6, 7], &rec);
+    println!("  -> recall is a function of bits=log2(B); schedules only SAMPLE that curve.");
+    println!(
+        "     Fibonacci PASS only if its points beat pow2 AND linear at matched bits by >=0.03."
+    );
+}
+
+fn exp3_merge(d: &Data) {
+    println!("\n## EXP3 — MERGE RATE");
+    let crow = |i: usize| &d.crank[i * d.dim..(i + 1) * d.dim];
+    let bucket = |rank: u16, b: u32| -> u16 { ((rank as u32 * b) / d.dim as u32) as u16 };
+    // (a) code-identity: do full B-bin codes collide as B shrinks?
+    println!("  (a) distinct full codes among {} docs:", d.n);
+    let fib_levels = [21u32, 13, 8, 5, 3, 2];
+    let mut distinct = vec![];
+    for &b in &fib_levels {
+        let set: std::collections::HashSet<Vec<u16>> = (0..d.n)
+            .map(|i| crow(i).iter().map(|&r| bucket(r, b)).collect::<Vec<u16>>())
+            .collect();
+        distinct.push(set.len());
+        println!(
+            "    B={b:>2}: {} distinct ({:.1}% of n)",
+            set.len(),
+            100.0 * set.len() as f64 / d.n as f64
+        );
+    }
+    // (b) neighbourhood merge: mean docs within a fixed footrule ball, across Fibonacci radii
+    let crow2 = |i: usize| &d.crank[i * d.dim..(i + 1) * d.dim];
+    let mut rng = ChaCha8Rng::seed_from_u64(SEED ^ 0x3333);
+    let probes: Vec<usize> = (0..50).map(|_| rng.random_range(0..d.n)).collect();
+    // a coarse footrule scale: median random footrule
+    let scale = {
+        let mut fs: Vec<f64> = (0..5000)
+            .map(|_| {
+                footrule(
+                    crow2(rng.random_range(0..d.n)),
+                    crow2(rng.random_range(0..d.n)),
+                )
+            })
+            .collect();
+        fs.sort_by(|a, b| a.partial_cmp(b).unwrap());
+        fs[fs.len() / 2]
+    };
+    let fib_r = [1f64, 2.0, 3.0, 5.0, 8.0, 13.0]; // Fibonacci-spaced shell radii (× scale/13)
+    let mut counts = vec![0f64; fib_r.len()];
+    for &p in &probes {
+        let dists: Vec<f64> = (0..d.n)
+            .into_par_iter()
+            .map(|di| footrule(crow2(p), crow2(di)))
+            .collect();
+        for (ri, &fr) in fib_r.iter().enumerate() {
+            let r = scale * fr / 13.0;
+            counts[ri] += dists.iter().filter(|&&x| x <= r).count() as f64;
+        }
+    }
+    println!("  (b) neighbourhood size at Fibonacci-spaced radii:");
+    for (ri, &fr) in fib_r.iter().enumerate() {
+        println!(
+            "    r={fr:>4} (×scale/13): mean |ball| = {:.1}",
+            counts[ri] / probes.len() as f64
+        );
+    }
+    let mut ratios = vec![];
+    for w in counts.windows(2) {
+        if w[0] > 0.0 {
+            ratios.push(w[1] / w[0]);
+        }
+    }
+    if !ratios.is_empty() {
+        let g = (ratios.iter().map(|r| r.ln()).sum::<f64>() / ratios.len() as f64).exp();
+        println!(
+            "  merge growth ratio across shells = {g:.3} -> {}",
+            if (1.50..=1.75).contains(&g) {
+                "GOLDEN (≈φ)"
+            } else {
+                "NOT φ (set by data geometry / shell spacing)"
+            }
+        );
+    }
+}
+
+fn main() {
+    let mut which = None;
+    let mut corpus_path = None;
+    let mut queries_path = None;
+    let mut a = std::env::args().skip(1);
+    while let Some(x) = a.next() {
+        match x.as_str() {
+            "--corpus-npy" => corpus_path = Some(a.next().unwrap()),
+            "--queries-npy" => queries_path = Some(a.next().unwrap()),
+            "exp1" | "exp2" | "exp3" => which = Some(x),
+            _ => {}
+        }
+    }
+    let cp = corpus_path.expect("--corpus-npy required");
+    let qp = queries_path.expect("--queries-npy required");
+    let (mut corpus, n, dim) = load_npy_f32(&cp);
+    let (mut queries, nq, dq) = load_npy_f32(&qp);
+    assert_eq!(dim, dq);
+    l2_normalize_rows(&mut corpus, dim);
+    l2_normalize_rows(&mut queries, dim);
+    let crank = rank_encode(&corpus, n, dim);
+    let qrank = rank_encode(&queries, nq, dim);
+    let truth = ground_truth(&corpus, &queries, dim, 10);
+    let d = Data {
+        corpus,
+        queries,
+        dim,
+        n,
+        nq,
+        crank,
+        qrank,
+        truth,
+    };
+    println!("# fib_reduction (CONTEXT-BOXED, pre-registered). corpus {n}x{dim}, queries {nq}");
+    let _ = (&d.corpus, &d.queries); // kept for parity; rank-space metrics use crank/qrank
+    match which.as_deref() {
+        Some("exp1") => exp1_fanout(&d),
+        Some("exp2") => exp2_schedule(&d),
+        Some("exp3") => exp3_merge(&d),
+        _ => {
+            exp1_fanout(&d);
+            exp2_schedule(&d);
+            exp3_merge(&d);
+        }
+    }
+}
diff --git a/experiments/ordinal-routing-research/gen_corpus.rs b/experiments/ordinal-routing-research/gen_corpus.rs
new file mode 100644
index 00000000..e12ab09d
--- /dev/null
+++ b/experiments/ordinal-routing-research/gen_corpus.rs
@@ -0,0 +1,279 @@
+//! Corpus zoo generator — writes diverse embedding geometries as .npy so the
+//! conjecture probes (spectral_probe, shard_recall, twonn_id, bench_rank) can
+//! be verified across distributions, not just the one synthetic clustered
+//! Gaussian. Decouples generation from measurement and dogfoods the .npy path.
+//!
+//! Kinds (each stresses a different failure mode / mimics a real pathology):
+//!   clustered   — low-rank Gaussian clusters (the existing baseline)
+//!   isotropic   — N(0,I) on the sphere (Poisson NULL control)
+//!   anisotropic — power-law singular spectrum sigma_k ~ k^-alpha (real
+//!                 embedding decay; Gao et al. narrow-cone)
+//!   rogue       — a few dimensions with huge variance (Timkey rogue dims)
+//!   manifold    — nonlinear low-D manifold (sinusoidal) in high-D: the one
+//!                 place angular RIGIDITY could plausibly hide
+//!   lattice     — jittered 1-D lattice projected up: POSITIVE rigidity
+//!                 control. If the number-variance probe never reports
+//!                 sub-Poisson, it can't see rigidity; this proves it can.
+//!
+//! Run:  cargo run --release --example gen_corpus -- --kind anisotropic \
+//!          --n 50000 --dim 256 --out corpus_aniso.npy
+//! Also emits a matched query file with --queries-out (smaller-noise draws).
+
+use rand::{RngExt, SeedableRng};
+use rand_chacha::ChaCha8Rng;
+
+struct Cfg {
+    kind: String,
+    n: usize,
+    n_queries: usize,
+    dim: usize,
+    latent: usize,
+    clusters: usize,
+    alpha: f32,
+    rogue: usize,
+    out: String,
+    queries_out: Option<String>,
+    seed: u64,
+}
+
+fn parse() -> Cfg {
+    let mut c = Cfg {
+        kind: "clustered".into(),
+        n: 50_000,
+        n_queries: 200,
+        dim: 256,
+        latent: 64,
+        clusters: 200,
+        alpha: 1.0,
+        rogue: 3,
+        out: "corpus.npy".into(),
+        queries_out: None,
+        seed: 1,
+    };
+    let mut a = std::env::args().skip(1);
+    while let Some(x) = a.next() {
+        match x.as_str() {
+            "--kind" => c.kind = a.next().unwrap(),
+            "--n" => c.n = a.next().unwrap().parse().unwrap(),
+            "--queries" => c.n_queries = a.next().unwrap().parse().unwrap(),
+            "--dim" => c.dim = a.next().unwrap().parse().unwrap(),
+            "--latent" => c.latent = a.next().unwrap().parse().unwrap(),
+            "--clusters" => c.clusters = a.next().unwrap().parse().unwrap(),
+            "--alpha" => c.alpha = a.next().unwrap().parse().unwrap(),
+            "--rogue" => c.rogue = a.next().unwrap().parse().unwrap(),
+            "--out" => c.out = a.next().unwrap(),
+            "--queries-out" => c.queries_out = Some(a.next().unwrap()),
+            "--seed" => c.seed = a.next().unwrap().parse().unwrap(),
+            other => panic!("unknown arg {other}"),
+        }
+    }
+    c
+}
+
+/// Inverse standard-normal CDF (Acklam's rational approximation, |err|<1e-9).
+fn inv_norm_cdf(p: f64) -> f64 {
+    let p = p.clamp(1e-12, 1.0 - 1e-12);
+    let a = [-3.969683028665376e+01, 2.209460984245205e+02, -2.759285104469687e+02,
+             1.383577518672690e+02, -3.066479806614716e+01, 2.506628277459239e+00];
+    let b = [-5.447609879822406e+01, 1.615858368580409e+02, -1.556989798598866e+02,
+             6.680131188771972e+01, -1.328068155288572e+01];
+    let c = [-7.784894002430293e-03, -3.223964580411365e-01, -2.400758277161838e+00,
+             -2.549732539343734e+00, 4.374664141464968e+00, 2.938163982698783e+00];
+    let d = [7.784695709041462e-03, 3.224671290700398e-01, 2.445134137142996e+00,
+             3.754408661907416e+00];
+    let plow = 0.02425;
+    if p < plow {
+        let q = (-2.0 * p.ln()).sqrt();
+        (((((c[0] * q + c[1]) * q + c[2]) * q + c[3]) * q + c[4]) * q + c[5])
+            / ((((d[0] * q + d[1]) * q + d[2]) * q + d[3]) * q + 1.0)
+    } else if p <= 1.0 - plow {
+        let q = p - 0.5;
+        let r = q * q;
+        (((((a[0] * r + a[1]) * r + a[2]) * r + a[3]) * r + a[4]) * r + a[5]) * q
+            / (((((b[0] * r + b[1]) * r + b[2]) * r + b[3]) * r + b[4]) * r + 1.0)
+    } else {
+        let q = (-2.0 * (1.0 - p).ln()).sqrt();
+        -(((((c[0] * q + c[1]) * q + c[2]) * q + c[3]) * q + c[4]) * q + c[5])
+            / ((((d[0] * q + d[1]) * q + d[2]) * q + d[3]) * q + 1.0)
+    }
+}
+
+fn gauss(rng: &mut ChaCha8Rng) -> f32 {
+    let u1: f32 = rng.random_range(1e-9..1.0);
+    let u2: f32 = rng.random_range(0.0..1.0);
+    (-2.0 * u1.ln()).sqrt() * (std::f32::consts::TAU * u2).cos()
+}
+
+/// Write a 2-D row-major f32 buffer as a NumPy v1 .npy (<f4, C order).
+fn write_npy(path: &str, data: &[f32], n: usize, dim: usize) {
+    use std::io::Write;
+    let header_body = format!(
+        "{{'descr': '<f4', 'fortran_order': False, 'shape': ({n}, {dim}), }}"
+    );
+    // total header must make (10 + len) a multiple of 64; pad with spaces + \n
+    let mut hb = header_body.into_bytes();
+    let unpadded = 10 + hb.len() + 1; // +1 for trailing newline
+    let pad = (64 - (unpadded % 64)) % 64;
+    hb.extend(std::iter::repeat(b' ').take(pad));
+    hb.push(b'\n');
+    let mut f = std::io::BufWriter::new(std::fs::File::create(path).expect("create npy"));
+    f.write_all(b"\x93NUMPY").unwrap();
+    f.write_all(&[1u8, 0u8]).unwrap(); // version 1.0
+    f.write_all(&(hb.len() as u16).to_le_bytes()).unwrap();
+    f.write_all(&hb).unwrap();
+    for &v in data {
+        f.write_all(&v.to_le_bytes()).unwrap();
+    }
+    eprintln!("# wrote {path}: {n} x {dim} f32");
+}
+
+fn l2(v: &mut [f32]) {
+    let nrm: f32 = v.iter().map(|x| x * x).sum::<f32>().sqrt();
+    if nrm > 0.0 {
+        for x in v.iter_mut() {
+            *x /= nrm;
+        }
+    }
+}
+
+/// Generate `count` L2-normalized rows of dimension cfg.dim for the chosen
+/// geometry. `is_query` reduces noise so queries sit near corpus structure.
+fn generate(cfg: &Cfg, rng: &mut ChaCha8Rng, count: usize, is_query: bool) -> Vec<f32> {
+    let d = cfg.dim;
+    let l = cfg.latent;
+    let noise = if is_query { 0.1 } else { 0.3 };
+    // Bug O fix: A and prototypes define the SHARED latent geometry and MUST be
+    // identical for corpus and queries (otherwise queries land in a different
+    // space and shard_recall ground truth is meaningless). Seed them from a
+    // dedicated, geometry-only RNG keyed by cfg.seed, NOT the streaming `rng`
+    // (whose state differs between the corpus and query calls).
+    let mut geo = ChaCha8Rng::seed_from_u64(cfg.seed ^ 0x9E37_79B9_7F4A_7C15);
+    let mut a = vec![0.0f32; d * l];
+    for x in a.iter_mut() {
+        *x = gauss(&mut geo);
+    }
+    let mut protos = vec![0.0f32; cfg.clusters * l];
+    for x in protos.iter_mut() {
+        *x = gauss(&mut geo);
+    }
+    // power-law per-latent scaling for "anisotropic": sigma_k ~ (k+1)^-alpha
+    let scale: Vec<f32> = (0..l)
+        .map(|k| ((k + 1) as f32).powf(-cfg.alpha))
+        .collect();
+
+    let mut out = vec![0.0f32; count * d];
+    for i in 0..count {
+        let row = &mut out[i * d..(i + 1) * d];
+        match cfg.kind.as_str() {
+            "isotropic" => {
+                for x in row.iter_mut() {
+                    *x = gauss(rng);
+                }
+            }
+            "clustered" => {
+                let c = rng.random_range(0..cfg.clusters);
+                let mut z = vec![0.0f32; l];
+                for j in 0..l {
+                    z[j] = protos[c * l + j] + noise * gauss(rng);
+                }
+                for ii in 0..d {
+                    let mut acc = 0.0f32;
+                    for j in 0..l {
+                        acc += a[ii * l + j] * z[j];
+                    }
+                    row[ii] = acc;
+                }
+            }
+            "anisotropic" => {
+                // power-law spectrum: latent coords scaled by k^-alpha
+                let mut z = vec![0.0f32; l];
+                for j in 0..l {
+                    z[j] = scale[j] * gauss(rng);
+                }
+                for ii in 0..d {
+                    let mut acc = 0.0f32;
+                    for j in 0..l {
+                        acc += a[ii * l + j] * z[j];
+                    }
+                    row[ii] = acc;
+                }
+            }
+            "rogue" => {
+                // baseline isotropic + a few dims with ~10x variance
+                for x in row.iter_mut() {
+                    *x = gauss(rng);
+                }
+                for r in 0..cfg.rogue.min(d) {
+                    row[r] = 10.0 * gauss(rng);
+                }
+            }
+            "manifold" => {
+                // nonlinear low-D manifold: latent t in R^m mapped through
+                // sinusoids of varying frequency, then projected up. Angular
+                // structure here is the best chance for rigidity to appear.
+                let m = l.min(8).max(2);
+                let mut t = vec![0.0f32; m];
+                for tj in t.iter_mut() {
+                    *tj = rng.random_range(0.0..std::f32::consts::TAU);
+                }
+                // build a smooth high-D embedding: row[ii] = sum_j sin((j+1) t_j + phase_ii)
+                for ii in 0..d {
+                    let mut acc = 0.0f32;
+                    for j in 0..m {
+                        let phase = a[ii * l + j]; // reuse A as fixed phases
+                        acc += ((j + 1) as f32 * t[j] + phase).sin();
+                    }
+                    row[ii] = acc + noise * gauss(rng);
+                }
+            }
+            "projected-rigid" => {
+                // STRONGEST rigidity control: place a quantile-spaced target on
+                // a FIXED direction u (= col 0 of A, unit-normalized once) and
+                // make it dominate. x_i = t_i * u  (+ tiny ortho jitter). Then
+                // for ANY probe direction r, r.x_i = t_i * (r.u), a scaled copy
+                // of the rigid sequence — so the projected key inherits the
+                // quantile spacing (up to L2 distortion). This tests whether the
+                // random-projection + normalize pipeline PRESERVES rigidity that
+                // genuinely lives in the key, closing the selftest gap.
+                let t = inv_norm_cdf((i as f64 + 0.5) / count as f64) as f32;
+                for ii in 0..d {
+                    row[ii] = t * a[ii * l] + 0.001 * gauss(rng);
+                }
+            }
+            "lattice" => {
+                // POSITIVE rigidity control. Each row gets a DISTINCT, evenly
+                // spaced position pos = i/count in [-1,1] (used once — no
+                // duplicates). Embedded on a small ARC: a constant base
+                // direction (col 0 of A) plus a small pos-scaled component along
+                // an orthogonal direction (col 1). The constant base survives
+                // L2 normalization (a pure line would collapse to +/- one
+                // vector); projecting onto the pos-direction recovers the
+                // evenly-spaced sequence -> SUB-Poisson (rigid) number variance.
+                // Gaussian-QUANTILE-spaced positions: pos_i = Phi^-1((i+0.5)/count).
+                // The marginal is then exactly N(0,1), matching the probe's
+                // Gaussian unfold, so any residual structure is REAL. Evenly
+                // spaced in probability = maximally regular = rigid.
+                let pos = inv_norm_cdf((i as f64 + 0.5) / count as f64) as f32;
+                for ii in 0..d {
+                    let base = a[ii * l];
+                    let along = a[ii * l + 1];
+                    row[ii] = base + 0.35 * pos * along + 0.01 * gauss(rng);
+                }
+            }
+            other => panic!("unknown kind {other}"),
+        }
+        l2(row);
+    }
+    out
+}
+
+fn main() {
+    let cfg = parse();
+    let mut rng = ChaCha8Rng::seed_from_u64(cfg.seed);
+    let corpus = generate(&cfg, &mut rng, cfg.n, false);
+    write_npy(&cfg.out, &corpus, cfg.n, cfg.dim);
+    if let Some(ref qp) = cfg.queries_out {
+        let q = generate(&cfg, &mut rng, cfg.n_queries, true);
+        write_npy(qp, &q, cfg.n_queries, cfg.dim);
+    }
+}
diff --git a/experiments/ordinal-routing-research/length_mixture_lake_results.md b/experiments/ordinal-routing-research/length_mixture_lake_results.md
new file mode 100644
index 00000000..31b2e664
--- /dev/null
+++ b/experiments/ordinal-routing-research/length_mixture_lake_results.md
@@ -0,0 +1,126 @@
+# Path B — the chunk-length-mixture lake (deployment robustness, final arc)
+
+The Phase B caveat split the unmodeled real-lake pathologies in two: OCR-garbage /
+mixed-language (uncapturable from clean embeddings — needs a real dirty S3 sample)
+and **chunk-length heterogeneity** (a *mixture of geometries* — capturable
+synthetically). The directions writeup flagged the latter as **Path B**
+(`oblivious_directions_results.md`, "chunk-length heterogeneity is itself a lake
+pathology (folds into Path B)") and named chunk length a **third geometry axis**
+(longer chunks → tokens average → vectors regress to the corpus mean → tighter cone,
+lower ID). Path B tests both at once: does a real chunk-length *mixture* break b=4 raw
+rank routing, and how big is the chunk-length geometry axis actually?
+
+Domain is held constant (fiqa, 57.6k docs), only the chunk length varies — the clean
+complement to Phase B, which held length constant and varied domain.
+
+## Setup
+
+Embed the SAME 57,600 fiqa docs (nomic-embed-text, gpu1) at four chunk-length caps
+`{128, 256, 512, 1100}` chars, then union into one 230,388-doc lake
+(`make_length_lake.py`). Queries: the full-length fiqa held-out set (1000) — a real
+lake mixes *document* chunk lengths, not query lengths. Probe: the **same
+`centering_recall.rs`** whose raw b=4 arm IS the incumbent that beat every
+oblivious-structure idea. Metric: R@10 of b-bit rank codes vs **FP32-cosine top-10
+recomputed on each corpus** — a routing-*fidelity* metric, so the 4× size difference
+cancels (it hits the code arm and its own FP32 ground truth equally; no size confound).
+
+## Pre-registration (locked before the lake run)
+
+- **Path B fear CONFIRMED** (the first positive of the whole arc) if b=4 raw R@10 on
+  the length mixture drops **≥ 0.02** vs the single-length fiqa@512 baseline.
+- **FALSIFIED** (consistent with every prior lake fear) if flat to noise (|Δ| < 0.02).
+- Secondary, pre-stated from the "third geometry axis" claim: longer chunks → tighter
+  cone (higher R̄) and the strata cones point in *different* directions (a true mixture
+  of geometries, not one).
+
+## Result 1 — the chunk-length geometry axis is real but SMALL and CO-AXIAL
+
+Per-stratum cone tightness R̄ (mean cosine to that stratum's centroid) and pairwise
+centroid-axis cosine, all on the identical 57.6k docs:
+
+| chunk cap | R̄ (cone tightness) | NaN/zero rows |
+|-----------|--------------------|---------------|
+| 128  | 0.7046 | 0 |
+| 256  | 0.7151 | 0 |
+| 512  | 0.7198 | 0 |
+| 1100 | 0.7234 | 12 |
+
+Cone-axis cosine between strata: **0.986–0.999** (128-vs-1100 = 0.9856; the rest tighter).
+
+The predicted direction holds — longer chunks DO give a tighter cone (R̄ rises
+monotonically 0.705→0.723) — but the magnitude is tiny: a 0.019 R̄ spread across an
+8.6× chunk-length range, and the four cones are essentially **co-axial** (≥0.986). So
+chunk length is a real third geometry axis but a *weak* one on this encoder/corpus: it
+shifts cone tightness slightly along a shared axis, it does NOT produce the distinct,
+differently-pointed cones the "mixture of geometries" framing imagined. (Caveat: fiqa
+docs are short Q&A; a corpus with genuinely long source documents — where a 128-char
+cap truly amputates content — could spread R̄ more. Established here only for
+fiqa/nomic.)
+
+## Result 2 — b=4 raw routing is IMMUNE to the length mixture (fear FALSIFIED)
+
+`centering_recall` raw arm, 1000 fiqa queries, FP32-cosine top-10 ground truth:
+
+| corpus | b=4 raw R@10 | b=4 raw CR@100 |
+|--------|--------------|----------------|
+| baseline fiqa@512 (57.6k) | 0.8230 | 1.0000 |
+| **Path B length-mixture lake (230k, 4 lengths)** | **0.8253** | 1.0000 |
+| **Δ** | **+0.0023** | 0.0000 |
+
+Pre-registered ≥0.02 drop to confirm the fear. **Actual +0.002 — flat to noise, if
+anything slightly up.** Candidate recall stays a perfect 1.0000: every FP32 top-10
+neighbour is still captured in the top-100 by code distance on the 4×-larger mixed lake.
+
+The centering signature is unchanged from single-length, too — b1 raw 0.516 → centered
++0.055, b2/b4 centered net-negative (b4 ΔR −0.079) — same PARTIAL pattern, same
+capacity-scaling penalty as every prior corpus. The mixture changed neither the routing
+fidelity nor the centering mechanism; it just reconfirmed both on 4× the data.
+
+## Why the global bin edges survive a length mixture
+
+`centering_recall`'s encoder is **per-dimension rank → equal-width bins** — the rank
+transform is computed within each vector, so a slightly-tighter (long-chunk) or
+slightly-looser (short-chunk) cone produces the same *ranks*; only the raw cosine
+magnitudes differ, and ranks discard magnitude. The fixed-mass rank code has nothing
+global for a tightness-shift to poison, exactly as the templated-hub immunity in Phase B
+had nothing global (no IDF term) for a hub to poison. Training-free + rank-based is again
+the property that confers the robustness.
+
+## Verdict — Path B closes the synthetic-lake arc on the same negative
+
+The last fear capturable without real dirty data is unfounded: a chunk-length mixture is
+**not** a mixture of distinct geometries on fiqa/nomic (cones co-axial, R̄ spread 0.019),
+and b=4 raw routing is **immune** to it (+0.002 R@10, CR 1.0). Combined with Phase B
+(multi-domain union: lower-ID, globally centerable, hub-immune), every synthetic lake
+pathology — multi-cone, templated-hub, and now multi-length — leaves the boring "spend
+the bits, b=4, raw ranks" baseline intact.
+
+CAVEAT (honest scope, unchanged): this is still *clean* embeddings of *curated* fiqa
+text truncated to length. It models length-geometry heterogeneity; it does NOT model
+OCR garbage, mixed-language, or broken-encoding sludge. The one remaining
+non-derivative test is an actual dirty S3 sample (the OCR/multilingual half of the Phase
+B caveat). Claim established: "a chunk-length mixture is benign for b=4 routing." Claim
+NOT established: "raw dirty S3 sludge is benign."
+
+## Reproduce
+```
+# 1. embed fiqa text truncated to each chunk-length cap (gpu1 nomic)
+for N in 128 256; do python3 - <<PY
+src=[l.rstrip('\n') for l in open('/tmp/corpora/fiqa_corpus.txt') if l.strip()]
+open(f'/tmp/corpora/fiqa_$N.txt','w').writelines(l[:$N]+'\n' for l in src)
+PY
+  python3 embed_ollama.py --texts /tmp/corpora/fiqa_$N.txt \
+     --out /tmp/corpora/fiqa_${N}_corpus.npy --host 100.67.101.76:11434 \
+     --model nomic-embed-text
+done
+# (fiqa@512 = fiqa_corpus.npy and fiqa@1100 = fiqa_nomic1100_corpus.npy already exist)
+
+# 2. union the four length strata into one lake
+python3 make_length_lake.py \
+   --parts fiqa_128_corpus.npy fiqa_256_corpus.npy fiqa_corpus.npy fiqa_nomic1100_corpus.npy \
+   --out length_lake_corpus.npy --drop-nan
+
+# 3. b=4 raw R@10 fidelity, baseline vs mixture (copy centering_recall.rs into examples/ first)
+cargo run --release --example centering_recall -- --corpus-npy fiqa_corpus.npy        --queries-npy fiqa_q.npy
+cargo run --release --example centering_recall -- --corpus-npy length_lake_corpus.npy --queries-npy fiqa_q.npy
+```
diff --git a/experiments/ordinal-routing-research/make_lake.py b/experiments/ordinal-routing-research/make_lake.py
new file mode 100644
index 00000000..19143edc
--- /dev/null
+++ b/experiments/ordinal-routing-research/make_lake.py
@@ -0,0 +1,76 @@
+#!/usr/bin/env python3
+"""Build a synthetic multi-cone 'lake' from existing per-domain .npy embeddings.
+Stdlib only (struct/array). Models two real enterprise-lake pathologies:
+  1. MULTI-CONE: concatenate N domain corpora (each its own cone) into one corpus.
+  2. TEMPLATED HUBS: optionally inject near-duplicate clusters (one base vector +
+     tiny gaussian noise) at a target prevalence -> artificial hubs.
+Queries: concatenate the per-domain held-out query sets (ground truth spans cones).
+Deterministic (fixed LCG); no numpy.
+"""
+import struct, array, sys, math, argparse
+
+def read_npy(path):
+    b = open(path, "rb").read()
+    assert b[:6] == b"\x93NUMPY"
+    hl = struct.unpack("<H", b[8:10])[0]
+    h = b[10:10+hl].decode()
+    s = h.split("'shape':")[1]
+    n, d = [int(x) for x in s[s.find("(")+1:s.find(")")].split(",") if x.strip()]
+    a = array.array("f"); a.frombytes(b[10+hl:10+hl+n*d*4])
+    return a, n, d
+
+def write_npy(path, a, n, d):
+    hdr = "{'descr': '<f4', 'fortran_order': False, 'shape': (%d, %d), }" % (n, d)
+    pad = (64 - (10 + len(hdr) + 1) % 64) % 64
+    hb = (hdr + " "*pad + "\n").encode()
+    with open(path, "wb") as f:
+        f.write(b"\x93NUMPY" + bytes([1,0])); f.write(struct.pack("<H", len(hb))); f.write(hb)
+        f.write(a.tobytes())
+    print(f"# wrote {path}: {n} x {d}", file=sys.stderr)
+
+# simple deterministic LCG gaussian (Box-Muller), no numpy
+_state = [88172645463325252]
+def _u():
+    x = _state[0]; x ^= (x << 13) & 0xFFFFFFFFFFFFFFFF; x ^= x >> 7; x ^= (x << 17) & 0xFFFFFFFFFFFFFFFF
+    _state[0] = x; return (x & 0xFFFFFFFF) / 0x100000000
+def _g():
+    u1 = max(_u(), 1e-12); u2 = _u()
+    return math.sqrt(-2*math.log(u1)) * math.cos(2*math.pi*u2)
+
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--parts", nargs="+", required=True, help="domain corpus .npy files")
+    ap.add_argument("--out", required=True)
+    ap.add_argument("--hub-frac", type=float, default=0.0, help="fraction of corpus that is templated near-dupes")
+    ap.add_argument("--hub-clusters", type=int, default=50, help="number of distinct templated clusters")
+    ap.add_argument("--hub-noise", type=float, default=0.02, help="gaussian noise added to base vec (pre-normalize)")
+    ap.add_argument("--cap", type=int, default=0, help="cap docs per part (0=all)")
+    args = ap.parse_args()
+
+    dim = None; rows = array.array("f"); total = 0
+    for p in args.parts:
+        a, n, d = read_npy(p)
+        dim = d if dim is None else dim
+        assert d == dim, f"dim mismatch {p}"
+        take = n if args.cap == 0 else min(n, args.cap)
+        rows.extend(a[:take*d]); total += take
+        print(f"# {p}: +{take} docs", file=sys.stderr)
+
+    # inject templated hubs: replace a fraction of rows with near-dupes of K bases
+    if args.hub_frac > 0:
+        n_hub = int(total * args.hub_frac)
+        # pick K base vectors from existing rows (deterministic indices)
+        bases = [ [rows[(i*9973 % total)*dim + j] for j in range(dim)] for i in range(args.hub_clusters) ]
+        for h in range(n_hub):
+            base = bases[h % args.hub_clusters]
+            tgt = (h * 7919) % total  # deterministic overwrite positions
+            row = [base[j] + args.hub_noise * _g() for j in range(dim)]
+            nrm = math.sqrt(sum(x*x for x in row)) or 1.0
+            for j in range(dim):
+                rows[tgt*dim + j] = row[j] / nrm
+        print(f"# injected {n_hub} templated-hub docs ({args.hub_frac:.0%}) across {args.hub_clusters} clusters", file=sys.stderr)
+
+    write_npy(args.out, rows, total, dim)
+
+if __name__ == "__main__":
+    main()
diff --git a/experiments/ordinal-routing-research/make_length_lake.py b/experiments/ordinal-routing-research/make_length_lake.py
new file mode 100644
index 00000000..0410f94e
--- /dev/null
+++ b/experiments/ordinal-routing-research/make_length_lake.py
@@ -0,0 +1,64 @@
+#!/usr/bin/env python3
+"""Path B — build a CHUNK-LENGTH-MIXTURE lake (one domain, many length geometries).
+
+Unlike make_lake.py (multi-DOMAIN union -> separated cones), this unions the SAME
+documents embedded at several chunk lengths {128,256,512,1100}. Domain is held
+constant; only the length-geometry varies. Models the real-lake pathology where one
+S3 bucket holds the same content chunked every which way -> a *mixture of cones of
+different tightness* quantized under one global set of rank-bin edges.
+
+Stdlib only (struct/array), matches make_lake.py's npy IO. Deterministic.
+"""
+import struct, array, sys, argparse
+
+def read_npy(path):
+    b = open(path, "rb").read()
+    assert b[:6] == b"\x93NUMPY"
+    hl = struct.unpack("<H", b[8:10])[0]
+    h = b[10:10+hl].decode()
+    s = h.split("'shape':")[1]
+    n, d = [int(x) for x in s[s.find("(")+1:s.find(")")].split(",") if x.strip()]
+    a = array.array("f"); a.frombytes(b[10+hl:10+hl+n*d*4])
+    return a, n, d
+
+def write_npy(path, a, n, d):
+    hdr = "{'descr': '<f4', 'fortran_order': False, 'shape': (%d, %d), }" % (n, d)
+    pad = (64 - (10 + len(hdr) + 1) % 64) % 64
+    hb = (hdr + " "*pad + "\n").encode()
+    with open(path, "wb") as f:
+        f.write(b"\x93NUMPY" + bytes([1,0])); f.write(struct.pack("<H", len(hb))); f.write(hb)
+        f.write(a.tobytes())
+    print(f"# wrote {path}: {n} x {d}", file=sys.stderr)
+
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--parts", nargs="+", required=True, help="per-length corpus .npy files")
+    ap.add_argument("--out", required=True)
+    ap.add_argument("--drop-nan", action="store_true", help="skip rows with NaN/zero norm")
+    args = ap.parse_args()
+
+    dim = None; rows = array.array("f"); total = 0; dropped = 0
+    for p in args.parts:
+        a, n, d = read_npy(p)
+        dim = d if dim is None else dim
+        assert d == dim, f"dim mismatch {p}"
+        kept = 0
+        for i in range(n):
+            seg = a[i*d:(i+1)*d]
+            if args.drop_nan:
+                ss = 0.0; bad = False
+                for x in seg:
+                    if x != x:  # NaN
+                        bad = True; break
+                    ss += x*x
+                if bad or ss == 0.0:
+                    dropped += 1; continue
+            rows.extend(seg); kept += 1
+        total += kept
+        print(f"# {p}: +{kept} docs", file=sys.stderr)
+    if dropped:
+        print(f"# dropped {dropped} NaN/zero rows", file=sys.stderr)
+    write_npy(args.out, rows, total, dim)
+
+if __name__ == "__main__":
+    main()
diff --git a/experiments/ordinal-routing-research/oblivious_directions_results.md b/experiments/ordinal-routing-research/oblivious_directions_results.md
new file mode 100644
index 00000000..edefb50f
--- /dev/null
+++ b/experiments/ordinal-routing-research/oblivious_directions_results.md
@@ -0,0 +1,315 @@
+# Data-oblivious structure vs learned-encoder geometry — the directions arc
+
+Second investigation round (follows the density/CRT/tau work). Question: can a
+**data-oblivious low-discrepancy structure** — golden-angle, Fibonacci, Sobol,
+Kronecker — improve ordvec's training-free routing? The probes (`*.rs`) and this
+doc live in this directory; to run a probe, copy it into the crate's `examples/`
+and `cargo run --release --example <name>` (same convention as the sibling docs).
+**No crate or public-API changes.** Every probe was **pre-registered** (pass/fail
+thresholds fixed in the source header BEFORE running) to prevent goalpost-moving,
+and run on real embeddings.
+
+## Corpora (real, nomic-embed-text 768-d via ollama on GPU)
+
+| corpus | n docs | queries | domain | mean-resultant-length R̄ |
+|--------|--------|---------|--------|--------------------------|
+| repo   | 8,777  | 200     | md+rust | ~0.07 raw-cluster fixture* |
+| fiqa   | 57,600 | 1,000   | financial QA | 0.69 (strongly coned) |
+| nq     | 74,129 | 1,000   | Wikipedia QA | 0.69 |
+| quora  | 150,000| 1,000   | question pairs | ~0.7 |
+
+Ground truth = FP32 cosine top-10. *The repo corpus is the prior round's; the
+financial/QA corpora are the breadth/scale test. NOTE: intrinsic dimension is a
+**corpus** property, not an encoder constant — the repo corpus reads TwoNN≈13 but
+fiqa reads ≈24 with the SAME encoder. See the "ID≈13 anchor was a corpus artifact"
+correction in the cross-encoder section before quoting any single ID number.
+
+## VERDICT (pre-registered, replicated)
+
+**CLASS-DEAD.** Data-oblivious low-discrepancy directions do **not** beat iid-random
+directions for training-free routing — not in ambient space, not in the centered
+intrinsic subspace. The lever is **not the directions**.
+
+## The chain of probes
+
+### 1. `fib_directions` — golden-angle directions, AMBIENT sphere → TIE (with caveat)
+Shared-index triple-tiled golden-angle directions vs iid-Gaussian, recall@k at
+matched candidates-scanned. Across 10 seeds the gap is inside random's own
+seed-to-seed noise (±0.03); the eye-catching seed-1 +5.8pts **evaporated under
+reseeding** (a caught false positive). CAVEAT registered at the time: ambient
+768-sphere is the wrong regime — concentration of measure makes iid-Gaussian
+already near-uniform, so the test was pre-doomed regardless of golden's merit.
+
+### 2. `uniformity_lemma` — why golden is dead on BIN EDGES (not directions)
+The rank transform whitens each coordinate's marginal to **exactly uniform**, so
+equal-width bucketing is **entropy-optimal** (measured: entropy = B exactly at
+B∈{1,2,4}; golden boundaries 3.92 < 4.00 — they *strictly waste bits*). BUT this
+lemma governs the **marginal only**; it is blind to the **joint** correlation
+structure that projection directions act on. So it kills golden on bin edges and
+says nothing about directions — which set up probe #4.
+Correction banked: constant-composition is intrinsic to *any* fixed boundaries
+on ranks, NOT unique to equal-width (the original claim was too strong).
+
+### 3. `overlap_decomp` — the cone is ~100% of the bitmap-overlap excess
+ordvec's hypergeometric null `n_top²/D` is **miscalibrated on real data**: random
+pairs overlap ~4× the textbook prediction, and `cone_frac ≈ 1.000` — essentially
+**all** the excess is the shared cone (hubness), not pairwise similarity.
+Per-coord **mean-centering removes it** (cone_base collapses to the uniform null)
+AND amplifies the true-neighbour overlap gap 2–5×. Replicated on fiqa and nq.
+This is the one continuous, geometry-aligned intervention that *converted*.
+
+### 4. `subspace_directions` — the directions test done RIGHT → CLASS-DEAD
+Center → project to the populated k≈13-dim PCA subspace (where joint structure is
+legible) → place R directions via {random, Sobol, Kronecker} → route. Fair
+envelope (recall@10 at matched candidates):
+
+| corpus | budget | random | sobol | kronecker | pca-axes (ceiling) |
+|--------|--------|--------|-------|-----------|--------------------|
+| fiqa   | 8000   | 0.1967 | 0.1997| 0.2001    | 0.1946 |
+| nq     | 8000   | 0.1836 | 0.1830| 0.1817    | 0.1877 |
+| quora  | 8000   | 0.1004 | 0.1035| 0.1061    | 0.1073 |
+
+Max margin anywhere **+0.006** (threshold for a win was +0.02), across all three
+corpora. Decisive detail:
+the **data-dependent ceiling (pca-axes) itself barely beats random** → the
+directions are not a lever for *anyone* on this data. With ID≈13 and a cone, the
+populated subspace is so dense that any ~13–128 directions cover it equally; there
+is no discrepancy gap for quasi-random structure to close. The Kronecker/Sobol
+**hybrid was NOT built** — it required distinct-regime component wins; there were
+none.
+
+## centering's recall verdict (`centering_recall`, pre-registered)
+
+Centering as a RankQuant encoding change: full-scan R@10 vs FP32 cosine, raw vs
+centered, matched bytes.
+
+| corpus | Δb1 | Δb2 | Δb4 | verdict |
+|--------|-----|-----|-----|---------|
+| repo   | +0.038 | −0.044 | −0.108 | PARTIAL |
+| fiqa   | +0.066 | −0.017 | −0.075 | PARTIAL |
+| nq     | +0.051 | −0.016 | −0.083 | PARTIAL |
+| quora  | +0.027 | −0.022 | −0.061 | PARTIAL |
+
+Centering **helps at b=1, hurts at b≥2**, replicated across 3 corpora. It removes
+the additive cone (good for top-bucket coverage / coarse routing) but rotates the
+rank order away from raw-cosine (bad for full-order fidelity at higher bits). It
+is a **low-bit-prefilter / coarse-routing tool, not a fine-scan tool** — and
+explicitly does NOT survive at the incumbent b=4.
+
+## The through-line
+
+Every oblivious *discrete* structural prior tried across both rounds — golden,
+Fibonacci, prime/spectral, Sobol, Kronecker, low-discrepancy directions — is
+**inert against learned continuous embedding geometry**. The data is a smooth
+~13-dim cone; discrete combinatorial structure has no purchase on it. The only
+interventions that converted were **continuous and geometric** (centering / cone
+removal), and only in the low-bit regime. "Just spend bits (b=4, raw ranks)"
+remains the baseline that beats every oblivious-structure idea.
+
+Even centering — the one continuous intervention that *partially* converted (b=1
+prefilter, +0.03–0.07 R@10) — does not survive at b=4 and does not make a better
+partition key. The cone is real and removable, but removing it trades cosine
+fidelity for balance, and routing wants the fidelity.
+
+## Scope & boundary of this negative (checkpoint)
+
+This is a deliberate stopping point: the hypothesis space had one degree of
+freedom — *which oblivious structure* on *which axis* (bin-edges / directions /
+partition key) — and both ends of both axes have now been tested, **including the
+data-dependent PCA ceiling**. The load-bearing result is that the ceiling itself
+barely beats random: that kills the *axis*, so any further sequence is a
+**derivative** that inherits the null. You cannot escape a null by swapping the
+sequence when even the optimal member ties.
+
+Every result here is conditioned on **off-the-shelf `nomic-embed-text`**: a
+learned, anisotropic (R̄≈0.69), low-intrinsic-dim (≈13) encoder. The honest claim
+is therefore scoped: *data-oblivious combinatorial structure is inert against
+learned anisotropic embedding geometry of this kind.* The only **non-derivative**
+next experiment is to change that conditioning variable — a different encoder
+(different anisotropy / intrinsic dim), or a sparsity/ordinal-trained encoder
+where ranks are the native code and discrete structure might finally seat.
+Everything inside the current encoder is a derivative of what is already
+falsified here.
+
+## Partition balance (`partition_balance`, pre-registered) — BALANCE pass, PRUNING fail
+
+The one round-on-round thread: does cone-removal make a balanced, prunable coarse
+shard key? Sign-pattern key over the top-`bits_key` PCA axes (basis shared across
+arms; only centering differs). Swept bits_key ∈ {6,10,14} on fiqa.
+
+| sub-claim | threshold | result |
+|-----------|-----------|--------|
+| BALANCE — largest-cell-fraction reduction | ≥1.5× | **PASS** (1.6–2.2×; Gini 0.60→0.38 at bits_key=10) |
+| PRUNING — candidates at matched recall≥0.90 | ≤0.80× | **FAIL** (same recall-vs-candidates envelope) |
+
+Centering de-hubs the cells (more balanced, uses all cells) but at matched recall
+scans the **same** candidates — better balance did NOT convert to better pruning.
+Mechanism: balance is a *marginal* property of the key; pruning quality depends on
+cell-to-neighbour *alignment*, and centering rotates the key away from cosine (the
+same reason it fails recall at b≥2). Cells spread along the wrong axis don't prune
+better. This was the last live thread; it closes for the session's recurring
+reason.
+
+## Cross-encoder extension (the non-derivative move) — IN PROGRESS
+
+Every result above is conditioned on nomic (ID≈13, R̄≈0.69). The only
+information-adding next step is changing the encoder. Geometry gate (8k fiqa
+sample, TwoNN intrinsic dim + mean-resultant-length R̄) — run BEFORE any probe to
+confirm the encoders genuinely differ from nomic, else a re-run is derivative:
+
+First-pass gate (512-char chunks) measured: nomic 768d ID≈13 R̄0.69; all-minilm
+384d ID21.7 R̄0.28; mxbai 1024d ID24.3 R̄0.69; snowflake-v1 1024d ID31.1 R̄0.74.
+
+**Methodology correction (capacity confound).** 384-dim (all-minilm) cannot hold
+meaning above ~400-char chunks, and our chunks were capped at 512 — so minilm's
+geometry was capacity-corrupted, not a clean ID point, and does not belong on the
+ladder. Floor set to **768-dim** encoders and chunk cap raised to **1100 chars**
+(BGE-768 capacity ceiling). Revised capacity-honest set (all ≥1024-dim, all clear
+the floor): **bge-m3, bge-large, snowflake-arctic-embed-v2**, plus
+**harrier-oss-v1-0.6b** (the operator's actual daily-driver encoder). Model facts
+verified against the ollama host + HF config (not recalled): harrier =
+Qwen3-based, **1024-dim, 32k context** (Q8 GGUF); ingests 1100-char chunks with
+zero truncation. Re-embedding fiqa at 1100-char chunks; ladder rebuilt on these.
+(For code corpora, the right encoder is jina-code — noted for a future code-domain
+run; not in this text-corpus ladder.)
+
+PRE-REGISTERED prediction (fixed before the full-corpus probes run):
+- The session's mechanism is "ID≈13 is so dense any directions cover it equally."
+  If TRUE, then as ID climbs to 31 the subspace sparsifies and a discrepancy gap
+  should OPEN → Sobol/Kronecker begin to beat random, and centering's b=4 penalty
+  shrinks (less cone). Verdict per model by the SAME thresholds as above
+  (directions: +0.02 at matched budget; centering: +0.02 at b4).
+- If oblivious structure stays inert even at ID=31 / low-anisotropy minilm → the
+  negative generalizes across the realistic ID range of text encoders (stronger).
+
+### Ladder results (fiqa 57.6k, 1100-char chunks, capacity-honest)
+
+| encoder | dim | intrinsic dim | centering Δb4 | sobol−random @8k | pca-axes ceiling |
+|---------|-----|---------------|---------------|------------------|------------------|
+| nomic@512 | 768 | ~13 | −0.075 | ~tie | ≈ random |
+| bge-m3 | 1024 | 21.4 | −0.106 | +0.006 (flicker +0.024@4k) | leads (0.216) |
+| bge-large | 1024 | 22.9 | −0.155 | **−0.036** | leads (0.255) |
+| snowflake-arctic-v2 | 1024 | 18.0 | −0.088 | +0.010 | leads (0.237) |
+| snowflake-arctic-v2 | 1024 | 18.0 | −0.088 | +0.010 | leads (0.237) |
+| harrier-oss-0.6b | 1024 | 21.4 | −0.069 | +0.002 | leads (0.252) |
+| nomic@1100 (same fiqa) | 768 | 23.1 | −0.082 | (n/a) | — |
+| nomic@512 (same fiqa) | 768 | 24.3 | (≈ above) | — | — |
+
+### CORRECTION — the "ID≈13" anchor was a CORPUS artifact, not an encoder fact
+
+The "nomic intrinsic dim ≈ 13" that motivated the whole "dense low-dim subspace"
+mechanism came from the **repo corpus** (8.7k md+rust sentences), NOT fiqa. On a
+FIXED corpus (fiqa 57.6k), nomic's ID is **24.3 @512 / 23.1 @1100** — and *every*
+encoder lands ID ~18–24. The apparent "ID ladder 13→31" was mostly **corpus
+differences masquerading as encoder differences** (nomic-on-repo vs others-on-fiqa).
+On one corpus the ladder is nearly flat. **Intrinsic dim is set more by the corpus
+than the encoder**, and chunk length (512→1100) barely moved it (24.3→23.1) — the
+opposite of the predicted large drop. Both my "ID≈13" anchor and my "longer chunks
+→ much lower ID" prediction were wrong; recorded here rather than quietly dropped.
+
+What SURVIVES the correction (now properly controlled — same corpus, varied encoder):
+- **Directions CLASS-DEAD** across all 5 encoders at the ID range real text
+  encoders actually produce (~18–24). The negative was never about low ID; it is
+  general. No revival anywhere.
+- **Centering b4 dead** across all 5 (−0.07 to −0.15); harrier + nomic@1100 both
+  PARTIAL (b1 help, b4 fail) — same pattern as the original runs.
+- **pca-axes (data-aligned) leads** at every encoder (0.24–0.25 vs random ~0.19):
+  the lever is data-alignment, which training-free forbids. The robust positive.
+
+**Directions: CLASS-DEAD, confirmed across the ID ladder.** sobol−random flips
+sign per corpus (+0.006, −0.036, +0.010) — noise around zero, never ≥0.02 in ≥2
+corpora; the bge-m3 flicker did NOT replicate at near-identical ID (bge-large 22.9).
+The pre-registered ≥2-corpora rule holds the line.
+
+**The one robust high-ID effect (a real, positive, scoped finding):** the
+data-aligned ceiling (pca-axes) clearly LEADS random at every high-ID encoder
+(0.22–0.26 vs ~0.18–0.20), unlike nomic where it barely beat random. So higher
+intrinsic dim genuinely opens room on the directions axis — but ONLY
+data-dependent directions exploit it; oblivious (sobol/kronecker) cannot. The
+lever at high ID is **data-alignment**, which is exactly what training-free
+forbids. This locates *why* oblivious structure fails rather than just restating
+that it does.
+
+**Centering b4: dead across all encoders, penalty SCALES with capacity** (−0.08 to
+−0.15) — higher-capacity encoders carry more cosine order for centering to
+corrupt. Most robust negative in the set.
+
+harrier-oss-0.6b (operator's daily driver) + nomic@1100 (chunk-length control)
+embedding; results to follow.
+
+**Chunk length is a THIRD geometry axis (not a free parameter).** Longer chunks
+average more tokens → vectors regress toward the corpus mean → R̄ rises (tighter
+cone), intrinsic dim falls. So nomic@512 and nomic@1100 are *different geometries*,
+and every 512-char result above is conditioned on that one chunk length. We
+re-embed nomic@1100 on the SAME 57.6k fiqa docs to measure the shift directly
+(two-point: ID and R̄ delta). Deployment consequence: a real enterprise lake has
+documents at every chunk length at once → a *mixture* of geometries, not one — so
+chunk-length heterogeneity is itself a lake pathology (folds into Path B). **Path B
+was run** — see [length_mixture_lake_results.md](length_mixture_lake_results.md): the
+chunk-length axis is real but small and co-axial (R̄ 0.705→0.723 over 8.6× length,
+cones ≥0.986 aligned), and a 4-length mixture lake leaves b=4 raw R@10 immune (+0.002).
+
+## Phase B — the messy multi-cone "lake" (deployment robustness)
+
+Built a synthetic enterprise lake: union of 3 nomic-embedded domains (fiqa 57.6k +
+nq 74.1k + quora 150k = **281,729 docs**, 3,000 queries spanning all cones), with
+optional templated near-duplicate hub injection. Two pre-registered predictions
+about lake geometry — **both FALSIFIED** by the validated probes (the payoff of the
+negative-verification discipline: locked thresholds caught my wrong intuitions):
+
+- **Predicted: union of cones → higher effective ID.** WRONG — union ID = **8.2**
+  (vs ~24 per-domain). Far-apart domain clusters read to TwoNN as a few discrete
+  blobs = *lower* intrinsic dim. Cones add separation, not dimensions. A
+  multi-domain lake is **lower-ID and more clustered**, not higher-ID. (This also
+  voids Phase B's Question B — there is no ID rise to revive oblivious structure.)
+- **Predicted: no single removable mean (global centering breaks).** WRONG —
+  global centering still collapses cone_base to the uniform null (B=4: 11.2→3.25 ≈
+  textbook 3.0), well under the pre-registered 0.30 "insufficient" bar. The 3
+  domains share enough common direction that ONE global mean still de-hubs them.
+  Global centering **holds** on the multi-cone union.
+
+Net: the "messy lake = higher-ID, locally-coned, needs per-cluster centering"
+intuition is wrong on both counts. A multi-domain union is lower-ID and globally
+centerable.
+
+**Question C — templated-hub robustness (the dominant real-lake pathology).**
+Inject near-duplicate clusters (1 base vec + tiny noise) at rising prevalence;
+measure raw RankQuant b=4 R@10 vs FP32 cosine on the 3000-query lake set:
+
+| hub prevalence | b4 R@10 | Δ from clean |
+|----------------|---------|--------------|
+| 0%  | 0.8429 | — |
+| 5%  | 0.8408 | −0.0021 |
+| 10% | 0.8420 | −0.0009 |
+| 15% | 0.8421 | −0.0008 |
+
+Pre-registered "graceful if drop < 0.10 through 15%." Actual: **−0.002, flat to
+noise — essentially immune.** Mechanism: the fixed-mass rank code has no global
+frequency/IDF term for a hub to poison; injected boilerplate is far from real
+queries in rank space and never enters the top-k. Training-free + data-oblivious
+is *exactly* the property that confers hub-robustness (a learned codebook or
+IDF-weighted scheme would be corrupted by hub prevalence; ordvec is not).
+
+## Phase B verdict — ordvec is robust on multi-cone lake geometry
+
+All three pre-registered lake fears were unfounded: a multi-domain union is
+**lower-ID** (8.2 vs ~24/domain), **still globally centerable**, and **immune to
+templated hubs** (−0.002 R@10 through 15%). b=4 raw holds R@10 ≈ 0.84 throughout.
+The deployment-relevant conclusion: ordvec's training-free encoding does NOT
+degrade on multi-cone / hub-heavy data — if anything the geometry is easier.
+
+CAVEAT (honest scope): this is a synthetic union of 3 *curated* corpora. It models
+multi-cone structure + templated hubs, but NOT the OCR-garbage / mixed-language /
+extreme-length heterogeneity of a true S3 dump (uncapturable from clean
+embeddings). Claim established: "multi-domain union is benign." Claim NOT
+established: "raw dirty S3 sludge is benign" — that needs actual dirty data.
+
+## Reproduce
+```
+# embed real corpora via ollama nomic-embed-text -> .npy (see REAL_CORPUS_RUNBOOK.md)
+cargo run --release --example uniformity_lemma    -- --corpus-npy CORPUS.npy
+cargo run --release --example overlap_decomp      -- --corpus-npy CORPUS.npy --queries-npy Q.npy
+cargo run --release --example centering_recall     -- --corpus-npy CORPUS.npy --queries-npy Q.npy
+cargo run --release --example subspace_directions  -- --corpus-npy CORPUS.npy --queries-npy Q.npy --kdim 13
+# Phase B lake: python make_lake.py --parts fiqa_corpus.npy nq_corpus.npy quora_corpus.npy --out lake.npy [--hub-frac 0.1]
+```
diff --git a/experiments/ordinal-routing-research/overlap_decomp.rs b/experiments/ordinal-routing-research/overlap_decomp.rs
new file mode 100644
index 00000000..969c79a9
--- /dev/null
+++ b/experiments/ordinal-routing-research/overlap_decomp.rs
@@ -0,0 +1,287 @@
+//! CONTEXT-BOXED, PRE-REGISTERED — decomposes the bitmap top-bucket overlap
+//! EXCESS on real embeddings into CONE (marginal/hubness) vs PAIRWISE (signal),
+//! and tests whether per-coordinate mean-centering recovers calibration.
+//! Isolated by request: touches no harness, writes nothing. Verdicts fixed below.
+//!
+//! WHY: uniformity_lemma.rs found the hypergeometric null_err GROWS with B on
+//! real data (0.31 -> 0.85 -> 3.34). Hypothesis: that excess is the shared cone
+//! (R̄=0.69 anisotropy => same coords top-bucket in EVERY doc), not genuine
+//! pairwise similarity. If so, the bitmap prefilter admits candidates partly on
+//! HUBNESS, not meaning.
+//!
+//! FOUR overlap levels for the top bucket (n_top = D/2^B coords):
+//!   uniform_null  = n_top^2 / D                  (random-subset assumption)
+//!   cone_baseline = Σ_c f_c^2                     (f_c = P(coord c in top bucket);
+//!                                                  overlap of two INDEPENDENT docs
+//!                                                  given marginals == pure hubness)
+//!   obs_random    = mean top-overlap, random doc pairs
+//!   obs_trueNbr   = mean top-overlap, (query, FP32-cosine-true-neighbour) pairs
+//!
+//! Decomposition (all on the SAME corpus), reported raw AND after per-coord
+//! mean-centering (subtract corpus per-coordinate mean before ranking):
+//!   cone fraction   = (cone_baseline - uniform_null) / (obs_random - uniform_null)
+//!   pairwise signal = obs_trueNbr - obs_random        (the discriminative gap)
+//!
+//! PRE-REGISTERED VERDICT (fixed before running):
+//!   A. "cone dominates the null": cone fraction >= 0.70 (raw).
+//!      => most of the bitmap's apparent overlap structure is hubness.
+//!   B. "genuine pairwise signal survives": obs_trueNbr - obs_random >= 0.20 * obs_random
+//!      AND obs_trueNbr > cone_baseline (true nbrs overlap beyond what hubness explains).
+//!   C. "centering helps": centering reduces cone fraction by >= 0.10 absolute
+//!      AND does not shrink the pairwise-signal ratio.
+//!   Each reported PASS/FAIL independently; report actuals regardless.
+//!
+//! Run: cargo run --release --example overlap_decomp -- \
+//!          --corpus-npy /tmp/repo_corpus.npy --queries-npy /tmp/repo_q.npy
+
+// Research probe: `analyze` is a parameterized runner; bundling its args into a
+// struct would obscure the experiment, not clarify it.
+#![allow(clippy::too_many_arguments)]
+
+use rand::{RngExt, SeedableRng};
+use rand_chacha::ChaCha8Rng;
+use rayon::prelude::*;
+
+const SEED: u64 = 1;
+
+fn load_npy_f32(path: &str) -> (Vec<f32>, usize, usize) {
+    let bytes = std::fs::read(path).expect("read npy");
+    assert!(
+        bytes.len() >= 10 && &bytes[..6] == b"\x93NUMPY",
+        "not a numpy file"
+    );
+    let major = bytes[6];
+    assert!(
+        major == 1 || major == 2,
+        "unsupported numpy .npy major version {major}"
+    );
+    if major == 2 {
+        assert!(bytes.len() >= 12, "truncated numpy v2 header");
+    }
+    let (hlen, hstart) = if major == 1 {
+        (u16::from_le_bytes([bytes[8], bytes[9]]) as usize, 10)
+    } else {
+        (
+            u32::from_le_bytes([bytes[8], bytes[9], bytes[10], bytes[11]]) as usize,
+            12,
+        )
+    };
+    assert!(hstart + hlen <= bytes.len(), "truncated numpy header");
+    let header = std::str::from_utf8(&bytes[hstart..hstart + hlen]).expect("utf8 header");
+    assert!(header.contains("'descr': '<f4'"), "expected <f4 dtype");
+    let after = &header[header.find("'shape':").expect("shape")..];
+    let inner = &after[after.find('(').unwrap() + 1..after.find(')').unwrap()];
+    let dims: Vec<usize> = inner
+        .split(',')
+        .filter_map(|s| s.trim().parse().ok())
+        .collect();
+    let (n, dim) = (dims[0], dims[1]);
+    let dstart = hstart + hlen;
+    let mut out = vec![0.0f32; n * dim];
+    for (i, ch) in bytes[dstart..].chunks_exact(4).enumerate() {
+        out[i] = f32::from_le_bytes([ch[0], ch[1], ch[2], ch[3]]);
+    }
+    (out, n, dim)
+}
+
+fn l2_normalize_rows(v: &mut [f32], dim: usize) {
+    for i in 0..v.len() / dim {
+        let row = &mut v[i * dim..(i + 1) * dim];
+        let nrm: f32 = row.iter().map(|x| x * x).sum::<f32>().sqrt();
+        if nrm > 0.0 {
+            for x in row.iter_mut() {
+                *x /= nrm;
+            }
+        }
+    }
+}
+
+/// Per-coordinate mean over corpus rows.
+fn coord_mean(corpus: &[f32], n: usize, dim: usize) -> Vec<f32> {
+    let mut m = vec![0f32; dim];
+    for i in 0..n {
+        for c in 0..dim {
+            m[c] += corpus[i * dim + c];
+        }
+    }
+    for m_c in m.iter_mut() {
+        *m_c /= n as f32;
+    }
+    m
+}
+
+/// Top-bucket membership bitset per doc: coords whose rank is in the top 1/2^B.
+/// Returns, per doc, the sorted list of top-bucket coord ids.
+fn top_sets(
+    rows: &[f32],
+    n: usize,
+    dim: usize,
+    bits: u32,
+    sub: Option<&[f32]>,
+) -> (Vec<Vec<u16>>, usize) {
+    let n_top = dim >> bits; // D / 2^B
+    let sets: Vec<Vec<u16>> = (0..n)
+        .into_par_iter()
+        .map(|r| {
+            let mut v: Vec<(u16, f32)> = (0..dim)
+                .map(|c| {
+                    let x = rows[r * dim + c] - sub.map_or(0.0, |s| s[c]);
+                    (c as u16, x)
+                })
+                .collect();
+            // top n_top by value = highest ranks
+            v.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
+            let mut top: Vec<u16> = v.into_iter().take(n_top).map(|(c, _)| c).collect();
+            top.sort_unstable();
+            top
+        })
+        .collect();
+    (sets, n_top)
+}
+
+fn overlap(a: &[u16], b: &[u16]) -> usize {
+    let (mut i, mut j, mut c) = (0, 0, 0);
+    while i < a.len() && j < b.len() {
+        match a[i].cmp(&b[j]) {
+            std::cmp::Ordering::Less => i += 1,
+            std::cmp::Ordering::Greater => j += 1,
+            std::cmp::Ordering::Equal => {
+                c += 1;
+                i += 1;
+                j += 1;
+            }
+        }
+    }
+    c
+}
+
+fn ground_truth(corpus: &[f32], queries: &[f32], dim: usize, k: usize) -> Vec<Vec<u32>> {
+    let n = corpus.len() / dim;
+    let nq = queries.len() / dim;
+    (0..nq)
+        .into_par_iter()
+        .map(|qi| {
+            let q = &queries[qi * dim..(qi + 1) * dim];
+            let mut s: Vec<(u32, f32)> = (0..n)
+                .map(|di| {
+                    let d = &corpus[di * dim..(di + 1) * dim];
+                    (di as u32, q.iter().zip(d).map(|(a, b)| a * b).sum::<f32>())
+                })
+                .collect();
+            s.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
+            s.into_iter().take(k).map(|(i, _)| i).collect()
+        })
+        .collect()
+}
+
+fn analyze(
+    label: &str,
+    corpus: &[f32],
+    queries: &[f32],
+    n: usize,
+    nq: usize,
+    dim: usize,
+    truth: &[Vec<u32>],
+    sub: Option<&[f32]>,
+) {
+    let mut rng = ChaCha8Rng::seed_from_u64(SEED);
+    let pairs: Vec<(usize, usize)> = (0..100_000)
+        .map(|_| {
+            let i = rng.random_range(0..n);
+            let mut j = rng.random_range(0..n);
+            if j == i {
+                j = (j + 1) % n;
+            }
+            (i, j)
+        })
+        .collect();
+
+    println!("\n## {label}");
+    println!(
+        "B\tn_top\tuniform_null\tcone_base\tobs_random\tobs_trueNbr\tcone_frac\tpair_sig_ratio"
+    );
+    for &bits in &[1u32, 2, 4] {
+        let (csets, n_top) = top_sets(corpus, n, dim, bits, sub);
+        let (qsets, _) = top_sets(queries, nq, dim, bits, sub);
+
+        // per-coord top-bucket frequency f_c -> cone baseline Σ f_c^2
+        let mut f = vec![0f64; dim];
+        for s in &csets {
+            for &c in s {
+                f[c as usize] += 1.0;
+            }
+        }
+        for x in f.iter_mut() {
+            *x /= n as f64;
+        }
+        let cone_base: f64 = f.iter().map(|x| x * x).sum();
+
+        let uniform_null = (n_top * n_top) as f64 / dim as f64;
+
+        let obs_random: f64 = pairs
+            .par_iter()
+            .map(|&(i, j)| overlap(&csets[i], &csets[j]) as f64)
+            .sum::<f64>()
+            / pairs.len() as f64;
+
+        // true-neighbour overlap: query top-set vs each true-nbr doc top-set
+        let (mut tn_sum, mut tn_cnt) = (0f64, 0usize);
+        for qi in 0..nq {
+            for &di in &truth[qi] {
+                tn_sum += overlap(&qsets[qi], &csets[di as usize]) as f64;
+                tn_cnt += 1;
+            }
+        }
+        let obs_true = tn_sum / tn_cnt as f64;
+
+        let cone_frac = (cone_base - uniform_null) / (obs_random - uniform_null);
+        let pair_sig_ratio = (obs_true - obs_random) / obs_random;
+
+        println!("{bits}\t{n_top}\t{uniform_null:.2}\t{cone_base:.2}\t{obs_random:.2}\t{obs_true:.2}\t{cone_frac:.3}\t{pair_sig_ratio:+.3}");
+    }
+}
+
+fn main() {
+    let mut cp = None;
+    let mut qp = None;
+    let mut a = std::env::args().skip(1);
+    while let Some(x) = a.next() {
+        match x.as_str() {
+            "--corpus-npy" => cp = Some(a.next().unwrap()),
+            "--queries-npy" => qp = Some(a.next().unwrap()),
+            _ => {}
+        }
+    }
+    let (mut corpus, n, dim) = load_npy_f32(&cp.expect("--corpus-npy"));
+    let (mut queries, nq, dq) = load_npy_f32(&qp.expect("--queries-npy"));
+    assert_eq!(dim, dq);
+    l2_normalize_rows(&mut corpus, dim);
+    l2_normalize_rows(&mut queries, dim);
+    // ground truth computed on the L2-normalized RAW vectors (the real task target);
+    // centering changes the rank CODE, not the retrieval ground truth.
+    let truth = ground_truth(&corpus, &queries, dim, 10);
+
+    println!("# overlap_decomp (CONTEXT-BOXED, pre-registered). corpus {n}x{dim}, queries {nq}");
+    println!("# cone_frac = (cone_base - uniform_null)/(obs_random - uniform_null); 1.0 => excess is ALL hubness");
+    println!("# pair_sig_ratio = (obs_trueNbr - obs_random)/obs_random; >0 => true nbrs overlap MORE than random");
+
+    analyze("RAW ranks", &corpus, &queries, n, nq, dim, &truth, None);
+
+    let mean = coord_mean(&corpus, n, dim);
+    analyze(
+        "MEAN-CENTERED ranks (subtract per-coord corpus mean before ranking)",
+        &corpus,
+        &queries,
+        n,
+        nq,
+        dim,
+        &truth,
+        Some(&mean),
+    );
+
+    println!("\n# PRE-REGISTERED: A cone-dominates if cone_frac>=0.70 (raw);");
+    println!("#   B pairwise-signal-survives if pair_sig_ratio>=0.20 AND obs_trueNbr>cone_base;");
+    println!(
+        "#   C centering-helps if cone_frac drops >=0.10 absolute and pair_sig_ratio not reduced."
+    );
+}
diff --git a/experiments/ordinal-routing-research/partition_balance.rs b/experiments/ordinal-routing-research/partition_balance.rs
new file mode 100644
index 00000000..c5afd2a5
--- /dev/null
+++ b/experiments/ordinal-routing-research/partition_balance.rs
@@ -0,0 +1,352 @@
+//! CONTEXT-BOXED, PRE-REGISTERED — the round-on-round thread: does cone-removal
+//! (centering) make a BALANCED, PRUNABLE coarse partition key for a sharding/IVF
+//! layer? This is the one place cone-removal might buy SCALING (sublinear pruning)
+//! rather than recall. Isolated by request: touches no harness, writes nothing.
+//!
+//! WHY: overlap_decomp showed raw low-B top-buckets are ~100% cone (every doc
+//! shares the same hub coords) -> a raw coarse key is maximally UNBALANCED: docs
+//! pile into a few cells, so an IVF layer probing few cells can't prune without
+//! losing neighbours. Centering removes the cone -> should SPREAD docs across
+//! cells. The decisive question is NOT balance for its own sake but candidates-
+//! scanned at matched recall under a real inverted-index probe.
+//!
+//! PARTITION KEY = the coarse cell id of a doc. We build cells by k-means-free,
+//! training-free coarse coding: project (centered or raw) onto the top PCA axes,
+//! take the SIGN pattern of the top `bits_key` axes -> 2^bits_key cells (a
+//! data-oblivious-given-PCA hash; PCA basis is the only data-dependent part and is
+//! shared identically across arms, so arms differ ONLY in centering).
+//! Query probes the P nearest cells (by Hamming on the key); candidate = union of
+//! those cells' docs; recall@10 vs FP32 cosine top-10.
+//!
+//! ARMS (PCA basis identical across all; only the centering differs):
+//!   raw      — sign-key from raw (uncentered) projections.
+//!   centered — sign-key from per-coord-mean-centered projections.
+//! Controls reported: cell-occupancy Gini (0=perfectly balanced, 1=all in one
+//! cell) and the largest-cell fraction.
+//!
+//! PRE-REGISTERED VERDICT (fixed before running):
+//!   BALANCE: centered must cut largest-cell-fraction by >= 1.5x vs raw.
+//!   PRUNING (the one that matters): at matched recall@10 (>=0.90), centered must
+//!     scan <= 0.80x the candidates raw scans, in >= 2 of 3 corpora.
+//!   => PASS = centered is a materially better coarse key (a SCALING win, distinct
+//!      from the recall FAIL at b>=2). FAIL = centering doesn't help partitioning
+//!      either; the cone is not the bottleneck for routing.
+//!
+//! Run: cargo run --release --example partition_balance -- \
+//!          --corpus-npy /tmp/corpora/fiqa_corpus.npy --queries-npy /tmp/corpora/fiqa_q.npy
+
+// Research probe: `analyze` is a parameterized runner; bundling its args into a
+// struct would obscure the experiment, not clarify it.
+#![allow(clippy::too_many_arguments)]
+
+use rand::{RngExt, SeedableRng};
+use rand_chacha::ChaCha8Rng;
+use rayon::prelude::*;
+
+const SEED: u64 = 1;
+
+fn load_npy_f32(path: &str) -> (Vec<f32>, usize, usize) {
+    let bytes = std::fs::read(path).expect("read npy");
+    assert!(
+        bytes.len() >= 10 && &bytes[..6] == b"\x93NUMPY",
+        "not a numpy file"
+    );
+    let major = bytes[6];
+    assert!(
+        major == 1 || major == 2,
+        "unsupported numpy .npy major version {major}"
+    );
+    if major == 2 {
+        assert!(bytes.len() >= 12, "truncated numpy v2 header");
+    }
+    let (hlen, hstart) = if major == 1 {
+        (u16::from_le_bytes([bytes[8], bytes[9]]) as usize, 10)
+    } else {
+        (
+            u32::from_le_bytes([bytes[8], bytes[9], bytes[10], bytes[11]]) as usize,
+            12,
+        )
+    };
+    assert!(hstart + hlen <= bytes.len(), "truncated numpy header");
+    let header = std::str::from_utf8(&bytes[hstart..hstart + hlen]).expect("utf8 header");
+    assert!(header.contains("'descr': '<f4'"), "expected <f4 dtype");
+    let after = &header[header.find("'shape':").expect("shape")..];
+    let inner = &after[after.find('(').unwrap() + 1..after.find(')').unwrap()];
+    let dims: Vec<usize> = inner
+        .split(',')
+        .filter_map(|s| s.trim().parse().ok())
+        .collect();
+    let (n, dim) = (dims[0], dims[1]);
+    let dstart = hstart + hlen;
+    let mut out = vec![0.0f32; n * dim];
+    for (i, ch) in bytes[dstart..].chunks_exact(4).enumerate() {
+        out[i] = f32::from_le_bytes([ch[0], ch[1], ch[2], ch[3]]);
+    }
+    (out, n, dim)
+}
+
+fn l2_normalize_rows(v: &mut [f32], dim: usize) {
+    for i in 0..v.len() / dim {
+        let row = &mut v[i * dim..(i + 1) * dim];
+        let nrm: f32 = row.iter().map(|x| x * x).sum::<f32>().sqrt();
+        if nrm > 0.0 {
+            for x in row.iter_mut() {
+                *x /= nrm;
+            }
+        }
+    }
+}
+
+fn coord_mean(rows: &[f32], n: usize, dim: usize) -> Vec<f32> {
+    let mut m = vec![0f32; dim];
+    for i in 0..n {
+        for c in 0..dim {
+            m[c] += rows[i * dim + c];
+        }
+    }
+    for m_c in m.iter_mut() {
+        *m_c /= n as f32;
+    }
+    m
+}
+
+/// top-k PCA via power iteration + deflation (implicit covariance action). Pure Rust.
+fn pca_topk(centered: &[f32], n: usize, dim: usize, k: usize) -> Vec<Vec<f32>> {
+    let sample: Vec<usize> = if n > 20000 {
+        let mut rng = ChaCha8Rng::seed_from_u64(SEED ^ 0x9C0FFEE);
+        (0..20000).map(|_| rng.random_range(0..n)).collect()
+    } else {
+        (0..n).collect()
+    };
+    let cov_mul = |x: &[f32]| -> Vec<f32> {
+        let acc: Vec<f32> = sample
+            .par_iter()
+            .map(|&i| {
+                let row = &centered[i * dim..(i + 1) * dim];
+                let d: f32 = row.iter().zip(x).map(|(a, b)| a * b).sum();
+                row.iter().map(|&r| r * d).collect::<Vec<f32>>()
+            })
+            .reduce(
+                || vec![0f32; dim],
+                |mut a, b| {
+                    for j in 0..dim {
+                        a[j] += b[j];
+                    }
+                    a
+                },
+            );
+        let s = sample.len() as f32;
+        acc.iter().map(|v| v / s).collect()
+    };
+    let mut comps: Vec<Vec<f32>> = Vec::with_capacity(k);
+    let mut rng = ChaCha8Rng::seed_from_u64(SEED ^ 0xABCD);
+    for _ in 0..k {
+        let mut v: Vec<f32> = (0..dim)
+            .map(|_| {
+                let u1: f32 = rng.random_range(1e-9..1.0);
+                let u2: f32 = rng.random_range(0.0..1.0);
+                (-2.0 * u1.ln()).sqrt() * (std::f32::consts::TAU * u2).cos()
+            })
+            .collect();
+        for _ in 0..50 {
+            for c in &comps {
+                let p: f32 = v.iter().zip(c).map(|(a, b)| a * b).sum();
+                for j in 0..dim {
+                    v[j] -= p * c[j];
+                }
+            }
+            let mut y = cov_mul(&v);
+            for c in &comps {
+                let p: f32 = y.iter().zip(c).map(|(a, b)| a * b).sum();
+                for j in 0..dim {
+                    y[j] -= p * c[j];
+                }
+            }
+            let nrm: f32 = y.iter().map(|t| t * t).sum::<f32>().sqrt();
+            if nrm < 1e-20 {
+                break;
+            }
+            for j in 0..dim {
+                v[j] = y[j] / nrm;
+            }
+        }
+        comps.push(v);
+    }
+    comps
+}
+
+/// sign-pattern key over the top `bits_key` PCA axes -> cell id in [0, 2^bits_key).
+fn cell_key(v: &[f32], mean: Option<&[f32]>, basis: &[Vec<f32>], bits_key: usize) -> u32 {
+    let mut key = 0u32;
+    for (a, axis) in basis.iter().take(bits_key).enumerate() {
+        let proj: f32 = axis
+            .iter()
+            .enumerate()
+            .map(|(c, &w)| (v[c] - mean.map_or(0.0, |m| m[c])) * w)
+            .sum();
+        if proj > 0.0 {
+            key |= 1 << a;
+        }
+    }
+    key
+}
+
+fn gini(counts: &[usize]) -> f32 {
+    let n = counts.len();
+    if n == 0 {
+        return 0.0;
+    }
+    let mut s: Vec<f64> = counts.iter().map(|&c| c as f64).collect();
+    s.sort_by(|a, b| a.partial_cmp(b).unwrap());
+    let tot: f64 = s.iter().sum();
+    if tot == 0.0 {
+        return 0.0;
+    }
+    let mut cum = 0.0;
+    let mut lorenz = 0.0;
+    for &x in &s {
+        cum += x;
+        lorenz += cum;
+    }
+    let g = (n as f64 + 1.0 - 2.0 * lorenz / tot) / n as f64;
+    g as f32
+}
+
+fn analyze(
+    label: &str,
+    corpus: &[f32],
+    queries: &[f32],
+    n: usize,
+    nq: usize,
+    dim: usize,
+    truth: &[Vec<u32>],
+    mean: Option<&[f32]>,
+    basis: &[Vec<f32>],
+    bits_key: usize,
+) {
+    let n_cells = 1usize << bits_key;
+    // assign cells
+    let ckeys: Vec<u32> = (0..n)
+        .into_par_iter()
+        .map(|i| cell_key(&corpus[i * dim..(i + 1) * dim], mean, basis, bits_key))
+        .collect();
+    let qkeys: Vec<u32> = (0..nq)
+        .into_par_iter()
+        .map(|i| cell_key(&queries[i * dim..(i + 1) * dim], mean, basis, bits_key))
+        .collect();
+    let mut cells: Vec<Vec<u32>> = vec![Vec::new(); n_cells];
+    for (i, &k) in ckeys.iter().enumerate() {
+        cells[k as usize].push(i as u32);
+    }
+    let counts: Vec<usize> = cells.iter().map(|c| c.len()).collect();
+    let g = gini(&counts);
+    let largest = *counts.iter().max().unwrap() as f64 / n as f64;
+    let nonempty = counts.iter().filter(|&&c| c > 0).count();
+
+    println!("\n## {label} (bits_key={bits_key}, {n_cells} cells, {nonempty} non-empty)");
+    println!("  occupancy Gini={g:.4} (0=balanced), largest-cell-fraction={largest:.4}");
+    println!("  P_probe\tcand_scanned\trecall@10");
+    // probe P nearest cells by Hamming on the key
+    for p_probe in [1usize, 2, 4, 8, 16, 32] {
+        if p_probe > n_cells {
+            break;
+        }
+        let stats: Vec<(usize, f32)> = (0..nq)
+            .into_par_iter()
+            .map(|qi| {
+                let qk = qkeys[qi];
+                // rank cells by Hamming distance to qk, take P nearest
+                let mut order: Vec<u32> = (0..n_cells as u32).collect();
+                order.sort_by_key(|&c| (c ^ qk).count_ones());
+                let mut cand = 0usize;
+                use std::collections::HashSet;
+                let mut hit_set: HashSet<u32> = HashSet::new();
+                for &c in order.iter().take(p_probe) {
+                    cand += cells[c as usize].len();
+                    for &d in &cells[c as usize] {
+                        hit_set.insert(d);
+                    }
+                }
+                let hits = truth[qi].iter().filter(|&&t| hit_set.contains(&t)).count();
+                (cand, hits as f32 / truth[qi].len() as f32)
+            })
+            .collect();
+        let mc = stats.iter().map(|s| s.0 as f64).sum::<f64>() / nq as f64;
+        let mr = stats.iter().map(|s| s.1 as f64).sum::<f64>() / nq as f64;
+        println!("  {p_probe}\t{mc:.0}\t{mr:.4}");
+    }
+}
+
+fn main() {
+    let mut cp = None;
+    let mut qp = None;
+    let mut bits_key = 10usize;
+    let mut a = std::env::args().skip(1);
+    while let Some(x) = a.next() {
+        match x.as_str() {
+            "--corpus-npy" => cp = Some(a.next().unwrap()),
+            "--queries-npy" => qp = Some(a.next().unwrap()),
+            "--bits-key" => bits_key = a.next().unwrap().parse().unwrap(),
+            _ => {}
+        }
+    }
+    let (mut corpus, n, dim) = load_npy_f32(&cp.expect("--corpus-npy"));
+    let (mut queries, nq, dq) = load_npy_f32(&qp.expect("--queries-npy"));
+    assert_eq!(dim, dq);
+    l2_normalize_rows(&mut corpus, dim);
+    l2_normalize_rows(&mut queries, dim);
+    let truth: Vec<Vec<u32>> = (0..nq)
+        .into_par_iter()
+        .map(|qi| {
+            let q = &queries[qi * dim..(qi + 1) * dim];
+            let mut s: Vec<(u32, f32)> = (0..n)
+                .map(|di| {
+                    let d = &corpus[di * dim..(di + 1) * dim];
+                    (di as u32, q.iter().zip(d).map(|(a, b)| a * b).sum::<f32>())
+                })
+                .collect();
+            s.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
+            s.into_iter().take(10).map(|(i, _)| i).collect()
+        })
+        .collect();
+
+    let mean = coord_mean(&corpus, n, dim);
+    // PCA basis from the CENTERED corpus, shared identically across arms.
+    let mut cc = corpus.clone();
+    for i in 0..n {
+        for c in 0..dim {
+            cc[i * dim + c] -= mean[c];
+        }
+    }
+    eprintln!("# computing top-{bits_key} PCA basis...");
+    let basis = pca_topk(&cc, n, dim, bits_key);
+
+    println!("# partition_balance (CONTEXT-BOXED, pre-registered). corpus {n}x{dim}, queries {nq}");
+    println!("# coarse cell key = sign pattern of top-{bits_key} PCA axes; PCA basis shared across arms.");
+    analyze(
+        "RAW key (uncentered projections)",
+        &corpus,
+        &queries,
+        n,
+        nq,
+        dim,
+        &truth,
+        None,
+        &basis,
+        bits_key,
+    );
+    analyze(
+        "CENTERED key (mean-subtracted projections)",
+        &corpus,
+        &queries,
+        n,
+        nq,
+        dim,
+        &truth,
+        Some(&mean),
+        &basis,
+        bits_key,
+    );
+    println!("\n# PRE-REGISTERED: BALANCE = centered cuts largest-cell-fraction >=1.5x; PRUNING = centered scans <=0.80x");
+    println!("#   candidates at matched recall>=0.90 in >=2/3 corpora => PASS (a scaling win). Else FAIL.");
+}
diff --git a/experiments/ordinal-routing-research/rigidity_impossibility_proofs.md b/experiments/ordinal-routing-research/rigidity_impossibility_proofs.md
new file mode 100644
index 00000000..82956616
--- /dev/null
+++ b/experiments/ordinal-routing-research/rigidity_impossibility_proofs.md
@@ -0,0 +1,122 @@
+# Why oblivious random-projection keys cannot be rigid — proofs, not corpora
+
+The zoo gives evidence; these give the reason. Three claims, stated and proved.
+"Rigid" = number variance Σ²(L) grows sub-linearly (o(L)); "Poisson-or-worse"
+= Σ²(L) = Ω(L). The routing key is K_r(x) = r·x / ||x|| for a probe direction r
+(cosine geometry: the key is a function of the RAY, magnitude-free).
+
+------------------------------------------------------------------------
+## Lemma 0 (magnitude is quotiented out)
+
+For any x ≠ 0 and scalar c > 0, K_r(cx) = K_r(x). So any structure a corpus
+encodes purely in vector NORMS is invisible to the key. In particular the
+`projected-rigid` construction x_i = t_i·u collapses under normalization to
+sign(t_i)·u ∈ {+u, −u}: the 50k rigid magnitudes collapse toward ≈2 angular keys
+(±u, modulo the small orthogonal jitter the generator adds), so the key sees a
+near-degenerate occupancy exactly as the quotient predicts. ∎
+
+Consequence: rigidity must live in the ANGULAR distribution on the sphere
+S^{d-1}, or it does not exist for the key.
+
+------------------------------------------------------------------------
+## Theorem 1 (isotropy ⇒ exact Poisson key)
+
+Let x be distributed so that x/||x|| is uniform on S^{d-1} (isotropic). For a
+fixed unit r, the projected key K_r = r·(x/||x||) has the known marginal with
+density ∝ (1−s²)^((d-3)/2) on [−1,1]. Drawing n i.i.d. corpus points gives n
+i.i.d. key values from a fixed continuous law F. After unfolding by F, the
+points are a homogeneous Poisson process of rate 1, for which Σ²(L) = L exactly.
+
+Proof: i.i.d. samples under their own CDF are uniform on [0,1]; counts in
+disjoint windows have Binomial counts → Σ²(L) = L(1−L/n) (≈ L for L ≪ n). ∎
+(Matches isotropic zoo row: Σ²/L = 0.99, flat.)
+
+------------------------------------------------------------------------
+## Theorem 2 (i.i.d. corpus ⇒ Σ²(L) = L(1−L/n), NO rigidity, for ANY distribution)
+
+This is the special case Var_θ(p) = 0 of Theorem 3 (i.i.d. = degenerate mixing
+measure). Let corpus points be i.i.d. from ANY distribution D on R^d (isotropic
+or not, clustered, manifold, heavy-tailed). Fix r. The keys K_r(x_1..x_n) are
+i.i.d. from the induced 1-D law F_r. Unfolding by F_r (probability integral
+transform) gives n i.i.d. Uniform[0,1] points.
+
+Proof. n i.i.d. uniforms form a BINOMIAL point process (NOT Poisson — review
+correction). A window of width L/n has count N ~ Binomial(n, L/n), so
+    Σ²(L) = Var(N) = n·(L/n)(1 − L/n) = L(1 − L/n).
+Thus Σ²(L) = L(1−L/n) = Θ(L): sub-LINEAR rigidity (Σ² = o(L)) is impossible. ∎
+
+NOTE (corrected): the earlier "homogeneous Poisson, Σ²=L exactly, independent
+window counts, Var=mean=L" was WRONG — that is the Poisson framing this work
+exists to debunk. For fixed n the process is binomial; counts in disjoint
+windows are NEGATIVELY correlated (they sum to n), and Σ²(L)=L(1−L/n) < L. The
+CONCLUSION (no sub-linear rigidity) is unaffected because the value is Θ(L).
+
+This is why clustered/anisotropic/rogue/manifold are not sub-linear: by Thm 3
+the mixing variance only ADDS (Σ² ≥ L(1−L/n), and clustering pushes above L).
+Genuine rigidity (Σ² = o(L)) needs a negatively-associated, FINITELY-exchangeable
+generator (lattice / determinantal point process) — see Thm 3's escape hatch —
+which a fixed data distribution does not produce.
+
+------------------------------------------------------------------------
+## Theorem 3 (no rigidity for any fixed-distribution corpus — full derivation)
+
+Setup. Corpus rows exchangeable ⇒ keys exchangeable ⇒ unfolded positions
+U_1..U_n exchangeable on [0,1]. Window W of length L/n, so E[count]=L. Let
+I_j = 1{U_j ∈ W}, N = Σ I_j, p = L/n. By exchangeability every I_j has mean p
+and every pair shares one covariance c = Cov(I_1,I_2):
+
+    Var(N) = n·p(1−p) + n(n−1)·c.                                    (★)
+
+de Finetti (infinite exchangeable ⇒ conditionally i.i.d. given latent θ):
+I_j are independent Bernoulli(p(θ)) given θ, so
+    c = Cov(I_1,I_2) = E[p(θ)²] − p̄² = Var_θ(p(θ)) ≥ 0,   p̄ = E[p(θ)].
+Substituting into (★):
+
+    Σ²(L) = Var(N) = n·p̄(1−p̄) + n(n−1)·Var_θ(p(θ))
+                   ≥ n·p̄(1−p̄) = L(1 − L/n).                          ∎
+
+So Σ²(L) ≥ L(1−L/n): rigidity (sub-Poisson, Σ²=o(L)) is IMPOSSIBLE because the
+mixing-variance term Var_θ(p(θ)) is non-negative and can only RAISE the count
+variance above the Poisson line. Clustering = large Var_θ(p(θ)) = strict excess.
+
+THE PRECISE ESCAPE HATCH (named, not gestured at). de Finetti requires INFINITE
+exchangeability. Finitely-exchangeable sequences (sampling WITHOUT replacement
+from a fixed finite set — i.e. a lattice / determinantal point process) violate
+the c ≥ 0 step: there c < 0 and Σ²(L) < L (genuinely rigid). But "drawn from a
+fixed data distribution" IS conditional-i.i.d.-given-θ — the infinite-exchangeable
+case — so any corpus an encoder produces from a data distribution falls under
+the theorem. Rigidity requires a negatively-associated generator (lattice/DPP)
+AND preserved row order; an oblivious router over a fixed distribution has
+neither. ∎
+
+------------------------------------------------------------------------
+## What this proves about the conjecture (NARROWED post-review)
+
+The original hope was that prime/spectral structure could exploit *rigidity* in
+the routing key. Theorems 1–3 show the key has Σ²(L) ≥ L(1−L/n) for any
+conditionally-i.i.d. (mixture-of-i.i.d.) corpus — equality at isotropy, strict
+excess under clustering. So:
+
+  **The routing key is not number-variance-rigid.** Sub-linear Σ²(L) requires a
+  finitely-exchangeable / negatively-associated generator (lattice/DPP) AND
+  preserved row order — neither holds for an oblivious router over a fixed
+  distribution.
+
+That is the defensible claim, and it is all the mathematics supports.
+
+RETRACTED (non-sequitur, flagged in adversarial review). The earlier conclusion
+— "quantile bucketing is optimal against the entire achievable class; no
+spectral/prime partition can beat it because the structure is provably not
+there" — DOES NOT FOLLOW from Σ²(L) ≥ L. Number variance and partition
+quality (load balance / recall) are different figures of merit; Σ² ≥ L says
+nothing about whether some partition outperforms quantile bucketing. A Poisson
+key can still admit a useful partition. The impossibility result refutes only
+the specific "exploit rigidity" route, NOT all spectral/prime partitions, and
+NOT in favor of quantile-optimality-over-all-partitions. Establishing the latter
+would need a separate argument (that partition quality is monotone in Σ²), which
+is not provided and is likely false.
+
+(The zoo runs in withdrawn/corpus_zoo_results.md were EVIDENCE for the narrow claim, with
+the unfolding caveat in ADVERSARIAL_REVIEW.md. The selftest Σ²=0 only shows the
+estimator can represent the o(L) regime; Thm 3 says a fixed-distribution corpus
+cannot reach it.)
diff --git a/experiments/ordinal-routing-research/shard_recall.rs b/experiments/ordinal-routing-research/shard_recall.rs
new file mode 100644
index 00000000..c053af2c
--- /dev/null
+++ b/experiments/ordinal-routing-research/shard_recall.rs
@@ -0,0 +1,445 @@
+//! R-projection shard-recall experiment for the training-free routing layer.
+//!
+//! Open empirical question from the prime/index conjecture investigation:
+//! can a *data-oblivious* router (R random 1-D projections, each bucketed by
+//! a uniform grid) capture true neighbours well enough to route, and does
+//! making the grid periods pairwise COPRIME beat independent random OFFSETS
+//! at decorrelating bucket seams?
+//!
+//! Method (per agent-5 spec): for each query, the router probes a budget of
+//! buckets across the R projections; shard-recall@k = fraction of the FP32
+//! cosine top-k that land in the probed union. We plot recall vs
+//! CANDIDATES-SCANNED (not vs probe radius), so arms with different bucket
+//! counts are compared at equal work — the only fair axis.
+//!
+//! Arms:
+//!   coprime        — R grids, pairwise-coprime periods
+//!   aligned        — R grids, identical period (worst case: seams stack)
+//!   random-offset  — R grids, same period, independent random phase
+//!   both           — coprime periods AND random offsets
+//!
+//! Decision rule: if coprime ~= random-offset within noise, coprimality adds
+//! nothing (justified only on build-determinism). If coprime materially beats
+//! random-offset at equal candidates-scanned and survives reseeding, that is
+//! the publishable surprise.
+//!
+//! Run (synthetic): cargo run --release --example shard_recall
+//! No external data, no BLAS.
+
+use rand::{RngExt, SeedableRng};
+use rand_chacha::ChaCha8Rng;
+use rayon::prelude::*;
+
+const SEED: u64 = 1;
+
+struct Cfg {
+    dim: usize,
+    n: usize,
+    n_queries: usize,
+    k: usize,
+    latent_dim: usize,
+    n_clusters: usize,
+    r_values: Vec<usize>, // projection counts to sweep
+    isotropic: bool,
+    corpus_npy: Option<String>,
+    queries_npy: Option<String>,
+}
+
+fn parse() -> Cfg {
+    let mut c = Cfg {
+        dim: 256,
+        n: 50_000,
+        n_queries: 200,
+        k: 10,
+        latent_dim: 64,
+        n_clusters: 200,
+        r_values: vec![1, 2, 4, 8, 16],
+        isotropic: false,
+        corpus_npy: None,
+        queries_npy: None,
+    };
+    let mut a = std::env::args().skip(1);
+    while let Some(x) = a.next() {
+        match x.as_str() {
+            "--dim" => c.dim = a.next().unwrap().parse().unwrap(),
+            "--n" => c.n = a.next().unwrap().parse().unwrap(),
+            "--queries" => c.n_queries = a.next().unwrap().parse().unwrap(),
+            "--k" => c.k = a.next().unwrap().parse().unwrap(),
+            "--latent" => c.latent_dim = a.next().unwrap().parse().unwrap(),
+            "--clusters" => c.n_clusters = a.next().unwrap().parse().unwrap(),
+            "--isotropic" => c.isotropic = true,
+            "--corpus-npy" => c.corpus_npy = Some(a.next().unwrap()),
+            "--queries-npy" => c.queries_npy = Some(a.next().unwrap()),
+            other => panic!("unknown arg {other}"),
+        }
+    }
+    c
+}
+
+/// Minimal NumPy v1/v2 .npy reader for 2-D little-endian f32 C-order arrays
+/// (same format contract as bench_rank's --corpus-npy / --queries-npy).
+fn load_npy_f32(path: &str) -> (Vec<f32>, usize, usize) {
+    let bytes = std::fs::read(path).expect("read npy");
+    assert!(bytes.len() >= 10 && &bytes[..6] == b"\x93NUMPY", "not a numpy file");
+    let major = bytes[6];
+    assert!(
+        major == 1 || major == 2,
+        "unsupported numpy .npy major version {major}"
+    );
+    if major == 2 {
+        assert!(bytes.len() >= 12, "truncated numpy v2 header");
+    }
+    let (hlen, hstart) = if major == 1 {
+        (u16::from_le_bytes([bytes[8], bytes[9]]) as usize, 10)
+    } else {
+        (u32::from_le_bytes([bytes[8], bytes[9], bytes[10], bytes[11]]) as usize, 12)
+    };
+    assert!(hstart + hlen <= bytes.len(), "truncated numpy header");
+    let header = std::str::from_utf8(&bytes[hstart..hstart + hlen]).expect("utf8 header");
+    assert!(header.contains("'descr': '<f4'"), "expected <f4 dtype");
+    assert!(header.contains("'fortran_order': False"), "expected C order");
+    let after = &header[header.find("'shape':").expect("shape")..];
+    let inner = &after[after.find('(').unwrap() + 1..after.find(')').unwrap()];
+    let dims: Vec<usize> = inner.split(',').filter_map(|s| s.trim().parse().ok()).collect();
+    assert_eq!(dims.len(), 2, "expected 2-D array");
+    let (n, dim) = (dims[0], dims[1]);
+    let dstart = hstart + hlen;
+    assert_eq!(bytes.len() - dstart, n * dim * 4, "data length mismatch");
+    let mut out = vec![0.0f32; n * dim];
+    for (i, ch) in bytes[dstart..].chunks_exact(4).enumerate() {
+        out[i] = f32::from_le_bytes([ch[0], ch[1], ch[2], ch[3]]);
+    }
+    (out, n, dim)
+}
+
+fn l2_normalize_rows(v: &mut [f32], dim: usize) {
+    let n = v.len() / dim;
+    for i in 0..n {
+        let row = &mut v[i * dim..(i + 1) * dim];
+        let nrm: f32 = row.iter().map(|x| x * x).sum::<f32>().sqrt();
+        if nrm > 0.0 {
+            for x in row.iter_mut() {
+                *x /= nrm;
+            }
+        }
+    }
+}
+
+fn gauss(rng: &mut ChaCha8Rng) -> f32 {
+    let u1: f32 = rng.random_range(1e-9..1.0);
+    let u2: f32 = rng.random_range(0.0..1.0);
+    (-2.0 * u1.ln()).sqrt() * (std::f32::consts::TAU * u2).cos()
+}
+
+/// Low-rank clustered corpus + matched queries (same construction as bench_rank).
+/// Returns (corpus, queries) as flat row-major dim-strided buffers.
+fn make_corpus(cfg: &mut Cfg) -> (Vec<f32>, Vec<f32>) {
+    // Real corpus path: both files required (need real queries for honest recall).
+    if let (Some(cp), Some(qp)) = (cfg.corpus_npy.clone(), cfg.queries_npy.clone()) {
+        let (mut corpus, n, d) = load_npy_f32(&cp);
+        let (mut queries, nq, dq) = load_npy_f32(&qp);
+        assert_eq!(d, dq, "corpus/query dim mismatch");
+        l2_normalize_rows(&mut corpus, d);
+        l2_normalize_rows(&mut queries, dq);
+        cfg.dim = d;
+        cfg.n = n;
+        cfg.n_queries = nq;
+        eprintln!("# loaded corpus {n}x{d}, queries {nq}x{dq}");
+        return (corpus, queries);
+    }
+    let mut rng = ChaCha8Rng::seed_from_u64(SEED);
+    let d = cfg.dim;
+    let l = cfg.latent_dim;
+    let mut a = vec![0.0f32; d * l];
+    for x in a.iter_mut() {
+        *x = gauss(&mut rng);
+    }
+    let mut protos = vec![0.0f32; cfg.n_clusters * l];
+    for x in protos.iter_mut() {
+        *x = gauss(&mut rng);
+    }
+    let make = |proto: &[f32], noise: f32, rng: &mut ChaCha8Rng| -> Vec<f32> {
+        let mut z = vec![0.0f32; l];
+        for j in 0..l {
+            z[j] = proto[j] + noise * gauss(rng);
+        }
+        let mut out = vec![0.0f32; d];
+        for i in 0..d {
+            let mut acc = 0.0f32;
+            for j in 0..l {
+                acc += a[i * l + j] * z[j];
+            }
+            out[i] = acc;
+        }
+        let nrm: f32 = out.iter().map(|v| v * v).sum::<f32>().sqrt();
+        if nrm > 0.0 {
+            for v in out.iter_mut() {
+                *v /= nrm;
+            }
+        }
+        out
+    };
+    let iso = |rng: &mut ChaCha8Rng| -> Vec<f32> {
+        let mut v = vec![0.0f32; d];
+        for x in v.iter_mut() {
+            *x = gauss(rng);
+        }
+        let nrm: f32 = v.iter().map(|t| t * t).sum::<f32>().sqrt();
+        for x in v.iter_mut() {
+            *x /= nrm;
+        }
+        v
+    };
+    let mut corpus = vec![0.0f32; cfg.n * d];
+    for i in 0..cfg.n {
+        let e = if cfg.isotropic {
+            iso(&mut rng)
+        } else {
+            let c = rng.random_range(0..cfg.n_clusters);
+            make(&protos[c * l..(c + 1) * l], 0.3, &mut rng)
+        };
+        corpus[i * d..(i + 1) * d].copy_from_slice(&e);
+    }
+    let mut queries = vec![0.0f32; cfg.n_queries * d];
+    for i in 0..cfg.n_queries {
+        let e = if cfg.isotropic {
+            iso(&mut rng)
+        } else {
+            let c = rng.random_range(0..cfg.n_clusters);
+            make(&protos[c * l..(c + 1) * l], 0.1, &mut rng)
+        };
+        queries[i * d..(i + 1) * d].copy_from_slice(&e);
+    }
+    (corpus, queries)
+}
+
+/// FP32 brute-force cosine top-k ground truth (corpus is L2-normalized already).
+fn ground_truth(corpus: &[f32], queries: &[f32], dim: usize, k: usize) -> Vec<Vec<usize>> {
+    let n = corpus.len() / dim;
+    let nq = queries.len() / dim;
+    (0..nq)
+        .into_par_iter()
+        .map(|qi| {
+            let q = &queries[qi * dim..(qi + 1) * dim];
+            let mut scored: Vec<(usize, f32)> = (0..n)
+                .map(|di| {
+                    let doc = &corpus[di * dim..(di + 1) * dim];
+                    let dot: f32 = q.iter().zip(doc).map(|(a, b)| a * b).sum();
+                    (di, dot)
+                })
+                .collect();
+            scored.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
+            scored.into_iter().take(k).map(|(i, _)| i).collect()
+        })
+        .collect()
+}
+
+/// One projection: a unit random direction + a grid period + a phase offset.
+/// A doc's bucket on this projection is floor((r.x - phase) / width).
+struct Proj {
+    dir: Vec<f32>,
+    width: f32,
+    phase: f32,
+}
+
+impl Proj {
+    fn bucket(&self, v: &[f32]) -> i64 {
+        let dot: f32 = self.dir.iter().zip(v).map(|(a, b)| a * b).sum();
+        ((dot - self.phase) / self.width).floor() as i64
+    }
+}
+
+/// Pairwise-coprime-ish period multipliers (distinct small primes). The grid
+/// WIDTH is base_width / period, so more-divided grids have more, finer buckets.
+/// Coprime periods => the seam sets (multiples of width) share no interior
+/// coincidence until the lcm — the vernier effect.
+const COPRIME_PERIODS: [u32; 16] = [
+    2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53,
+];
+
+#[derive(Clone, Copy)]
+enum Arm {
+    Coprime = 0,
+    Aligned = 1,
+    RandomOffset = 2,
+    Both = 3,
+}
+
+impl Arm {
+    fn name(self) -> &'static str {
+        match self {
+            Arm::Coprime => "coprime",
+            Arm::Aligned => "aligned",
+            Arm::RandomOffset => "random-offset",
+            Arm::Both => "both",
+        }
+    }
+}
+
+/// Build R projections for one arm. `base_width` is calibrated so a single
+/// grid has ~sqrt(n) buckets (a reasonable shard granularity); coprime arms
+/// subdivide by the prime multiplier.
+fn build_projs(cfg: &Cfg, arm: Arm, r: usize, base_width: f32) -> Vec<Proj> {
+    let d = cfg.dim;
+    // Bug L fix: directions and phases come from SEPARATE, fixed-seed RNGs that
+    // do not depend on the arm. Every arm therefore sees the IDENTICAL R
+    // projection directions (and identical phase draws where used), so the arms
+    // differ ONLY in width/phase policy — the clean ablation. Previously the
+    // RandomOffset arm consumed an extra value from one shared stream, desyncing
+    // the directions so arms were compared on different projections.
+    let mut dir_rng = ChaCha8Rng::seed_from_u64(SEED ^ 0xD132_0000 ^ (r as u64));
+    let mut phase_rng = ChaCha8Rng::seed_from_u64(SEED ^ 0x9DA5E_000 ^ (r as u64));
+    (0..r)
+        .map(|i| {
+            let mut dir = vec![0.0f32; d];
+            for x in dir.iter_mut() {
+                *x = gauss(&mut dir_rng);
+            }
+            let nrm: f32 = dir.iter().map(|t| t * t).sum::<f32>().sqrt();
+            for x in dir.iter_mut() {
+                *x /= nrm;
+            }
+            // draw a phase fraction for EVERY arm (keeps phase_rng in lockstep);
+            // arms that don't use it simply ignore the value.
+            let p: f32 = phase_rng.random_range(0.0..1.0);
+            // i % len() guards against r > COPRIME_PERIODS.len() (OOB panic).
+            let period = COPRIME_PERIODS[i % COPRIME_PERIODS.len()] as f32;
+            let (width, phase) = match arm {
+                Arm::Coprime => (base_width / period, 0.0),
+                Arm::Aligned => (base_width, 0.0),
+                Arm::RandomOffset => (base_width, p * base_width),
+                Arm::Both => {
+                    let w = base_width / period;
+                    (w, p * w)
+                }
+            };
+            Proj { dir, width, phase }
+        })
+        .collect()
+}
+
+/// Assign every doc a bucket-key tuple (one i64 per projection), grouped into
+/// an index: for each projection, map bucket-id -> list of doc ids.
+type BucketIndex = Vec<std::collections::HashMap<i64, Vec<u32>>>;
+
+fn build_index(projs: &[Proj], corpus: &[f32], dim: usize) -> BucketIndex {
+    let n = corpus.len() / dim;
+    projs
+        .par_iter()
+        .map(|p| {
+            let mut m: std::collections::HashMap<i64, Vec<u32>> = std::collections::HashMap::new();
+            for di in 0..n {
+                let b = p.bucket(&corpus[di * dim..(di + 1) * dim]);
+                m.entry(b).or_default().push(di as u32);
+            }
+            m
+        })
+        .collect()
+}
+
+/// For one query at a given probe radius `rad` (probe buckets b-rad..=b+rad on
+/// each projection), return the candidate union and its size (candidates
+/// scanned). Recall is measured against the ground-truth ids.
+fn probe_recall(
+    projs: &[Proj],
+    index: &BucketIndex,
+    q: &[f32],
+    rad: i64,
+    truth: &[usize],
+) -> (usize, f32) {
+    use std::collections::HashSet;
+    let mut cand: HashSet<u32> = HashSet::new();
+    for (pi, p) in projs.iter().enumerate() {
+        let b = p.bucket(q);
+        for bb in (b - rad)..=(b + rad) {
+            if let Some(ids) = index[pi].get(&bb) {
+                cand.extend(ids.iter().copied());
+            }
+        }
+    }
+    // truth is already unique ground-truth ids; filter directly against cand
+    // (no separate HashSet allocation — review suggestion).
+    let hits = truth.iter().filter(|&&i| cand.contains(&(i as u32))).count();
+    let recall = hits as f32 / truth.len().max(1) as f32;
+    (cand.len(), recall)
+}
+
+fn run_arms(cfg: &Cfg, corpus: &[f32], queries: &[f32], truth: &[Vec<usize>]) {
+    // Calibrate base_width so a single unit-direction grid yields ~sqrt(n)
+    // buckets. r.x ~ N(0, 1/dim) for L2-normalized x, so its spread ~ 6/sqrt(dim)
+    // across the corpus; divide that range into ~sqrt(n) cells.
+    let spread = 6.0 / (cfg.dim as f32).sqrt();
+    let base_width = spread / (cfg.n as f32).sqrt();
+    let arms = [Arm::Coprime, Arm::Aligned, Arm::RandomOffset, Arm::Both];
+    // Collect (arm, cand, recall) across all (R, rad) for the fair envelope.
+    let mut pts: Vec<(&'static str, f64, f64)> = Vec::new();
+    println!("# recall vs candidates-scanned (mean over queries), per arm per R");
+    println!("arm\tR\trad\tcand_scanned\trecall@k");
+    for &r in &cfg.r_values {
+        for &arm in &arms {
+            // build_projs now seeds its own direction/phase RNGs internally
+            // (identical across arms for the same r) — Bug L fix.
+            let projs = build_projs(cfg, arm, r, base_width);
+            let index = build_index(&projs, corpus, cfg.dim);
+            for rad in [0i64, 1, 2, 4] {
+                let stats: Vec<(usize, f32)> = (0..cfg.n_queries)
+                    .into_par_iter()
+                    .map(|qi| {
+                        let q = &queries[qi * cfg.dim..(qi + 1) * cfg.dim];
+                        probe_recall(&projs, &index, q, rad, &truth[qi])
+                    })
+                    .collect();
+                let mean_cand =
+                    stats.iter().map(|s| s.0 as f64).sum::<f64>() / stats.len() as f64;
+                let mean_rec =
+                    stats.iter().map(|s| s.1 as f64).sum::<f64>() / stats.len() as f64;
+                println!(
+                    "{}\t{r}\t{rad}\t{:.0}\t{:.4}",
+                    arm.name(),
+                    mean_cand,
+                    mean_rec
+                );
+                pts.push((arm.name(), mean_cand, mean_rec));
+            }
+        }
+    }
+    // Fair envelope: best recall achievable per arm at or below each candidate
+    // budget. This is the only honest cross-arm comparison — equal work.
+    println!("\n# FAIR ENVELOPE: max recall@k at candidates-scanned <= budget");
+    let budgets = [1000.0, 2000.0, 4000.0, 8000.0, 16000.0];
+    print!("budget");
+    for a in &arms {
+        print!("\t{}", a.name());
+    }
+    println!();
+    for &bud in &budgets {
+        print!("{bud:.0}");
+        for a in &arms {
+            let best = pts
+                .iter()
+                .filter(|(name, c, _)| *name == a.name() && *c <= bud)
+                .map(|(_, _, r)| *r)
+                .fold(0.0f64, f64::max);
+            print!("\t{best:.4}");
+        }
+        println!();
+    }
+}
+
+fn main() {
+    let mut cfg = parse();
+    let (corpus, queries) = make_corpus(&mut cfg);
+    let truth = ground_truth(&corpus, &queries, cfg.dim, cfg.k);
+    let src = if cfg.corpus_npy.is_some() {
+        "npy"
+    } else if cfg.isotropic {
+        "isotropic"
+    } else {
+        "clustered"
+    };
+    println!(
+        "# shard_recall: n={} q={} dim={} k={} corpus={src}",
+        cfg.n, cfg.n_queries, cfg.dim, cfg.k
+    );
+    run_arms(&cfg, &corpus, &queries, &truth);
+}
diff --git a/experiments/ordinal-routing-research/shard_recall_results.md b/experiments/ordinal-routing-research/shard_recall_results.md
new file mode 100644
index 00000000..7e3c496d
--- /dev/null
+++ b/experiments/ordinal-routing-research/shard_recall_results.md
@@ -0,0 +1,69 @@
+> ✅ RNG-DESYNC BUG FIXED (was Bug L). build_projs now seeds direction and phase
+> RNGs separately and identically across arms, so aligned vs random-offset share
+> the SAME R projection directions and differ ONLY in phase — the clean ablation.
+> Re-run result: aligned 0.9095 vs random-offset 0.9080 at 16k budget = TIED, so
+> "random phase offsets add nothing across R different directions" now holds on a
+> controlled comparison. CAVEAT STILL OPEN: the coprime/both arms subdivide
+> buckets so the "fair envelope" undersells them (they saturate below high
+> budgets); coprimality across R directions remains the wrong geometry — the
+> within-axis vernier harness (crt_seam_oracle covers the theory) is the right
+> test and is not built here. Numbers below are the post-fix run.
+
+# R-projection shard-recall: does coprime seam-decorrelation help?
+
+Experiment for the training-free routing layer. Five projection arms over a
+random-projection ensemble; recall@10 measured against FP32 cosine top-10,
+compared at EQUAL candidates-scanned (the only fair axis).
+
+Source: `examples/shard_recall.rs`. Synthetic clustered corpus n=50k, dim=256,
+200 queries, k=10, seed=1.
+
+## Fair envelope (max recall@10 at candidates-scanned <= budget)
+
+Post-fix run (Bug L fixed: arms share identical projection directions):
+
+| budget | coprime | aligned | random-offset | both |
+|--------|---------|---------|---------------|------|
+| 1000   | 0.109   | 0.0885  | 0.0835        | 0.1085 |
+| 2000   | 0.1885  | 0.1795  | 0.1840        | 0.1830 |
+| 4000   | 0.3370  | 0.3780  | 0.3880        | 0.3290 |
+| 8000   | 0.5315  | 0.6425  | 0.6490        | 0.5300 |
+| 16000  | 0.5315  | 0.9095  | 0.9080        | 0.5300 |
+
+## Findings
+
+**1. CLEAN RESULT (controlled ablation): random offsets are redundant.**
+After the Bug-L fix, `aligned` and `random-offset` share the SAME R projection
+directions and identical bucket width — differing ONLY in phase. They are
+statistically TIED (0.9095 vs 0.9080 at 16k). Across R *different* random
+projection directions, the direction randomness already decorrelates seams;
+adding random phase buys nothing. Confirms the literature prediction
+(multi-probe LSH / random-rotation decorrelation).
+
+**2. CONFOUND (do NOT read as "coprime is worse"):** the coprime/both arms
+subdivide each projection by a distinct prime (W/2..W/53), so their grids are
+mostly tiny — they scan few candidates and plateau at ~0.52 due to collapsed
+bucket VOLUME, not seam structure. This is a parameterization flaw.
+
+**3. GEOMETRY MISMATCH (the real lesson):** coprime PERIODS are a
+within-single-axis stacked-grid (vernier) effect. Across R different random
+directions there is only ONE grid per axis, so coprimality of periods across
+directions is not even the right test. The vernier idea, to be tested properly,
+needs multiple coprime-period grids overlaid on ONE projection direction — a
+different harness (TODO).
+
+## Verdict
+
+For the multi-direction routing layer: **plain R random projections with a
+single shared grid width are as good as any seam-decorrelation scheme tried
+here.** Coprimality adds nothing in this geometry and is only potentially
+meaningful in the within-axis stacked-grid setup, which this experiment does
+not test. Recommendation: build the oblivious router with R shared-width
+random-projection grids; do not invest in coprime periods unless the within-axis
+vernier experiment shows a surprise.
+
+Next: (a) within-axis vernier test (coprime grids on one direction);
+(b) re-run on real embeddings via --corpus-npy;
+(c) add a quantile-k-means control arm to size the train-free vs trained gap.
+
+Reproduce: `cargo run --release --example shard_recall`
diff --git a/experiments/ordinal-routing-research/spectral_probe.rs b/experiments/ordinal-routing-research/spectral_probe.rs
new file mode 100644
index 00000000..0772c18e
--- /dev/null
+++ b/experiments/ordinal-routing-research/spectral_probe.rs
@@ -0,0 +1,383 @@
+//! Spectral / number-variance probe for the "prime mile-marker" conjecture.
+//!
+//! Decisive cheap experiment #1: is a 1-D routing key over an embedding
+//! corpus *Poisson* (number variance grows like the window length L) or
+//! *rigid* (variance grows like log L, as GUE-spectra do)?
+//!
+//! If Poisson, there is nothing for spectral / prime-gap structure to grip
+//! and plain quantile bucketing is the whole story. Rigidity (sub-linear
+//! number variance) would be the only empirical opening for the conjecture.
+//!
+//! Key subtlety handled below: number variance is only meaningful after
+//! *unfolding* by the SMOOTH density, not the empirical CDF. Unfolding by
+//! rank trivially yields a perfect lattice (Sigma^2 = 0) and measures
+//! nothing. A random projection r . x of an L2-normalized vector is
+//! approximately N(0, 1/dim) by the CLT, so we unfold with the matching
+//! Gaussian CDF Phi and measure the residual fluctuation.
+//!
+//! Run (synthetic, self-contained):
+//!     cargo run --release --example spectral_probe
+//! Run (real corpus):
+//!     cargo run --release --example spectral_probe -- --corpus-npy emb.npy
+//!
+//! No external data, no BLAS. Reuses the same construction as bench_rank.
+
+use rand::{RngExt, SeedableRng};
+use rand_chacha::ChaCha8Rng;
+
+const SEED: u64 = 1;
+
+struct Cfg {
+    dim: usize,
+    n: usize,
+    latent_dim: usize,
+    n_clusters: usize,
+    n_keys: usize, // number of independent random-projection keys to average over
+    corpus_npy: Option<String>,
+    isotropic: bool, // control: N(0,I) corpus, no clusters (true Poisson expected)
+    unfold_empirical: bool, // exact-rank unfold (lattice; demo only, not a rigidity test)
+    unfold_smooth_knots: usize, // K>0 => smooth empirical unfold with K knots (correct method)
+}
+
+fn parse() -> Cfg {
+    let mut c = Cfg {
+        dim: 256,
+        n: 200_000,
+        latent_dim: 64,
+        n_clusters: 200,
+        n_keys: 8,
+        corpus_npy: None,
+        isotropic: false,
+        unfold_empirical: false,
+        unfold_smooth_knots: 0,
+    };
+    let mut a = std::env::args().skip(1);
+    while let Some(x) = a.next() {
+        match x.as_str() {
+            "--dim" => c.dim = a.next().unwrap().parse().unwrap(),
+            "--n" => c.n = a.next().unwrap().parse().unwrap(),
+            "--latent" => c.latent_dim = a.next().unwrap().parse().unwrap(),
+            "--clusters" => c.n_clusters = a.next().unwrap().parse().unwrap(),
+            "--keys" => c.n_keys = a.next().unwrap().parse().unwrap(),
+            "--corpus-npy" => c.corpus_npy = Some(a.next().unwrap()),
+            "--isotropic" => c.isotropic = true,
+            "--unfold-empirical" => c.unfold_empirical = true,
+            "--unfold-smooth" => c.unfold_smooth_knots = a.next().unwrap().parse().unwrap(),
+            "--rigid-selftest" => {} // handled in main, bypasses corpus
+            other => panic!("unknown arg {other}"),
+        }
+    }
+    c
+}
+
+fn gauss(rng: &mut ChaCha8Rng) -> f32 {
+    let u1: f32 = rng.random_range(1e-9..1.0);
+    let u2: f32 = rng.random_range(0.0..1.0);
+    (-2.0 * u1.ln()).sqrt() * (std::f32::consts::TAU * u2).cos()
+}
+
+/// Phi(z) = 0.5 (1 + erf(z / sqrt 2)). Standard normal CDF.
+fn normal_cdf(z: f64) -> f64 {
+    0.5 * (1.0 + erf(z / std::f64::consts::SQRT_2))
+}
+
+fn erf(x: f64) -> f64 {
+    let t = 1.0 / (1.0 + 0.327_591_1 * x.abs());
+    let y = 1.0
+        - (((((1.061_405_429 * t - 1.453_152_027) * t) + 1.421_413_741) * t - 0.284_496_736) * t
+            + 0.254_829_592)
+            * t
+            * (-x * x).exp();
+    if x >= 0.0 {
+        y
+    } else {
+        -y
+    }
+}
+
+fn main() {
+    let cfg = parse();
+    // Instrument self-test: feed a perfectly evenly-spaced (unfolded) key
+    // directly, bypassing corpus + projection. This is a maximally rigid 1-D
+    // sequence; the estimator MUST report Sigma^2/L << 1 (falling). Proves the
+    // probe can SEE rigidity when it is present, so a Poisson reading elsewhere
+    // is a real finding, not instrument blindness.
+    if std::env::args().any(|a| a == "--rigid-selftest") {
+        let n = 50_000usize;
+        let key: Vec<f64> = (0..n).map(|i| i as f64).collect(); // perfect lattice
+        println!("# RIGID SELF-TEST: perfectly even key, expect Sigma^2/L << 1");
+        println!("L\tSigma2_mean\tSigma2/L");
+        report(&[key], n);
+        return;
+    }
+    let (keys_per_proj, n) = build_keys(&cfg);
+    println!("# spectral_probe: n={n} keys={}", keys_per_proj.len());
+    println!("# unfolded by Gaussian CDF; Poisson => Sigma^2(L) ~ L, rigid => ~log L");
+    println!("L\tSigma2_mean\tSigma2/L\tslope_hint");
+    report(&keys_per_proj, n);
+}
+
+/// Minimal NumPy v1/v2 .npy reader for 2-D little-endian f32 C-order arrays
+/// (same format contract as bench_rank's --corpus-npy).
+fn load_npy_f32(path: &str) -> (Vec<f32>, usize, usize) {
+    let bytes = std::fs::read(path).expect("read npy");
+    assert!(bytes.len() >= 10 && &bytes[..6] == b"\x93NUMPY", "not a numpy file");
+    let major = bytes[6];
+    assert!(
+        major == 1 || major == 2,
+        "unsupported numpy .npy major version {major}"
+    );
+    if major == 2 {
+        assert!(bytes.len() >= 12, "truncated numpy v2 header");
+    }
+    let (hlen, hstart) = if major == 1 {
+        (u16::from_le_bytes([bytes[8], bytes[9]]) as usize, 10)
+    } else {
+        (u32::from_le_bytes([bytes[8], bytes[9], bytes[10], bytes[11]]) as usize, 12)
+    };
+    assert!(hstart + hlen <= bytes.len(), "truncated numpy header");
+    let header = std::str::from_utf8(&bytes[hstart..hstart + hlen]).expect("utf8 header");
+    assert!(header.contains("'descr': '<f4'"), "expected <f4 dtype");
+    assert!(header.contains("'fortran_order': False"), "expected C order");
+    let after = &header[header.find("'shape':").expect("shape")..];
+    let inner = &after[after.find('(').unwrap() + 1..after.find(')').unwrap()];
+    let dims: Vec<usize> = inner.split(',').filter_map(|s| s.trim().parse().ok()).collect();
+    assert_eq!(dims.len(), 2, "expected 2-D array");
+    let (n, dim) = (dims[0], dims[1]);
+    let dstart = hstart + hlen;
+    assert_eq!(bytes.len() - dstart, n * dim * 4, "data length mismatch");
+    let mut out = vec![0.0f32; n * dim];
+    for (i, ch) in bytes[dstart..].chunks_exact(4).enumerate() {
+        out[i] = f32::from_le_bytes([ch[0], ch[1], ch[2], ch[3]]);
+    }
+    (out, n, dim)
+}
+
+/// Project a real corpus into the same per-projection unfolded keys the
+/// synthetic path produces. L2-normalizes rows first (cosine geometry).
+fn build_keys_from_corpus(cfg: &Cfg, corpus: &[f32], n: usize, d: usize) -> (Vec<Vec<f64>>, usize) {
+    let mut rng = ChaCha8Rng::seed_from_u64(SEED);
+    // unit-normalize rows in a local copy
+    let mut c = corpus.to_vec();
+    for i in 0..n {
+        let row = &mut c[i * d..(i + 1) * d];
+        let nrm: f32 = row.iter().map(|x| x * x).sum::<f32>().sqrt();
+        if nrm > 0.0 {
+            for x in row.iter_mut() {
+                *x /= nrm;
+            }
+        }
+    }
+    let mut out = Vec::with_capacity(cfg.n_keys);
+    for _ in 0..cfg.n_keys {
+        let mut r = vec![0.0f32; d];
+        for v in r.iter_mut() {
+            *v = gauss(&mut rng);
+        }
+        let rn: f32 = r.iter().map(|v| v * v).sum::<f32>().sqrt();
+        for v in r.iter_mut() {
+            *v /= rn;
+        }
+        let mut keys: Vec<f64> = (0..n)
+            .map(|i| {
+                let doc = &c[i * d..(i + 1) * d];
+                r.iter().zip(doc).map(|(a, b)| (a * b) as f64).sum()
+            })
+            .collect();
+        if cfg.unfold_empirical {
+            keys.sort_by(|x, y| x.partial_cmp(y).unwrap());
+            for (i, kv) in keys.iter_mut().enumerate() {
+                *kv = i as f64;
+            }
+        } else {
+            let sigma = (1.0 / d as f64).sqrt();
+            for kv in keys.iter_mut() {
+                *kv = normal_cdf(*kv / sigma) * n as f64;
+            }
+            keys.sort_by(|x, y| x.partial_cmp(y).unwrap());
+        }
+        out.push(keys);
+    }
+    (out, n)
+}
+
+/// Returns (one unfolded key-position list per projection, n).
+fn build_keys(cfg: &Cfg) -> (Vec<Vec<f64>>, usize) {
+    if let Some(ref path) = cfg.corpus_npy {
+        let (corpus, n, d) = load_npy_f32(path);
+        eprintln!("# loaded {n} x {d} from {path}");
+        return build_keys_from_corpus(cfg, &corpus, n, d);
+    }
+    let mut rng = ChaCha8Rng::seed_from_u64(SEED);
+    let d = cfg.dim;
+    let l = cfg.latent_dim;
+    // low-rank clustered corpus, same construction as bench_rank
+    let mut a = vec![0.0f32; d * l];
+    for x in a.iter_mut() {
+        *x = gauss(&mut rng);
+    }
+    let mut protos = vec![0.0f32; cfg.n_clusters * l];
+    for x in protos.iter_mut() {
+        *x = gauss(&mut rng);
+    }
+    let make = |proto: &[f32], noise: f32, rng: &mut ChaCha8Rng| -> Vec<f32> {
+        let mut z = vec![0.0f32; l];
+        for j in 0..l {
+            z[j] = proto[j] + noise * gauss(rng);
+        }
+        let mut out = vec![0.0f32; d];
+        for i in 0..d {
+            let mut acc = 0.0f32;
+            for j in 0..l {
+                acc += a[i * l + j] * z[j];
+            }
+            out[i] = acc;
+        }
+        let nrm: f32 = out.iter().map(|v| v * v).sum::<f32>().sqrt();
+        if nrm > 0.0 {
+            for v in out.iter_mut() {
+                *v /= nrm;
+            }
+        }
+        out
+    };
+    let mut corpus = vec![0.0f32; cfg.n * d];
+    for i in 0..cfg.n {
+        let e = if cfg.isotropic {
+            // control: draw directly in R^d as N(0,I), normalize. No cluster
+            // structure, no low-rank latent -> the projection key should be a
+            // clean Poisson process after Gaussian unfold.
+            let mut v = vec![0.0f32; d];
+            for x in v.iter_mut() {
+                *x = gauss(&mut rng);
+            }
+            let nrm: f32 = v.iter().map(|t| t * t).sum::<f32>().sqrt();
+            for x in v.iter_mut() {
+                *x /= nrm;
+            }
+            v
+        } else {
+            let c = rng.random_range(0..cfg.n_clusters);
+            make(&protos[c * l..(c + 1) * l], 0.3, &mut rng)
+        };
+        corpus[i * d..(i + 1) * d].copy_from_slice(&e);
+    }
+    // random projection keys
+    let mut out = Vec::with_capacity(cfg.n_keys);
+    for _k in 0..cfg.n_keys {
+        let mut r = vec![0.0f32; d];
+        for v in r.iter_mut() {
+            *v = gauss(&mut rng);
+        }
+        let rn: f32 = r.iter().map(|v| v * v).sum::<f32>().sqrt();
+        for v in r.iter_mut() {
+            *v /= rn;
+        }
+        // key value for each doc = r . x ; for L2-normalized x and unit r,
+        // distributed ~ N(0, 1/dim) under the CLT for isotropic x.
+        let mut keys: Vec<f64> = (0..cfg.n)
+            .map(|i| {
+                let doc = &corpus[i * d..(i + 1) * d];
+                let dot: f32 = r.iter().zip(doc).map(|(a, b)| a * b).sum();
+                dot as f64
+            })
+            .collect();
+        keys.sort_by(|x, y| x.partial_cmp(y).unwrap());
+        if cfg.unfold_smooth_knots > 0 {
+            // SMOOTH EMPIRICAL unfold (correct method; removes the Gaussian-
+            // mismatch confound). Fit the CDF with K coarse knots placed at
+            // rank-quantiles, linear between. This subtracts the large-scale
+            // marginal density of ANY distribution WITHOUT collapsing local
+            // fluctuation to a lattice (which exact-rank unfolding does).
+            // Number variance must then be read at window scales L << n/K so
+            // the knot smoothing does not wash out the signal being measured.
+            let k = cfg.unfold_smooth_knots.min(cfg.n);
+            let n = cfg.n;
+            // Precompute the k+1 knots once: (position p_j = rank j*n/k, value
+            // keys[p_j]). Avoids recomputing knot_rank (div/mul/min) and
+            // re-indexing keys inside the per-key binary search (review).
+            let kr = |j: usize| -> usize { (j * n / k).min(n - 1) };
+            let knot_pos: Vec<f64> = (0..=k).map(|j| kr(j) as f64).collect();
+            let knot_val: Vec<f64> = (0..=k).map(|j| keys[kr(j)]).collect();
+            let unfolded: Vec<f64> = keys
+                .iter()
+                .map(|&x| {
+                    // bracketing knot index by value (knots sorted ascending)
+                    let mut lo = 0usize;
+                    let mut hi = k;
+                    while lo < hi {
+                        let mid = (lo + hi) / 2;
+                        if knot_val[mid] < x {
+                            lo = mid + 1;
+                        } else {
+                            hi = mid;
+                        }
+                    }
+                    let j = lo.clamp(1, k);
+                    let (v0, v1) = (knot_val[j - 1], knot_val[j]);
+                    let (p0, p1) = (knot_pos[j - 1], knot_pos[j]);
+                    if (v1 - v0).abs() < 1e-30 {
+                        p0
+                    } else {
+                        p0 + (x - v0) / (v1 - v0) * (p1 - p0)
+                    }
+                })
+                .collect();
+            let mut u = unfolded;
+            u.sort_by(|a, b| a.partial_cmp(b).unwrap());
+            out.push(u);
+        } else if cfg.unfold_empirical {
+            // exact-rank unfold: assigns rank i. Trivially a lattice (Sigma^2=0)
+            // for ANY input — this is NOT a rigidity test, only a demonstration
+            // that exact quantile tiling balances occupancy. Kept for that use.
+            for (i, kv) in keys.iter_mut().enumerate() {
+                *kv = i as f64;
+            }
+            out.push(keys);
+        } else {
+            // fixed-Gaussian unfold: ONLY valid for the isotropic marginal.
+            // CONFOUNDED for non-isotropic corpora (see ADVERSARIAL_REVIEW.md);
+            // prefer --unfold-smooth.
+            let sigma = (1.0 / d as f64).sqrt();
+            for kv in keys.iter_mut() {
+                *kv = normal_cdf(*kv / sigma) * cfg.n as f64;
+            }
+            out.push(keys);
+        }
+    }
+    (out, cfg.n)
+}
+
+/// Number variance Sigma^2(L): variance of the count of points in a window
+/// of length L, over many window placements, averaged across projections.
+fn report(keys_per_proj: &[Vec<f64>], n: usize) {
+    let ls = [2.0_f64, 4.0, 8.0, 16.0, 32.0, 64.0, 128.0, 256.0, 512.0];
+    let n_windows = 4000usize;
+    for &lwin in ls.iter() {
+        if lwin >= n as f64 {
+            break;
+        }
+        let mut acc_var = 0.0f64;
+        for positions in keys_per_proj {
+            // place n_windows windows uniformly in [0, n-L], count via binary search
+            let mut counts = Vec::with_capacity(n_windows);
+            for w in 0..n_windows {
+                let start = (w as f64) * ((n as f64 - lwin) / n_windows as f64);
+                let end = start + lwin;
+                let lo = positions.partition_point(|&p| p < start);
+                let hi = positions.partition_point(|&p| p < end);
+                counts.push((hi - lo) as f64);
+            }
+            let mean = counts.iter().sum::<f64>() / counts.len() as f64;
+            let var = counts.iter().map(|c| (c - mean).powi(2)).sum::<f64>()
+                / counts.len() as f64;
+            acc_var += var;
+        }
+        let sigma2 = acc_var / keys_per_proj.len() as f64;
+        let log_l = lwin.ln();
+        println!(
+            "{lwin:.0}\t{sigma2:.4}\t{:.4}\t(L={lwin:.0}, lnL={log_l:.3})",
+            sigma2 / lwin
+        );
+    }
+}
diff --git a/experiments/ordinal-routing-research/subspace_directions.rs b/experiments/ordinal-routing-research/subspace_directions.rs
new file mode 100644
index 00000000..1bfec38f
--- /dev/null
+++ b/experiments/ordinal-routing-research/subspace_directions.rs
@@ -0,0 +1,553 @@
+//! CONTEXT-BOXED, PRE-REGISTERED — the directions experiment done RIGHT, using
+//! every finding: center (remove the cone) -> project to the populated k-dim
+//! subspace (where joint structure is legible) -> place R probe directions via
+//! {random, Sobol, Kronecker} -> route -> candidates-scanned at matched recall.
+//! Isolated by request: touches no harness, writes nothing.
+//!
+//! WHY THIS ISN'T fib_directions REDUX:
+//!   fib_directions placed golden directions on the AMBIENT 768-sphere, where
+//!   concentration of measure makes iid-Gaussian already near-uniform -> doomed.
+//!   The marginal uniformity lemma is BLIND to joint structure; directions act on
+//!   the JOINT correlation. So the honest test is in the centered, low-effective-
+//!   dim subspace (intrinsic dim ~13), where low-discrepancy has room to matter.
+//!
+//! SEQUENCES (data-oblivious; PCA control is the data-DEPENDENT ceiling):
+//!   random   — iid Gaussian directions in the k-subspace (the baseline).
+//!   sobol    — base-2 digital net -> inverse-normal -> subspace direction.
+//!              Discrepancy-optimal at R=2^m; predicted to spike at R in {16,32,64}.
+//!   kronecker— additive {i*alpha} per axis (generalized golden) -> direction.
+//!              Three-distance at EVERY prefix; predicted to hold across all R but
+//!              degrade as k climbs (dimension factor in its discrepancy bound).
+//!   pca-axes — the top-R principal directions themselves (DATA-DEPENDENT control;
+//!              expected to win — it bounds what oblivious sets leave on the table).
+//!
+//! METRIC: per query, probe its bucket +/- rad on each of R directions; candidate
+//! union; recall@10 vs FP32 cosine top-10. Report the FAIR ENVELOPE (max recall at
+//! candidates-scanned <= budget) per sequence — the only fair cross-arm axis.
+//! Also report post-map DISCREPANCY (star-discrepancy proxy) of each direction set,
+//! because cube->sphere mapping warps the even-spacing and that must be MEASURED.
+//!
+//! PRE-REGISTERED VERDICT (fixed before running), family-level, Bonferroni in mind:
+//!   COMPONENT WIN if a sequence beats random by >= 0.02 recall at matched
+//!     candidates in >= 2 of {fiqa,nq,quora} AND survives at k near 13.
+//!   SOBOL-AS-PREDICTED if its wins concentrate at R in {16,32,64} (power-of-2).
+//!   KRONECKER-AS-PREDICTED if it wins across R but its margin shrinks as k grows.
+//!   CLASS-DEAD if neither low-discrepancy sequence beats random anywhere by >=0.02
+//!     -> the entire oblivious-directions hypothesis (not just golden) is falsified.
+//!   The Kronecker/Sobol HYBRID is NOT built here; it is earned only if each wins
+//!     in a DISTINCT R-regime. This experiment decides whether that even applies.
+//!
+//! Run: cargo run --release --example subspace_directions -- \
+//!          --corpus-npy /tmp/corpora/fiqa_corpus.npy --queries-npy /tmp/corpora/fiqa_q.npy --kdim 13
+
+// Research probe (not library code): index loops and the inv-normal constant
+// tables read closer to the math than the iterator rewrites would.
+#![allow(
+    clippy::needless_range_loop,
+    clippy::type_complexity,
+    clippy::excessive_precision
+)]
+
+use rand::{RngExt, SeedableRng};
+use rand_chacha::ChaCha8Rng;
+use rayon::prelude::*;
+
+const SEED: u64 = 1;
+
+fn load_npy_f32(path: &str) -> (Vec<f32>, usize, usize) {
+    let bytes = std::fs::read(path).expect("read npy");
+    assert!(
+        bytes.len() >= 10 && &bytes[..6] == b"\x93NUMPY",
+        "not a numpy file"
+    );
+    let major = bytes[6];
+    assert!(
+        major == 1 || major == 2,
+        "unsupported numpy .npy major version {major}"
+    );
+    if major == 2 {
+        assert!(bytes.len() >= 12, "truncated numpy v2 header");
+    }
+    let (hlen, hstart) = if major == 1 {
+        (u16::from_le_bytes([bytes[8], bytes[9]]) as usize, 10)
+    } else {
+        (
+            u32::from_le_bytes([bytes[8], bytes[9], bytes[10], bytes[11]]) as usize,
+            12,
+        )
+    };
+    assert!(hstart + hlen <= bytes.len(), "truncated numpy header");
+    let header = std::str::from_utf8(&bytes[hstart..hstart + hlen]).expect("utf8 header");
+    assert!(header.contains("'descr': '<f4'"), "expected <f4 dtype");
+    let after = &header[header.find("'shape':").expect("shape")..];
+    let inner = &after[after.find('(').unwrap() + 1..after.find(')').unwrap()];
+    let dims: Vec<usize> = inner
+        .split(',')
+        .filter_map(|s| s.trim().parse().ok())
+        .collect();
+    let (n, dim) = (dims[0], dims[1]);
+    let dstart = hstart + hlen;
+    let mut out = vec![0.0f32; n * dim];
+    for (i, ch) in bytes[dstart..].chunks_exact(4).enumerate() {
+        out[i] = f32::from_le_bytes([ch[0], ch[1], ch[2], ch[3]]);
+    }
+    (out, n, dim)
+}
+
+fn l2_normalize_rows(v: &mut [f32], dim: usize) {
+    for i in 0..v.len() / dim {
+        let row = &mut v[i * dim..(i + 1) * dim];
+        let nrm: f32 = row.iter().map(|x| x * x).sum::<f32>().sqrt();
+        if nrm > 0.0 {
+            for x in row.iter_mut() {
+                *x /= nrm;
+            }
+        }
+    }
+}
+
+fn coord_mean(rows: &[f32], n: usize, dim: usize) -> Vec<f32> {
+    let mut m = vec![0f32; dim];
+    for i in 0..n {
+        for c in 0..dim {
+            m[c] += rows[i * dim + c];
+        }
+    }
+    for m_c in m.iter_mut() {
+        *m_c /= n as f32;
+    }
+    m
+}
+
+/// Top-k principal directions of the centered corpus via power iteration +
+/// deflation. Returns k unit vectors of length `dim`. Pure Rust, no BLAS.
+/// Uses a doc subsample for the covariance action (cov is implicit: Cov*x =
+/// (1/n) X^T (X x), never materialized).
+fn pca_topk(centered: &[f32], n: usize, dim: usize, k: usize, mean: &[f32]) -> Vec<Vec<f32>> {
+    let sample: Vec<usize> = if n > 20000 {
+        let mut rng = ChaCha8Rng::seed_from_u64(SEED ^ 0x9C0FFEE);
+        (0..20000).map(|_| rng.random_range(0..n)).collect()
+    } else {
+        (0..n).collect()
+    };
+    // implicit covariance action y = Cov * x using centered rows on the sample
+    let cov_mul = |x: &[f32]| -> Vec<f32> {
+        let acc: Vec<f32> = sample
+            .par_iter()
+            .map(|&i| {
+                let row = &centered[i * dim..(i + 1) * dim];
+                let d: f32 = row.iter().zip(x).map(|(a, b)| a * b).sum();
+                row.iter().map(|&r| r * d).collect::<Vec<f32>>()
+            })
+            .reduce(
+                || vec![0f32; dim],
+                |mut a, b| {
+                    for j in 0..dim {
+                        a[j] += b[j];
+                    }
+                    a
+                },
+            );
+        let s = sample.len() as f32;
+        acc.iter().map(|v| v / s).collect()
+    };
+    let mut comps: Vec<Vec<f32>> = Vec::with_capacity(k);
+    let mut rng = ChaCha8Rng::seed_from_u64(SEED ^ 0xABCD);
+    let _ = mean;
+    for _ in 0..k {
+        let mut v: Vec<f32> = (0..dim)
+            .map(|_| {
+                let u1: f32 = rng.random_range(1e-9..1.0);
+                let u2: f32 = rng.random_range(0.0..1.0);
+                (-2.0 * u1.ln()).sqrt() * (std::f32::consts::TAU * u2).cos()
+            })
+            .collect();
+        for _ in 0..50 {
+            // deflate against found components
+            for c in &comps {
+                let p: f32 = v.iter().zip(c).map(|(a, b)| a * b).sum();
+                for j in 0..dim {
+                    v[j] -= p * c[j];
+                }
+            }
+            let mut y = cov_mul(&v);
+            for c in &comps {
+                let p: f32 = y.iter().zip(c).map(|(a, b)| a * b).sum();
+                for j in 0..dim {
+                    y[j] -= p * c[j];
+                }
+            }
+            let nrm: f32 = y.iter().map(|t| t * t).sum::<f32>().sqrt();
+            if nrm < 1e-20 {
+                break;
+            }
+            for j in 0..dim {
+                v[j] = y[j] / nrm;
+            }
+        }
+        comps.push(v);
+    }
+    comps
+}
+
+/// project a centered vector onto the k-dim PCA basis -> k coords.
+fn project(v: &[f32], basis: &[Vec<f32>]) -> Vec<f32> {
+    basis
+        .iter()
+        .map(|b| v.iter().zip(b).map(|(a, c)| a * c).sum())
+        .collect()
+}
+
+// ---- direction generators in k-dim (returned as R unit vectors length k) ----
+
+fn dirs_random(r: usize, k: usize) -> Vec<Vec<f32>> {
+    let mut rng = ChaCha8Rng::seed_from_u64(SEED ^ 0x1111 ^ (r as u64));
+    (0..r)
+        .map(|_| {
+            let mut v: Vec<f32> = (0..k)
+                .map(|_| {
+                    let u1: f32 = rng.random_range(1e-9..1.0);
+                    let u2: f32 = rng.random_range(0.0..1.0);
+                    (-2.0 * u1.ln()).sqrt() * (std::f32::consts::TAU * u2).cos()
+                })
+                .collect();
+            let nrm: f32 = v.iter().map(|t| t * t).sum::<f32>().sqrt();
+            for x in v.iter_mut() {
+                *x /= nrm;
+            }
+            v
+        })
+        .collect()
+}
+
+/// Van der Corput / Sobol-lite: radical inverse in base 2 per dimension using
+/// distinct primes for the radical-inverse base per axis (Halton-style net, the
+/// digital low-discrepancy family). inverse-normal -> unit direction.
+fn radical_inverse(mut i: usize, base: usize) -> f32 {
+    let mut f = 1.0f64;
+    let mut r = 0.0f64;
+    while i > 0 {
+        f /= base as f64;
+        r += f * (i % base) as f64;
+        i /= base;
+    }
+    r as f32
+}
+
+const PRIMES: [usize; 32] = [
+    2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,
+    101, 103, 107, 109, 113, 127, 131,
+];
+
+/// inverse standard normal CDF (Acklam's rational approximation) for u in (0,1).
+fn inv_norm(u: f32) -> f32 {
+    let u = u.clamp(1e-6, 1.0 - 1e-6) as f64;
+    let a = [
+        -3.969683028665376e+01,
+        2.209460984245205e+02,
+        -2.759285104469687e+02,
+        1.383577518672690e+02,
+        -3.066479806614716e+01,
+        2.506628277459239e+00,
+    ];
+    let b = [
+        -5.447609879822406e+01,
+        1.615858368580409e+02,
+        -1.556989798598866e+02,
+        6.680131188771972e+01,
+        -1.328068155288572e+01,
+    ];
+    let c = [
+        -7.784894002430293e-03,
+        -3.223964580411365e-01,
+        -2.400758277161838e+00,
+        -2.549732539343734e+00,
+        4.374664141464968e+00,
+        2.938163982698783e+00,
+    ];
+    let d = [
+        7.784695709041462e-03,
+        3.224671290700398e-01,
+        2.445134137142996e+00,
+        3.754408661907416e+00,
+    ];
+    let pl = 0.02425;
+    let x = if u < pl {
+        let q = (-2.0 * u.ln()).sqrt();
+        (((((c[0] * q + c[1]) * q + c[2]) * q + c[3]) * q + c[4]) * q + c[5])
+            / ((((d[0] * q + d[1]) * q + d[2]) * q + d[3]) * q + 1.0)
+    } else if u <= 1.0 - pl {
+        let q = u - 0.5;
+        let rr = q * q;
+        (((((a[0] * rr + a[1]) * rr + a[2]) * rr + a[3]) * rr + a[4]) * rr + a[5]) * q
+            / (((((b[0] * rr + b[1]) * rr + b[2]) * rr + b[3]) * rr + b[4]) * rr + 1.0)
+    } else {
+        let q = (-2.0 * (1.0 - u).ln()).sqrt();
+        -(((((c[0] * q + c[1]) * q + c[2]) * q + c[3]) * q + c[4]) * q + c[5])
+            / ((((d[0] * q + d[1]) * q + d[2]) * q + d[3]) * q + 1.0)
+    };
+    x as f32
+}
+
+fn dirs_sobol(r: usize, k: usize) -> Vec<Vec<f32>> {
+    (0..r)
+        .map(|i| {
+            let mut v: Vec<f32> = (0..k)
+                .map(|a| inv_norm(radical_inverse(i + 1, PRIMES[a % PRIMES.len()])))
+                .collect();
+            let nrm: f32 = v.iter().map(|t| t * t).sum::<f32>().sqrt();
+            if nrm > 0.0 {
+                for x in v.iter_mut() {
+                    *x /= nrm;
+                }
+            }
+            v
+        })
+        .collect()
+}
+
+/// Kronecker: per-axis additive recurrence {i*alpha_a}, alpha from the generalized
+/// golden ratio (root of x^{k+1}=x+1) approximated per-axis by primes^(1/(a+2)) frac.
+fn dirs_kronecker(r: usize, k: usize) -> Vec<Vec<f32>> {
+    // irrational generators: fractional parts of sqrt(prime) (badly approximable, Weyl-equidistributed)
+    let alphas: Vec<f64> = (0..k)
+        .map(|a| (PRIMES[a % PRIMES.len()] as f64).sqrt().fract())
+        .collect();
+    (0..r)
+        .map(|i| {
+            let mut v: Vec<f32> = (0..k)
+                .map(|a| inv_norm(((i as f64 + 1.0) * alphas[a]).fract() as f32))
+                .collect();
+            let nrm: f32 = v.iter().map(|t| t * t).sum::<f32>().sqrt();
+            if nrm > 0.0 {
+                for x in v.iter_mut() {
+                    *x /= nrm;
+                }
+            }
+            v
+        })
+        .collect()
+}
+
+fn dirs_pca_axes(r: usize, k: usize) -> Vec<Vec<f32>> {
+    // data-dependent ceiling: the k principal axes, then (for R>k) random unit
+    // combinations of them seeded deterministically — so R>k gives R DISTINCT
+    // directions spanning the populated subspace, not repeats (control fix).
+    let mut rng = ChaCha8Rng::seed_from_u64(SEED ^ 0xACE5);
+    (0..r)
+        .map(|i| {
+            let mut v = vec![0f32; k];
+            if i < k {
+                v[i] = 1.0;
+            } else {
+                for x in v.iter_mut() {
+                    let u1: f32 = rng.random_range(1e-9..1.0);
+                    let u2: f32 = rng.random_range(0.0..1.0);
+                    *x = (-2.0 * u1.ln()).sqrt() * (std::f32::consts::TAU * u2).cos();
+                }
+                let nrm: f32 = v.iter().map(|t| t * t).sum::<f32>().sqrt();
+                for x in v.iter_mut() {
+                    *x /= nrm;
+                }
+            }
+            v
+        })
+        .collect()
+}
+
+/// star-discrepancy proxy: max over a random set of half-spaces of |empirical - expected|.
+fn discrepancy_proxy(dirs: &[Vec<f32>], k: usize) -> f32 {
+    let mut rng = ChaCha8Rng::seed_from_u64(SEED ^ 0xDDDD);
+    let probes = 2000;
+    let mut worst = 0f32;
+    for _ in 0..probes {
+        // random half-space normal
+        let mut hn: Vec<f32> = (0..k)
+            .map(|_| {
+                let u1: f32 = rng.random_range(1e-9..1.0);
+                let u2: f32 = rng.random_range(0.0..1.0);
+                (-2.0 * u1.ln()).sqrt() * (std::f32::consts::TAU * u2).cos()
+            })
+            .collect();
+        let hnm: f32 = hn.iter().map(|t| t * t).sum::<f32>().sqrt();
+        for x in hn.iter_mut() {
+            *x /= hnm;
+        }
+        let frac = dirs
+            .iter()
+            .filter(|d| d.iter().zip(&hn).map(|(a, b)| a * b).sum::<f32>() > 0.0)
+            .count() as f32
+            / dirs.len() as f32;
+        worst = worst.max((frac - 0.5).abs());
+    }
+    worst
+}
+
+struct Proj {
+    dir: Vec<f32>,
+    width: f32,
+}
+impl Proj {
+    fn bucket(&self, v: &[f32]) -> i64 {
+        let dot: f32 = self.dir.iter().zip(v).map(|(a, b)| a * b).sum();
+        (dot / self.width).floor() as i64
+    }
+}
+
+fn main() {
+    let mut cp = None;
+    let mut qp = None;
+    let mut kdim = 13usize;
+    let mut a = std::env::args().skip(1);
+    while let Some(x) = a.next() {
+        match x.as_str() {
+            "--corpus-npy" => cp = Some(a.next().unwrap()),
+            "--queries-npy" => qp = Some(a.next().unwrap()),
+            "--kdim" => kdim = a.next().unwrap().parse().unwrap(),
+            _ => {}
+        }
+    }
+    let (mut corpus, n, dim) = load_npy_f32(&cp.expect("--corpus-npy"));
+    let (mut queries, nq, dq) = load_npy_f32(&qp.expect("--queries-npy"));
+    assert_eq!(dim, dq);
+    l2_normalize_rows(&mut corpus, dim);
+    l2_normalize_rows(&mut queries, dim);
+
+    // ground truth = FP32 cosine top-10 on normalized RAW vectors
+    let truth: Vec<Vec<u32>> = (0..nq)
+        .into_par_iter()
+        .map(|qi| {
+            let q = &queries[qi * dim..(qi + 1) * dim];
+            let mut s: Vec<(u32, f32)> = (0..n)
+                .map(|di| {
+                    let d = &corpus[di * dim..(di + 1) * dim];
+                    (di as u32, q.iter().zip(d).map(|(a, b)| a * b).sum::<f32>())
+                })
+                .collect();
+            s.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
+            s.into_iter().take(10).map(|(i, _)| i).collect()
+        })
+        .collect();
+
+    // center
+    let mean = coord_mean(&corpus, n, dim);
+    let mut cc = corpus.clone();
+    for i in 0..n {
+        for c in 0..dim {
+            cc[i * dim + c] -= mean[c];
+        }
+    }
+    let mut qc = queries.clone();
+    for i in 0..nq {
+        for c in 0..dim {
+            qc[i * dim + c] -= mean[c];
+        }
+    }
+
+    // PCA basis from centered corpus, project both
+    eprintln!("# computing top-{kdim} PCA basis (power iteration)...");
+    let basis = pca_topk(&cc, n, dim, kdim, &mean);
+    let cproj: Vec<Vec<f32>> = (0..n)
+        .into_par_iter()
+        .map(|i| project(&cc[i * dim..(i + 1) * dim], &basis))
+        .collect();
+    let qproj: Vec<Vec<f32>> = (0..nq)
+        .into_par_iter()
+        .map(|i| project(&qc[i * dim..(i + 1) * dim], &basis))
+        .collect();
+
+    // width calibration: ~sqrt(n) buckets along a unit direction in subspace
+    let spread = {
+        let mut s = 0f32;
+        for v in cproj.iter().take(2000) {
+            for &x in v {
+                s += x * x;
+            }
+        }
+        (s / (2000.0 * kdim as f32)).sqrt() * 6.0
+    };
+    let base_width = spread / (n as f32).sqrt();
+
+    println!("# subspace_directions (CONTEXT-BOXED, pre-registered). corpus {n}x{dim}, k={kdim}");
+    println!("# centered+projected to k-dim PCA subspace; recall@10 vs FP32 cosine top-10");
+    let gens: [(&str, fn(usize, usize) -> Vec<Vec<f32>>); 4] = [
+        ("random", dirs_random),
+        ("sobol", dirs_sobol),
+        ("kronecker", dirs_kronecker),
+        ("pca-axes", dirs_pca_axes),
+    ];
+    let r_values = [8usize, 16, 32, 64, 128];
+
+    println!("seq\tR\tdiscrepancy\trad\tcand\trecall@10");
+    let mut pts: Vec<(String, f64, f64)> = vec![];
+    for (name, gen) in gens.iter() {
+        for &r in &r_values {
+            let dirs = gen(r, kdim);
+            let disc = discrepancy_proxy(&dirs, kdim);
+            let projs: Vec<Proj> = dirs
+                .into_iter()
+                .map(|d| Proj {
+                    dir: d,
+                    width: base_width,
+                })
+                .collect();
+            // index: per direction, bucket -> docs
+            let index: Vec<std::collections::HashMap<i64, Vec<u32>>> = projs
+                .par_iter()
+                .map(|p| {
+                    let mut m: std::collections::HashMap<i64, Vec<u32>> =
+                        std::collections::HashMap::new();
+                    for di in 0..n {
+                        m.entry(p.bucket(&cproj[di])).or_default().push(di as u32);
+                    }
+                    m
+                })
+                .collect();
+            for rad in [0i64, 1, 2, 4] {
+                let stats: Vec<(usize, f32)> = (0..nq)
+                    .into_par_iter()
+                    .map(|qi| {
+                        use std::collections::HashSet;
+                        let mut cand: HashSet<u32> = HashSet::new();
+                        for (pi, p) in projs.iter().enumerate() {
+                            let b = p.bucket(&qproj[qi]);
+                            for bb in (b - rad)..=(b + rad) {
+                                if let Some(ids) = index[pi].get(&bb) {
+                                    cand.extend(ids.iter().copied());
+                                }
+                            }
+                        }
+                        let hits = truth[qi].iter().filter(|&&i| cand.contains(&i)).count();
+                        (cand.len(), hits as f32 / truth[qi].len() as f32)
+                    })
+                    .collect();
+                let mc = stats.iter().map(|s| s.0 as f64).sum::<f64>() / nq as f64;
+                let mr = stats.iter().map(|s| s.1 as f64).sum::<f64>() / nq as f64;
+                println!("{name}\t{r}\t{disc:.4}\t{rad}\t{mc:.0}\t{mr:.4}");
+                pts.push((name.to_string(), mc, mr));
+            }
+        }
+    }
+    println!("\n# FAIR ENVELOPE: max recall@10 at candidates-scanned <= budget");
+    let budgets = [500.0, 1000.0, 2000.0, 4000.0, 8000.0];
+    print!("budget");
+    for (name, _) in gens.iter() {
+        print!("\t{name}");
+    }
+    println!();
+    for &bud in &budgets {
+        print!("{bud:.0}");
+        for (name, _) in gens.iter() {
+            let best = pts
+                .iter()
+                .filter(|(nm, c, _)| nm == name && *c <= bud)
+                .map(|(_, _, r)| *r)
+                .fold(0.0, f64::max);
+            print!("\t{best:.4}");
+        }
+        println!();
+    }
+    println!("\n# PRE-REGISTERED: COMPONENT WIN if sobol/kronecker beat random by >=0.02 at matched budget in >=2 corpora;");
+    println!("#   sobol-as-predicted if wins concentrate at R in {{16,32,64}}; kronecker if across R but shrinks as k grows;");
+    println!("#   CLASS-DEAD if neither beats random by >=0.02 anywhere. Hybrid earned only on distinct-regime component wins.");
+}
diff --git a/experiments/ordinal-routing-research/tau_rerank_bakeoff.rs b/experiments/ordinal-routing-research/tau_rerank_bakeoff.rs
new file mode 100644
index 00000000..43bf3bf1
--- /dev/null
+++ b/experiments/ordinal-routing-research/tau_rerank_bakeoff.rs
@@ -0,0 +1,344 @@
+//! Matched-bytes bake-off: does a Kendall-tau rerank of b=2 survivors beat
+//! simply spending the bits on b=4, on R@10 vs FP32 cosine?
+//!
+//! This is the decisive deployment question for the density-collapse finding.
+//! It is a CEILING experiment: the tau-rerank uses the FULL stored rank order
+//! (ignoring how compactly that order could be stored). If the idealized
+//! tau-rerank cannot beat b=4, the answer is "just use b=4" and no codec work
+//! is justified. If it can, a compact tau codec becomes worth designing.
+//!
+//! Arms (all scored against FP32 brute-force cosine top-k):
+//!   b2            — RankQuant b=2 asymmetric          (dim/4 bytes/vec)
+//!   b4            — RankQuant b=4 asymmetric          (dim/2 bytes/vec)
+//!   b2+tau        — b=2 top-M candidates, reranked by Kendall-tau of the
+//!                   query's top-k coord order vs each doc's stored rank order
+//!   b2+fp32       — b=2 top-M candidates, reranked by exact FP32 cosine
+//!                   (absolute ceiling / sanity — should approach FP32)
+//!
+//! Run: cargo run --release --example tau_rerank_bakeoff
+//!      cargo run --release --example tau_rerank_bakeoff -- --corpus-npy emb.npy --queries-npy q.npy
+//! No external data, no BLAS.
+
+use ordvec::rank::rank_transform;
+use ordvec::RankQuant;
+use rand::{RngExt, SeedableRng};
+use rand_chacha::ChaCha8Rng;
+use rayon::prelude::*;
+
+const SEED: u64 = 1;
+
+struct Cfg {
+    dim: usize,
+    n: usize,
+    n_queries: usize,
+    latent: usize,
+    clusters: usize,
+    k: usize,
+    m: usize,    // candidate-set size for the rerank arms
+    topk: usize, // # query top coords used by the tau rerank
+    corpus_npy: Option<String>,
+    queries_npy: Option<String>,
+}
+
+fn parse() -> Cfg {
+    let mut c = Cfg {
+        dim: 256,
+        n: 30_000,
+        n_queries: 200,
+        latent: 64,
+        clusters: 200,
+        k: 10,
+        m: 200,
+        topk: 32,
+        corpus_npy: None,
+        queries_npy: None,
+    };
+    let mut a = std::env::args().skip(1);
+    while let Some(x) = a.next() {
+        match x.as_str() {
+            "--dim" => c.dim = a.next().unwrap().parse().unwrap(),
+            "--n" => c.n = a.next().unwrap().parse().unwrap(),
+            "--queries" => c.n_queries = a.next().unwrap().parse().unwrap(),
+            "--latent" => c.latent = a.next().unwrap().parse().unwrap(),
+            "--clusters" => c.clusters = a.next().unwrap().parse().unwrap(),
+            "--k" => c.k = a.next().unwrap().parse().unwrap(),
+            "--m" => c.m = a.next().unwrap().parse().unwrap(),
+            "--topk" => c.topk = a.next().unwrap().parse().unwrap(),
+            "--corpus-npy" => c.corpus_npy = Some(a.next().unwrap()),
+            "--queries-npy" => c.queries_npy = Some(a.next().unwrap()),
+            other => panic!("unknown arg {other}"),
+        }
+    }
+    c
+}
+
+fn gauss(rng: &mut ChaCha8Rng) -> f32 {
+    let u1: f32 = rng.random_range(1e-9..1.0);
+    let u2: f32 = rng.random_range(0.0..1.0);
+    (-2.0 * u1.ln()).sqrt() * (std::f32::consts::TAU * u2).cos()
+}
+
+fn l2_rows(v: &mut [f32], dim: usize) {
+    for i in 0..v.len() / dim {
+        let row = &mut v[i * dim..(i + 1) * dim];
+        let nrm: f32 = row.iter().map(|x| x * x).sum::<f32>().sqrt();
+        if nrm > 0.0 {
+            for x in row.iter_mut() {
+                *x /= nrm;
+            }
+        }
+    }
+}
+
+fn load_npy_f32(path: &str) -> (Vec<f32>, usize, usize) {
+    let bytes = std::fs::read(path).expect("read npy");
+    assert!(bytes.len() >= 10 && &bytes[..6] == b"\x93NUMPY", "not a numpy file");
+    let major = bytes[6];
+    assert!(
+        major == 1 || major == 2,
+        "unsupported numpy .npy major version {major}"
+    );
+    if major == 2 {
+        assert!(bytes.len() >= 12, "truncated numpy v2 header");
+    }
+    let (hlen, hstart) = if major == 1 {
+        (u16::from_le_bytes([bytes[8], bytes[9]]) as usize, 10)
+    } else {
+        (u32::from_le_bytes([bytes[8], bytes[9], bytes[10], bytes[11]]) as usize, 12)
+    };
+    assert!(hstart + hlen <= bytes.len(), "truncated numpy header");
+    let header = std::str::from_utf8(&bytes[hstart..hstart + hlen]).expect("utf8 header");
+    assert!(header.contains("'descr': '<f4'"), "expected <f4");
+    assert!(header.contains("'fortran_order': False"), "expected C order");
+    let after = &header[header.find("'shape':").expect("shape")..];
+    let inner = &after[after.find('(').unwrap() + 1..after.find(')').unwrap()];
+    let dims: Vec<usize> = inner.split(',').filter_map(|s| s.trim().parse().ok()).collect();
+    assert_eq!(dims.len(), 2, "expected 2-D");
+    let (n, dim) = (dims[0], dims[1]);
+    let dstart = hstart + hlen;
+    assert_eq!(bytes.len() - dstart, n * dim * 4, "len mismatch");
+    let mut out = vec![0.0f32; n * dim];
+    for (i, ch) in bytes[dstart..].chunks_exact(4).enumerate() {
+        out[i] = f32::from_le_bytes([ch[0], ch[1], ch[2], ch[3]]);
+    }
+    (out, n, dim)
+}
+
+fn main() {
+    let mut cfg = parse();
+    let (corpus, queries) = load_corpus(&mut cfg);
+    let truth = fp32_topk(&corpus, &queries, cfg.dim, cfg.k);
+    println!(
+        "# tau_rerank_bakeoff: n={} q={} dim={} k={} M={} topk={}",
+        cfg.n, cfg.n_queries, cfg.dim, cfg.k, cfg.m, cfg.topk
+    );
+    run(&cfg, &corpus, &queries, &truth);
+}
+
+fn recall_at_k(pred: &[usize], truth: &[usize]) -> f32 {
+    use std::collections::HashSet;
+    let t: HashSet<usize> = truth.iter().copied().collect();
+    let hits = pred.iter().filter(|i| t.contains(i)).count();
+    hits as f32 / truth.len().max(1) as f32
+}
+
+/// Top-k indices of a vector by value, descending.
+fn top_coords(v: &[f32], k: usize) -> Vec<usize> {
+    let mut idx: Vec<usize> = (0..v.len()).collect();
+    idx.sort_by(|&i, &j| v[j].partial_cmp(&v[i]).unwrap_or(std::cmp::Ordering::Equal));
+    idx.truncate(k);
+    idx
+}
+
+/// Kendall-tau distance between the orderings q and the doc's stored RANKS
+/// induce on `coords` (fraction of discordant pairs). Lower = more similar
+/// order. `dranks[c]` is the stored rank of coord c for this doc.
+fn tau_dist(q: &[f32], dranks: &[u16], coords: &[usize]) -> f32 {
+    let m = coords.len();
+    if m < 2 {
+        return 0.0;
+    }
+    let (mut disc, mut tot) = (0usize, 0usize);
+    for x in 0..m {
+        for y in (x + 1)..m {
+            let (cx, cy) = (coords[x], coords[y]);
+            let sq = (q[cx] - q[cy]).partial_cmp(&0.0).unwrap_or(std::cmp::Ordering::Equal);
+            // stored ranks are integers; higher rank = larger value
+            let sd = dranks[cx].cmp(&dranks[cy]);
+            if sq != sd {
+                disc += 1;
+            }
+            tot += 1;
+        }
+    }
+    disc as f32 / tot.max(1) as f32
+}
+
+fn run(cfg: &Cfg, corpus: &[f32], queries: &[f32], truth: &[Vec<usize>]) {
+    let d = cfg.dim;
+    let nq = cfg.n_queries;
+
+    // Build b=2 and b=4 indices on identical data.
+    let mut b2 = RankQuant::new(d, 2);
+    b2.add(corpus);
+    let mut b4 = RankQuant::new(d, 4);
+    b4.add(corpus);
+    eprintln!(
+        "# b2 {} B/vec, b4 {} B/vec",
+        b2.bytes_per_vec(),
+        b4.bytes_per_vec()
+    );
+
+    // Precompute stored ranks per doc (the CEILING tau signal: full order).
+    // This is what a compact tau codec would have to approximate.
+    let doc_ranks: Vec<Vec<u16>> = (0..cfg.n)
+        .into_par_iter()
+        .map(|i| rank_transform(&corpus[i * d..(i + 1) * d]))
+        .collect();
+
+    // Per query: candidate set = b=2's own asymmetric top-M (the cheap stage).
+    let b2_res = b2.search_asymmetric(queries, cfg.m);
+    let b4_res = b4.search_asymmetric(queries, cfg.k);
+
+    let mut r_b2 = 0.0f64;
+    let mut r_b4 = 0.0f64;
+    let mut r_tau = 0.0f64;
+    let mut r_fp = 0.0f64;
+
+    let per_q: Vec<(f32, f32, f32, f32)> = (0..nq)
+        .into_par_iter()
+        .map(|qi| {
+            let q = &queries[qi * d..(qi + 1) * d];
+            // b2 top-k (first k of its top-M)
+            let b2_ids: Vec<usize> =
+                b2_res.indices_for_query(qi).iter().take(cfg.k).map(|&i| i as usize).collect();
+            let b4_ids: Vec<usize> =
+                b4_res.indices_for_query(qi).iter().map(|&i| i as usize).collect();
+            // candidate pool = b2 top-M
+            let cands: Vec<usize> =
+                b2_res.indices_for_query(qi).iter().map(|&i| i as usize).collect();
+            let coords = top_coords(q, cfg.topk);
+            // tau rerank: ascending tau distance
+            let mut by_tau: Vec<(usize, f32)> = cands
+                .iter()
+                .map(|&di| (di, tau_dist(q, &doc_ranks[di], &coords)))
+                .collect();
+            by_tau.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap_or(std::cmp::Ordering::Equal));
+            let tau_ids: Vec<usize> = by_tau.iter().take(cfg.k).map(|&(i, _)| i).collect();
+            // fp32 rerank: descending cosine (absolute ceiling)
+            let mut by_fp: Vec<(usize, f32)> = cands
+                .iter()
+                .map(|&di| {
+                    let dv = &corpus[di * d..(di + 1) * d];
+                    (di, q.iter().zip(dv).map(|(a, b)| a * b).sum())
+                })
+                .collect();
+            by_fp.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
+            let fp_ids: Vec<usize> = by_fp.iter().take(cfg.k).map(|&(i, _)| i).collect();
+            (
+                recall_at_k(&b2_ids, &truth[qi]),
+                recall_at_k(&b4_ids, &truth[qi]),
+                recall_at_k(&tau_ids, &truth[qi]),
+                recall_at_k(&fp_ids, &truth[qi]),
+            )
+        })
+        .collect();
+    for (a, b, c, e) in &per_q {
+        r_b2 += *a as f64;
+        r_b4 += *b as f64;
+        r_tau += *c as f64;
+        r_fp += *e as f64;
+    }
+    let nqf = nq as f64;
+    println!("\narm              bytes/vec   R@{}", cfg.k);
+    println!("b2 asym          {:>9}   {:.4}", b2.bytes_per_vec(), r_b2 / nqf);
+    println!("b4 asym          {:>9}   {:.4}", b4.bytes_per_vec(), r_b4 / nqf);
+    println!(
+        "b2 + tau-rerank  {:>9}*  {:.4}   (*ceiling: full stored ranks)",
+        b2.bytes_per_vec(),
+        r_tau / nqf
+    );
+    println!("b2 + fp32-rerank {:>9}   {:.4}   (absolute ceiling)", "—", r_fp / nqf);
+    println!("\nVERDICT:");
+    let (tau, b4v) = (r_tau / nqf, r_b4 / nqf);
+    if tau > b4v + 0.005 {
+        println!("  tau-rerank BEATS b4 at half the bytes ({tau:.4} > {b4v:.4}) — codec work justified");
+    } else if tau + 0.005 < b4v {
+        println!("  b4 BEATS idealized tau-rerank ({b4v:.4} > {tau:.4}) — just use b4, no codec");
+    } else {
+        println!("  tau-rerank ~= b4 ({tau:.4} vs {b4v:.4}) — tie even at the ceiling; b4 simpler");
+    }
+}
+
+/// FP32 brute-force cosine top-k (corpus + queries L2-normalized). Vec per query.
+fn fp32_topk(corpus: &[f32], queries: &[f32], dim: usize, k: usize) -> Vec<Vec<usize>> {
+    let n = corpus.len() / dim;
+    (0..queries.len() / dim)
+        .into_par_iter()
+        .map(|qi| {
+            let q = &queries[qi * dim..(qi + 1) * dim];
+            let mut s: Vec<(usize, f32)> = (0..n)
+                .map(|di| {
+                    let d = &corpus[di * dim..(di + 1) * dim];
+                    (di, q.iter().zip(d).map(|(a, b)| a * b).sum())
+                })
+                .collect();
+            s.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
+            s.into_iter().take(k).map(|(i, _)| i).collect()
+        })
+        .collect()
+}
+
+fn load_corpus(cfg: &mut Cfg) -> (Vec<f32>, Vec<f32>) {
+    if let (Some(cp), Some(qp)) = (cfg.corpus_npy.clone(), cfg.queries_npy.clone()) {
+        let (mut c, n, d) = load_npy_f32(&cp);
+        let (mut q, nq, dq) = load_npy_f32(&qp);
+        assert_eq!(d, dq, "dim mismatch");
+        l2_rows(&mut c, d);
+        l2_rows(&mut q, dq);
+        cfg.n = n;
+        cfg.dim = d;
+        cfg.n_queries = nq;
+        eprintln!("# loaded corpus {n}x{d}, queries {nq}x{dq}");
+        return (c, q);
+    }
+    let mut rng = ChaCha8Rng::seed_from_u64(SEED);
+    let (d, l) = (cfg.dim, cfg.latent);
+    let mut a = vec![0.0f32; d * l];
+    for x in a.iter_mut() {
+        *x = gauss(&mut rng);
+    }
+    let mut protos = vec![0.0f32; cfg.clusters * l];
+    for x in protos.iter_mut() {
+        *x = gauss(&mut rng);
+    }
+    let mk = |proto: &[f32], noise: f32, rng: &mut ChaCha8Rng| -> Vec<f32> {
+        let mut z = vec![0.0f32; l];
+        for j in 0..l {
+            z[j] = proto[j] + noise * gauss(rng);
+        }
+        let mut o = vec![0.0f32; d];
+        for i in 0..d {
+            let mut acc = 0.0f32;
+            for j in 0..l {
+                acc += a[i * l + j] * z[j];
+            }
+            o[i] = acc;
+        }
+        o
+    };
+    let mut corpus = vec![0.0f32; cfg.n * d];
+    for i in 0..cfg.n {
+        let c = rng.random_range(0..cfg.clusters);
+        corpus[i * d..(i + 1) * d].copy_from_slice(&mk(&protos[c * l..(c + 1) * l], 0.3, &mut rng));
+    }
+    let mut queries = vec![0.0f32; cfg.n_queries * d];
+    for i in 0..cfg.n_queries {
+        let c = rng.random_range(0..cfg.clusters);
+        queries[i * d..(i + 1) * d]
+            .copy_from_slice(&mk(&protos[c * l..(c + 1) * l], 0.1, &mut rng));
+    }
+    l2_rows(&mut corpus, d);
+    l2_rows(&mut queries, d);
+    (corpus, queries)
+}
diff --git a/experiments/ordinal-routing-research/tau_rerank_bakeoff_results.md b/experiments/ordinal-routing-research/tau_rerank_bakeoff_results.md
new file mode 100644
index 00000000..e651aaa1
--- /dev/null
+++ b/experiments/ordinal-routing-research/tau_rerank_bakeoff_results.md
@@ -0,0 +1,54 @@
+# Matched-bytes bake-off: tau-rerank vs b=4 — the deployment verdict
+
+The decisive question for the density-collapse finding: does breaking dense-region
+ties with permutation order (Kendall-tau rerank of b=2 survivors) beat simply
+spending the bits on b=4? Source: `examples/tau_rerank_bakeoff.rs`. CEILING
+experiment — tau uses the FULL stored rank order (best case for the method).
+
+## Result — NO. b=4 wins decisively, even at the tau ceiling.
+
+Real embeddings (nomic-embed-text, 8665 docs / 200 held-out queries, 768-d,
+topk=128, M=50 — tau's best parameters):
+
+| arm | bytes/vec | R@10 |
+|-----|-----------|------|
+| b2 asym | 192 | 0.8980 |
+| b4 asym | 384 | 0.9420 |
+| b2 + tau-rerank | 192* | 0.5965  (*ceiling: full stored ranks) |
+| b2 + fp32-rerank | — | 1.0000  (absolute ceiling) |
+
+Synthetic (dim 256, n 30k) tells the same story: b2 0.5785, b4 0.8095,
+tau-rerank 0.22–0.57 (best case ties b2, never reaches b4), fp32-rerank 1.0000.
+
+## Why the tau-rerank fails (the precise lesson)
+
+1. **The candidate pool is fine.** fp32-rerank = 1.0000 proves b=2's top-M
+   *contains* every FP32 true neighbour. Candidate generation is not the problem.
+2. **Tau MISORDERS them.** tau-rerank (0.597) is worse than b=2's OWN ordering
+   (0.898). The ~0.04 tau gap from density_collapse_results.md is real as a faint
+   BINARY discriminator (true-neighbour vs far-lookalike) but far too weak to
+   ORDER 50 candidates correctly — it actively destroys the ordering b=2's
+   scores already provide.
+3. **Even at the ceiling** (full stored ranks, best topk/M) tau-rerank only
+   claws back to ≈ b=2; it never reaches b=4. No compact tau codec could do
+   better than this ceiling, so codec work is NOT justified.
+
+## Verdict
+
+**The density-collapse tau-signal is a real-but-inert observation, not a usable
+lever. "Just use b=4."** This closes the loop opened in
+density_collapse_results.md: the signal exists (binary separation, CI > 0) but
+does not convert to retrieval value. There is no ordvec code change here worth
+making.
+
+This is the honest negative that the whole research line was meant to resolve:
+the prime/spectral/permutation ideas for improving dense-region retrieval do not
+beat the boring baseline (spend the bits). The contribution of this branch is
+therefore RESEARCH (a characterized mechanism + a clean negative), not a feature.
+
+Reproduce:
+```
+cargo run --release --example tau_rerank_bakeoff                      # synthetic
+cargo run --release --example tau_rerank_bakeoff -- \
+    --corpus-npy repo_real.npy --queries-npy repo_q.npy --topk 128 --m 50
+```
diff --git a/experiments/ordinal-routing-research/twonn_id.rs b/experiments/ordinal-routing-research/twonn_id.rs
new file mode 100644
index 00000000..c446ec07
--- /dev/null
+++ b/experiments/ordinal-routing-research/twonn_id.rs
@@ -0,0 +1,243 @@
+//! TwoNN intrinsic-dimension estimator (Facco et al. 2017) for embedding corpora.
+//!
+//! For each point, μ = r2/r1 = ratio of 2nd to 1st nearest-neighbour distance.
+//! Under locally-uniform density of intrinsic dimension d, μ ~ Pareto(d), so
+//! the linear fit  log(1 - F(μ_i))  =  -d * log(μ_i)  through the origin
+//! recovers d. No binning, no manifold reconstruction — two distances/point.
+//!
+//! Metric: embeddings are L2-normalized and retrieval is ANGULAR, so NN
+//! distances here use chord / Euclidean distance between unit vectors
+//! (`sqrt(2 - 2cos)`). Cosine distance (`1 - cos`) is squared-angle locally
+//! and biases TwoNN estimates downward.
+//!
+//! Validation control: the synthetic corpus is a latent_dim-D manifold
+//! projected into dim-D, so TwoNN should recover ID ≈ latent_dim. Run the
+//! default and check it lands near 64 before trusting it on real .npy data.
+//!
+//! Run (synthetic control):  cargo run --release --example twonn_id
+//! Run (real corpus):        cargo run --release --example twonn_id -- --corpus-npy emb.npy
+//! No external data, no BLAS.
+
+use rand::{RngExt, SeedableRng};
+use rand_chacha::ChaCha8Rng;
+use rayon::prelude::*;
+
+const SEED: u64 = 1;
+
+struct Cfg {
+    dim: usize,
+    n: usize,
+    latent_dim: usize,
+    n_clusters: usize,
+    corpus_npy: Option<String>,
+    isotropic: bool,
+}
+
+fn parse() -> Cfg {
+    let mut c = Cfg {
+        dim: 256,
+        n: 20_000,
+        latent_dim: 64,
+        n_clusters: 200,
+        corpus_npy: None,
+        isotropic: false,
+    };
+    let mut a = std::env::args().skip(1);
+    while let Some(x) = a.next() {
+        match x.as_str() {
+            "--dim" => c.dim = a.next().unwrap().parse().unwrap(),
+            "--n" => c.n = a.next().unwrap().parse().unwrap(),
+            "--latent" => c.latent_dim = a.next().unwrap().parse().unwrap(),
+            "--clusters" => c.n_clusters = a.next().unwrap().parse().unwrap(),
+            "--corpus-npy" => c.corpus_npy = Some(a.next().unwrap()),
+            "--isotropic" => c.isotropic = true,
+            other => panic!("unknown arg {other}"),
+        }
+    }
+    c
+}
+
+fn gauss(rng: &mut ChaCha8Rng) -> f32 {
+    let u1: f32 = rng.random_range(1e-9..1.0);
+    let u2: f32 = rng.random_range(0.0..1.0);
+    (-2.0 * u1.ln()).sqrt() * (std::f32::consts::TAU * u2).cos()
+}
+
+/// Minimal NumPy v1/v2 .npy reader for 2-D little-endian f32 C-order arrays.
+fn load_npy_f32(path: &str) -> (Vec<f32>, usize, usize) {
+    let bytes = std::fs::read(path).expect("read npy");
+    assert!(bytes.len() >= 10 && &bytes[..6] == b"\x93NUMPY", "not a numpy file");
+    let major = bytes[6];
+    assert!(
+        major == 1 || major == 2,
+        "unsupported numpy .npy major version {major}"
+    );
+    if major == 2 {
+        assert!(bytes.len() >= 12, "truncated numpy v2 header");
+    }
+    let (hlen, hstart) = if major == 1 {
+        (u16::from_le_bytes([bytes[8], bytes[9]]) as usize, 10)
+    } else {
+        (u32::from_le_bytes([bytes[8], bytes[9], bytes[10], bytes[11]]) as usize, 12)
+    };
+    assert!(hstart + hlen <= bytes.len(), "truncated numpy header");
+    let header = std::str::from_utf8(&bytes[hstart..hstart + hlen]).expect("utf8 header");
+    assert!(header.contains("'descr': '<f4'"), "expected <f4 dtype");
+    assert!(header.contains("'fortran_order': False"), "expected C order");
+    let after = &header[header.find("'shape':").expect("shape")..];
+    let inner = &after[after.find('(').unwrap() + 1..after.find(')').unwrap()];
+    let dims: Vec<usize> = inner.split(',').filter_map(|s| s.trim().parse().ok()).collect();
+    assert_eq!(dims.len(), 2, "expected 2-D array");
+    let (n, dim) = (dims[0], dims[1]);
+    let dstart = hstart + hlen;
+    assert_eq!(bytes.len() - dstart, n * dim * 4, "data length mismatch");
+    let mut out = vec![0.0f32; n * dim];
+    for (i, ch) in bytes[dstart..].chunks_exact(4).enumerate() {
+        out[i] = f32::from_le_bytes([ch[0], ch[1], ch[2], ch[3]]);
+    }
+    (out, n, dim)
+}
+
+/// L2-normalize each row in place (cosine geometry).
+fn l2_normalize_rows(v: &mut [f32], dim: usize) {
+    let n = v.len() / dim;
+    for i in 0..n {
+        let row = &mut v[i * dim..(i + 1) * dim];
+        let nrm: f32 = row.iter().map(|x| x * x).sum::<f32>().sqrt();
+        if nrm > 0.0 {
+            for x in row.iter_mut() {
+                *x /= nrm;
+            }
+        }
+    }
+}
+
+/// Returns (corpus, n, dim). Synthetic = same low-rank clustered construction
+/// as bench_rank (latent_dim manifold in dim-space → TwoNN should recover
+/// ~latent_dim). Real = whatever the .npy holds, L2-normalized.
+fn load_corpus(cfg: &Cfg) -> (Vec<f32>, usize, usize) {
+    if let Some(ref path) = cfg.corpus_npy {
+        let (mut v, n, dim) = load_npy_f32(path);
+        l2_normalize_rows(&mut v, dim);
+        return (v, n, dim);
+    }
+    let mut rng = ChaCha8Rng::seed_from_u64(SEED);
+    let (d, l) = (cfg.dim, cfg.latent_dim);
+    let mut a = vec![0.0f32; d * l];
+    for x in a.iter_mut() {
+        *x = gauss(&mut rng);
+    }
+    let mut protos = vec![0.0f32; cfg.n_clusters * l];
+    for x in protos.iter_mut() {
+        *x = gauss(&mut rng);
+    }
+    let make = |proto: &[f32], noise: f32, rng: &mut ChaCha8Rng| -> Vec<f32> {
+        let mut z = vec![0.0f32; l];
+        for j in 0..l {
+            z[j] = proto[j] + noise * gauss(rng);
+        }
+        let mut out = vec![0.0f32; d];
+        for i in 0..d {
+            let mut acc = 0.0f32;
+            for j in 0..l {
+                acc += a[i * l + j] * z[j];
+            }
+            out[i] = acc;
+        }
+        out
+    };
+    let mut corpus = vec![0.0f32; cfg.n * d];
+    for i in 0..cfg.n {
+        let e = if cfg.isotropic {
+            let mut v = vec![0.0f32; d];
+            for x in v.iter_mut() {
+                *x = gauss(&mut rng);
+            }
+            v
+        } else {
+            let c = rng.random_range(0..cfg.n_clusters);
+            make(&protos[c * l..(c + 1) * l], 0.3, &mut rng)
+        };
+        corpus[i * d..(i + 1) * d].copy_from_slice(&e);
+    }
+    l2_normalize_rows(&mut corpus, d);
+    (corpus, cfg.n, d)
+}
+
+/// For sampled anchors, compute μ = r2/r1 over COSINE distance (1 - cos),
+/// searching each anchor against the full corpus. Anchors are sampled (the
+/// Pareto fit is robust to it) but each search is exact, so r1/r2 are exact.
+/// Returns the μ values (>1, with degenerate/duplicate cases dropped).
+fn nn_ratios(corpus: &[f32], n: usize, dim: usize) -> Vec<f64> {
+    // sample up to 3000 anchors deterministically (stride sampling, seed-free)
+    let n_anchors = n.min(3000);
+    let stride = (n / n_anchors).max(1);
+    let anchors: Vec<usize> = (0..n).step_by(stride).take(n_anchors).collect();
+    anchors
+        .par_iter()
+        .filter_map(|&ai| {
+            let a = &corpus[ai * dim..(ai + 1) * dim];
+            // Track the two smallest chord distances to OTHER points.
+            let (mut d1, mut d2) = (f64::INFINITY, f64::INFINITY);
+            for bi in 0..n {
+                if bi == ai {
+                    continue;
+                }
+                let b = &corpus[bi * dim..(bi + 1) * dim];
+                let dot: f32 = a.iter().zip(b).map(|(x, y)| x * y).sum();
+                // Euclidean / chord distance between unit vectors:
+                // sqrt(2 - 2cos) ∝ sin(θ/2), locally LINEAR in the angle.
+                // (Cosine distance 1-cos ≈ θ²/2 is a *squared* distance and
+                // halves the TwoNN estimate — TwoNN needs a locally-linear metric.)
+                let dist = (2.0 - 2.0 * dot as f64).max(0.0).sqrt();
+                if dist < d1 {
+                    d2 = d1;
+                    d1 = dist;
+                } else if dist < d2 {
+                    d2 = dist;
+                }
+            }
+            // drop degenerate anchors (exact duplicate neighbour => r1≈0)
+            if d1 > 1e-12 && d2.is_finite() {
+                let mu = d2 / d1;
+                if mu > 1.0 + 1e-9 {
+                    return Some(mu);
+                }
+            }
+            None
+        })
+        .collect()
+}
+
+/// TwoNN Pareto fit. Sort μ ascending, empirical CDF F_i = (i+1)/N, then fit
+/// d through the origin on  y = -log(1 - F)  vs  x = log μ.  Robust variant:
+/// discard the top 10% of μ (Facco et al.'s recommended tail trim) and use the
+/// closed-form slope d = Σ x·y / Σ x².
+fn twonn_fit(mus: &[f64]) -> f64 {
+    let mut m: Vec<f64> = mus.to_vec();
+    m.sort_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal));
+    let n = m.len();
+    let keep = ((n as f64) * 0.9) as usize; // trim top 10% tail
+    let (mut sxx, mut sxy) = (0.0f64, 0.0f64);
+    for (i, &mu) in m.iter().enumerate().take(keep) {
+        let f = (i + 1) as f64 / n as f64; // empirical CDF
+        let x = mu.ln();
+        let y = -(1.0 - f).ln();
+        sxx += x * x;
+        sxy += x * y;
+    }
+    sxy / sxx
+}
+
+fn main() {
+    let cfg = parse();
+    let (corpus, n, dim) = load_corpus(&cfg);
+    eprintln!("# twonn_id: n={n} dim={dim} source={}",
+        if cfg.corpus_npy.is_some() { "npy" } else if cfg.isotropic { "isotropic" } else { "clustered" });
+    let mus = nn_ratios(&corpus, n, dim);
+    let d = twonn_fit(&mus);
+    println!("TwoNN intrinsic dimension estimate: {d:.2}  (ambient dim {dim})");
+    if cfg.corpus_npy.is_none() && !cfg.isotropic {
+        println!("(synthetic control: expect ~{} = latent_dim)", cfg.latent_dim);
+    }
+}
diff --git a/experiments/ordinal-routing-research/twonn_id_results.md b/experiments/ordinal-routing-research/twonn_id_results.md
new file mode 100644
index 00000000..e15fd0bf
--- /dev/null
+++ b/experiments/ordinal-routing-research/twonn_id_results.md
@@ -0,0 +1,69 @@
+> ⚠️ PARTIAL (adversarial review). The chord-vs-cosine METRIC FIX is correct and
+> sphere-validated — that stands. BUT the estimator uses OLS-through-origin on the
+> linearized CDF (F=(i+1)/N), the biased non-MLE variant; some of the "finite-
+> sample deflation" attributed to sampling is actually estimator bias. Treat the
+> low-tens ID as indicative, and prefer the MLE TwoNN estimator before quoting
+> exact values. See ADVERSARIAL_REVIEW.md.
+
+# TwoNN intrinsic dimension probe
+
+TwoNN estimator (Facco et al. 2017) for measuring the intrinsic dimension of an
+embedding corpus directly. The aim was to close the citation gap (no published
+TwoNN figure for sentence-transformers) — but this branch records only the
+synthetic sphere/cluster validations below; **no clean real sentence-transformer
+ID is measured here** (the estimator is biased; see the caveat above). There is
+no published TwoNN figure for sentence-transformers; deep vision-CNN intrinsic
+dimension is low-tens (Ansuini, arXiv:1905.12784) but that is a cross-domain
+analogy, not sentence-transformer evidence — so this branch establishes no
+sentence-transformer ID.
+
+Source: `examples/twonn_id.rs`. μ = r2/r1 (2nd/1st NN distance ratio), Pareto
+fit d = Σ(log μ · -log(1-F)) / Σ(log μ)², top-10% tail trimmed. Anchors sampled
+(≤3000), each searched exactly against the full corpus.
+
+## CRITICAL: metric must be locally LINEAR in distance
+
+First implementation used cosine distance (1 - cos θ ≈ θ²/2), a *squared*
+distance — this HALVED every estimate (μ squared → d halved through log μ).
+Fixed to chord/Euclidean distance between unit vectors: sqrt(2 - 2cos) ∝
+sin(θ/2), locally linear in the angle. Anyone adapting this for cosine spaces
+must use the chord metric, not cosine distance.
+
+## Validation (isotropic sphere, true ID = ambient - 1)
+
+| ambient dim | true ID | TwoNN (chord) | TwoNN (cosine bug) |
+|-------------|---------|---------------|--------------------|
+| 3  | 2  | 1.99  | 0.99 |
+| 5  | 4  | 4.19  | 2.09 |
+| 8  | 7  | 6.99  | 3.49 |
+| 12 | 11 | 10.41 | 5.21 |
+| 20 | 19 | 16.89 | — |
+| 50 | 49 | 35.18 | — |
+
+Verdict: **calibrated and trustworthy through ID ~12** (dense sampling at
+n=20k). Above that, finite-sample deflation sets in (ID=20→16.9, ID=50→35) —
+the curse of dimensionality floor on ALL NN-based ID estimators: ~exp(d) points
+needed to populate a d-dim neighborhood. Reported IDs in the tens are therefore
+LOWER BOUNDS; true ID may be higher. (The same finite-sample deflation would
+also affect Ansuini's 12-25 vision-CNN last-layer figures — a methodological
+note, not evidence those figures transfer to sentence-transformers.)
+
+## Measurements
+
+| corpus | n | ambient | TwoNN ID |
+|--------|---|---------|----------|
+| synthetic clustered (latent_dim=64) | 20k | 256 | 27.5 |
+| isotropic dim=256 | 10k | 256 | (sampling-limited) |
+
+The synthetic clustered corpus (built from a 64-D latent manifold) reads ID≈27
+— deflated from 64 by finite-sample + the within-cluster local geometry being
+lower-dim than the global latent space.
+
+## For real embeddings
+
+Run: `cargo run --release --example twonn_id -- --corpus-npy your_embeddings.npy`
+Loads 2-D <f4 C-order .npy, L2-normalizes rows, reports chord-metric TwoNN ID.
+A low-tens sentence-transformer ID is a HYPOTHESIS by analogy to vision-CNN final
+layers (Ansuini's 12-25 are CNNs, not sentence encoders — see
+conjecture_citation_audit.md), NOT established here; READ ANY ID AS A LOWER BOUND. This is the number that sizes the
+routing-layer projection budget: R ≈ c·d_int ⇒ R∈{8,16} suffices.
diff --git a/experiments/ordinal-routing-research/uniformity_lemma.rs b/experiments/ordinal-routing-research/uniformity_lemma.rs
new file mode 100644
index 00000000..a969069d
--- /dev/null
+++ b/experiments/ordinal-routing-research/uniformity_lemma.rs
@@ -0,0 +1,227 @@
+//! CONTEXT-BOXED, PRE-REGISTERED — proves the uniformity lemma underlying
+//! ordvec's equal-width bucketing, and shows WHY golden-ratio boundaries break it.
+//! Isolated by request: touches no harness, writes nothing. Verdicts fixed below.
+//!
+//! THE LEMMA (construction rationale, not a retrieval claim):
+//!   The rank transform maps a vector with distinct coords to a permutation of
+//!   {0..D-1} — an EXACTLY uniform coordinate distribution. Consequences:
+//!     (1) equal-width bucketing (2^B | D) is MAX-ENTROPY: every bucket is
+//!         equiprobable, so the B-bit code wastes no bits.
+//!     (2) CONSTANT-COMPOSITION: every document puts exactly D/2^B coords in
+//!         each bucket — identical histogram across all docs. This is what gives
+//!         the bitmap overlap its closed-form HYPERGEOMETRIC null E[X]=n_top^2/D.
+//!     (3) Any NON-uniform boundary scheme (golden-ratio / three-gap included)
+//!         makes bucket cardinalities UNEQUAL -> lower entropy AND breaks
+//!         constant-composition -> destroys the closed-form null.
+//!   So golden ratio is not merely unhelpful on the rank domain: it would remove
+//!   the property ordvec's headline theorem stands on. Equal-width is FORCED.
+//!
+//! MEASURED on real ranks (corpus rank codes), three boundary schemes for B bits:
+//!   equal-width  : cuts at k*D/2^B            (uniform)
+//!   golden       : cuts at sorted {frac(i*phi)}*D, i=1..2^B-1  (three-gap/low-disc)
+//!   random       : cuts at 2^B-1 sorted uniform random points  (control)
+//! Metrics: per-bucket composition std across docs (0 == constant-composition);
+//!          code entropy in bits (ceiling = B); hypergeometric-null error
+//!          |observed mean top-bucket overlap - n_top^2/D| / (n_top^2/D).
+//!
+//! PRE-REGISTERED VERDICT (fixed before running):
+//!   LEMMA HOLDS if, for B in {1,2,4}:
+//!     equal-width composition-std ≈ 0 (< 0.01 coords) AND entropy ≈ B (>= B-0.01)
+//!     AND hypergeom-null error < 0.02;  AND golden/random are strictly worse on
+//!     composition-std and entropy.  Else the lemma as stated is wrong — report it.
+//!
+//! Run: cargo run --release --example uniformity_lemma -- --corpus-npy /tmp/repo_corpus.npy
+
+use rand::{RngExt, SeedableRng};
+use rand_chacha::ChaCha8Rng;
+use rayon::prelude::*;
+
+const SEED: u64 = 1;
+
+fn load_npy_f32(path: &str) -> (Vec<f32>, usize, usize) {
+    let bytes = std::fs::read(path).expect("read npy");
+    assert!(
+        bytes.len() >= 10 && &bytes[..6] == b"\x93NUMPY",
+        "not a numpy file"
+    );
+    let major = bytes[6];
+    assert!(
+        major == 1 || major == 2,
+        "unsupported numpy .npy major version {major}"
+    );
+    if major == 2 {
+        assert!(bytes.len() >= 12, "truncated numpy v2 header");
+    }
+    let (hlen, hstart) = if major == 1 {
+        (u16::from_le_bytes([bytes[8], bytes[9]]) as usize, 10)
+    } else {
+        (
+            u32::from_le_bytes([bytes[8], bytes[9], bytes[10], bytes[11]]) as usize,
+            12,
+        )
+    };
+    assert!(hstart + hlen <= bytes.len(), "truncated numpy header");
+    let header = std::str::from_utf8(&bytes[hstart..hstart + hlen]).expect("utf8 header");
+    assert!(header.contains("'descr': '<f4'"), "expected <f4 dtype");
+    let after = &header[header.find("'shape':").expect("shape")..];
+    let inner = &after[after.find('(').unwrap() + 1..after.find(')').unwrap()];
+    let dims: Vec<usize> = inner
+        .split(',')
+        .filter_map(|s| s.trim().parse().ok())
+        .collect();
+    let (n, dim) = (dims[0], dims[1]);
+    let dstart = hstart + hlen;
+    let mut out = vec![0.0f32; n * dim];
+    for (i, ch) in bytes[dstart..].chunks_exact(4).enumerate() {
+        out[i] = f32::from_le_bytes([ch[0], ch[1], ch[2], ch[3]]);
+    }
+    (out, n, dim)
+}
+
+/// rank[i] = ordinal position (0..dim) of coordinate i within the row.
+fn rank_encode(rows: &[f32], n: usize, dim: usize) -> Vec<u16> {
+    let mut out = vec![0u16; n * dim];
+    out.par_chunks_mut(dim).enumerate().for_each(|(r, code)| {
+        let row = &rows[r * dim..(r + 1) * dim];
+        let mut idx: Vec<u16> = (0..dim as u16).collect();
+        idx.sort_by(|&a, &b| row[a as usize].partial_cmp(&row[b as usize]).unwrap());
+        for (rank, &i) in idx.iter().enumerate() {
+            code[i as usize] = rank as u16;
+        }
+    });
+    out
+}
+
+/// Boundary cut points in rank units [0, dim], length 2^B - 1, ascending.
+fn boundaries(scheme: &str, bits: u32, dim: usize) -> Vec<f64> {
+    let nb = 1usize << bits;
+    match scheme {
+        "equal" => (1..nb).map(|k| k as f64 * dim as f64 / nb as f64).collect(),
+        "golden" => {
+            let phi = (1.0 + 5.0f64.sqrt()) / 2.0;
+            let mut cuts: Vec<f64> = (1..nb)
+                .map(|i| ((i as f64 * phi).fract()) * dim as f64)
+                .collect();
+            cuts.sort_by(|a, b| a.partial_cmp(b).unwrap());
+            cuts
+        }
+        "random" => {
+            let mut rng = ChaCha8Rng::seed_from_u64(SEED ^ 0xBEEF);
+            let mut cuts: Vec<f64> = (1..nb)
+                .map(|_| rng.random_range(0.0..1.0) * dim as f64)
+                .collect();
+            cuts.sort_by(|a, b| a.partial_cmp(b).unwrap());
+            cuts
+        }
+        _ => unreachable!(),
+    }
+}
+
+fn bucket_of(rank: u16, cuts: &[f64]) -> usize {
+    let r = rank as f64 + 0.5;
+    cuts.iter().filter(|&&c| r > c).count()
+}
+
+fn main() {
+    let mut corpus_path = None;
+    let mut a = std::env::args().skip(1);
+    while let Some(x) = a.next() {
+        if x == "--corpus-npy" {
+            corpus_path = Some(a.next().unwrap());
+        }
+    }
+    let path = corpus_path.expect("--corpus-npy required");
+    let (corpus, n, dim) = load_npy_f32(&path);
+    let ranks = rank_encode(&corpus, n, dim);
+    let row = |i: usize| &ranks[i * dim..(i + 1) * dim];
+
+    println!("# uniformity_lemma (CONTEXT-BOXED, pre-registered). corpus {n}x{dim}");
+    println!("# per-bucket composition std across docs (0=constant-composition), code entropy (bits), hypergeom-null err");
+    println!("scheme\tB\tcomp_std\tentropy\tnull_err");
+
+    let mut rng_pairs = ChaCha8Rng::seed_from_u64(SEED);
+    let pairs: Vec<(usize, usize)> = (0..50_000)
+        .map(|_| {
+            let i = rng_pairs.random_range(0..n);
+            let mut j = rng_pairs.random_range(0..n);
+            if j == i {
+                j = (j + 1) % n;
+            }
+            (i, j)
+        })
+        .collect();
+
+    for &bits in &[1u32, 2, 4] {
+        let nb = 1usize << bits;
+        for scheme in ["equal", "golden", "random"] {
+            let cuts = boundaries(scheme, bits, dim);
+
+            // composition: count per bucket per doc; std of counts across docs, per bucket, then mean.
+            let comps: Vec<Vec<f64>> = (0..n)
+                .into_par_iter()
+                .map(|d| {
+                    let mut c = vec![0f64; nb];
+                    for &rk in row(d) {
+                        c[bucket_of(rk, &cuts)] += 1.0;
+                    }
+                    c
+                })
+                .collect();
+            let mut comp_std_sum = 0.0;
+            for b in 0..nb {
+                let vals: Vec<f64> = comps.iter().map(|c| c[b]).collect();
+                let m = vals.iter().sum::<f64>() / n as f64;
+                let v = vals.iter().map(|x| (x - m) * (x - m)).sum::<f64>() / n as f64;
+                comp_std_sum += v.sqrt();
+            }
+            let comp_std = comp_std_sum / nb as f64;
+
+            // code entropy: bucket occupancy probabilities (same for every doc under
+            // constant-composition); use the mean histogram normalized.
+            let mut hist = vec![0f64; nb];
+            for c in &comps {
+                for b in 0..nb {
+                    hist[b] += c[b];
+                }
+            }
+            let tot: f64 = hist.iter().sum();
+            let entropy: f64 = hist
+                .iter()
+                .filter(|&&h| h > 0.0)
+                .map(|&h| {
+                    let p = h / tot;
+                    -p * p.log2()
+                })
+                .sum();
+
+            // hypergeometric null: top bucket = highest bucket id; overlap of top sets.
+            let n_top_counts: Vec<f64> = comps.iter().map(|c| c[nb - 1]).collect();
+            let mean_ntop = n_top_counts.iter().sum::<f64>() / n as f64;
+            let top_sets: Vec<std::collections::HashSet<u16>> = (0..n)
+                .into_par_iter()
+                .map(|d| {
+                    row(d)
+                        .iter()
+                        .enumerate()
+                        .filter(|(_, &rk)| bucket_of(rk, &cuts) == nb - 1)
+                        .map(|(i, _)| i as u16)
+                        .collect()
+                })
+                .collect();
+            let obs_overlap: f64 = pairs
+                .par_iter()
+                .map(|&(i, j)| top_sets[i].intersection(&top_sets[j]).count() as f64)
+                .sum::<f64>()
+                / pairs.len() as f64;
+            let expected = mean_ntop * mean_ntop / dim as f64;
+            let null_err = if expected > 0.0 {
+                (obs_overlap - expected).abs() / expected
+            } else {
+                f64::NAN
+            };
+
+            println!("{scheme}\t{bits}\t{comp_std:.4}\t{entropy:.4}\t{null_err:.4}");
+        }
+    }
+    println!("\n# LEMMA HOLDS iff equal-width: comp_std<0.01, entropy>=B-0.01, null_err<0.02; golden/random strictly worse on comp_std & entropy.");
+}
diff --git a/experiments/ordinal-routing-research/withdrawn/README.md b/experiments/ordinal-routing-research/withdrawn/README.md
new file mode 100644
index 00000000..374b0515
--- /dev/null
+++ b/experiments/ordinal-routing-research/withdrawn/README.md
@@ -0,0 +1,16 @@
+# Withdrawn results
+
+These findings did NOT survive review and are kept only for the record — do not
+cite them as results. See [../ADVERSARIAL_REVIEW.md](../ADVERSARIAL_REVIEW.md)
+(Round 1) and [../README.md](../README.md) for the tiering.
+
+- **spectral_probe_results.md** / **corpus_zoo_results.md** — the number-variance
+  "embedding keys are super-Poisson, never rigid" finding. The probe's unfold is
+  uncalibrated: a smooth-unfold salvage attempt INVERTED the isotropic/clustered
+  ordering, proving neither unfold is validated against analytic ground truth.
+  The rigidity THEORY (../rigidity_impossibility_proofs.md) does not depend on
+  this probe and is unaffected.
+
+The `examples/spectral_probe.rs` instrument itself is retained (its
+`--rigid-selftest` is a valid estimator check); only the corpus *findings* are
+withdrawn.
diff --git a/experiments/ordinal-routing-research/withdrawn/corpus_zoo_results.md b/experiments/ordinal-routing-research/withdrawn/corpus_zoo_results.md
new file mode 100644
index 00000000..c18404ee
--- /dev/null
+++ b/experiments/ordinal-routing-research/withdrawn/corpus_zoo_results.md
@@ -0,0 +1,66 @@
+> ⛔ WITHDRAWN. This zoo runs the number-variance probe whose unfold is
+> uncalibrated (see spectral_probe_results.md banner and ADVERSARIAL_REVIEW.md).
+> The smooth-unfold salvage inverted the isotropic/clustered ordering, so the
+> "robust across geometries" claim is unsupported. Retained as record only. The
+> gen_corpus.rs zoo GENERATOR itself is fine and reusable; the measurement on top
+> is what's withdrawn.
+
+# Corpus zoo: number-variance robustness across geometries
+
+Rigorous multi-corpus verification of the "embedding routing keys are not rigid"
+finding. Generates 6 geometries as .npy (`examples/gen_corpus.rs`) and runs the
+number-variance probe on each. Includes instrument controls at BOTH ends.
+
+n=50k, dim=256, 8 random-projection keys, Gaussian unfold.
+
+## Σ²(L)/L  (≈1 Poisson · >1 clustered · <1 RIGID)
+
+| kind        | L=8   | L=32  | L=128 | L=512  | reading |
+|-------------|-------|-------|-------|--------|---------|
+| isotropic   | 0.99  | 0.99  | 0.99  | 0.93   | Poisson NULL ✓ |
+| clustered   | 1.34  | 2.39  | 6.13  | 18.5   | super-Poisson |
+| anisotropic | 2.38  | 6.48  | 22.7  | 85.0   | super-Poisson |
+| rogue       | 1.60  | 3.34  | 10.0  | 34.1   | super-Poisson |
+| manifold    | 2.93  | 8.73  | 30.3  | 96.4   | super-Poisson |
+
+## Instrument controls (CRITICAL — proves the probe isn't blind)
+
+| control | result | proves |
+|---------|--------|--------|
+| `--rigid-selftest` (perfect lattice key) | Σ²/L = 0.0000 all L | probe DETECTS rigidity ✓ |
+| isotropic corpus | Σ²/L = 0.99 flat | probe DETECTS Poisson ✓ |
+
+## Findings
+
+1. **Robust across 5 geometries: embedding-like keys are super-Poisson
+   (clustered), never rigid.** Crucially this includes the NONLINEAR MANIFOLD
+   (2.9→96) — the one geometry where angular rigidity could plausibly have
+   hidden. It didn't.
+2. **The probe is validated at both ends.** Perfect lattice → Σ²=0 (sees
+   rigidity); isotropic → Σ²/L≈1 (sees Poisson). So the super-Poisson readings
+   are real, not instrument blindness. This was the gap that would have let a
+   skeptic dismiss the whole spectral finding; it is now closed.
+3. **Bonus substantive result (random-projection lattice control):** a sequence
+   that is rigid along ONE direction does NOT produce a rigid key after random
+   projection + L2 normalization (tested: still super-Poisson). I.e. even if an
+   embedding had rigid structure on a special axis, a data-oblivious
+   random-projection router would not see it — reinforcing that spectral
+   structure is not exploitable training-free. (The instrument self-test
+   isolates the estimator from this pipeline effect.)
+
+## Verdict
+
+The "keys are non-rigid → quantile bucketing is the whole story, no spectral
+opening" conclusion is now verified across diverse corpus geometries with
+validated instrument controls, not a single synthetic distribution. A FALLING
+Σ²/L on a real embedding dump would still be the surprise worth chasing — run
+the zoo recipe on real data via the runbook.
+
+Reproduce:
+```
+for K in isotropic clustered anisotropic rogue manifold; do
+  cargo run --release --example gen_corpus -- --kind $K --n 50000 --dim 256 --out zoo_$K.npy
+  cargo run --release --example spectral_probe -- --corpus-npy zoo_$K.npy
+done
+cargo run --release --example spectral_probe -- --rigid-selftest   # instrument control
+```
diff --git a/experiments/ordinal-routing-research/withdrawn/spectral_probe_results.md b/experiments/ordinal-routing-research/withdrawn/spectral_probe_results.md
new file mode 100644
index 00000000..68a678a4
--- /dev/null
+++ b/experiments/ordinal-routing-research/withdrawn/spectral_probe_results.md
@@ -0,0 +1,54 @@
+> ⛔ WITHDRAWN — PROBE DOES NOT MEASURE WHAT IT CLAIMS. The salvage attempt
+> (`--unfold-smooth K`) INVERTED the result: smooth unfold makes isotropic read
+> super-Poisson (Σ²/L 1.7→12) and clustered read LOWER. The two unfolds disagree
+> and neither is validated against analytic ground truth; the Gaussian "isotropic
+> = 0.99 Poisson" control was just the one marginal that unfold happened to fit.
+> The numbers below are RETAINED ONLY as the record of the failure. Fixing needs
+> the estimator calibrated against a process with known closed-form Σ²(L). The
+> theory (rigidity_impossibility_proofs.md) does NOT depend on this probe. Full
+> account: ../ADVERSARIAL_REVIEW.md.
+
+# Number-variance probe: is a 1-D routing key rigid or Poisson?
+
+Experiment for the "prime mile-marker / spectral index" conjecture. The
+conjecture needs the routing key to exhibit **rigidity** (sub-linear number
+variance, Σ²(L) ~ log L, like GUE eigenvalue / zeta-zero spectra) for any
+spectral/prime-gap partition structure to beat plain quantile bucketing.
+
+Source: `examples/spectral_probe.rs`. Number variance Σ²(L) = variance of the
+count of keys in a window of length L, after unfolding to unit mean density.
+Poisson ⇒ Σ²(L) = L (ratio 1, flat). Rigid ⇒ Σ²(L) ~ log L (ratio → 0).
+
+## Results (synthetic, n=200k, dim=256, 8 random-projection keys, seed=1)
+
+| L | clustered Σ²/L | isotropic Σ²/L (control) | quantile-unfold Σ² |
+|---|---|---|---|
+| 2 | 1.28 | 1.00 | 0.0000 |
+| 8 | 3.05 | 1.00 | 0.0000 |
+| 32 | 3.26 | 1.00 | 0.0000 |
+| 128 | 5.94 | 1.01 | 0.0000 |
+| 512 | 17.93 | 1.01 | 0.0000 |
+
+## Verdict
+
+1. **Isotropic control = exact Poisson (ratio 1.00, flat).** Validates the
+   probe: it reports "no structure" when there is none. The clustered signal is
+   therefore real, not an unfold artifact.
+2. **Clustered key is SUPER-Poissonian (ratio climbs 1.3→18, Σ² ≈ 0.037·L²).**
+   Clustering, the *opposite* of the rigidity the conjecture requires. There is
+   no sub-linear (log-L) regime for spectral/prime structure to grip.
+3. **Quantile unfold = Σ² 0.0000 exactly.** Quantile (inverse-CDF) tiling
+   balances the key perfectly by construction — empirical confirmation that it
+   strictly dominates any fixed-density tiling (including Li(x)) on occupancy.
+
+Conclusion: on this corpus the scalar routing key is Poisson-to-clustered, never
+rigid. Quantile bucketing is the whole story for occupancy balance. The opening
+the conjecture needed (rigidity) is absent. Re-run on real embeddings with
+`--corpus-npy` to confirm; prior is strongly that real keys are also non-rigid.
+
+Reproduce:
+```
+cargo run --release --example spectral_probe                    # clustered
+cargo run --release --example spectral_probe -- --isotropic     # Poisson control
+cargo run --release --example spectral_probe -- --unfold-empirical  # quantile
+```