Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
7729d4c
Add ordinal-routing research: probes, impossibility proofs, CRT verni…
toadkicker Jun 13, 2026
c03e4bd
Close adversarial-review gaps: fix correctness bugs, withdraw unsuppo…
toadkicker Jun 13, 2026
ebe4000
Characterize ordvec density collapse: mechanism + recoverable permuta…
toadkicker Jun 13, 2026
571a20b
Confirm density-collapse tau-signal on REAL embeddings (nomic-embed-t…
toadkicker Jun 13, 2026
6e7e306
Address Gemini code review: panic-safety, perf, Lean typecheck fix
toadkicker Jun 13, 2026
a0ec265
Second review round: de-circularize density test, fix bucketing, hard…
toadkicker Jun 13, 2026
21e5a9a
Docs: add benchmarks/README front door, move withdrawn results to wit…
toadkicker Jun 13, 2026
41a9007
Run the decisive bake-off: tau-rerank does NOT beat b=4 — clean negative
toadkicker Jun 13, 2026
8b22198
Move Lean formalization to ordvec-formalization; reframe as research …
toadkicker Jun 13, 2026
0f364ba
Maintainer review pass: relocate research to experiments/, fix reprod…
Fieldnote-Echo Jun 16, 2026
4b0fe44
docs: reconcile TwoNN evidence + Lean-PR status across the writeup
Fieldnote-Echo Jun 16, 2026
4d81c73
Research round 2: oblivious-directions arc — pre-registered negative …
toadkicker Jun 16, 2026
0db7515
Phase B: messy multi-cone "lake" robustness — all three lake fears un…
toadkicker Jun 16, 2026
e406bfb
Path B: chunk-length-mixture lake — fear falsified, lake arc closed
toadkicker Jun 16, 2026
1cbf2b4
docs: graduate the messy-corpus robustness finding into product docs
toadkicker Jun 17, 2026
9515198
Polish ordinal-routing research hygiene
Fieldnote-Echo Jun 19, 2026
1b4d291
Merge branch 'main' into feature/ordinal-routing-research
project-navi-bot Jun 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ exclude = [
"docs/FOLLOWUP_BODY_KERNEL_TIE_BREAK.md",
"docs/INDEX_PROVENANCE.md",
"docs/c-api.md",
"experiments/",
"fuzz/",
"ordvec-ffi/",
"ordvec-go/",
Expand Down
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,12 @@ structure of each vector on its own:
fit step. Encoding is a per-vector rank (or sign) transform — index the
very first vector with no prior data, and never refit when the corpus
drifts.
- **Robust by construction on messy corpora.** Because the code is built
from per-vector ranks (magnitude discarded) with no global frequency / IDF
term, the two things that corrupt learned codebooks — near-duplicate hubs
and mixed chunk lengths — have nothing to grab: b=4 R@10 moved −0.002 under
15% templated-hub injection and +0.002 across a four-chunk-length mixture.
([details + honest scope](docs/RANK_MODES.md#a-consequence-robust-by-construction-on-messy-corpora))
- **Zero system dependencies.** Pure Rust — no BLAS / LAPACK / `ndarray` /
`faer`. Builds and cross-compiles cleanly, including to `aarch64` and
`wasm32`.
Expand Down
37 changes: 37 additions & 0 deletions docs/RANK_MODES.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,43 @@ fraction of the full-scan latency. The `bench_rank` run prints this
as its `TwoStage ...` rows with the per-M candidate-recall (`CR`)
figure attached.

## A consequence: robust by construction on messy corpora

The same two properties that make the score combinatorial — it is a
function of **ranks** (magnitude discarded) and carries **no global
per-coordinate frequency / IDF term** — also make the encoding
*robust* on the heterogeneous data a real corpus dump contains. The
two failure modes that corrupt learned codebooks and IDF-weighted
schemes have nothing to grab here:

- **Templated / near-duplicate hubs.** A hub poisons schemes with a
global frequency term by inflating it. RankQuant has no such term, so
injected boilerplate stays far from real queries in rank space and
never enters the top-k. Measured: injecting templated near-duplicate
clusters into a 281k-doc multi-domain union at rising prevalence
moved b=4 R@10 by **−0.002 through 15%** — flat to noise.
- **Mixed chunk lengths.** Documents chunked at every length at once
produce cones of slightly different tightness. Because the per-vector
rank transform discards the magnitude that a tightness shift moves,
the *ranks* — and the score — are unchanged. Measured: a 230k-doc
lake unioning the same documents embedded at four chunk lengths
{128, 256, 512, 1100} moved b=4 R@10 by **+0.002** (candidate recall
stayed 1.0) versus the single-length baseline.

This is robustness *because* the code is deliberately oblivious, not in
spite of it — the property is a direct corollary of the
constant-composition mechanism above, no tuning involved.

**Scope (honest).** These are clean embeddings of curated corpora
(fiqa / nq / quora) made messy synthetically — multi-cone unions,
templated hubs, and chunk-length mixtures. They establish that
multi-domain, hub-heavy, and multi-length geometry are benign. They do
**not** model OCR garbage, mixed-language, or broken-encoding text,
which clean embeddings cannot reproduce; that case is untested. The
full pre-registered record (and the matching negative for
oblivious-*direction* structure across five encoders) is in
[`experiments/ordinal-routing-research/`](../experiments/ordinal-routing-research/README.md).

## Synthetic stress-test numbers

This is the clean-checkout stress test — regenerated by the default
Expand Down
107 changes: 107 additions & 0 deletions experiments/ordinal-routing-research/ADVERSARIAL_REVIEW.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Adversarial review findings (three independent hostile reviewers)

This branch was reviewed by three adversarial agents (Rust probes, math proofs,
Lean skeleton). Their findings are recorded here verbatim-in-summary so the PR
carries its own critique. Conclusions are tiered by what SURVIVED review.

## SOUND (ships as claimed)

- **CRT seam oracle** (`examples/crt_seam_oracle.rs`): exhaustive finite proof.
Coincidence spacing = lcm; exactly one all-L coincidence per period; the
honest negative result that phases cannot generically rescue a pointwise floor.
All three reviewers independent-agree this file is honest.
- **TwoNN metric fix** (chord vs cosine): the squared-distance bug fix is correct
and validated on sphere controls.

## CORRECTED IN THIS BRANCH (was wrong/loose, now fixed)

- **CRT density closed form**: must be `∏ min(2t+1, m_i)/m_i`, NOT `∏(2t+1)/m_i`.
The literal product exceeds 1 at t=3 (period-3 grid saturates) — the SAME error
class the doc attributed to the rejected `(2t)^L/M` form. The Rust code already
caps with `.min(p)`; the markdown and Lean statements are corrected to match.
Valid precondition for the uncapped form: `2t+1 ≤ min_i m_i` (t≤1 here).
- **Lean `crtEquiv`**: false as typed — `ZMod 0 = ℤ` breaks the cardinality
match. Needs `[∀ i, NeZero (m i)]`. Signature corrected; `hcongr` lemma plan
corrected to `Equiv.subtypeEquiv` + `Fintype.card_congr` (the cited
`Equiv.card_filter_map` does not exist).

## DEMOTED TO EXPLORATORY (known confounds — NOT a settled finding)

- **"Routing keys are super-Poisson, never rigid"** (`spectral_probe.rs`,
`withdrawn/corpus_zoo_results.md`): **WITHDRAWN — the probe does not measure what it
claims.** Attempted salvage (added `--unfold-smooth K`, a K-knot empirical
unfold) INVERTED the result: under the smooth unfold the isotropic corpus
reads super-Poisson (Σ²/L 1.7→12) and clustered reads LOWER (1.4→6.8) — the
opposite ordering from the Gaussian-unfold version. So the two unfolds
disagree and NEITHER is validated against analytic ground truth. The
Gaussian-unfold "isotropic = clean Poisson 0.99" was not a control passing —
it was the single marginal the wrong unfold happened to fit. The smooth unfold
has its own artifact (knot-scale + interpolation structure). CONCLUSION: the
number-variance empirical finding is UNSUPPORTED in either direction. Fixing it
requires rebuilding the estimator against a case with KNOWN analytic number
variance (e.g. a stationary process with closed-form Σ²(L)) to calibrate the
unfold + window estimator before trusting any corpus reading. This is open
follow-up work, not a patch. The THEORY (rigidity_impossibility_proofs.md) is
unaffected — it does not depend on this probe.
- **"Random offsets redundant / coprime adds nothing"** (`shard_recall.rs`):
**Bug L FIXED.** `build_projs` now seeds direction and phase RNGs separately
and identically across arms, so aligned vs random-offset share the same R
directions and the ablation is clean. Re-run: aligned 0.9095 vs random-offset
0.9080 (tied) — the "random offsets add nothing" claim now holds on a
controlled comparison. Still-open caveat: the "fair envelope" undersells the
coprime arms (they subdivide buckets and saturate below high budgets), and
coprimality across R directions is the wrong geometry — the within-axis vernier
harness remains unbuilt (theory is in crt_seam_oracle).
- **`gen_corpus` Bug O FIXED.** Corpus and queries now share one geometry
(`A` + prototypes seeded from a dedicated geometry-only RNG keyed by cfg.seed),
so query/corpus latent spaces match and shard_recall ground truth is valid.

## IMPOSSIBILITY PROOFS — repairable, currently overstated

- **Theorem 2**: conclusion (no rigidity) holds, but proof text is WRONG as
written — n i.i.d. uniforms are a BINOMIAL process, Σ²(L) = L(1−L/n), not
"Poisson, Σ²=L exactly, independent". Restate with the binomial value; the
Θ(L) conclusion is unaffected.
- **Theorem 3**: correct under its hypothesis but the hypothesis is smuggled —
"any fixed-distribution corpus" must be stated as "conditionally i.i.d. given
latent θ (mixture of i.i.d.)". de Finetti is decorative; it's the law of total
variance. Finite-without-replacement (the escape hatch) is excluded by
ASSUMPTION, not proof.
- **NON-SEQUITUR (must retract)**: "quantile bucketing is optimal against the
entire achievable class / prime-spectral structure provably is not there" does
NOT follow from Σ²(L) ≥ L. Number variance and partition recall/load-balance
are different figures of merit. Narrow the claim to: "the key is not
number-variance-rigid" (true), and drop the optimality-over-all-partitions
claim unless separately proved (likely false).

## Round 2 (real-embedding pipeline + post-PR code review)

After the real-embedding work and the PR was opened, a second hostile review
(plus the Gemini/qodo PR bots) hit the new material:

- **CRITICAL — density-collapse headline was an artifact, now corrected.** The
win-rate climb 0.667→0.930 with top-k was an estimator-variance effect (M2),
and tau was computed on the probe's own coords, coupling it to cosine (M1).
FIXED: tau now uses the per-pair UNION of top coords (de-circularized), and we
report the tau GAP (effect size) with a bootstrap 95% CI instead of win rate.
Result survives but is MODEST and FLAT: gap ≈ 0.04, CI strictly > 0 at every
top-k. The "sharpening / signature of a real effect" claim is RETRACTED; the
small-but-real separation stands.
- **qodo: bucketing bug FIXED.** density_collapse reimplemented bucketing as
`rank/(d/2^bits)` (panics at d/2^bits==0; wrong for non-divisible dims). Now
uses `ordvec::rank::rank_to_bucket` — measures REAL RankQuant behavior.
- **embed_ollama.py hardened:** E2 (silent row misalignment if ollama returns
wrong count) now aborts; E3 (empty corpus) guarded.
- **Reproducibility (E4/E5):** the repo-sentence extraction + embed procedure is
now recorded verbatim in density_collapse_results.md (was unrecorded).
- **G1 overclaim:** body language softened; single-corpus/single-model
generality explicitly NOT claimed.

## Net

The mathematically defensible deliverables are: the CRT vernier structure
(oracle + corrected density + Lean skeleton with corrected `crtEquiv` signature),
and the TwoNN metric fix. The empirical rigidity/routing findings are
exploratory with identified confounds and concrete salvage paths. The
impossibility theorems are directionally right but need restatement (binomial
not Poisson; mixture hypothesis explicit; optimality claim retracted).
89 changes: 89 additions & 0 deletions experiments/ordinal-routing-research/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Ordinal-routing research — reviewer's guide

Exploratory investigation into ordvec's **density behavior** and whether
prime/spectral structure can improve training-free routing. Everything lives in
`experiments/ordinal-routing-research/` — the findings/proofs (`*.md`) next to the
probes that produced them (`*.rs`, `embed_ollama.py`) — **no changes to the
`ordvec` crate or its public API.** This directory is `package.exclude`d, so it
ships with the source tree but not the published crate.

> **Running the probes.** These were developed and run as Cargo examples. The
> `cargo run --release --example <name>` commands and `examples/…` paths
> throughout these docs refer to that original layout. To reproduce, copy the
> relevant `.rs` from this directory into the crate's `examples/` directory and
> run the command shown (they depend only on the existing `ordvec` / `rand` /
> `rayon` dev-deps). They are kept here as reference source, not as a compiled
> build target.

Reviewed by three internal adversarial agents plus the PR bots; findings are
tiered below by **what survived scrutiny**. Read the tiers, not every doc.

## 3-minute path

1. This file (the tiers + the verdict at the bottom).
2. **[density_collapse_results.md](density_collapse_results.md)** — the mechanism
(real-embedding, with its honest correction), then
**[tau_rerank_bakeoff_results.md](tau_rerank_bakeoff_results.md)** — the
decisive negative: it doesn't beat b=4.
3. **[ADVERSARIAL_REVIEW.md](ADVERSARIAL_REVIEW.md)** — what was challenged,
fixed, retracted, withdrawn. The integrity record.

## SOUND — proven or real-data confirmed

| doc | claim |
|-----|-------|
| [density_collapse_results.md](density_collapse_results.md) | **Mechanism.** RankQuant b=2 density collapse = Hamming-near codes the scorer can't separate. Among those lookalikes, true neighbours have lower intra-code Kendall-tau (gap ≈ 0.04, CI > 0). Real but small. |
| [tau_rerank_bakeoff_results.md](tau_rerank_bakeoff_results.md) | **The verdict.** Does that tau signal beat b=4? NO — b=4 wins even at the tau ceiling; tau scores below b=2's own ordering. Signal is real-but-inert; just use b=4. Closes the line: research, not a feature. |
| [crt_seam_oracle_results.md](crt_seam_oracle_results.md) | CRT vernier seam theorem — exhaustive finite proof: lcm spacing, one coincidence/period, capped density `∏min(2t+1,m_i)/m_i`. Lean 4 formalization lives in the companion repo: [ordvec-formalization#17](https://github.com/Fieldnote-Echo/ordvec-formalization/pull/17) (open PR, `sorry`-free). |
| [shard_recall_results.md](shard_recall_results.md) | Controlled ablation (post RNG-desync fix): random phase offsets add nothing vs aligned grids across R random directions. |
| [oblivious_directions_results.md](oblivious_directions_results.md) | **The directions arc (round 2).** Data-oblivious low-discrepancy directions (golden-angle / Sobol / Kronecker) do NOT beat iid-random for training-free routing — across 5 encoders (nomic, bge-m3, bge-large, snowflake-arctic-v2, harrier-oss) at real intrinsic dim 18–24. CLASS-DEAD, pre-registered, replicated (the one mid-ladder flicker failed to replicate). Centering removes the cone but fails at b=4 (penalty grows with capacity). One robust positive: data-aligned (PCA) directions lead at higher ID — the lever is data-alignment, which training-free forbids. Also **resolves the twonn_id PARTIAL**: real-corpus ID measured at ~18–24 across 5 encoders, and ID is a **corpus** property (repo≈13 vs fiqa≈24, same encoder), not an encoder constant. Probes: `uniformity_lemma.rs`, `overlap_decomp.rs`, `centering_recall.rs`, `subspace_directions.rs`, `partition_balance.rs`, `fib_*.rs`. |
| [length_mixture_lake_results.md](length_mixture_lake_results.md) | **Path B — chunk-length-mixture lake (closes the synthetic-lake arc).** Same fiqa docs embedded at 4 chunk lengths {128,256,512,1100} unioned into a 230k-doc lake; b=4 raw R@10 vs FP32 cosine is **immune** (+0.002, CR@100=1.0). Bonus measurement of the "chunk length is a third geometry axis" claim: real but **small and co-axial** — R̄ spreads only 0.705→0.723 over an 8.6× length range, cone axes ≥0.986 aligned (not the distinct geometries the mixture framing imagined). With Phase B (multi-domain) this leaves every synthetic lake pathology — multi-cone, hub, multi-length — benign for "spend the bits, b=4." Probe: `make_length_lake.py` + `centering_recall.rs`. |

## THEORY — directionally right, restated honestly

| doc | status |
|-----|--------|
| [rigidity_impossibility_proofs.md](rigidity_impossibility_proofs.md) | The routing key is not number-variance-rigid (Thm 2/3, binomial `L(1-L/n)`). The over-broad "quantile optimal over all partitions" claim is **retracted** as a non-sequitur. |
| [conjecture_citation_audit.md](conjecture_citation_audit.md) | Citations verified by direct fetch (Ethayarajh, Broughan-Barnett, etc.); a few subagent confabulations caught and corrected. |
| [twonn_id_results.md](twonn_id_results.md) | ⚠️ PARTIAL. The chord-metric fix is sound (sphere-validated to ID ~12); the OLS-through-origin estimator is biased (not MLE), and **no clean real-corpus ID is recorded here**. A low-tens sentence-transformer ID is a hypothesis by cross-domain analogy (Ansuini's low-tens are vision CNNs, not sentence encoders) — not established or measured in this branch. |

## WITHDRAWN — see [withdrawn/](withdrawn/)

The number-variance "super-Poisson" finding ([withdrawn/spectral_probe_results.md](withdrawn/spectral_probe_results.md),
[withdrawn/corpus_zoo_results.md](withdrawn/corpus_zoo_results.md)) did not
survive: its unfold is uncalibrated (a salvage attempt inverted the result). The
*theory* above does not depend on it. Kept for the record, not as a claim.

## Conjecture verdict (the framing question)

Prime / Li(x) / Sacks-spiral constructions don't help retrieval: they act on the
index (ℕ) and carry no corpus information. The exploitable dense-region structure
lives on the permutohedron `S_D` — the data's own order — which is the
density-collapse result above. Detail across the theory docs + ADVERSARIAL_REVIEW.

## Reproduce

Per-doc commands are at the bottom of each file. Real-embedding pipeline (GPU via
ollama) is fully recorded in [density_collapse_results.md](density_collapse_results.md);
external-corpus recipe in [REAL_CORPUS_RUNBOOK.md](REAL_CORPUS_RUNBOOK.md).

## The deployment question — RESOLVED (negative)

[tau_rerank_bakeoff_results.md](tau_rerank_bakeoff_results.md): the decisive
matched-bytes experiment was run. **b=4 wins decisively, even at the tau ceiling**
(real embeddings: b4 0.942, b2 0.898, tau-rerank 0.597, fp32-rerank 1.000). The
b=2 candidate pool contains every true neighbour (fp32-rerank=1.0), but the ~0.04
tau gap is too weak to ORDER them — it scores below b=2's own ordering. The
density-collapse signal is **real but inert**: "just use b=4," no ordvec feature
follows.

The **deployment-robustness** sub-arc is likewise resolved negative: across every
synthetic lake pathology — multi-domain cones + templated hubs (Phase B) and now a
chunk-length mixture ([length_mixture_lake_results.md](length_mixture_lake_results.md),
Path B) — b=4 raw routing does not degrade (hub Δ −0.002 through 15%; length-mixture
Δ +0.002). The only un-run test left needs *real* dirty data (OCR / multilingual S3
sludge), uncapturable from clean embeddings.

This is the honest bottom line of the whole branch: a characterized mechanism and
a clean negative. **Research, not a feature** — the prime/spectral/permutation
ideas for dense-region retrieval do not beat the boring baseline (spend the bits).
48 changes: 48 additions & 0 deletions experiments/ordinal-routing-research/REAL_CORPUS_RUNBOOK.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Real-corpus runbook for the conjecture probes

All four examples now consume the SAME `.npy` format `bench_rank` documents:
2-D little-endian float32 (`<f4`), C-order, shape `(n, dim)`. One corpus dump
flows through the entire investigation. No BLAS, no Python at run time.

Recommended public corpora (per bench_rank header): GloVe, OpenAI
text-embedding-3 dumps, or any sentence-transformer output saved as .npy.

## 1. Intrinsic dimension — sizes the projection budget
```
cargo run --release --example twonn_id -- --corpus-npy corpus.npy
```
Reports chord-metric TwoNN ID. Expect low tens (READ AS LOWER BOUND — finite
sample deflates above ~12). If ID≈d_int, the routing layer wants R≈c·d_int
projections, i.e. R∈{8,16}.

## 2. Number variance — is the routing key rigid or Poisson?
```
cargo run --release --example spectral_probe -- --corpus-npy corpus.npy
cargo run --release --example spectral_probe -- --corpus-npy corpus.npy --unfold-empirical
```
Σ²(L)/L flat≈1 ⇒ Poisson; climbing ⇒ clustered; falling ⇒ rigid (the only
result that would reopen the spectral conjecture). --unfold-empirical confirms
quantile tiling balances the key (Σ²→0).

## 3. Shard recall — does the oblivious router work; does coprime help?
```
cargo run --release --example shard_recall -- \
--corpus-npy corpus.npy --queries-npy queries.npy
```
Needs BOTH files (real queries for honest recall). Fair envelope = recall@k at
equal candidates-scanned. Watch: does recall keep climbing R=1→16 (sets the
budget), and does coprime/random-offset beat plain aligned (predicted: no).

## 4. Headline retrieval quality (the existing bench)
```
cargo run --release --example bench_rank -- \
--corpus-npy corpus.npy --queries-npy queries.npy --queries 200 --k 10
```

## Expected story on real embeddings (prior)

Consistent with synthetic + verified literature: ID low tens → R∈{8,16};
key Poisson/clustered not rigid → quantile bucketing; coprime adds nothing →
R shared-width random projections is the router. A FALLING Σ²(L)/L or a
coprime>random-offset gap that survives reseeding would be the surprise worth
chasing.
Loading
Loading