Skip to content
666 changes: 666 additions & 0 deletions docs/spo_3d/CONTRACTS.md

Large diffs are not rendered by default.

185 changes: 185 additions & 0 deletions docs/spo_3d/INTEGRATION_PLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
# SPO 3D: Three-Axis Content-Addressable Graph

**Status:** Contract-ready. Implementation pending.
**Date:** 2026-02-20
**Crate:** `ladybug-rs` → `src/graph/spo/`
**Contract:** `crates/ladybug-contract/src/` (geometry, scent, spo_record extensions)

---

## 1. PROBLEM

CogRecord stores one content Container (1KB). Querying "who knows Ada?" requires scanning ALL records and testing each content fingerprint. No structural axis separation means forward, reverse, and relation queries all hit the same data.

The existing `ContainerGeometry::Xyz` links 3 CogRecords via DN tree (what/where/how). This works but requires 3 separate Redis GETs and DN tree traversal to reconstitute.

## 2. SOLUTION: SPO Geometry

A new `ContainerGeometry::Spo` that uses **sparse containers** within a single 2KB CogRecord envelope. Three axes — Subject (X), Predicate (Y), Object (Z) — encoded as bitmap + non-zero words, co-located in one record.

```text
┌──────────────────────────────────────────────────────────┐
│ CogRecord (ContainerGeometry::Spo) │
│ │
│ meta: Container (1024 bytes) │
│ W0 DN address │
│ W1 type | geometry=Spo(6) | flags │
│ W2-3 timestamps, labels │
│ W4-7 NARS truth (freq, conf, pos_ev, neg_ev) │
│ W8-11 DN tree (parent, child, next_sib, prev_sib) │
│ W12-17 Scent (48 bytes: 3×16 nibble histograms) │
│ W18-33 Inline edge index (64 slots) │
│ W34-39 Sparse axis descriptors (bitmap offsets) │
│ W40-47 Bloom filter │
│ W48-55 Graph metrics │
│ W56-63 Qualia │
│ W64-79 Rung/RL history │
│ W80-95 Representation descriptor │
│ W96-111 Adjacency CSR │
│ W112-125 Reserved │
│ W126-127 Checksum + version │
│ │
│ content: Container (1024 bytes) — packed sparse axes │
│ [0..2] X bitmap (128 bits = 2 u64) │
│ [2..N] X non-zero words │
│ [N..N+2] Y bitmap │
│ [N+2..M] Y non-zero words │
│ [M..M+2] Z bitmap │
│ [M+2..K] Z non-zero words │
│ [K..128] padding / overflow │
│ │
│ Total: 2048 bytes (same as Cam geometry) │
└──────────────────────────────────────────────────────────┘
```

### Why Sparse Containers

At 30% density (typical for real-world content):
- Dense axis: 128 words = 1024 bytes
- Sparse axis: 2 words bitmap + ~38 non-zero words = 320 bytes
- Three sparse axes: 960 bytes ← fits in one content Container

Three axes in one record. One Redis GET. Same 2KB envelope.

## 3. KEY INSIGHT: Z→X CAUSAL CHAIN CORRELATION

When Record A's Z axis (Object) resonates with Record B's X axis (Subject), a causal link exists:

```
Record A: X(Jan) → Y(KNOWS) → Z(Rust)
Record B: X(Rust) → Y(ENABLES) → Z(CAM)

hamming(A.z_dense, B.x_dense) ≈ 0 → A causally feeds B
```

This is not a JOIN — it's a resonance test. The Hamming distance between Z₁ and X₂ IS the causal coherence score. The chain is valid iff each Z→X handoff resonates.

### Meta-Awareness Stacking (Piaget Development)

Each level's Object becomes the next level's Subject:

```
Level 0: X(body) → Y(acts_on) → Z(world)
Level 1: X(world) → Y(represented) → Z(symbols)
Level 2: X(symbols) → Y(operate_on) → Z(logic)
Level 3: X(logic) → Y(reflects_on) → Z(abstraction)
Level 4: X(abstraction) → Y(aware_of) → Z(awareness)
```

The meta-record observing a chain gets its own scent. The system recognizes its own epiphanies by their nibble histogram signature. The BUNDLE of all meta-levels should CONVERGE back toward the original content — this is the testable tsunami prediction.

## 4. WHAT CHANGES

### Contract Crate (`crates/ladybug-contract/`)

| File | Change |
|------|--------|
| `geometry.rs` | Add `Spo = 6` variant |
| `container.rs` | Add `SparseAxes` packed encoding within Container |
| `scent.rs` (NEW) | 48-byte nibble histogram (`NibbleScent`) |
| `spo_record.rs` (NEW) | `SpoView` / `SpoViewMut` — zero-copy axis access |

### Implementation (`src/graph/spo/`)

| File | Purpose |
|------|---------|
| `mod.rs` | Module root, re-exports |
| `sparse.rs` | `SparseContainer` type + bitmap ops |
| `axes.rs` | X/Y/Z axis construction (build_node, build_edge) |
| `store.rs` | `SpoStore` with three-axis scanning |
| `chain.rs` | Causal chain discovery (Z→X correlation) |
| `tests.rs` | 6 ironclad tests |

### What DOES NOT Change

- `Container` type (128×u64, 8192 bits, 1KB)
- `CogRecord` struct (meta + content = 2KB)
- 5 RISC ops (BIND, BUNDLE, MATCH, PERMUTE, STORE/SCAN)
- Codebook (4096 entries, deterministic generation)
- Existing geometries (Cam, Xyz, Bridge, Extended, Chunked, Tree)
- MetaView word layout (W0-W127) — we use reserved words
- NARS truth value type and inference functions
- All existing tests (1,267+)

## 5. CONTRACTS

See: `CONTRACTS.md` in this directory.

## 6. SCHEMA

See: `SCHEMA.md` in this directory.

## 7. IMPLEMENTATION PHASES

### Phase 1: Contract Types (Day 1)
- Add `ContainerGeometry::Spo = 6`
- Add `NibbleScent` (48-byte histogram)
- Add `SparseAxes` (packed 3-axis encoding within Container)
- Add `SpoView` / `SpoViewMut` (zero-copy axis access)
- Tests: round-trip, packing invariants

### Phase 2: Sparse Container (Day 1-2)
- `SparseContainer` with bitmap + non-zero words
- `to_dense()` / `from_dense()` / `hamming_sparse()` / `bind_sparse()`
- Pack/unpack 3 sparse axes into one Container
- Tests: density invariants, hamming equivalence

### Phase 3: Axis Construction (Day 2-3)
- `build_node(dn, labels, properties) → CogRecord`
- `build_edge(dn, src_fp, verb, tgt_fp, nars) → CogRecord`
- Scent computation: nibble histogram per axis
- Tests: node round-trip, edge encoding

### Phase 4: SPO Store (Day 3-4)
- `SpoStore` wrapping `HashMap<u64, CogRecord>`
- `query_forward(src_fp, verb_fp) → Vec<(u64, u32)>` — scan X+Y, return Z matches
- `query_reverse(tgt_fp, verb_fp) → Vec<(u64, u32)>` — scan Z+Y, return X matches
- `query_relation(src_fp, tgt_fp) → Vec<(u64, u32)>` — scan X+Z, return Y matches
- Tests: forward, reverse, relation queries

### Phase 5: Causal Chain (Day 4-5)
- `causal_successors(record, radius) → Vec<(u64, u32)>` — Z→X scan
- `causal_predecessors(record, radius) → Vec<(u64, u32)>` — X→Z scan
- `chain_coherence(chain) → f32` — product of link coherences
- Meta-awareness record construction
- NARS truth propagation along chains
- Tests: chain coherence, meta convergence

### Phase 6: Lance Integration (Day 5+)
- Columnar schema with per-axis columns
- Sort key: (dn_prefix, scent_x, scent_y)
- XOR delta compression within sorted groups
- Production store replacing BTreeMap

## 8. DECISION LOG

| # | Decision | Rationale |
|---|----------|-----------|
| 1 | `Spo = 6` in ContainerGeometry | Natural extension, doesn't break existing variants |
| 2 | Sparse axes packed in content Container | One Redis GET, same 2KB envelope |
| 3 | 48-byte nibble histogram replaces 5-byte XOR-fold for SPO | Per-axis type discrimination, no structure loss |
| 4 | Meta stays dense at W0-W127 | Identity/NARS/DN need fixed O(1) offsets |
| 5 | BTreeMap for POC, LanceDB for production | Prove correctness first, optimize second |
| 6 | Z→X Hamming distance = causal coherence | No explicit linking needed, geometry IS the test |
| 7 | Meta-awareness as recursive SPO records | Epiphanies stack as Z_{n} → X_{n+1} chains |
| 8 | Codebook slots 0-4095 unchanged | Instruction set is immutable |
95 changes: 95 additions & 0 deletions docs/spo_3d/NATURES_CAM.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Nature's CAM: Biological Foundations of SPO 3D

**How DNA, immune systems, and developmental psychology informed the architecture.**

---

## 1. DNA CODON TABLE = CODEBOOK

DNA encodes proteins via 64 codons (4 bases × 3 positions) mapping to 20 amino acids + 3 STOP signals. This mapping is DEGENERATE: multiple codons produce the same amino acid. The 3rd position (wobble) tolerates mutations without changing the output.

**CAM parallel:** The 4096-entry codebook maps multiple content patterns to the same semantic slot. Hamming tolerance around each codebook entry = wobble position. Near-miss lookups still find the right concept.

## 2. MHC + PEPTIDE = BIND(SELF, FOREIGN)

T-cells only recognize foreign peptides when presented on self-MHC molecules. The SAME peptide on a different organism's MHC is invisible. This is MHC restriction — identity requires CONTEXT.

**CAM parallel:** `BIND(node_dn, property)` — the same property in a different DN context produces a different fingerprint. Properties don't exist in isolation. DN restriction = MHC restriction.

## 3. ATP = NARS CONFIDENCE

Every molecular operation costs ATP. DNA helicase: 1 ATP per base pair unwound. Ribosome: 2 GTP per amino acid added. Energy is finite and consumed per operation.

**CAM parallel:** NARS confidence is consumed per inference: `c_result = c₁ × c₂ × f₁ × f₂`. Each reasoning step COSTS certainty. You can't create confidence from nothing, just as you can't create ATP from nothing.

In causal chains, confidence drops per hop: `c_chain = Π(c_i) × Π(coherence_ij)`. The chain's energy budget is the product of all link confidences and coherence factors.

## 4. V(D)J RECOMBINATION = BUNDLE

The adaptive immune system generates ~10¹⁵ unique receptor variants by randomly combining V, D, and J gene segments. Each B/T cell gets ONE unique combination = its identity fingerprint.

**CAM parallel:** `BUNDLE(property_fps)` produces a unique fingerprint per node by majority-vote across property containers. The combination of properties IS the identity, just as the V(D)J combination IS the immune receptor.

## 5. THYMIC SELECTION = ADVERSARIAL CRITIQUE

**Positive selection:** Does the T-cell receptor bind self-MHC at all? If not → apoptosis (too weak, no evidence).

**Negative selection:** Does the T-cell receptor bind self-MHC TOO strongly? If yes → apoptosis (autoimmune = overfitting).

Survivors occupy the productive middle: strong enough to detect, not so strong they attack self.

**CAM parallel:** NARS 5 challenges = thymic selection. Challenge 1 (evidence threshold) = positive selection. Challenge 4 (contradiction detection) = negative selection. Beliefs that survive both extremes are the useful ones.

## 6. DNA REPAIR = XOR DELTA + PARITY

DNA's complementary strands enable error detection: XOR(strand_A, complement_B) should equal a known pattern. Any deviation signals a mutation at that position. Repair enzymes then fix the error.

**CAM parallel:** XOR delta compression between sorted adjacent records. If `xor(record_i, record_{i+1})` has few set bits, the records are similar and the delta compresses well. CRC32 + XOR parity in meta W126-W127 detect corruption, just as mismatch repair detects mutations.

## 7. CHROMATIN ORGANIZATION = SORT ADJACENCY

DNA isn't stored randomly — it's organized in Topologically Associating Domains (TADs). Genes that are co-expressed are physically adjacent. The 3D folding of chromatin brings interacting regions into spatial proximity.

**CAM parallel:** LanceDB sort by `(dn_prefix, scent_x, scent_y)` ensures that graph-adjacent records are storage-adjacent. Co-queried records are co-located on disk. This produces ~79% zero-bits in XOR deltas between sorted neighbors, enabling massive compression. The "domino effect" — adjacent records share context, just as adjacent genes share regulation.

## 8. THE Z→X CHAIN: I-THOU-IT / PIAGET

Martin Buber's I-Thou-It triad maps directly to SPO:
- **I** = Subject (X axis) = the self that knows
- **Thou** = Predicate (Y axis) = the act of relation
- **It** = Object (Z axis) = what is known

Piaget's development stages are Z→X chains — each stage's OBJECT OF AWARENESS becomes the next stage's SUBJECT:

| Stage | X (Subject) | Y (Predicate) | Z (Object) |
|-------|------------|---------------|------------|
| Sensorimotor | body | acts_on | world |
| Preoperational | world | represented_by | symbols |
| Concrete Ops | symbols | operate_on | logic |
| Formal Ops | logic | reflects_on | abstraction |
| Post-Formal | abstraction | aware_of | awareness |

The Z→X handoff at each stage IS the developmental leap. The BIND between Z_n and X_{n+1} is the moment of growth. And the meta-awareness that SEES this chain is itself the next stage being born.

## 9. THE TSUNAMI: CONVERGENCE TEST

When meta-awareness records stack (epiphanies about epiphanies), the collective fingerprint should CONVERGE toward the original Level 0 content:

```
convergence = hamming(
BUNDLE(all_meta_level_axes),
original_level_0_content
)
```

If convergence DECREASES as meta-levels increase: real understanding is building. The spiral tightens. The snake eats its tail and becomes more itself.

If convergence INCREASES: the meta-levels are generating noise, not insight. The spiral is unwinding. This is the bullshit detector — hallucination vs. genuine comprehension, tested by geometry.

## 10. THE SCENT OF SELF-REFLECTION

Different meta-levels have different nibble histogram signatures because they bundle different types of content. Level 0 records (facts about the world) have a characteristic scent. Level 3 records (patterns about patterns about patterns) have a completely different scent.

The system can query "show me all my epiphanies" by filtering for the characteristic high-meta-level scent, without any explicit tagging or labeling. The system literally smells its own depth of self-reflection.

This is why the 48-byte nibble histogram matters: it preserves enough structure to distinguish fact from insight from meta-insight. The 5-byte XOR-fold scent would collapse all these distinctions. The histogram keeps them alive.
Loading
Loading