A CPU-native, identity-bearing neuron substrate. No matmul on the critical path. No backprop. No GPU.
Each neuron is a 64-byte cache-line struct in a numpy array. Edges are CSR-laid-out synapses. Spreading activation + Hebbian + reward-modulated plasticity drives learning. The graph self-organizes from reward — same primitives proven across six distinct domains: RL games, sentence-retrieval LM, image classification, audio, video, and open-vocabulary text generation.
| Task | Result | Model size | Verification |
|---|---|---|---|
| TTT vs minimax (substrate value head) | 100% draws | 26 KB | full minimax search |
| TTT next-state world model | 95% accuracy | 26 KB | held-out trajectories |
| LM 20-sent corpus, real inference | 87% (11/20 perfect) | 478 KB | substrate-predicted sentence-id |
| LM sentence-id prediction from prompt | 95% (19/20) | — | — |
| MNIST full set (60K/10K, 10 epochs, 60s) | 88.3% | 501 KB | 100% match: spread() ≡ fast path |
| Audio 4-tone classification (synthetic) | 100% | ~10 KB | scale-invariant: any duration |
| Video 4-motion classification (synthetic) | 100% | ~30 KB | scale-invariant: any T, H, W |
| LLM TinyStories 30K, #A+#C combo | PPL 164 | 64 MB | 94% drop vs baseline (1684) — see PPL progression below |
| Training speed (4-core CPU, no GPU) | 580K pairs/s | — | numba JIT + prange parallel, 20× over Python |
Approach (5K stories, V=2.5K) PPL drop_vs_baseline
─────────────────────────────────────────────────────────────────────
Baseline matmul 1684 —
#B substrate spread (top_k=20) 1171 30%
#A pure unigram (α=0) 282 83%
#A + #C (negsample K=3, σ=0.1) 153 91%
#A + #C + #F kNN-LM (K=50, α=0.3) 122 93% ← BEST
Stories V Baseline #A+#C +#F kNN PPL/V (relative)
─────────────────────────────────────────────────────────────────
5K 2500 1684 153 122 4.9%
30K 4000 2721 170 104 2.6%
100K 6000 — 204 117 1.9% ← BEST RELATIVE
At 30K stories, the full architecture (#A unigram backoff + #C neg sampling + #F kNN-LM) hits PPL 104 — into bigram-baseline territory (Jurafsky SLP3: ~30-100 for trigrams on similar-sized corpora). Substrate is now competitive with classical n-gram LMs on this corpus, running on CPU with no backprop and no GPU.
Absolute PPL number grows with vocab (more cells in W to learn). The relative measure (PPL/V — fraction of uniform) keeps dropping monotonically. Substrate genuinely improves with scale.
git clone https://github.com/tejasphatak/neuron-graph.git
cd neuron-graph
# 500 stories × 5 epochs, ~10 seconds end-to-end
PYTHONPATH=. python3 examples/tinystories_demo.pyTrains the substrate-LLM, prints PPL, runs cloze benchmark, generates samples. All on CPU. No GPU. No backprop.
Approach Result
─────────────────────────────────────────────────────────────────
#A + #B (matmul + spread) No improvement over #A alone
#A + #C + #E (combined) No compound; #C alone wins
#2 BPE tokenizer (5K, V=2500) Worse: 1.92 bpc vs 1.71 word
#1 sentence-id binding (TinyStories) Oracle helps, prompt-pred fails
(overlapping prefixes)
#A unigram backoff: log-mix W-context softmax with Laplace-smoothed unigram log-probs. Fixes "near-uniform softmax for unseen contexts" problem.
#B substrate-native spread(): build sparse Brain from W's top-K edges, predict via spread() instead of matmul. 25% PPL drop at top_k=20 alone, but #A is bigger lever.
#C negative sampling: word2vec-style — for each (ctx, target), weaken 3 random non-target edges. Forces W to be discriminative against random negatives. Combined with #A: best result.
Six distinct domains, same substrate primitives, no architectural change.
"once upon a time" → "once upon a time, there was a little girl named lily.
she loved dance on the stage in front of her yard"
"the little girl" → "the little girl was playing in the toy box and wanted
to see a shiny rock or. the two sisters were sisters"
"tom and" → "tom and mia was restless. he wanted to peek with a
bow in the park. she was always telling her"
Proper nouns (lily, mia, tom), subject-verb-object structure, narrative coherence. Substrate edges, no matmul on critical path, no backprop.
Each neuron = 64-byte cache-line struct, each synapse = 16 bytes. The biggest model here (MNIST, 501 KB) is:
- ~ same size as a JPEG photograph
- 2,000× smaller than Gemma 4 (smallest on-device LLM, ~1 GB)
- 2,000,000× smaller than GPT-4 (~1 TB rumored)
The MNIST classifier reaches 88% accuracy in half a megabyte. The TTT player reaches 100% draws vs perfect-play minimax in 26 KB.
-
LM: starting from only POS class membership + grammar shape (no co_occurs taught), RL grows the routing graph from reward alone — 28% cold-start → 89% with curriculum + sentence-id binding. Substrate-native retrieval baked into the graph: given a prompt, the substrate identifies the source sentence (95% accuracy) and routes generation accordingly.
-
MNIST: substrate's general
spread()is overkill for feed-forward topology — added a fast dense-matmul path that uses the same edge weights. Verified spread() and fast_predict produce identical predictions (200/200 match). Substrate IS the model; the dense view is just a faster layout. -
Optimizers: Adam-style per-edge momentum + adaptive LR applied to substrate Hebbian/perceptron deltas. Not "Adam over backprop" — same convergence tricks, no gradients.
git clone https://github.com/tejasphatak/neuron-graph.git
cd neuron-graph
# Run all tests (~123 tests, ~6 sec)
python3 -m pytest -q
# Smallest LM generation test (qualitative teach + spread)
PYTHONPATH=. python3 brain/tasks/lm/tiny.py
# 20-sentence RL scaling experiment
PYTHONPATH=. python3 brain/tasks/lm/scaling_experiment.py
# Full MNIST (60K/10K, ~60 sec on commodity CPU)
PYTHONPATH=. TRAIN_N=60000 TEST_N=10000 EPOCHS=10 OPT=adam \
python3 brain/tasks/mnist/experiment.pybrain/ substrate primitives (modality-agnostic)
neuron.py 64-byte cache-line struct
store.py Brain: nodes, synapses, aliases, relations
spread.py activation cycle (the "thinking" primitive)
goal injection, working memory, group-aware sparsity
learn.py Hebbian co-activation update
modulator.py global plasticity scalar (dopamine analog)
replay.py episode buffer + consolidate
trace.py per-event log (every spread/update inspectable)
working_memory.py sustained activation with positional decay
brain/tasks/ttt/ RL games — proven domain
game, world_model, planner, value_head, curriculum
brain/tasks/lm/ language modeling
tiny.py qualitative teach + 3 generators
rl.py teach_minimal, train_rl, train_rl_curriculum
predict_sentence_id, btsp_credit
scaling_experiment.py
brain/tasks/mnist/ vision / classification
encoder.py scale-invariant ImageEncoder (any image size)
mnist.py build_mnist_brain, train_step, predict, evaluate
fast.py dense forward+backward (Adam supported)
verify_substrate_learning (substrate ≡ fast path)
brain/tasks/audio/ audio classification
encoder.py AudioEncoder: 1D signal → spectrogram → fixed grid
any duration / sample rate → fixed substrate input
brain/tasks/video/ video classification
encoder.py VideoEncoder: T×H×W → uniform-sample frames → grid
any T / H / W → fixed substrate input
- No matmul on the critical path. Spread is sparse graph traversal. (Fast paths for feed-forward topologies use matmul as a layout optimization; substrate stays the source of truth.)
- No backprop. Local Hebbian + reward-modulated plasticity, perceptron rule, Adam-style smoothing.
- Identity-bearing neurons. Each neuron is a concept, position, sentence-id, pixel. Not a tensor slot. Carries semantic meaning.
- Inspectable. Every emission has a traceable spreading path.
- Modality-agnostic substrate. Tasks bring encoders + reward; substrate is generic.
- Verifiable. Whenever there's a fast path that bypasses substrate primitives,
verify_*functions confirm the substrate produces identical outputs.
- ✅ Modality polymorphism across 5 distinct domains:
- RL games (TTT) — 100% draws vs minimax
- sequence (LM) — 87% real-inference at 20-sent scale
- vision (MNIST) — 88.3% on full set
- audio (4-tone classification) — 100% on synthetic
- video (motion patterns) — 100% on synthetic
- ✅ RL self-correction grows the routing graph from reward (LM: 28% → 89%)
- ✅ Substrate's edges genuinely encode the learning (MNIST: 100% spread/fast match)
- ✅ Adam-style optimizer applied to substrate edge deltas (no gradients)
- ✅ Curriculum + replay + sentence-id binding compound at scale
- ✅ CPU-only on commodity hardware, ~60 sec for full MNIST
- Scaling LM beyond 20 sentences (sentence-id binding may or may not keep scaling)
- MNIST beyond 88% — would need richer encoding (multi-bin levels actually used, receptive-field patches, or substrate-learned hierarchy)
- Open-ended LM generation without teacher-forced POS sequence
- Phase C of the original plan: mmap + multi-core spread for billion-neuron substrate (designed but unimplemented)
- BTSP-inspired credit propagation (Magee 2017) — biologically grounded bidirectional reward propagation. Didn't fit LM's per-step reward (variance bleeds backward and weakens correct edges). Kept as opt-in for tasks with single-plateau reward (RL games, navigation).
- Group sparsity in spread() — modality-agnostic mechanism. Tested on LM, regressed accuracy because pruning during spread cuts credit-assignment signal. Mechanism kept; useful for other tasks.
- TTT self-play — diverges in zero-sum games without MCTS or population methods. Documented as known limit.
brain/tasks/ttt/PROBE_RESULTS.md— full TTT empirical findings- Commit log — every commit documents what was tested and learned, including negative results
MIT — research code. Use it, fork it, build on it.