TriMemory -- Memory Architecture for LLM Agents

Three-path memory layer for LLM agents. KV window for recent tokens, retrieval index for archived chunks, TRN recurrent state for compressed long-range patterns. 8 KB of state per agent. Flat throughput at 10,000+ token history where a KV cache is 156 MB and 15x slower.

KV window -- recent tokens, exact attention
Retrieval index -- archived chunks, cosine search over hidden states
TRN state -- compressed patterns and periodicity, constant size (8--96 KB)

A learned 3-way softmax gate mixes all three paths per token.

TRN vs Transformer+KV (CPU, d=256, L=8, K=128)

History	TRN (tps)	TF+KV (tps)	TRN State	KV Cache (fp32)
1,000	240	73.8	8 KB	15.6 MB
5,000	244	35.9	8 KB	78.1 MB
10,000	231	15.5	8 KB	156.3 MB

TRN state is O(K), not O(T). Throughput stays flat while Transformer+KV degrades as history grows.

Quick Demo

git clone https://github.com/amabito/tri-memory.git
cd tri-memory
pip install -e ".[dev]"
pytest  # 277 tests

TriMemoryEngine -- training loop

import torch
from trimemory import TRNConfig
from trimemory.tri_memory import TriMemoryEngine

cfg = TRNConfig(
    vocab_size=8192, d_model=128, n_oscillators=64,
    n_layers=4, d_ff=512, max_seq_len=1024,
)
model = TriMemoryEngine(
    cfg,
    window_size=64,           # KV window: last 64 tokens
    chunk_size=32,            # eviction granularity
    max_retrieval_chunks=256, # retrieval index capacity
    enable_trn=True,
    enable_retrieval=True,
)

ids = torch.randint(0, cfg.vocab_size, (1, 512))
out = model(ids, labels=ids)
print(f"loss: {out['loss']:.4f}")

mem = model.memory_summary()
print(f"TRN state: {mem['trn_state_bytes']} bytes (constant)")
print(f"KV window: {mem['kv_window_bytes']} bytes (bounded)")

AgentMemory -- stateful per-token streaming

from trimemory import TRNConfig
from trimemory.agent_memory import AgentMemory

mem = AgentMemory(TRNConfig.toy(), device="cpu")

# Feed tokens. State is O(K) -- no KV cache.
mem.add_tokens([1, 2, 3, 4, 5])

state = mem.get_state()
print(f"TRN state: {mem.state_size_bytes()} bytes, "
      f"position: {state['position']}")

# Save and restore across agent turns.
mem.save("turn1.pt")
mem.load("turn1.pt")

Standalone TRN

from trimemory import TRNConfig, TRNModel
import torch

cfg = TRNConfig.trn_100m()
model = TRNModel(cfg)

prompt = torch.randint(0, cfg.vocab_size, (1, 16))
tokens = model.generate(prompt, max_new_tokens=128)
# O(1) memory per step. No KV cache.

How it differs from RAG

	Standard RAG	TriMemory
Retrieval basis	Semantic similarity	Authority chain + semantic similarity
Handles amendment override	No (both chunks retrieved equally)	Yes (structured authority resolution)
Memory per agent at 10k context	O(context) -- 156 MB KV cache (fp32)	O(1) -- 8 KB TRN state
Throughput at 10k context	15.5 tps (TF+KV, CPU)	231 tps (TRN, CPU)
Content-addressed retrieval	Yes	No -- TRN recall is 0.0 (honest limitation)
Status	Production	Alpha (toy-scale models, N=10 benchmark)

Authority resolution requires use_compact_memory_packet=True and document metadata. Off by default.

Architecture

Input
 |-- KV window (last W tokens, exact attention)
 |-- Retrieval index (archived chunks, cosine search)
 |-- TRN state (compressed patterns, constant size)
 |
 v
3-way gate: [g_kv, g_trn, g_ret] = softmax(W_gate * x)
  out = g_kv * kv_out + g_trn * trn_out + g_ret * ret_out
 |
 v
FFN -> logits

Path	What it stores	Size	Access
KV window	Recent W tokens	O(W) per layer	Exact attention
Retrieval	Archived chunks with hidden states	Fixed capacity (default 256 chunks)	Cosine similarity search
TRN state	Compressed history (amplitude, phase, frequency)	O(K) per layer, constant	Linear recurrence

Tokens enter KV window. Every C tokens, the oldest chunk gets evicted and scored for saliency. High-saliency chunks go to the retrieval index. TRN state always updates. The gate routes each path's output based on what the current token needs.

Benchmark Results

V5 A/B/C/D (Seeds 1--10, 3000 steps)

Config	Composite	Strength	Caveat
A (KV only)	0.263	Baseline	No long-range memory
B (KV+TRN)	0.457	Pattern detection 0.678	2/10 seeds stuck on pattern (D recovers)
C (KV+Ret)	0.369	Old fact recall 0.433	No pattern capability
D (Full)	0.676	Pattern 0.805, Old fact 0.719	Toy scale only (d=128, L=4)

D >= max(A,B,C) in 10/10 seeds (mean delta +0.165). H1--H4 all PASS. See docs/FINAL_VERDICT.md.

Multi-Agent Scaling (trn_100m, T=1000)

Agents	TRN Total	KV Total (fp32)	Ratio
10	0.16 MB	312 MB	2,000x
100	1.56 MB	3,125 MB	2,000x
1,000	15.6 MB	31,250 MB	2,000x

Config-specific. bf16 KV halves the ratio. See docs/PUBLIC_CLAIMS.md.

PolicyBench (N=10)

Document authority QA -- 10 samples, Japanese corporate IT security policy, 9 evaluation types: authority resolution, amendment override, scope-dependent values, transition states, and more. English language corpus evaluation is planned.

data/policybench/policy_v1.jsonl

Each sample has multi-document context (base policy + amendments + circulars), a query, gold answer, authority chain, and failure class annotation.

VERONICA Integration

TriMemory handles memory architecture. Runtime containment (budget enforcement, circuit breaking, policy governance) is handled by VERONICA-core. Together: governed knowledge execution for LLM agents.

Known Limitations

TRN cannot perform content-addressed retrieval. Selective copy accuracy is 8.8% vs Transformer 96.2%. Needle-in-Haystack recall is 0.0. Structural property of linear recurrence.
All experiments use toy-scale models (1--100M parameters). Scaling behavior at 1B+ is unknown.
B config (KV+TRN) shows seed-dependent pattern failure (2/10 stuck), though D recovers.
PolicyBench is N=10. Validation on larger corpora is future work.
CompactMemoryPacket (authority resolution, conflict detection) is off by default. Requires use_compact_memory_packet=True and document metadata with authority chains.
Retrieval index is not persistent across process restarts.
Gate telemetry is diagnostic only -- gate ratios vary by input distribution.
Alpha status. No production deployment.

See docs/TRN_LIMITATIONS.md and docs/PUBLIC_CLAIMS.md.

Repository Structure

src/trimemory/
    tri_memory.py    TriMemoryEngine (KV + TRN + Retrieval)
    retrieval.py     RetrievalIndex (bag/hidden/hybrid search)
    model.py         TRNModel (standalone)
    resonance.py     TemporalResonanceLayer (oscillator recurrence)
    baseline.py      TransformerModel (A/B comparison)
    saliency.py      SaliencyArchiver (chunk scoring)
    router.py        Retrieval router / gate
    config.py        TRNConfig (toy, trn_100m, trn_400m, trn_1b)
    agent_memory.py  Streaming agent inference wrapper

scripts/
    eval_go_no_go.py                   Go/No-Go gate evaluation
    run_trimemory_streaming_eval.py    Streaming evaluation with telemetry
    bench_phase7_gpu.py                GPU benchmark (TRN vs TF+KV)

data/policybench/
    policy_v1.jsonl                    PolicyBench N=10

tests/     277 unit tests
docs/      Architecture, limitations, public claims audit

License

Apache License 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github		.github
artifacts		artifacts
csrc		csrc
data		data
docs		docs
results		results
scripts		scripts
src/trimemory		src/trimemory
tests		tests
.gitignore		.gitignore
BENCHMARK.md		BENCHMARK.md
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
ROADMAP.md		ROADMAP.md
bench_generate.py		bench_generate.py
bench_train.py		bench_train.py
conftest.py		conftest.py
demo.py		demo.py
pyproject.toml		pyproject.toml
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TriMemory -- Memory Architecture for LLM Agents

TRN vs Transformer+KV (CPU, d=256, L=8, K=128)

Quick Demo

TriMemoryEngine -- training loop

AgentMemory -- stateful per-token streaming

Standalone TRN

How it differs from RAG

Architecture

Benchmark Results

V5 A/B/C/D (Seeds 1--10, 3000 steps)

Multi-Agent Scaling (trn_100m, T=1000)

PolicyBench (N=10)

VERONICA Integration

Known Limitations

Repository Structure

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TriMemory -- Memory Architecture for LLM Agents

TRN vs Transformer+KV (CPU, d=256, L=8, K=128)

Quick Demo

TriMemoryEngine -- training loop

AgentMemory -- stateful per-token streaming

Standalone TRN

How it differs from RAG

Architecture

Benchmark Results

V5 A/B/C/D (Seeds 1--10, 3000 steps)

Multi-Agent Scaling (trn_100m, T=1000)

PolicyBench (N=10)

VERONICA Integration

Known Limitations

Repository Structure

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages