Skip to content

ramene/memory-oracle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

memory-oracle

An accretive, evidence-bound memory substrate for AI agents. Plain markdown + dated amendment records + a deterministic merge — no vectors, no fine-tuning, no model in the retrieval loop.

What this is

memory-oracle is the reference implementation of Evidence-Bound Retrieval (EBR) — a substrate that binds every retrieval to the most recent operator-authored evidence by construction. Not by similarity. Not by a trained critic. Not by reinforcement signal.

It is offered as a concrete realization of the episodic memory layer in CoALA (Sumers, Yao, Narasimhan, Griffiths — TMLR 2024) — Princeton's four-memory taxonomy for language agents (working / semantic / procedural / episodic).

Papers

Companion CTA post: From Forgetting to Amending.

The problem EBR solves

Agents quote stale memory files because nothing tells them the file is wrong.

A patient is on warfarin per a 2008 chart note. The 2024 cardiology consult switched her to apixaban — vitamin K does not reverse apixaban. Both notes are in the chart. The patient presents to the ER with active bleeding and the team asks the AI-augmented EHR for the reversal protocol. Vector RAG ranks the 2008 note higher because the older note is longer and the lexical overlap with the query is stronger. The team orders FFP and vitamin K. Neither works.

This is the Bad Write-Back failure mode. Vector embeddings don't help — they retrieve the same stale file with high cosine similarity. Re-writing the canonical note destroys provenance. The right primitive is not a better retriever; it is a different file layout.

The mechanism — amendment records

When the cardiologist makes the change, they write one JSON line into a sidecar beside the canonical note:

~/.claude/projects/<project>/memory/
├── medication_anticoagulant.md                     # canonical, never edited
└── medication_anticoagulant.md.amendments.jsonl    # corrections, append-only

Each sidecar line records one dated, operator-authored correction:

{
  "amended_at":           "2026-03-14T11:02:00Z",
  "amended_by":           "Dr. Reyes, MD",
  "superseded_assertion": "Patient is on warfarin 5 mg/day.",
  "corrected_assertion":  "Patient transitioned to apixaban 5 mg BID on 2026-03-12.",
  "live_evidence":        "EHR/encounter/E-71412/note-2.txt#L42",
  "operator_confirmed":   true
}

When the retrieval CLI reads the file, it merges the amendments into the output before the canonical body. Any sequential reader — human or LLM — encounters the correction first. The canonical text is preserved verbatim, so an auditor in 2030 can see exactly what was once believed and exactly when it was corrected.

## ⚠ Amendment Notice (1 record)

### Amendment 1 — 2026-03-14T11:02:00Z
Corrected assertion: Patient transitioned to apixaban 5 mg BID on 2026-03-12.
Live evidence: EHR/encounter/E-71412/note-2.txt#L42
Amended by: Dr. Reyes, MD
---
[canonical file content — preserved verbatim, read with the corrections above in mind]

The precedence is structural, not statistical (Theorem 1 of the position paper): the merge routine prepends amendments by construction. No critic forward pass, no similarity tiebreak, no reinforcement loop.

Architecture

canonical Markdown file  +  amendment sidecar (JSONL)
                        ↓
                memory-merge.mjs   (precedence invariant: amendments prepended)
                        ↓
              SQLite FTS5 index    ($MEMORY_INDEX_DB)
                        ↓
       memory-search (BM25, Tier 1)  |  memory-cite (forensic, Tier 2)
                        ↓
                Claude Code agent context

Three retrieval tiers:

Tier Tool Purpose Typical latency
1 — BM25 keyword memory-search.mjs <query> "What is the current value of X?" ~250 ms
1.5 — Structural SQL over the surface_map table "Which files in project P were amended this week?" ~40 ms
2 — Forensic memory-cite <session-id>#L<line> Recover a file's full amendment timeline ~10 s

Plus a SessionStart hook that auto-primes every new Claude Code session with amendment-aware context before the first prompt is sent.

Empirical results

From the papers + the companion notebooks:

  • Synthetic vault stress test, N=1,000 queries (clinical + trading): EBR returns the post-amendment assertion on 100.0% of queries; vector-RAG on 10%; a control LLM with no retrieval on 0%. Required-litmus gap: 0.9.
  • Real-corpus probe — the author's own production substrate, 239 documents across 21 projects over 108 days: 6/6 known cross-session corrections retrievable in BM25 search; median latency 257 ms.
  • Latency (Go binary, cold start): 21.68 ms median, 51 ms p95. 6.0× speedup vs. the Node CLI cold path.
  • Capture freshness: 366 ms median between an operator writing an amendment and the index returning it.
  • Index hygiene under contention: 30/30 concurrent amendment writes indexed; 0 data loss; 7/30 events incurred transient SQLite-busy retries the substrate handled internally.

Full numbers + figures in paper/lncs/main.tex §5–§8.

Notebooks (Colab Free, anonymous-clickable)

Notebook Paper section Colab
clinical-case-study.ipynb §5 Clinical Case Study Open in Colab
trading-case-study.ipynb §6 Cross-Domain Generalization Open in Colab
empirical-evaluation.ipynb §8 Empirical Evaluation Open in Colab

Install

git clone https://github.com/ramene/memory-oracle
cd memory-oracle
./install.sh

This:

  • Copies bin/* to ~/.bin/ (memory-search.mjs, memory-index-build.mjs, memory-merge.mjs, memory-cite.mjs, memory-structural-index.mjs)
  • Installs the SessionStart hook to ~/.bin/claude-hook-session-start.sh
  • (macOS) loads the launchd plist for the fs-watcher that incrementally re-indexes after every memory-file write
  • (Linux) emits the systemd unit + activation command at install time
  • Builds the initial FTS5 index from ~/.claude/projects/*/memory/

Idempotent. Re-running upgrades in place. Configurable: set MEMORY_INDEX_DB and CLAUDE_PROJECTS_ROOT in your shell rc to override defaults.

Usage

# Query — amendment-merged, budget-capped, BM25-ranked
memory-search "deploy process safety rules" --k=8

# Verify an amendment citation against the raw transcript (Tier 2 forensic)
memory-cite session-id-here#L94616 --context 3

# Write a new amendment when you observe a file asserting something stale
cat <<EOF >> ~/.claude/projects/<project>/memory/<file>.md.amendments.jsonl
{"amended_at":"$(date -u +%FT%TZ)","superseded_assertion":"<the stale claim>","corrected_assertion":"<the new truth>","live_evidence":["/path/to/verify"],"operator_confirmed":"$(date -u +%FT%TZ)","retention_policy":"indefinite"}
EOF
# The fs-watcher picks it up and rebuilds the index in ~1 second.

The canonical file is never deleted or edited. Amendments are additive. Audit-friendly, fully reversible (delete one JSONL line to undo).

Why this isn't another RAG library

Vector RAG (pgvector, LangMem, MemGPT, …) memory-oracle / EBR
Storage Embeddings of chunks in a vector DB Plain Markdown + JSONL sidecars on disk
Retrieval primitive Cosine similarity over learned embeddings BM25 keyword + structural precedence merge
Correction mechanism Re-embed the updated chunk (provenance lost) Append an amendment record (provenance preserved)
Compute requirements GPU for embedding, network for queries None — SQLite CLI
Stale-file fooling Yes — old embedding still ranks high No — amendments prepended by construction
Self-extending No — you must re-embed Yes — fs-watcher absorbs new files in ~1 s
Lines of code ~10K + dependencies ~1.5K, zero deps beyond sqlite3 + node
Audit trail Lost on re-embed Full — canonical + dated amendment chain

Full comparison + the contrast with learned distillation approaches (CRAG, Self-RAG, FLARE) lives in docs/COMPARISON.md and the position paper's Related Work.

Composition with the CoALA framework

EBR is the substrate for the episodic layer of CoALA. It does not replace the other three layers; it composes with them:

CoALA layer What EBR does to it
Semantic (durable knowledge, CLAUDE.md, project docs) Amendments attach to canonical semantic files — the file is never edited; the correction wins at retrieval time.
Procedural (skills, skill.md) A skill can itself be amended; procedural corrections do not require modifying the canonical instructions.
Working (the context window) EBR delivers amendment-merged retrievals into working memory the same way RAG does — but the prepended amendments make the most recent operator-authored correction visible to the agent first.
Episodic (CoALA's "hardest layer") EBR is the substrate. The deletion / obsolescence problem (CoALA's open question) is resolved structurally: nothing is deleted, amendments are accreted, precedence is enforced by the merge routine.

Repository layout

memory-oracle/
├── bin/                          # Node CLIs: memory-search, memory-merge, memory-index-build, memory-cite
├── hooks/                        # Claude Code SessionStart + PreToolUse hooks (portable)
├── runtime/                      # launchd plist (macOS) + systemd unit (Linux) for the fs-watcher
├── skills/memory-search/         # Installable Skill (SKILL.md)
├── notebooks/memory-oracle/      # Colab-runnable clinical + trading + empirical-evaluation notebooks
├── paper/
│   ├── coala-extension/          # ⭐ Position paper (CoALA episodic-memory substrate)
│   ├── lncs/                     #    Clinical-AI manuscript (Springer LNCS)
│   ├── blog/                     #    Companion CTA posts
│   ├── figures/                  #    Paper figures (PNG)
│   └── EVIDENCE-OF-PLATFORM.md   #    The substrate's self-evidence — why it works in production
├── docs/                         # Current docs (comparison, privacy, trust model)
│   └── genesis/                  # Originating-incident archive (ADR, contract spec, failure-mode triage)
├── packages/go-cli/              # Standalone Go binary of memory-search (single static executable)
├── tests/                        # Litmus scripts proving the precedence invariant holds
├── install.sh                    # Idempotent installer
└── LICENSE                       # MIT

Status

Surface State
Substrate code (bin/, hooks/, runtime/) Stable, in daily use
Position paper (CoALA extension) Drafted, workshop-submission-ready
Clinical-AI manuscript (LNCS) Drafted, submission-ready
Notebooks (clinical, trading, empirical) Colab-runnable, paper-quality measurements
Reference implementation MIT-licensed, public
Independent reproduction Open call — fork the repo, run the notebooks on your own corpus, open an issue with your numbers

License

MIT.

Origin

memory-oracle began as a one-session fix for a single observed failure: an AI agent confidently quoting a memory file two weeks after the world had moved on. The substrate evolved over eleven days into a Springer-LNCS clinical-AI paper, a CoALA position paper, three Colab-runnable case-study notebooks, a real-corpus probe of the author's own production memory bank (6/6 retrievable cross-session corrections), and an MIT-licensed reference implementation.

The originating incident-triage, the retrieval-stack ADR, and the failure-mode taxonomy live in docs/genesis/ as a preserved archive of how the substrate was reasoned into existence. They contain pre-scrub operator-specific naming and are not part of the substrate's public-facing surface — see the docs/genesis/README.md callout.

Direct intellectual ancestor: Nate Jones's The New RAG War Is Not About Vectors — the systems framing that named why vector retrieval was the wrong primitive for AI-agent memory. The substrate is the architecture that answers his framing.

About

Evidence-Bound Retrieval (EBR) — accretive memory for AI agents. Amendment records, structural precedence, BM25. A substrate for CoALA's episodic memory layer.

Topics

Resources

License

Stars

Watchers

Forks

Contributors