Token-efficient external memory format for AI agents, using .aicontext files.
- Two-level access — Index files (lightweight overview) loaded first; detail files loaded on demand
- Hierarchical structure — Each folder has its own
index.aicontext; don't flatten to root - Token-based abbreviations — Only multi-token terms (3+ tokens) worth abbreviating; dictionary stored in
dict.aicontext - Merge updates — Keep one entry per topic file; update with latest date instead of appending
- Concise writing — Bullets, not paragraphs; remove filler
memory/
dict.aicontext # Abbreviation dictionary (optional, load first)
index.aicontext # Root index: lists categories
decisions/
index.aicontext # Lists decision topic files
auth.aicontext # One topic per file
db.aicontext
architecture/
index.aicontext
tasks/
index.aicontext
[topic files]
bugs/
index.aicontext
[topic files]
2026-04-10: JWT + Redis caching strategy
- Stateless session management via JWT tokens
- Redis for token revocation
- Scales horizontally
- Trade-off: complexity vs simplicityRules:
- One topic per file (e.g., auth.aicontext contains only auth strategy, not mixed decisions)
- YAML format with timestamp (YYYY-MM-DD)
- Bullet points only (concise)
- When topic revisited: update existing entry with new date; don't append duplicate entries
- Keep only the latest state; history not needed
# decisions/ - Architecture and design decisions
dir: architecture/ | System architecture decisions
dir: infra/ | Infrastructure and deployment decisions
file: auth.aicontext | JWT + Redis authentication strategy
file: db.aicontext | PostgreSQL selection rationaleRules:
- Lists immediate children only (files and subdirectories)
- Each entry: path + one-line description
- Update when files added/removed
# Multi-token abbreviations (optional)
SMS: session management strategy
DTL: database transaction loggingRules:
- Only include terms that:
- Are 3+ tokens when written out
- Appear 3+ times in context
- Are domain-specific (reused across files)
- Load first when reading memory files
Note: At small-to-medium scale (~100 topics), abbreviations yield modest savings (~5%) with meaningful maintenance overhead. The primary compaction gains come from prose removal and two-level access, not abbreviations. Skip dict.aicontext unless your memory grows large enough that the savings justify the upkeep.
RAG optimizes for scale. Mnemo optimizes for density.
| Mnemo | RAG | |
|---|---|---|
| Best for | ~100 topics, deterministic recall | 1000+ documents, fuzzy search |
| Token cost | Low — hand-curated bullets | Medium — raw chunks passed as-is |
| Setup | Text editor only | Vector DB + embedding model |
| Tuning | None | Chunk size, overlap, retrieval K |
| Recall guarantee | ✅ Explicit file read | ❌ Search can miss |
| Offline | ✅ | ❌ API required |
RAG's hidden costs: vector DB ops, embedding API fees, chunk strategy tuning, retrieval quality evaluation.
Use Mnemo when you have a bounded set of decisions, rules, or context that must be reliably recalled — and you want zero infrastructure.
Use RAG when you have large unstructured corpora and fuzzy search is acceptable.
Scenario: 10 project topics (auth, DB schema, API design, etc.), equivalent to ~10 typical wiki pages.
| What you load | Tokens | vs. raw docs |
|---|---|---|
| Raw Markdown docs (all) | ~2,920 | 100% |
| Mnemo — all files (without dict) | ~929 | 31.8% |
| Mnemo — all files | ~881 | 30.2% |
| Mnemo — index only | ~13 | <1% |
| Mnemo — index + 1 topic | ~194 | 6.6% |
The two main drivers of compaction:
- Removing prose: Keep only decision bullets, remove background paragraphs (~68% reduction)
- Two-level access: Index acts as a table of contents; load full topics only when needed
dict.aicontext (abbreviations) adds a further ~5% on top, but at small scale the maintenance overhead rarely justifies it.
See benchmark/scenario_10topics/ for full scenario data (raw/ and memory/ directories).
Claude reads referenced context, extracts key insights, determines category/topic, writes to appropriate file with timestamp, updates all affected indices.
Algorithm:
- Parse user input (e.g., "conversation history")
- Read referenced content
- Analyze: extract 3-5 key insights
- Determine: category (decisions/tasks/bugs) + topic (auth/db/etc)
- Check: does
memory/{category}/{topic}.aicontextexist?- Yes → read, merge with new info, update date
- No → create file
- Write entry (timestamp + bullets)
- Update
memory/{category}/index.aicontext - If new category created: update
memory/index.aicontext - Suggest dict entries if multi-token terms appear 3+ times
Example:
User: /saveMemory "conversation about JWT vs OAuth"
Claude's decision:
- Extract: "JWT chosen for stateless design, horizontal scaling"
- Category: decisions
- Topic: auth
- Writes to: memory/decisions/auth.aicontext
- Updates: memory/decisions/index.aicontext, memory/index.aicontext
- Copy
CLAUDE.md,memory-format.aicontext, and.claude/commands/saveMemory.mdto the new project - Create
memory/index.aicontextandmemory/{category}/index.aicontext - Call
/saveMemory— Claude creates topic files automatically - All operations use standard Read/Write/Edit tools → no external dependencies