A persistent memory layer for Claude Code (and any MCP-compatible agent)
that captures every user prompt and tool_call from your coding sessions,
exposes them for recall via an MCP server + FastAPI, and feeds them back
into a local fine-tune pipeline that produces project-specific tool-calling
LoRAs. Built locally, runs locally — Postgres + pgvector, FastAPI on
127.0.0.1:3377, GGUFs in LM Studio. The hooks are the recorder; the
mem_* tables are the journal; the fine-tune pipeline is what turns that
journal into a model that actually knows your codebases.
Claude session ──► hooks (UserPromptSubmit, PreToolUse, PostToolUse, SessionStart/End)
│
▼
FastAPI ingest (port 3377, Bearer-token auth)
│
▼
Postgres (mem_user_prompts, mem_tool_calls, mem_sessions,
mem_projects, mem_observations, mem_lessons)
│
┌─────────────┴──────────────┐
▼ ▼
MCP recall surface fine-tune pipeline
(search, timeline, (build_v3_dataset.py)
get_observations, │
save_memory, lessons) ▼
LoRA train → merge → GGUF
│
▼
LM Studio
The write path (hooks → FastAPI → queue → Postgres) is fire-and-forget
and never blocks Claude. The read path (MCP tools) uses Reciprocal Rank
Fusion across pgvector and Postgres FTS. The training path reads directly
from mem_tool_calls joined to mem_user_prompts and emits Qwen-format
chat datasets.
| Path | Purpose |
|---|---|
app/ |
FastAPI service (lifecycle, routes, middleware, queue worker, redaction, auth) |
app/routes/ |
REST endpoints by domain: health, observations, sessions, admin, lessons, prompts, tool-calls |
mcp_server.py |
Stdio MCP server. Own DB pool + embedding model, zero FastAPI dependency |
hooks/ |
Claude Code Node.js hooks (UserPromptSubmit, Pre/PostToolUse, SessionStart/End, ensure-services) |
scripts/migrations/ |
Versioned SQL migrations (001-initial-schema through 013-project-consolidation) |
scripts/fine_tune/ |
Training pipeline: dataset builders, validator, smoke tests, GGUF verify, the wizard |
scripts/backfill/ |
Backfill tool_calls + prompts from Claude JSONL session logs |
scripts/ (root) |
install_backup_schedule.sh, backup.sh, run_migrations.py, install-agent-memory-*.sh |
models/ |
Base/LoRA/merged/GGUF artifacts. Symlinked to Dropbox cold storage. Gitignored |
data/ |
Postgres backups + processed datasets (processed/qwen25_tools/v2/, processed/qwen3_tools/v3/). Gitignored |
tests/ |
pytest API integration tests + tests/fine_tune/ validator/dataset tests + real-world A/B harnesses |
docs/ |
fine_tune/ (V3_PLAN, V2_DATA_PIPELINE_PLAN, FAILURE_MODES, WIZARD, PIPELINE_RUNBOOK), training_runs/, backups.md, PRIMER.md |
hooks/hooks.json |
Reference hook registration; copy into ~/.claude/settings.json |
install.js |
Legacy Node installer (Docker + native PG, MCP register). Still works for fresh installs |
All tables are mem_*-prefixed to avoid collisions in a shared Postgres.
| Table | Purpose |
|---|---|
mem_tool_calls |
Every tool_call captured from Claude sessions — input, output preview, success, errors. The training fuel |
mem_user_prompts |
User prompts that drove the tool_calls. Linked from mem_tool_calls.prev_user_prompt_id (migration 012) |
mem_sessions |
Claude session identifiers + start/end time + final status |
mem_projects |
Project identity keyed on git root + remote + branch (post migration 013 consolidation) |
mem_observations |
Explicit memory observations (semantic notes the agent or user saved). 768-dim pgvector embeddings |
mem_observation_queue |
Async processing queue for the worker (FOR UPDATE SKIP LOCKED) |
mem_lessons |
Proactive rules triggered before risky tool calls (Edit/Write/Bash/NotebookEdit) |
mem_schema_migrations |
Migration history. One row per applied file in scripts/migrations/ |
Full schema lives in scripts/migrations/*.sql. The current head is
013-project-consolidation.sql. See docs/PRIMER.md for column-level
details and docs/fine_tune/V2_DATA_PIPELINE_PLAN.md for the
prompt↔tool_call linkage design introduced by migration 012.
Five Node.js hooks live in hooks/. They are designed fire-and-forget
(~30-40ms p99) and exit 0 on every error path so a misconfigured or
down agent-memory never blocks Claude.
| Hook | Event | Description |
|---|---|---|
user-prompt-submit.js |
UserPromptSubmit | POSTs prompt text + session + cwd to /api/prompts. Live capture of the prompt that drives the next tool calls (added by issue #30, before that mem_user_prompts was empty between 2026-03-29 and 2026-05-13) |
pre-tool-use.js |
PreToolUse | Checks active lessons for Edit/Write/Bash/NotebookEdit. Injects warnings as a systemMessage |
post-tool-use.js |
PostToolUse | Fire-and-forget POST to /api/queue. If the server is down, spawns ensure-services.js |
session-start.js |
SessionStart | Blocks until services are healthy. Calls ensure-services.js if down. Installs daily backup schedule (idempotent) |
session-end.js |
Stop | PATCHes /api/sessions/{id} to mark the session completed |
Hook auth shares hooks/auth-header.js which reads AGENT_MEMORY_TOKEN
from the environment. Hooks also send X-Agent-Name: claude so the
trusted-agents bypass applies on localhost.
To wire them into Claude Code, symlink each hooks/*.js file into
~/.claude/hooks/ and register the hook list in ~/.claude/settings.json.
The exact commands are in HANDOFF.md under "Setup on New Machine".
| Version | Base | Status | Notes |
|---|---|---|---|
| v1 | Qwen2.5-3B-Instruct | shipped, in production | Q4_K_M GGUF at models/gguf/qwen2.5-3b-toolcalls-q4km.gguf, loaded in LM Studio. Has a known empty-args loop bug on vague prompts — anti-loop guard mitigates |
| v2 | Qwen2.5-3B-Instruct | RETRACTED 2026-05-15 | Multi-turn regression in real-world A/B (0/10 useful, 90% re-emit). Eval gate measured the wrong symptom. See docs/training_runs/v2-real-world-test.md |
| v3 | Qwen3-4B | in progress | Local MPS training, ≤6 GB Q4_K_M rule, ≥125k effective context via YaRN. Plan doc currently lists Qwen3-8B as the target; the 4B is the smoke/iteration run |
Anchor docs:
docs/fine_tune/V3_PLAN.md— current training plan with multi-turn fixes baked indocs/training_runs/v2-real-world-test.md— verbatim A/B transcripts that drove the retractiondocs/fine_tune/FAILURE_MODES.md— 12 operational gotchas (resolve()-into-Dropbox, llama-cli hangs, YaRN config, anti-loop, etc.)docs/fine_tune/PIPELINE_RUNBOOK.md— phase-by-phase recipedocs/fine_tune/V2_DATA_PIPELINE_PLAN.md— how the v2 dataset shape was built (still the v3 data shape too)
The training script (models/lora/qwen2.5-3b-toolcalls-lora/run_train_lora.py)
is env-var-driven, not argparse — set MODEL_SLUG, DATASET_VERSION,
DATASET_TIER, RUN_TAG, EPOCHS.
This assumes the legacy installer is not desired. For a one-shot install,
node install.js still works (sets up Docker or native Postgres, MCP
registration, hook symlinks, daily backup, .env).
# 1. Clone
git clone https://github.com/metazen11/agent-memory.git ~/_CODING/agentMemory
cd ~/_CODING/agentMemory
# 2. Python venv (project targets 3.12+; current dev runs 3.14)
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
# 3. Postgres (native Homebrew recommended on macOS)
brew install postgresql@16 pgvector
brew services start postgresql@16
createuser -s mz
createdb -O mz agent_memory
psql -d agent_memory -c "CREATE EXTENSION vector;"
# 4. Configure .env (copy from .env.example, set POSTGRES_* + REQUIRE_AUTH=true)
cp .env.example .env
# 5. Start the API (migrations run on startup)
.venv/bin/uvicorn app.main:app --port 3377 --host 127.0.0.1
# 6. In another shell: generate tokens for trusted agents
.venv/bin/python -m app.cli setup
echo 'export AGENT_MEMORY_TOKEN="<claude-token-from-step-6>"' >> ~/.zshenv
# 7. Symlink hooks into ~/.claude/hooks/ and register them in settings.json
# Full commands: see HANDOFF.md "Setup on New Machine"
# 8. Install the daily Postgres backup schedule (idempotent)
bash scripts/install_backup_schedule.sh
bash scripts/install_backup_schedule.sh --checkVerify the install:
curl http://localhost:3377/api/health
.venv/bin/python -m app.cli list-tokensscripts/fine_tune/wizard.py is a Textual TUI that sequences the v2/v3
ad-hoc playbook into one command. It runs phase by phase with gates so
a bad dataset or a failed smoke can't silently turn into a 36-hour
training run.
.venv-finetune/bin/python scripts/fine_tune/wizard.py
# Or, replay a saved config non-interactively
.venv-finetune/bin/python scripts/fine_tune/wizard.py \
--config train_config.yaml --no-tuiStages: verify env (Dropbox quit, MPS available, disk free) → build
dataset (build_v2_dataset.py / build_v3_dataset.py) → audit gate
(token counts, tool histogram, drop-reason MANIFEST) → tiny smoke
(200 rows, 1 epoch, ~25-40 min) → tiny validator (≥3% parse rate)
→ full train (~3-4h MPS for 3B, ~36-40h for 8B) → full validator
(≥85% on merged HF + GGUF) → GGUF convert + LM Studio install →
chat-loop verification on llama-server.
Full reference: docs/fine_tune/WIZARD.md.
Auth and isolation are configured via .env and app/config.py. Defaults
err on the safe side; production install (this machine) has all of these
on.
- Bearer token auth —
REQUIRE_AUTH=trueenablesAuthMiddlewareon every endpoint. Tokens are generated bypython -m app.cli setupand scoped per-agent (anvil,claude,codex,gemini,python-httpx). - Trusted-agent bypass —
TRUSTED_AGENTSallows a known agent name via theX-Agent-Nameheader on localhost only. Hooks use this so the recorder never has to ship a token to~/.claude/. - Host bound to
127.0.0.1— no external interface ever. - CORS locked to localhost origins.
- Rate limits —
100/minwrites,500/minreads (RateLimitMiddleware). - Secret redaction —
REDACT_SECRETS=trueby default. Strips API keys, tokens, and password-shaped strings fromtool_inputbefore persistence. Seeapp/redact.py. - Audit logging —
audit_log_level=writes_only, 30-day retention. SeeAuditMiddleware. - PG trust-auth warning — startup logs CRITICAL if
POSTGRES_PASSWORDis empty andALLOW_TRUST_AUTH=trueis not explicitly set.
There is no docs/security.md yet — read app/middleware.py, app/auth.py,
app/redact.py, and the security block in app/config.py for current
behavior.
launchd runs pg_dump at 03:14 local time and writes
data/backups/daily_YYYYMMDD_HHMMSS.sql.gz. Retention keeps the 3
most recent daily_*.sql.gz; manually-created snapshots
(pre_v2_backfill_*.sql.gz, etc.) are never auto-deleted.
# Install (idempotent — safe to re-run, called from session-start hook too)
bash scripts/install_backup_schedule.sh
# Verify
bash scripts/install_backup_schedule.sh --check
ls -lht data/backups/daily_*.sql.gz | head -3
# Manual snapshot
bash scripts/backup.shThe plist installed is ~/Library/LaunchAgents/com.metazen.agent-memory-backup.plist.
Operator details: docs/backups.md.
HANDOFF.md— current state (v2 retraction status, v3 plan refs, setup-on-new-machine, resume commands)AGENTS.md— file-map operating guide for agents working in this repodocs/fine_tune/V3_PLAN.md— current training plan with multi-turn fixesdocs/fine_tune/FAILURE_MODES.md— 12 known failure modes + fixes (start here when something breaks)docs/fine_tune/WIZARD.md— wizard referencedocs/fine_tune/PIPELINE_RUNBOOK.md— phase-by-phase training recipedocs/fine_tune/V2_DATA_PIPELINE_PLAN.md— full data-pipeline design (prompt↔tool_call linkage, project consolidation, build_v2_dataset)docs/training_runs/— per-run reports including the v2 real-world A/Bdocs/backups.md— daily backup operator referencedocs/PRIMER.md— multi-agent integration guide (Cursor, Windsurf, Cline, Codex, Zed, custom)
No LICENSE file is committed. Treat the repo as private until one is added.
Maintainer: mz@wfca.com (see ~/.claude/CLAUDE.md).