Persona is a local-first, learnable memory layer for LLMs. It imports existing chat history, segments it into episodes, retrieves prior context through an OpenAI-compatible proxy, and evaluates whether learned memory policies produce context that is more representative of a person than frequency- or recency-weighted baselines.
This repository is a 12-week research proof of concept. It is deliberately evaluation-first: the probes, rubric, baselines, refusal behavior, latency budget, and phase gates live beside the implementation.
Does a multi-signal scorer over personal chat history, combined with mode-aware retrieval, produce retrieved context that users judge as more representative of themselves than cosine-, frequency-, or recency-weighted retrieval?
The planned scorer models surprise, affect, disclosure, and self-relevance.
The primary target is paired Wilcoxon p < 0.05 and Cohen's d > 0.5 on 30
pre-registered identity probes, with blind external-rater confirmation.
No Persona-versus-baseline performance result is claimed yet.
Week 3.5: foundation hardening before model training.
Implemented:
- ChatGPT export parsing with active-branch reconstruction
- Episode segmentation using conversation, time-gap, and lexical topic-shift signals
- SQLite episodic storage with provenance-ready metadata
- CLI inspection for statistics, search, episodes, captures, and proxy logs
- Non-streaming OpenAI-compatible proxy with retrieval, augmentation, forwarding, response capture, and local JSONL diagnostics
- Structured retrieval decisions with confidence, epistemic tier, provenance, and explicit refusal metadata
- Three baseline runners with a shared result contract
- Locked evaluation artifacts: 30 identity, 10 transactional, and 10 refusal probes
- Blind packet export, rating templates, score recording, and summary tooling
Not implemented yet:
- Frozen sentence-transformer encoder and vector index
- Multi-signal write-policy scorer
- Mode classifier and mode-aware retrieval
- Learned reranker
- Semantic and preference stores
- Offline consolidator, contradiction handling, decay, and gated fine-tuning
- Final statistical analysis and Persona-versus-baseline experiment
See Implementation Status for the module-level breakdown.
flowchart LR
A[Chat export] --> B[Ingest and segment]
B --> C[(Episodic SQLite)]
U[User or app] --> P[Local Persona proxy]
P --> R[Retrieve]
R --> D{Supported?}
D -- no --> F[Forward without memory]
D -- yes --> G[Deterministic augmentation]
G --> F
F --> L[Configured LLM upstream]
L --> W[Capture response]
W --> C
C --> E[Evaluation harness]
E --> X[Blind packets and aggregate results]
C -. planned .-> M[Scorer, mode classifier, reranker]
C -. planned .-> S[(Semantic, mode, preference stores)]
C -. planned .-> O[Offline consolidator]
The target design uses four user-owned stores and three small learnable heads over a frozen encoder. The current implementation is intentionally narrower: the episodic store, proxy loop, refusal-aware retrieval contract, and evaluation harness are operational; the learned components are the next phase.
Read Architecture for current and target boundaries.
Requirements:
- Python 3.11
- uv
git clone https://github.com/amazadfar/persona.git
cd persona
make install
make testThe current bootstrap command defaults to a private local snapshot path. Pass your own unzipped export directory explicitly:
uv run python scripts/bootstrap_from_export.py \
/path/to/chatgpt-export \
--db-path var/episodic.sqlite
uv run persona stats
uv run persona search "project planning"
uv run persona show <episode-id>data/, var/, SQLite files, and local logs are gitignored. A synthetic
public demo fixture is planned as the next publication milestone.
For strict local-first operation, point Persona at a local OpenAI-compatible server. Using a cloud endpoint sends the augmented prompt and selected memory previews to that provider.
export PERSONA_UPSTREAM_BASE_URL=http://127.0.0.1:11434/v1
export PERSONA_EPISODIC_DB_PATH=var/episodic.sqlite
make proxyThe current proxy implements non-streaming POST /v1/chat/completions.
make eval-baselines
make eval-packets
make eval-ratings-template
# Complete var/eval/ratings_template.yaml, then:
make eval-record
make eval-analyzePrivate results and retrieved previews stay under var/. Public reports must
contain aggregate or synthetic evidence only.
- Probe set: 30 identity, 10 transactional, 10 unsupported-detail refusal probes
- Baselines: lexical fallback, frequency-weighted, recency-weighted
- Scoring: representativeness, specificity, balance, refusal correctness, and irrelevance severity
- Protocol: randomized system labels, blind human scoring, external-rater subset
- Primary analysis: paired Wilcoxon signed-rank and Cohen's
d - Controls: locked probe/rubric versions, frozen evaluation corpus, ablations, rater-drift check, and latency gate
The current lexical baselines are scaffolding for the evaluation contract. The planned vector baseline and learned Persona pipeline arrive in Weeks 4-6.
Read Evaluation for the complete protocol and current limitations.
Persona stores personal history locally and does not include telemetry. Raw exports, local databases, proxy logs, evaluation packets, and model artifacts are excluded from git.
The configured LLM upstream is a trust boundary. A local upstream preserves the strict local-first path. A cloud upstream receives the augmented request, so it must be treated as an explicit privacy tradeoff.
Read Privacy and Data Handling before running Persona on personal history.
- Architecture
- Evaluation
- Privacy and Data Handling
- Implementation Status
- 12-Week PoC Plan
- Agent Contract
- Foundation hardening: refusal-aware contracts, provenance, locked eval artifacts
- Learnable components: write-policy scorer, mode classifier, reranker
- Evaluation: frozen corpus, blind scoring, external rater, ablations, statistics
- Release: synthetic demo, public result reports, technical paper, demo video
Phase gates in PLAN.md are authoritative. Scope is cut before evaluation rigor.
- Retrieval is currently lexical, not embedding-based.
- Refusal confidence is heuristic and not calibrated.
- The proxy does not support streaming.
- Model and consolidator modules are still stubs.
- The current evaluation is single-user and cannot establish population-level claims.
- Personal-memory evaluation is vulnerable to rater subjectivity and corpus-specific bias.
Apache-2.0. See LICENSE.