Persona

Persona is a local-first, learnable memory layer for LLMs. It imports existing chat history, segments it into episodes, retrieves prior context through an OpenAI-compatible proxy, and evaluates whether learned memory policies produce context that is more representative of a person than frequency- or recency-weighted baselines.

This repository is a 12-week research proof of concept. It is deliberately evaluation-first: the probes, rubric, baselines, refusal behavior, latency budget, and phase gates live beside the implementation.

Research Question

Does a multi-signal scorer over personal chat history, combined with mode-aware retrieval, produce retrieved context that users judge as more representative of themselves than cosine-, frequency-, or recency-weighted retrieval?

The planned scorer models surprise, affect, disclosure, and self-relevance. The primary target is paired Wilcoxon p < 0.05 and Cohen's d > 0.5 on 30 pre-registered identity probes, with blind external-rater confirmation.

No Persona-versus-baseline performance result is claimed yet.

Current State

Week 3.5: foundation hardening before model training.

Implemented:

ChatGPT export parsing with active-branch reconstruction
Episode segmentation using conversation, time-gap, and lexical topic-shift signals
SQLite episodic storage with provenance-ready metadata
CLI inspection for statistics, search, episodes, captures, and proxy logs
Non-streaming OpenAI-compatible proxy with retrieval, augmentation, forwarding, response capture, and local JSONL diagnostics
Structured retrieval decisions with confidence, epistemic tier, provenance, and explicit refusal metadata
Three baseline runners with a shared result contract
Locked evaluation artifacts: 30 identity, 10 transactional, and 10 refusal probes
Blind packet export, rating templates, score recording, and summary tooling

Not implemented yet:

Frozen sentence-transformer encoder and vector index
Multi-signal write-policy scorer
Mode classifier and mode-aware retrieval
Learned reranker
Semantic and preference stores
Offline consolidator, contradiction handling, decay, and gated fine-tuning
Final statistical analysis and Persona-versus-baseline experiment

See Implementation Status for the module-level breakdown.

Architecture

flowchart LR
    A[Chat export] --> B[Ingest and segment]
    B --> C[(Episodic SQLite)]

    U[User or app] --> P[Local Persona proxy]
    P --> R[Retrieve]
    R --> D{Supported?}
    D -- no --> F[Forward without memory]
    D -- yes --> G[Deterministic augmentation]
    G --> F
    F --> L[Configured LLM upstream]
    L --> W[Capture response]
    W --> C

    C --> E[Evaluation harness]
    E --> X[Blind packets and aggregate results]

    C -. planned .-> M[Scorer, mode classifier, reranker]
    C -. planned .-> S[(Semantic, mode, preference stores)]
    C -. planned .-> O[Offline consolidator]

The target design uses four user-owned stores and three small learnable heads over a frozen encoder. The current implementation is intentionally narrower: the episodic store, proxy loop, refusal-aware retrieval contract, and evaluation harness are operational; the learned components are the next phase.

Read Architecture for current and target boundaries.

Quickstart

Requirements:

Python 3.11
uv

git clone https://github.com/amazadfar/persona.git
cd persona
make install
make test

Import a ChatGPT export

The current bootstrap command defaults to a private local snapshot path. Pass your own unzipped export directory explicitly:

uv run python scripts/bootstrap_from_export.py \
  /path/to/chatgpt-export \
  --db-path var/episodic.sqlite

uv run persona stats
uv run persona search "project planning"
uv run persona show <episode-id>

data/, var/, SQLite files, and local logs are gitignored. A synthetic public demo fixture is planned as the next publication milestone.

Run the proxy

For strict local-first operation, point Persona at a local OpenAI-compatible server. Using a cloud endpoint sends the augmented prompt and selected memory previews to that provider.

export PERSONA_UPSTREAM_BASE_URL=http://127.0.0.1:11434/v1
export PERSONA_EPISODIC_DB_PATH=var/episodic.sqlite

make proxy

The current proxy implements non-streaming POST /v1/chat/completions.

Run the baseline evaluation

make eval-baselines
make eval-packets
make eval-ratings-template

# Complete var/eval/ratings_template.yaml, then:
make eval-record
make eval-analyze

Private results and retrieved previews stay under var/. Public reports must contain aggregate or synthetic evidence only.

Evaluation Design

Probe set: 30 identity, 10 transactional, 10 unsupported-detail refusal probes
Baselines: lexical fallback, frequency-weighted, recency-weighted
Scoring: representativeness, specificity, balance, refusal correctness, and irrelevance severity
Protocol: randomized system labels, blind human scoring, external-rater subset
Primary analysis: paired Wilcoxon signed-rank and Cohen's d
Controls: locked probe/rubric versions, frozen evaluation corpus, ablations, rater-drift check, and latency gate

The current lexical baselines are scaffolding for the evaluation contract. The planned vector baseline and learned Persona pipeline arrive in Weeks 4-6.

Read Evaluation for the complete protocol and current limitations.

Privacy Model

Persona stores personal history locally and does not include telemetry. Raw exports, local databases, proxy logs, evaluation packets, and model artifacts are excluded from git.

The configured LLM upstream is a trust boundary. A local upstream preserves the strict local-first path. A cloud upstream receives the augmented request, so it must be treated as an explicit privacy tradeoff.

Read Privacy and Data Handling before running Persona on personal history.

Project Documents

Roadmap

Foundation hardening: refusal-aware contracts, provenance, locked eval artifacts
Learnable components: write-policy scorer, mode classifier, reranker
Evaluation: frozen corpus, blind scoring, external rater, ablations, statistics
Release: synthetic demo, public result reports, technical paper, demo video

Phase gates in PLAN.md are authoritative. Scope is cut before evaluation rigor.

Limitations

Retrieval is currently lexical, not embedding-based.
Refusal confidence is heuristic and not calibrated.
The proxy does not support streaming.
Model and consolidator modules are still stubs.
The current evaluation is single-user and cannot establish population-level claims.
Personal-memory evaluation is vulnerable to rater subjectivity and corpus-specific bias.

License

Apache-2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
configs		configs
docs		docs
persona		persona
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
PLAN.md		PLAN.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Persona

Research Question

Current State

Architecture

Quickstart

Import a ChatGPT export

Run the proxy

Run the baseline evaluation

Evaluation Design

Privacy Model

Project Documents

Roadmap

Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Persona

Research Question

Current State

Architecture

Quickstart

Import a ChatGPT export

Run the proxy

Run the baseline evaluation

Evaluation Design

Privacy Model

Project Documents

Roadmap

Limitations

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages