Skip to content

amazadfar/persona

Repository files navigation

Persona

CI Python 3.11 uv License Local first

Persona is a local-first, learnable memory layer for LLMs. It imports existing chat history, segments it into episodes, retrieves prior context through an OpenAI-compatible proxy, and evaluates whether learned memory policies produce context that is more representative of a person than frequency- or recency-weighted baselines.

This repository is a 12-week research proof of concept. It is deliberately evaluation-first: the probes, rubric, baselines, refusal behavior, latency budget, and phase gates live beside the implementation.

Research Question

Does a multi-signal scorer over personal chat history, combined with mode-aware retrieval, produce retrieved context that users judge as more representative of themselves than cosine-, frequency-, or recency-weighted retrieval?

The planned scorer models surprise, affect, disclosure, and self-relevance. The primary target is paired Wilcoxon p < 0.05 and Cohen's d > 0.5 on 30 pre-registered identity probes, with blind external-rater confirmation.

No Persona-versus-baseline performance result is claimed yet.

Current State

Week 3.5: foundation hardening before model training.

Implemented:

  • ChatGPT export parsing with active-branch reconstruction
  • Episode segmentation using conversation, time-gap, and lexical topic-shift signals
  • SQLite episodic storage with provenance-ready metadata
  • CLI inspection for statistics, search, episodes, captures, and proxy logs
  • Non-streaming OpenAI-compatible proxy with retrieval, augmentation, forwarding, response capture, and local JSONL diagnostics
  • Structured retrieval decisions with confidence, epistemic tier, provenance, and explicit refusal metadata
  • Three baseline runners with a shared result contract
  • Locked evaluation artifacts: 30 identity, 10 transactional, and 10 refusal probes
  • Blind packet export, rating templates, score recording, and summary tooling

Not implemented yet:

  • Frozen sentence-transformer encoder and vector index
  • Multi-signal write-policy scorer
  • Mode classifier and mode-aware retrieval
  • Learned reranker
  • Semantic and preference stores
  • Offline consolidator, contradiction handling, decay, and gated fine-tuning
  • Final statistical analysis and Persona-versus-baseline experiment

See Implementation Status for the module-level breakdown.

Architecture

flowchart LR
    A[Chat export] --> B[Ingest and segment]
    B --> C[(Episodic SQLite)]

    U[User or app] --> P[Local Persona proxy]
    P --> R[Retrieve]
    R --> D{Supported?}
    D -- no --> F[Forward without memory]
    D -- yes --> G[Deterministic augmentation]
    G --> F
    F --> L[Configured LLM upstream]
    L --> W[Capture response]
    W --> C

    C --> E[Evaluation harness]
    E --> X[Blind packets and aggregate results]

    C -. planned .-> M[Scorer, mode classifier, reranker]
    C -. planned .-> S[(Semantic, mode, preference stores)]
    C -. planned .-> O[Offline consolidator]
Loading

The target design uses four user-owned stores and three small learnable heads over a frozen encoder. The current implementation is intentionally narrower: the episodic store, proxy loop, refusal-aware retrieval contract, and evaluation harness are operational; the learned components are the next phase.

Read Architecture for current and target boundaries.

Quickstart

Requirements:

  • Python 3.11
  • uv
git clone https://github.com/amazadfar/persona.git
cd persona
make install
make test

Import a ChatGPT export

The current bootstrap command defaults to a private local snapshot path. Pass your own unzipped export directory explicitly:

uv run python scripts/bootstrap_from_export.py \
  /path/to/chatgpt-export \
  --db-path var/episodic.sqlite

uv run persona stats
uv run persona search "project planning"
uv run persona show <episode-id>

data/, var/, SQLite files, and local logs are gitignored. A synthetic public demo fixture is planned as the next publication milestone.

Run the proxy

For strict local-first operation, point Persona at a local OpenAI-compatible server. Using a cloud endpoint sends the augmented prompt and selected memory previews to that provider.

export PERSONA_UPSTREAM_BASE_URL=http://127.0.0.1:11434/v1
export PERSONA_EPISODIC_DB_PATH=var/episodic.sqlite

make proxy

The current proxy implements non-streaming POST /v1/chat/completions.

Run the baseline evaluation

make eval-baselines
make eval-packets
make eval-ratings-template

# Complete var/eval/ratings_template.yaml, then:
make eval-record
make eval-analyze

Private results and retrieved previews stay under var/. Public reports must contain aggregate or synthetic evidence only.

Evaluation Design

  • Probe set: 30 identity, 10 transactional, 10 unsupported-detail refusal probes
  • Baselines: lexical fallback, frequency-weighted, recency-weighted
  • Scoring: representativeness, specificity, balance, refusal correctness, and irrelevance severity
  • Protocol: randomized system labels, blind human scoring, external-rater subset
  • Primary analysis: paired Wilcoxon signed-rank and Cohen's d
  • Controls: locked probe/rubric versions, frozen evaluation corpus, ablations, rater-drift check, and latency gate

The current lexical baselines are scaffolding for the evaluation contract. The planned vector baseline and learned Persona pipeline arrive in Weeks 4-6.

Read Evaluation for the complete protocol and current limitations.

Privacy Model

Persona stores personal history locally and does not include telemetry. Raw exports, local databases, proxy logs, evaluation packets, and model artifacts are excluded from git.

The configured LLM upstream is a trust boundary. A local upstream preserves the strict local-first path. A cloud upstream receives the augmented request, so it must be treated as an explicit privacy tradeoff.

Read Privacy and Data Handling before running Persona on personal history.

Project Documents

Roadmap

  1. Foundation hardening: refusal-aware contracts, provenance, locked eval artifacts
  2. Learnable components: write-policy scorer, mode classifier, reranker
  3. Evaluation: frozen corpus, blind scoring, external rater, ablations, statistics
  4. Release: synthetic demo, public result reports, technical paper, demo video

Phase gates in PLAN.md are authoritative. Scope is cut before evaluation rigor.

Limitations

  • Retrieval is currently lexical, not embedding-based.
  • Refusal confidence is heuristic and not calibrated.
  • The proxy does not support streaming.
  • Model and consolidator modules are still stubs.
  • The current evaluation is single-user and cannot establish population-level claims.
  • Personal-memory evaluation is vulnerable to rater subjectivity and corpus-specific bias.

License

Apache-2.0. See LICENSE.

About

A local-first, learnable memory layer for LLMs that turns personal chat history into representative, mode-aware context.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors