An interactive explainer of Eldan & Russinovich (2023), Who's Harry Potter? Approximate Unlearning in LLMs: the paper that taught Llama-2-7B to forget Harry Potter without retraining from scratch. Built as a portfolio piece using marimo's reactive notebook framework; runs entirely in the browser via WebAssembly.
🔗 Live notebook: deploying to
jacobbowie.com/unlearning. For now, clone and run it locally (see below), or build the in-browser WASM bundle yourself.
The 2023 paper showed that Llama-2-7B, which had cost roughly 184,000 GPU-hours to pretrain, could be made to forget Harry Potter in about one GPU-hour of fine-tuning, with the model's general-knowledge benchmark scores essentially unchanged. The mechanism is three small ideas: a deliberately-reinforced HP-aware twin model, an arithmetic trick that subtracts that twin's predictions out of the baseline, and a dictionary of ~1,500 anchored substitutions that lets the same trick work even when the prompt itself is HP-flavored. The interaction between the three ideas is the substance of the paper.
This notebook walks through that mechanism interactively. The hero visualization frames the algorithm as edge surgery on the model's knowledge graph: HP-specific links die, the surrounding language graph survives, and a few "re-purposed" links grow so the model has something plausible to say when asked about Harry. Three sliders, a prompt explorer, and a live alpha knob let a reader feel each step of the algorithm rather than just reading about it. The notebook also includes a candid caveats section: follow-up work (Lynch et al. 2024; Liu et al. EMNLP 2024) has shown the unlearning is more brittle than the paper's original familiarity metric suggested, and newer methods (NPO, SimNPO, RMU) have largely eclipsed this technique. The paper remains the cleanest pedagogical entry into the problem, which is why this walkthrough sits on top of it.
notebook.py the polished walkthrough (single-file, self-contained)
claims/
manuscript_claims.json catalog of every citation-bearing claim, used during
the citation-base audit prior to public release
deploy/
citations.md verified citations + per-claim spot-checks against
the source paper and HF model card
HANDOFF.md brief for the portfolio agent (case-study integration)
build_wasm.sh one-command WASM export via Docker
case_study_unlearning.qmd drop-in Quarto template for embedding the notebook
og_metadata.html Open Graph + Twitter card meta tags
social_card_brief.md spec for the 1200×630 social preview image
description_snippets.md portfolio / social / HN copy at three lengths
sketches/
comparison.py six visualization approaches in one file (dev scratch)
d3_force_skeleton.py d3-force HTML embed proof-of-concept
SKETCHES_README.md which viz won and why
scripts/
precompute_hf.py extend cached completions via the released checkpoint
(sketched; Databricks is the cleanest compute path)
precompute_local.py toy gpt2 version of the unlearning pipeline (CPU-OK)
pyproject.toml project + pinned deps (uv-managed)
uv.lock locked, reproducible env (committed)
Dockerfile optional uv image + marimo edit on :2718
docker-compose.yml one-command local dev (optional)
.dockerignore
LICENSE MIT (code)
LICENSE-CONTENT.md CC BY 4.0 (notebook prose, narrative, visualizations)
CITATION.cff cite-this-repository metadata (renders on GitHub)
CLAUDE.md orientation for any Claude session in this repo
README.md this file
The notebook's data is inlined as Python literals (Figures 1, 3, and 5 from the paper). Single self-contained file: ships clean to molab and bundles small via WASM export.
uv run marimo edit notebook.py # author / preview
uv run marimo run notebook.py # read-only app viewuv provisions the environment from pyproject.toml + uv.lock on first run.
No manual install step, and no global Python is required.
docker compose up --buildThen open http://localhost:2718. Edit any .py file in the project from
the marimo file picker; changes persist to the host because the project dir
is bind-mounted. Stop with Ctrl+C, or docker compose down.
Lint pass:
uv run marimo check notebook.pyThe canonical surface for this notebook is the standalone Netlify deploy
configured outside this repo (handled by the Netlify integration on push
to main). The build artifact lives at dist/unlearning/ and is
regenerated on every successful CI run.
For embedding into a larger portfolio site as a case-study page, see
deploy/HANDOFF.md.
To rebuild the bundle locally:
bash deploy/build_wasm.sh
# produces dist/unlearning/ (gitignored, regenerable)
# smoke test:
python -m http.server --directory dist/unlearning 8000The graph-viz edge weights are currently illustrative; the rest of the
numbers (completions, token probabilities, familiarity scores, benchmark
scores) are taken verbatim from the paper. Both the notebook and
deploy/citations.md say so explicitly.
To replace illustrative weights with measured ones:
scripts/precompute_hf.py: skeleton script that should hit the releasedmicrosoft/Llama2-7b-WhoIsHarryPottercheckpoint over a curated cloze probe set, capture top-k softmax probabilities, and emit JSON that the notebook can load. The cleanest compute path is a one-time Databricks job usingtransformers.AutoModelForCausalLMwithgenerate(..., output_scores=True). The released checkpoint is ungated under the Microsoft Research License Agreement; no Llama-2 gate dance required for this fine-tune specifically.scripts/precompute_local.py: small gpt2-based toy pipeline for development and pedagogy; not used in the published notebook.
This is v0.2 work. The v0.1 notebook ships with illustrative weights and explicit disclosure of that fact.
marimo reactive notebooks · Altair charts · networkx graph layout · Pyodide for browser execution.
Citation metadata in CITATION.cff. GitHub renders a
"Cite this repository" widget on the landing page.
If you use this notebook in your own work, please also cite the original research paper it explains:
Eldan, R., & Russinovich, M. (2023). Who's Harry Potter? Approximate Unlearning in LLMs. arXiv:2310.02238. https://arxiv.org/abs/2310.02238
Built by Jacob Bowie, researcher at UCONN. Reach me at jacob.bowie2@gmail.com.
A 2026 marimo demonstration, explaining Eldan & Russinovich (2023) and the unlearning work it set off.
