Redacta

Pseudonymise medical and clinical documents before they're processed by AI or shared. Redacta replaces patient identifiers with labelled tokens — [PATIENT_NAME_1], [NHS_NUMBER_1], [DATE_OF_BIRTH_1], … — while leaving the clinical meaning intact, and returns a redaction report alongside the cleaned text.

It started as an Agent Skill and is now one engine shipped across eight surfaces — an iOS app, agent skill, MCP server, two libraries, a CLI, and two whiteboard apps.

One engine, many surfaces

Surface	Folder	Get it
iOS app — iPhone (app, Share Extension, widget)	`ios-app/`	Build with Xcode — see `ios-app/README.md`
Agent skill (Claude Code / apps / API)	`SKILL.md`, `scripts/`	`openclaw skills install redacta` (ClawHub)
MCP server (Claude Desktop, Cursor, …)	`mcp-server/`	`npx -y redacta-mcp` (npm · MCP Registry · Anthropic MCP Directory)
TypeScript library	`npm-package/`	`npm i @pharmatools/redacta` (npm)
Python library	`python-package/`	`pip install redacta` (PyPI)
Command-line tool	`cli-package/`	`npx redacta-cli` (npm)
Miro app	`miro-app/`	getpatiently.ai → Redacta
FigJam plugin	`figjam-plugin/`	Figma Community

The detection logic lives in one place — the TypeScript engine (@pharmatools/redacta, in npm-package/), which the MCP server and both whiteboard apps consume, and which the iOS app runs on-device via JavaScriptCore. The Python package mirrors it for pip users; the agent skill adds LLM reasoning for free-text names on top of the deterministic patterns.

How it works

Two layers:

Patterns (deterministic). A bundled script (scripts/redact_structured.py, Python standard library only, no network) matches fixed-format identifiers: NHS numbers (Modulus-11 validated), UK National Insurance numbers, dates of birth, UK postcodes, phone numbers, emails, and hospital/MRN numbers. US SSN and ZIP codes are also handled.
Reasoning (judgement). The skill then has the agent handle what patterns can't: patient names (told apart from the clinicians treating them), relatives and carers, postal addresses, and identifying ages.
Self-check. A final pass re-reads the output for any identifier that slipped through before the report is written.

It also works in reverse. Re-identification (scripts/reinstate.py) takes the token map from an earlier redaction and restores the original values — so you can redact a document, run it through another AI tool, and put the real details back locally. Redact → process → re-identify is a complete round trip, and identifiers only ever exist on your machine.

Safe Harbor mode. Ask for HIPAA Safe Harbor (or "US de-identification") and Redacta applies a stricter pass: all dates (not just the date of birth), all specific ages, and the remaining HIPAA identifiers — fax, certificate/licence, device serial, VIN, and health-plan beneficiary numbers.

Install

Claude Code

git clone https://github.com/nickjlamb/redacta ~/.claude/skills/redacta

Then invoke it with /redacta, or let it trigger automatically when you ask to redact or de-identify clinical text.

Claude apps / API

Zip the repository folder and upload it as a skill.

Path	What it is
`SKILL.md`	The skill — instructions plus metadata
`reference.md`	Pattern specs, the Modulus-11 algorithm, NI prefix rules, the date-of-birth vs clinical-date rule, token vocabulary, limitations
`scripts/redact_structured.py`	The deterministic pattern layer
`scripts/reinstate.py`	The re-identification layer (restore originals from a token map)
`scripts/test_redact_structured.py`	Tests for the pattern layer
`scripts/test_reinstate.py`	Tests for the re-identification layer
`evaluations.json`	Example evaluation scenarios

Run the tests:

python3 scripts/test_redact_structured.py
python3 scripts/test_reinstate.py

A note on limits

Redacta is a strong first line of defence, not a guarantee. It won't catch every possible identifier and isn't a substitute for formal data-protection processes. Always review the redaction report before sharing text.

License

MIT-0 (MIT No Attribution). Built by PharmaTools.AI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Redacta

One engine, many surfaces

How it works

Install

Contents

A note on limits

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github/workflows		.github/workflows
benchmark		benchmark
cli-package		cli-package
figjam-plugin		figjam-plugin
ios-app		ios-app
mcp-server		mcp-server
miro-app		miro-app
npm-package		npm-package
python-package		python-package
scripts		scripts
.clawhubignore		.clawhubignore
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
evaluations.json		evaluations.json
redacta-portfolio.md		redacta-portfolio.md
reference.md		reference.md

Folders and files

Latest commit

History

Repository files navigation

Redacta

One engine, many surfaces

How it works

Install

Contents

A note on limits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages