A general-purpose Claude Code plugin for synthetic data generation across any tabular domain.
Synthdata turns a YAML schema (or one of 12 built-in templates) into realistic synthetic datasets — with Faker-backed fields, statistical distributions, foreign-key integrity, behavioral profiles, and temporal event generation. Outputs xlsx, csv, json, sql, or parquet.
| Skill | What it does |
|---|---|
| synthdata-generate | Pick a template (HR, e-commerce, SaaS, healthcare, finance, security, IoT, CRM, logs, surveys, +blank) or design custom schema via interview, generate synthetic dataset |
| synthdata-extract | Extract tabular data from Excel workbooks to JSON (auto-detects title rows and headers) |
| synthdata-extend | Add rows or new columns to an existing dataset while preserving FK integrity and profile distributions |
| synthdata-anonymize | Transform a real dataset into a synthetic equivalent — detects PII, replaces with Faker values, preserves shape and distributions |
| synthdata-compute | Derive aggregated, scored, or transformed tables from existing data — monthly rollups, composite scores, percentile ranks, segment summaries |
| synthdata-prompt-builder | Plan multi-step generation workflows — identify raw vs derived tables, match to templates, output a sequenced set of prompts |
| synthdata-tutorial | Guided interactive walkthrough of the synthdata skills |
/plugin marketplace add rappdw/synthdata
/plugin install synthdata@synthdata-marketplaceIn another marketplace's marketplace.json:
{
"name": "synthdata",
"source": {
"source": "github",
"repo": "rappdw/synthdata"
}
}claude --plugin-dir /path/to/synthdatacp -r skills/* ~/.claude/skills/
# or use the installer:
./install.sh./package.sh # produces dist/synthdata-v0.2.0.plugin
# Cowork > Customize > Plugins > Upload custom pluginpip install openpyxl faker numpy pandas pyyaml --break-system-packages> Generate me a synthetic HR directory with 500 employees
> Create an e-commerce orders dataset
> Build a custom dataset for my app — I'll describe the tables
> Extract this spreadsheet to JSON
> Anonymize this customer export
> Compute monthly risk scores from my event data
> Help me plan what data I need for a fraud detection demo
12 domain starters ship with synthdata-generate. Pick one to get going fast, or start from blank-slate for a custom schema.
| Template | Entities |
|---|---|
| hr-directory | employees, departments |
| ecommerce-orders | customers, products, orders, order_items |
| saas-metrics | accounts, users, events, subscriptions |
| healthcare-patients | patients, providers, encounters, claims |
| financial-transactions | accounts, customers, transactions |
| security-events | users, devices, alerts, incidents |
| log-events | services, requests, errors |
| iot-sensors | devices, readings, events |
| crm-pipeline | contacts, companies, deals, activities |
| survey-responses | respondents, questions, responses |
| healthcare-hrm-security | users, threat events, phishing sims, training, DLP, abuse mailbox |
| blank-slate | minimal starter for custom schemas |
name: my-dataset
tables:
- name: users
rows: { quick: 50, medium: 1000, thorough: 5000 }
columns:
- { name: user_id, type: id, prefix: "U", width: 4 }
- { name: name, type: faker, method: name }
- { name: department, type: choice, values: [Sales, Eng, Ops], weights: [0.4, 0.4, 0.2] }
- { name: salary, type: float, distribution: lognormal, mean: 75000, sigma: 0.4, min: 30000 }
profiles:
- { name: high_risk, weight: 0.05, overrides: { risk_multiplier: 3.0 } }
- name: events
foreign_key: { column: user_id, references: users.user_id, distribution: zipfian, alpha: 1.5 }
rows_per_parent: { distribution: poisson, lam: 5 }
columns:
- { name: event_type, type: choice, values: [login, click, error] }
- { name: ts, type: timestamp, start: "2025-01-01", end: "2025-12-31" }
writers: [xlsx, json]See skills/synthdata-generate/references/schema-spec.md for the complete spec.
MIT — see LICENSE.