Synthdata Plugin

A general-purpose Claude Code plugin for synthetic data generation across any tabular domain.

Synthdata turns a YAML schema (or one of 12 built-in templates) into realistic synthetic datasets — with Faker-backed fields, statistical distributions, foreign-key integrity, behavioral profiles, and temporal event generation. Outputs xlsx, csv, json, sql, or parquet.

Skills

Skill	What it does
synthdata-generate	Pick a template (HR, e-commerce, SaaS, healthcare, finance, security, IoT, CRM, logs, surveys, +blank) or design custom schema via interview, generate synthetic dataset
synthdata-extract	Extract tabular data from Excel workbooks to JSON (auto-detects title rows and headers)
synthdata-extend	Add rows or new columns to an existing dataset while preserving FK integrity and profile distributions
synthdata-anonymize	Transform a real dataset into a synthetic equivalent — detects PII, replaces with Faker values, preserves shape and distributions
synthdata-compute	Derive aggregated, scored, or transformed tables from existing data — monthly rollups, composite scores, percentile ranks, segment summaries
synthdata-prompt-builder	Plan multi-step generation workflows — identify raw vs derived tables, match to templates, output a sequenced set of prompts
synthdata-tutorial	Guided interactive walkthrough of the synthdata skills

Installation

Option 1: GitHub marketplace (recommended)

/plugin marketplace add rappdw/synthdata
/plugin install synthdata@synthdata-marketplace

Option 2: Reference from another marketplace

In another marketplace's marketplace.json:

{
  "name": "synthdata",
  "source": {
    "source": "github",
    "repo": "rappdw/synthdata"
  }
}

Option 3: Plugin directory

claude --plugin-dir /path/to/synthdata

Option 4: Manual skill copy

cp -r skills/* ~/.claude/skills/
# or use the installer:
./install.sh

Option 5: Cowork upload

./package.sh                              # produces dist/synthdata-v0.2.0.plugin
# Cowork > Customize > Plugins > Upload custom plugin

Prerequisites

pip install openpyxl faker numpy pandas pyyaml --break-system-packages

Quick Start

> Generate me a synthetic HR directory with 500 employees
> Create an e-commerce orders dataset
> Build a custom dataset for my app — I'll describe the tables
> Extract this spreadsheet to JSON
> Anonymize this customer export
> Compute monthly risk scores from my event data
> Help me plan what data I need for a fraud detection demo

Templates

12 domain starters ship with synthdata-generate. Pick one to get going fast, or start from blank-slate for a custom schema.

Template	Entities
hr-directory	employees, departments
ecommerce-orders	customers, products, orders, order_items
saas-metrics	accounts, users, events, subscriptions
healthcare-patients	patients, providers, encounters, claims
financial-transactions	accounts, customers, transactions
security-events	users, devices, alerts, incidents
log-events	services, requests, errors
iot-sensors	devices, readings, events
crm-pipeline	contacts, companies, deals, activities
survey-responses	respondents, questions, responses
healthcare-hrm-security	users, threat events, phishing sims, training, DLP, abuse mailbox
blank-slate	minimal starter for custom schemas

Schema Format

name: my-dataset
tables:
  - name: users
    rows: { quick: 50, medium: 1000, thorough: 5000 }
    columns:
      - { name: user_id, type: id, prefix: "U", width: 4 }
      - { name: name, type: faker, method: name }
      - { name: department, type: choice, values: [Sales, Eng, Ops], weights: [0.4, 0.4, 0.2] }
      - { name: salary, type: float, distribution: lognormal, mean: 75000, sigma: 0.4, min: 30000 }
    profiles:
      - { name: high_risk, weight: 0.05, overrides: { risk_multiplier: 3.0 } }
  - name: events
    foreign_key: { column: user_id, references: users.user_id, distribution: zipfian, alpha: 1.5 }
    rows_per_parent: { distribution: poisson, lam: 5 }
    columns:
      - { name: event_type, type: choice, values: [login, click, error] }
      - { name: ts, type: timestamp, start: "2025-01-01", end: "2025-12-31" }
writers: [xlsx, json]

See skills/synthdata-generate/references/schema-spec.md for the complete spec.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.claude-plugin		.claude-plugin
skills		skills
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
package.sh		package.sh
registry.json		registry.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthdata Plugin

Skills

Installation

Option 1: GitHub marketplace (recommended)

Option 2: Reference from another marketplace

Option 3: Plugin directory

Option 4: Manual skill copy

Option 5: Cowork upload

Prerequisites

Quick Start

Templates

Schema Format

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Synthdata Plugin

Skills

Installation

Option 1: GitHub marketplace (recommended)

Option 2: Reference from another marketplace

Option 3: Plugin directory

Option 4: Manual skill copy

Option 5: Cowork upload

Prerequisites

Quick Start

Templates

Schema Format

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages