Skip to content

rappdw/synthdata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Synthdata Plugin

A general-purpose Claude Code plugin for synthetic data generation across any tabular domain.

Synthdata turns a YAML schema (or one of 12 built-in templates) into realistic synthetic datasets — with Faker-backed fields, statistical distributions, foreign-key integrity, behavioral profiles, and temporal event generation. Outputs xlsx, csv, json, sql, or parquet.

Skills

Skill What it does
synthdata-generate Pick a template (HR, e-commerce, SaaS, healthcare, finance, security, IoT, CRM, logs, surveys, +blank) or design custom schema via interview, generate synthetic dataset
synthdata-extract Extract tabular data from Excel workbooks to JSON (auto-detects title rows and headers)
synthdata-extend Add rows or new columns to an existing dataset while preserving FK integrity and profile distributions
synthdata-anonymize Transform a real dataset into a synthetic equivalent — detects PII, replaces with Faker values, preserves shape and distributions
synthdata-compute Derive aggregated, scored, or transformed tables from existing data — monthly rollups, composite scores, percentile ranks, segment summaries
synthdata-prompt-builder Plan multi-step generation workflows — identify raw vs derived tables, match to templates, output a sequenced set of prompts
synthdata-tutorial Guided interactive walkthrough of the synthdata skills

Installation

Option 1: GitHub marketplace (recommended)

/plugin marketplace add rappdw/synthdata
/plugin install synthdata@synthdata-marketplace

Option 2: Reference from another marketplace

In another marketplace's marketplace.json:

{
  "name": "synthdata",
  "source": {
    "source": "github",
    "repo": "rappdw/synthdata"
  }
}

Option 3: Plugin directory

claude --plugin-dir /path/to/synthdata

Option 4: Manual skill copy

cp -r skills/* ~/.claude/skills/
# or use the installer:
./install.sh

Option 5: Cowork upload

./package.sh                              # produces dist/synthdata-v0.2.0.plugin
# Cowork > Customize > Plugins > Upload custom plugin

Prerequisites

pip install openpyxl faker numpy pandas pyyaml --break-system-packages

Quick Start

> Generate me a synthetic HR directory with 500 employees
> Create an e-commerce orders dataset
> Build a custom dataset for my app — I'll describe the tables
> Extract this spreadsheet to JSON
> Anonymize this customer export
> Compute monthly risk scores from my event data
> Help me plan what data I need for a fraud detection demo

Templates

12 domain starters ship with synthdata-generate. Pick one to get going fast, or start from blank-slate for a custom schema.

Template Entities
hr-directory employees, departments
ecommerce-orders customers, products, orders, order_items
saas-metrics accounts, users, events, subscriptions
healthcare-patients patients, providers, encounters, claims
financial-transactions accounts, customers, transactions
security-events users, devices, alerts, incidents
log-events services, requests, errors
iot-sensors devices, readings, events
crm-pipeline contacts, companies, deals, activities
survey-responses respondents, questions, responses
healthcare-hrm-security users, threat events, phishing sims, training, DLP, abuse mailbox
blank-slate minimal starter for custom schemas

Schema Format

name: my-dataset
tables:
  - name: users
    rows: { quick: 50, medium: 1000, thorough: 5000 }
    columns:
      - { name: user_id, type: id, prefix: "U", width: 4 }
      - { name: name, type: faker, method: name }
      - { name: department, type: choice, values: [Sales, Eng, Ops], weights: [0.4, 0.4, 0.2] }
      - { name: salary, type: float, distribution: lognormal, mean: 75000, sigma: 0.4, min: 30000 }
    profiles:
      - { name: high_risk, weight: 0.05, overrides: { risk_multiplier: 3.0 } }
  - name: events
    foreign_key: { column: user_id, references: users.user_id, distribution: zipfian, alpha: 1.5 }
    rows_per_parent: { distribution: poisson, lam: 5 }
    columns:
      - { name: event_type, type: choice, values: [login, click, error] }
      - { name: ts, type: timestamp, start: "2025-01-01", end: "2025-12-31" }
writers: [xlsx, json]

See skills/synthdata-generate/references/schema-spec.md for the complete spec.

License

MIT — see LICENSE.

About

A general-purpose Claude Code plugin for **synthetic data generation** across any tabular domain

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors