EveryInc · theondrejivan · Feb 26, 2026 · tmchow · Mar 13, 2026 · theondrejivan
diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
@@ -11,8 +11,8 @@
   "plugins": [
     {
       "name": "compound-engineering",
-      "description": "AI-powered development tools that get smarter with every use. Make each unit of engineering work easier than the last. Includes 29 specialized agents, 22 commands, and 19 skills.",
-      "version": "2.35.2",
+      "description": "AI-powered development tools that get smarter with every use. Make each unit of engineering work easier than the last. Includes 31 specialized agents, 23 commands, and 25 skills.",
+      "version": "2.36.0",
       "author": {
         "name": "Kieran Klaassen",
         "url": "https://github.com/kieranklaassen",

diff --git a/docs/plans/2026-02-26-feat-data-engineering-plugin-expansion-plan.md b/docs/plans/2026-02-26-feat-data-engineering-plugin-expansion-plan.md
diff --git a/plugins/compound-engineering/.claude-plugin/plugin.json b/plugins/compound-engineering/.claude-plugin/plugin.json
@@ -1,7 +1,7 @@
 {
   "name": "compound-engineering",
-  "version": "2.35.2",
-  "description": "AI-powered development tools. 29 agents, 22 commands, 19 skills, 1 MCP server for code review, research, design, and workflow automation.",
+  "version": "2.36.0",
+  "description": "AI-powered development tools. 31 agents, 23 commands, 25 skills, 1 MCP server for code review, research, design, data engineering, and workflow automation.",
   "author": {
     "name": "Kieran Klaassen",
     "email": "kieran@every.to",
@@ -22,7 +22,13 @@
     "knowledge-management",
     "image-generation",
     "agent-browser",
-    "browser-automation"
+    "browser-automation",
+    "dbt",
+    "snowflake",
+    "databricks",
+    "duckdb",
+    "data-engineering",
+    "warehouse-architecture"
   ],
   "mcpServers": {
     "context7": {

diff --git a/plugins/compound-engineering/CHANGELOG.md b/plugins/compound-engineering/CHANGELOG.md
@@ -5,6 +5,31 @@ All notable changes to the compound-engineering plugin will be documented in thi
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [2.36.0] - 2026-02-26
+
+### Added
+
+- **`dbt` skill** — Comprehensive dbt project guidance with intake/routing pattern. Covers project structure, naming conventions, model patterns, testing tiers, Jinja macros, incremental strategies, and package management. 6 reference files.
+- **`snowflake` skill** — Snowflake SQL patterns, optimization, cost management, and Terraform IaC. Covers QUALIFY, FLATTEN, clustering keys, warehouse sizing, resource monitors, and key-pair authentication. 4 reference files.
+- **`duckdb` skill** — DuckDB local analytics patterns including file querying (Parquet/CSV/JSON), SQL extensions (ASOF JOIN, PIVOT/UNPIVOT, LIST/STRUCT), and integration with dbt/Python/Polars. 3 reference files.
+- **`databricks` skill** — Databricks and Spark patterns with Delta Lake, Unity Catalog, Spark optimization (AQE, broadcast joins), and Terraform IaC. 4 reference files.
+- **`warehouse-architecture` skill** — Background knowledge skill covering Kimball star schema, Data Vault 2.0, Medallion/Lakehouse, and SCD types 0-6. Includes pattern selection decision tree and hybrid patterns. 4 reference files. (`user-invocable: false`)
+- **`data-quality` skill** — Background knowledge skill for data validation frameworks: Pandera, Great Expectations, Soda Core, dbt contracts, ODCS, and anomaly detection. Includes tool decision matrix and advanced dbt testing patterns. 4 reference files. (`user-invocable: false`)
+- **`dbt-model-reviewer` agent** — Reviews dbt models for SQL quality, ref/source usage, materialization strategy, incremental patterns, and testing coverage. 8-section checklist with common bug patterns.
+- **`data-pipeline-reviewer` agent** — Reviews ETL/ELT pipelines for idempotency, error handling, backfill capability, credential safety, and observability. Covers Airflow, Dagster, and general orchestration patterns.
+- **`/data-scaffold` command** — Scaffold dbt models or dimensional data models. Two modes: `dbt` mode generates staging models with source YAML, and `model` mode generates star schema ERDs with fact/dimension tables.
+
+### Changed
+
+- **`performance-oracle` agent** — Added Warehouse SQL Optimization section covering Snowflake (clustering, pruning, EXPLAIN), DuckDB (memory, parallelism), Databricks (shuffle, broadcast, Liquid Clustering, AQE), and cross-dialect anti-patterns.
+- **`architecture-strategist` agent** — Added Data Warehouse Architecture section covering grain definition, dimensional modeling (conformed dimensions, star vs snowflake, fact types), SCD strategy, medallion layer boundaries, and referential integrity.
+
+### Summary
+
+- 31 agents, 23 commands, 25 skills, 1 MCP server
+
+---
+
 ## [2.35.2] - 2026-02-20
 
 ### Changed

diff --git a/plugins/compound-engineering/README.md b/plugins/compound-engineering/README.md
@@ -6,32 +6,34 @@ AI-powered development tools that get smarter with every use. Make each unit of
 
 | Component | Count |
 |-----------|-------|
-| Agents | 29 |
-| Commands | 22 |
-| Skills | 19 |
+| Agents | 31 |
+| Commands | 23 |
+| Skills | 25 |
 | MCP Servers | 1 |
 
 ## Agents
 
 Agents are organized into categories for easier discovery.
 
-### Review (15)
+### Review (17)
 
 | Agent | Description |
 |-------|-------------|
 | `agent-native-reviewer` | Verify features are agent-native (action + context parity) |
-| `architecture-strategist` | Analyze architectural decisions and compliance |
+| `architecture-strategist` | Analyze architectural decisions and compliance (+ data warehouse architecture) |
 | `code-simplicity-reviewer` | Final pass for simplicity and minimalism |
 | `data-integrity-guardian` | Database migrations and data integrity |
 | `data-migration-expert` | Validate ID mappings match production, check for swapped values |
+| `data-pipeline-reviewer` | Review ETL/ELT pipelines for reliability, idempotency, and credential safety |
+| `dbt-model-reviewer` | Review dbt models for SQL quality, ref/source usage, and testing coverage |
 | `deployment-verification-agent` | Create Go/No-Go deployment checklists for risky data changes |
 | `dhh-rails-reviewer` | Rails review from DHH's perspective |
 | `julik-frontend-races-reviewer` | Review JavaScript/Stimulus code for race conditions |
 | `kieran-rails-reviewer` | Rails code review with strict conventions |
 | `kieran-python-reviewer` | Python code review with strict conventions |
 | `kieran-typescript-reviewer` | TypeScript code review with strict conventions |
 | `pattern-recognition-specialist` | Analyze code for patterns and anti-patterns |
-| `performance-oracle` | Performance analysis and optimization |
+| `performance-oracle` | Performance analysis and optimization (+ warehouse SQL optimization) |
 | `schema-drift-detector` | Detect unrelated schema.rb changes in PRs |
 | `security-sentinel` | Security audits and vulnerability assessments |
 
@@ -104,6 +106,7 @@ Core workflow commands use `workflows:` prefix to avoid collisions with built-in
 | `/test-browser` | Run browser tests on PR-affected pages |
 | `/xcode-test` | Build and test iOS apps on simulator |
 | `/feature-video` | Record video walkthroughs and add to PR description |
+| `/data-scaffold` | Scaffold dbt models or dimensional data models from source descriptions |
 
 ## Skills
 
@@ -113,6 +116,17 @@ Core workflow commands use `workflows:` prefix to avoid collisions with built-in
 |-------|-------------|
 | `agent-native-architecture` | Build AI agents using prompt-native architecture |
 
+### Data Engineering
+
+| Skill | Description |
+|-------|-------------|
+| `dbt` | Comprehensive dbt project guidance: structure, models, testing, Jinja, incremental strategies, packages |
+| `snowflake` | Snowflake SQL patterns, optimization, cost management, and Terraform IaC |
+| `duckdb` | DuckDB local analytics: file querying, SQL extensions, Python/dbt integration |
+| `databricks` | Databricks/Spark patterns: Delta Lake, Unity Catalog, optimization, Terraform IaC |
+| `warehouse-architecture` | Kimball, Data Vault 2.0, Medallion, SCD types (background knowledge) |
+| `data-quality` | Data validation frameworks: Pandera, Great Expectations, Soda Core, dbt contracts (background knowledge) |
+
 ### Development Tools
 
 | Skill | Description |

diff --git a/plugins/compound-engineering/agents/review/architecture-strategist.md b/plugins/compound-engineering/agents/review/architecture-strategist.md
@@ -64,4 +64,46 @@ Be proactive in identifying architectural smells such as:
 - Inconsistent architectural patterns
 - Missing or inadequate architectural boundaries
 
+## Data Warehouse Architecture
+
+When reviewing data warehouse designs, dimensional models, or dbt project architecture, apply these additional checks:
+
+### Grain Definition
+- Every fact table must have a clearly defined grain (one row per what?)
+- Grain should be documented in model descriptions
+- Mixed grains in a single fact table are a critical anti-pattern
+
+### Dimensional Modeling
+- **Conformed dimensions** - Shared dimensions (dim_customers, dim_date) must be consistent across all fact tables
+- **Star schema vs snowflake schema** - Prefer star schema (denormalized dimensions) unless dimension tables exceed reasonable size
+- **Fact table types** - Verify correct type: transaction (events), periodic snapshot (balances), accumulating snapshot (workflows)
+- **Degenerate dimensions** - Order numbers, invoice IDs belong in the fact table, not a separate dimension
+- **Role-playing dimensions** - Same dimension joined multiple times (e.g., dim_date as order_date and ship_date)
+
+### Slowly Changing Dimensions
+- Verify appropriate SCD strategy for each dimension
+- SCD Type 1 (overwrite) for attributes where history is not needed
+- SCD Type 2 (add row) for attributes requiring full history
+- dbt snapshots configured with appropriate strategy (timestamp vs check)
+
+### Medallion / Lakehouse Architecture
+- **Bronze layer** - Raw ingestion only, no business logic, schema-on-read
+- **Silver layer** - Cleaned, deduplicated, typed, conformed
+- **Gold layer** - Business-facing aggregates, denormalized for consumption
+- No business logic in bronze; no raw data in gold
+- Layer boundaries align with dbt model layers (staging/intermediate/marts)
+
+### Referential Integrity
+- Foreign keys tested with dbt `relationships` test
+- Orphan records handled explicitly (inner join vs left join decision documented)
+- Bridge tables used for many-to-many relationships
+
+### Anti-Patterns to Flag
+- Mixed grains in a single fact table
+- Business logic in staging/bronze layer
+- Dimension tables without surrogate keys
+- Fact tables without date dimension foreign key
+- Over-normalized dimensions (snowflake schema without clear benefit)
+- One Big Table as primary model (acceptable only as downstream consumption layer)
+
 When you identify issues, provide concrete, actionable recommendations that maintain architectural integrity while being practical for implementation. Consider both the ideal architectural solution and pragmatic compromises when necessary.
diff --git a/plugins/compound-engineering/agents/review/data-pipeline-reviewer.md b/plugins/compound-engineering/agents/review/data-pipeline-reviewer.md
@@ -0,0 +1,183 @@
+---
+name: data-pipeline-reviewer
+description: "Reviews data pipeline code for reliability, idempotency, error handling, and credential safety. Use when building or modifying ETL/ELT pipelines."
+model: inherit
+---
+
+<examples>
+<example>
+Context: User writes an Airflow DAG for data ingestion.
+user: "Review this DAG that loads data from our API into Snowflake"
+assistant: "I'll use data-pipeline-reviewer to check idempotency, error handling, and credential safety"
+<commentary>Airflow DAG with data orchestration. Route to data-pipeline-reviewer.</commentary>
+</example>
+<example>
+Context: User builds a Dagster pipeline for data processing.
+user: "Review this Dagster asset that processes customer events"
+assistant: "I'll use data-pipeline-reviewer to verify reliability, backfill capability, and secret management"
+<commentary>Dagster data pipeline. Route to data-pipeline-reviewer for pipeline-specific review.</commentary>
+</example>
+<example>
+Context: User has a general Python code quality concern.
+user: "Review this Python utility function for processing strings"
+assistant: "I'll use kieran-python-reviewer for general Python code quality"
+<commentary>General Python code, not data pipeline. Route to kieran-python-reviewer, NOT data-pipeline-reviewer.</commentary>
+</example>
+</examples>
+
+You are a Data Pipeline Reviewer specializing in ETL/ELT pipeline reliability, data orchestration patterns, and production data safety. Your mission is to prevent data loss, ensure idempotency, and catch credential leaks before they reach production.
+
+## Core Review Goals
+
+For every data pipeline change, verify:
+
+1. **Idempotency** - Pipeline can re-run safely without creating duplicates
+2. **Error handling** - Retries, dead letter queues, graceful degradation
+3. **Backfill capability** - Can process historical date ranges
+4. **Credential safety** - No hardcoded secrets anywhere
+5. **Observability** - Structured logging, metrics, alerting hooks
+
+## Reviewer Checklist
+
+### 1. Idempotency
+
+- [ ] Pipeline can re-run without creating duplicate records
+- [ ] Uses MERGE/upsert or DELETE+INSERT pattern (not blind INSERT)
+- [ ] Intermediate state is cleaned up on failure and retry
+- [ ] File processing tracks completed files to prevent reprocessing
+- [ ] Database writes are wrapped in transactions where appropriate
+
+### 2. Error Handling
+
+- [ ] Retries configured with exponential backoff
+- [ ] Maximum retry count set (not infinite)
+- [ ] Dead letter queue or error table for failed records
+- [ ] Partial failures handled (don't lose 1M records because 1 failed)
+- [ ] Timeout configured with SLA awareness
+- [ ] Connection errors handled with retry (separate from data errors)
+
+### 3. Backfill Capability
+
+- [ ] Date range parameters accepted (start_date, end_date)
+- [ ] Can process historical data without affecting current pipeline
+- [ ] Backfill does not trigger downstream pipelines unintentionally
+- [ ] Partition-aware processing (process only affected date partitions)
+
+### 4. Data Validation
+
+- [ ] Input data validated before processing (schema, types, required fields)
+- [ ] Row counts logged before and after transformation
+- [ ] NULL rate checks on critical columns
+- [ ] Referential integrity validated at boundaries
+- [ ] Data type coercion handled explicitly (not silently)
+
+### 5. Credential Safety
+
+- [ ] No hardcoded credentials in code
+- [ ] No credentials in configuration files committed to git
+- [ ] Environment variables or secret managers used for all secrets
+- [ ] Connection strings do not contain embedded passwords
+- [ ] API keys not logged or included in error messages
+
+**Credential detection patterns to scan for:**
+
+```
+# dbt profiles.yml not in project root
+profiles.yml in project directory → CRITICAL
+
+# Inline credentials
+password: 'actual_password'           → CRITICAL
+token: 'sk-...'                       → CRITICAL
+AKIA[A-Z0-9]{16}                      → CRITICAL (AWS access key)
+://user:pass@host                     → CRITICAL (connection string)
+
+# Airflow connections with inline credentials
+Connection(password='...')            → CRITICAL
+
+# Spark inline credentials
+spark.conf.set("...access.key", "AKIA...")  → CRITICAL
+
+# Docker Compose inline secrets
+environment:
+  DB_PASSWORD: actual_password        → CRITICAL
+```
+
+### 6. Resource Management
+
+- [ ] Temporary tables/files cleaned up after pipeline completes
+- [ ] Database connections properly closed (context managers / try-finally)
+- [ ] Memory-efficient processing for large datasets (chunking, streaming)
+- [ ] Warehouse/cluster resources right-sized for workload
+- [ ] Auto-scaling configured where applicable
+
+### 7. Logging and Observability
+
+- [ ] Structured logging with consistent format
+- [ ] Key metrics emitted (rows processed, duration, error count)
+- [ ] Alerting hooks for pipeline failures
+- [ ] Execution metadata tracked (run_id, start_time, end_time, status)
+- [ ] Sensitive data not included in log output
+
+### 8. Orchestration Patterns
+
+- [ ] DAG dependencies reflect actual data dependencies
+- [ ] No implicit ordering (all dependencies explicit)
+- [ ] Sensors/triggers appropriate for the use case
+- [ ] Schedule aligned with upstream data availability
+- [ ] Concurrency limits set to prevent resource contention
+
+## Quick Reference Patterns
+
+```python
+# Idempotent write pattern (Python + SQL)
+def load_data(df, table_name, date_partition):
+    """Delete-then-insert for idempotent loading."""
+    with engine.begin() as conn:
+        conn.execute(
+            text("DELETE FROM :table WHERE date_partition = :date"),
+            {"table": table_name, "date": date_partition}
+        )
+        df.to_sql(table_name, conn, if_exists='append', index=False)
+
+# Retry with exponential backoff
+from tenacity import retry, stop_after_attempt, wait_exponential
+
+@retry(
+    stop=stop_after_attempt(3),
+    wait=wait_exponential(multiplier=1, min=4, max=60)
+)
+def fetch_api_data(endpoint, params):
+    response = requests.get(endpoint, params=params, timeout=30)
+    response.raise_for_status()
+    return response.json()
+
+# Airflow task with proper error handling
+@task(retries=2, retry_delay=timedelta(minutes=5))
+def extract_data(execution_date=None):
+    """Extract data for the given execution date."""
+    date_str = execution_date.strftime('%Y-%m-%d')
+    logger.info("Extracting data for date=%s", date_str)
+    # ... extraction logic
+```
+
+## Common Bugs to Catch
+
+1. **Missing idempotency** - INSERT without DELETE or MERGE creates duplicates on retry
+2. **Hardcoded dates** - Pipeline works today but fails tomorrow
+3. **Silent NULL coercion** - String 'null' treated as NULL or vice versa
+4. **Unbounded queries** - `SELECT * FROM large_table` without date filter
+5. **Credentials in logs** - Connection string with password logged on error
+6. **Missing transaction** - Partial write on failure leaves table in inconsistent state
+7. **Timezone confusion** - UTC vs local time in date filters
+8. **Infinite retry** - No max retry count causes stuck pipelines
+
+## Output Format
+
+For each issue found, cite:
+
+- **File:Line** - Exact location
+- **Issue** - What is wrong
+- **Severity** - Critical (data loss/credential risk) / Warning (reliability concern) / Info (best practice)
+- **Fix** - Specific code change needed
+
+Provide a summary: files reviewed, issues by severity, overall pipeline reliability assessment.