Stagecraft

Stagecraft is an orchestrator that runs your AI coding tool through a structured 18-stage pipeline. PM writes the brief. Principal designs. Specialists build their areas. Reviewers critique. QA tests. Each stage produces an artifact and a machine-readable gate. The next stage cannot start until the gate passes. The full run is on disk: auditable, resumable, not buried in a chat log.

Works across Claude Code, Codex CLI, Gemini CLI, and a generic no-host mode. One project, one config, one or more hosts. Different roles can run on different models — Claude for design, Codex for backend, Gemini for QA, Claude for review. The gate JSON is the seam.

devteam init --host claude-code        # one-time install in your project
devteam stage requirements --feature "Add SMS notification opt-in"
# or: devteam stage requirements --feature-file ./feature-brief.md
# (model writes brief + gate; hooks validate)
devteam next                           # → "▶️ run-stage design (stage-02)"
# … 16 more stages, then "🎉 pipeline-complete"

The CLI binary is devteam. Stagecraft is the project; devteam is what you type.

First 30 minutes

If you're evaluating Stagecraft, this is the fastest path to a working answer:

(5 min) Install the framework. git clone <this-repo> && cd stagecraft && npm install && npm link. Verify with devteam --help.
(2 min) Initialize a throwaway target project. mkdir /tmp/scratch && cd /tmp/scratch && devteam init --host claude-code. Then devteam doctor should be all green.
(3 min) Read EXAMPLE.md. One feature traced through all 18 stages with real CLI captures. Tells you what each stage actually does.
(15 min) Run one full pipeline yourself. The simplest path: run each stage with --headless and let the orchestrator drive Claude Code directly.
```
devteam stage requirements --feature "a one-paragraph feature you understand" --headless
devteam next   # tells you the next command to run
```
Walk forward through design, build, peer-review, qa, sign-off. For deploy, pick a deploy adapter in .devteam/config.yml (cloud-run, gizmos, kubernetes, terraform, docker-compose, or custom) — or write the gate by hand to skip the deploy step: echo '{"stage":"stage-08","status":"PASS","track":"full","timestamp":"'$(date -u +%FT%TZ)'","blockers":[],"warnings":[]}' > pipeline/gates/stage-08.json.
(5 min) Inspect the audit trail. ls pipeline/gates/ — every stage's outcome on disk. cat pipeline/brief.md, pipeline/design-spec.md, pipeline/code-review/by-*.md. The pipeline is reconstructable from these files alone.

If after 30 minutes you can see how this would help your team, run a 2-week pilot (adoption-guide.md has the script). If you can't, it may not be the right tool for your team.

Documentation map

Four reader paths. Every doc belongs to exactly one. (docs/README.md is the docs-only index.)

Evaluator — should we adopt?

Goal: decision-ready in 30 minutes.

Step	Doc	Purpose
1	EXAMPLE.md	One feature traced through all 18 stages — the fastest reality check
2	docs/comparative-analysis.md	Stagecraft vs BMAD, GitHub Spec Kit, Agent OS, Kiro — four-school taxonomy and three defensible claims
3	docs/adoption-guide.md	Pilot script, common objections, and success criteria for the 2-week trial

Evaluating further (long-form): docs/presentation-notes.md · docs/walkthroughs/soc2-evidence-collector.md · docs/walkthroughs/stage-04-split-host.md

Operator — I run pipelines daily

Step	Doc	Purpose
1	docs/user-guide.md	Daily-use reference: running stages, multi-host setups, headless mode
2	docs/tracks.md	Which of the six tracks to pick for a given change
3	docs/conventions.md	Pipeline markers operators read and write (`QUESTION:`, `BLOCKER:`, magic comments)
4	docs/runbooks/README.md	Troubleshooting index: symptom → runbook section
5	docs/cost.md	Cost tracking, pricing table, and budget workflow

Reference: docs/faq.md · docs/git-workflow.md · docs/ci.md · docs/memory.md · docs/observability.md · docs/reproducibility.md · docs/runbooks/escalation.md · docs/runbooks/fix-and-retry.md · docs/runbooks/open-followups.md · docs/runbooks/deploy-failure.md · docs/runbooks/autonomous-run.md · docs/runbook-template.md

Contributor — I change Stagecraft

Step	Doc	Purpose
1	CONTRIBUTING.md	Recipes for adding a host, role, stage, or skill
2	ARCHITECTURE.md	12 locked design decisions — read when you want to know why
3	docs/TESTING.md	Test structure and guidance
4	core/adapters/host-adapter.md	The host-adapter contract (~150 lines)
5	docs/adr/README.md	Architecture decision records

Reference: docs/concepts.md · docs/methodology.md · docs/FEATURES.md · docs/BACKLOG.md · docs/autonomous-execution-design.md · docs/spec-authoring.md · docs/migration-safety.md · docs/red-team.md · docs/verification-beyond-tests.md · docs/brief-template.md · docs/design-spec-template.md · docs/GAP-ANALYSIS.md

Model — never reads docs/

Nothing under docs/ is load-bearing for a pipeline run. Models read AGENTS.md · rules/ · roles/ · skills/ only.

Why "Stagecraft"?

The pipeline is a staged production. PM writes the brief. Principal directs the architecture. Specialist developers each build their part. Reviewers critique. QA tests. Deploy is the curtain. Each stage has its cast, its script (the rules under rules/), and its gate that decides whether the show moves on. Multi-model peer review is the panel of critics; the red-team is the adversary in the wings. The discipline of coordinating specialized roles in a sequenced production is stagecraft.

The vocabulary extends naturally: a run is one pipeline invocation, dress rehearsal is pre-review (Stage 4a), curtain call is the retrospective (Stage 9), and notes are peer-review comments. The metaphor is structural, not decorative.

What this gives you

A coordinated team of role-specific subagents running a structured software-development pipeline end-to-end:

18-stage gated pipeline — requirements (PM) → design (Principal) → build (specialist workstreams) → peer-review (Reviewer × 4) → QA → deploy → retrospective. Each stage writes a machine-readable gate; the next stage cannot start until the gate passes.
6 tracks — full, quick, nano, config-only, dep-update, hotfix. Pick by change size; devteam assess infers the right track from description keywords and file heuristics.
Per-workstream gate JSON — every stage writes a gate to pipeline/gates/. Validator enforces shape; orchestrator merges multi-role stage gates.
Multi-host routing — .devteam/config.yml picks which host runs which role. Claude for design, Codex for backend, Gemini for QA — the gate JSON is the stable seam.
Bounded autonomous driver — devteam run loops next → dispatch → merge until pipeline-complete; devteam stage <name> --headless drives a single stage non-interactively.

Full feature catalogue: docs/FEATURES.md.

Prerequisites

Platform: macOS, Linux, and native Windows are supported. CI exercises the core Windows portability surface on Node 22: CLI startup, initialization, diagnostics, quoted host commands, executable discovery, and timeout termination. WSL2 remains a supported option when a host CLI or project toolchain expects a POSIX shell.

Node.js ≥ 20
At least one of: Claude Code (claude --version works), Codex CLI (codex --version works), Gemini CLI (gemini --version works), or just a terminal (generic adapter — prompts rendered for manual use, no automation)
Git (recommended for version-controlling artifacts; the pipeline itself does not require it)

Quick start

# 1. Get the framework (one time, anywhere)
git clone <this-repo> && cd stagecraft && npm install && npm link

# 2. In your target project — install the host adapter surface
cd ~/projects/my-app
devteam init --host claude-code         # or: codex / gemini-cli / claude-code,codex

# 3. Verify
devteam doctor                           # should be all green

Then drive the pipeline. There are two ways to run a stage:

Path A — `--headless` (single terminal, start here)

The orchestrator drives the host CLI for you (claude --print, codex exec, gemini). One command per stage; model output streams to your terminal. Best for first runs, CI, and scripted use.

devteam stage requirements --feature "Add SMS notification opt-in" --headless
# [devteam] dispatching pm → claude-code (headless)
# (model output streams to your terminal as it works)
#   ✓ pm (claude-code): exit 0, 23000ms → pipeline/gates/stage-01.json

cat pipeline/brief.md                    # the artifact the model wrote
cat pipeline/gates/stage-01.json          # the gate JSON

devteam next                              # → "▶️ run-stage design (stage-02)"
devteam stage design --headless           # next stage
# … keep going until "🎉 pipeline-complete"

One terminal. One command per stage. The gate file appears when the model is done. devteam next tells you the next command.

Path B — Interactive in Claude Code (two windows)

Use this when you want to watch the subagent work, observe file edits, or intervene mid-stage. The slash command /devteam is installed by devteam init.

# Terminal 1 (your project dir):
devteam stage requirements --feature "Add SMS notification opt-in"
#   → prints a prompt with an onboarding preamble explaining what to do with it

In Terminal 2 (or a separate window), inside Claude Code at the same project root:

/devteam stage requirements --feature "Add SMS notification opt-in"

Claude Code recognizes the slash command, dispatches the pm subagent. You see Claude write pipeline/brief.md, then write pipeline/gates/stage-01.json. The Stop hook fires and validates:

[gate-validator] ✅ GATE PASS — stage-01/pm (claude-code)

Back in Terminal 1:

devteam next                              # → "▶️ run-stage design (stage-02)"

Repeat for each stage.

Which path should I pick?

	Path A: `--headless`	Path B: interactive in Claude Code
Terminal windows	1	2 (terminal + Claude Code)
You see the agent's work	streamed text in your terminal	full Claude Code UI
Best for	first runs, CI, scripted use, multi-host fanout	watching agents work, mid-stage intervention, debugging a stage
Auth	`claude --version` must work	Claude Code app/CLI logged in
Speed	usually faster (no UI overhead)	usually slower (UI overhead, human pauses)

Recommendation for first-timers: Path A. One terminal, one command per stage, results on disk. Switch to Path B when you want to observe a specific stage in detail.

For a complete walked-through example with the actual output you'll see at each step, read EXAMPLE.md.

What `devteam init` installs

For --host claude-code in a target project:

Path	Contents
`.devteam/config.yml`	Routing config — which host handles which role/stage
`.devteam/rules/*.md`	Pipeline, gate, escalation, retrospective rules (10 top-level docs + per-stage `stage-NN.md` files)
`.claude/agents/*.md`	Role subagents with Claude Code YAML frontmatter (12)
`.claude/skills/*/SKILL.md`	Task helpers — implement, review-rubric, security-checklist, etc. (13)
`.claude/commands/devteam.md`	Slash command wrapper
`.claude/settings.local.json`	Hooks: validator on `Stop`/`SubagentStop`; approval-derivation on `PostToolUse`; secret-scan on `PreToolUse`
`pipeline/gates/`	Empty workspace dir for gate files

For --host codex: similar but rendered into .codex/prompts/roles/, .codex/skills/, with no hooks or slash commands (codex doesn't have those primitives).

For multi-host (--host claude-code,codex): both surfaces installed side-by-side; the routing config decides who handles what at runtime.

CLI reference

Command	What it does
`devteam init --host <name>[,<name>...] [--profile dogfood]`	Install host adapter(s) into the current project; write `.devteam/config.yml`; write (or update) the managed `.gitignore` block of volatile Stagecraft files. Add `--profile dogfood` to also install the four dogfooding safeguards (supplemental gitignore block, pre-commit infrastructure guard, `.git/info/exclude` entry, and `profile: dogfood` config marker).
`devteam stage <name> [--feature "..." \| --feature-file <path>] [--headless]`	Render (and optionally execute) a stage prompt; `--feature-file` reads a UTF-8 feature brief from disk; `--headless` drives the host CLI automatically
`devteam next [--json] [--skip-advise]`	Report the next action the pipeline needs (run-stage, continue-stage, merge, fix-and-retry, resolve-escalation, pipeline-complete). Non-pass actions carry a `failure_class` (code-defect / state-corruption / external-blocked / judgment-gate / convergence-exhausted) telling you how to respond, and a `fix-and-retry` carries structured `clear_gates` (repo-relative gate paths to clear before re-running) the autonomous driver consumes directly; `--json` adds a `schema_version`. Emits a ⚠ advisory notice when unresolved follow-up items may block downstream stages.
`devteam run [--feature "..." \| --feature-file <path>] [--repair "<symptom>"] [--repair-at <file>:<line>] [--track <t>] [--until <s>] [--budget-usd X] [--timeout-ms N] [--retry-delay-ms N] [--auto-rule <classes>] [--allow-stage <s>] [--max-iterations N] [--resume] [--force] [--json] [--fail-on-advisory[=all]] [--auto-commit]`	Bounded autonomous driver (ADR-003 / H2). Loops `next → dispatch → merge` until `pipeline-complete`. Auto-fixes `code-defect` failures (clears the failing gate, propagates blockers to `context.md`, re-dispatches — bounded by `autonomy.max_retries`) and retries transient dispatch failures with backoff. With `--auto-rule <classes>` it also auto-resolves escalations whose Principal-ruling class is in the allowlist — never a cannot-decide, the consequence ceiling, or `convergence-exhausted`. Halts for a human otherwise (un-granted escalation, the ceiling — grant with `--allow-stage`, a budget cap, or a structural dispatch failure). `--feature-file <path>` is equivalent to `--feature` with the file's UTF-8 contents and is mutually exclusive with `--feature`. `--repair "<symptom>"` selects bug-fix intent mode (ADR-009): stage-01 produces a DIAGNOSIS (root cause, proposed fix, `affected_files`, regression criterion) instead of a feature brief; the diagnosis is a judgment gate requiring `--auto-rule diagnosis-approved` or a human ruling before build; stage-03b (executable-spec) is injected immediately before build — even on hotfix depth — where the agent writes a failing-first regression scenario and sets a tri-state `reproduced` field (`true` / `false` / `"unverifiable: <reason>"`), which the orchestrator stamp verifies red→green (not agent-asserted); the build runs in PATCH MODE scoped to the diagnosed files; a structural scope gate FAILs builds that write outside the diagnosed `affected_files` set. `--repair-at <file>:<line>` skips the LLM diagnosis and seeds the affected-files list directly from known defect locations (the reproduction stage is still included). Both repair flags are mutually exclusive with `--feature` and `--feature-file`. `--auto-commit` automatically commits pipeline artifacts after a clean halt (`ceiling`, `--until`, `budget`) using the same algorithm as `devteam commit` — no interactive prompt; commit failure emits a warning but does not change the halt exit code (Phase 12.3, ADR-010). Writes `pipeline/run.lock`, `run-state.json`, `run-log.jsonl`. Emits a `heartbeat` event at the start of each iteration; a fire-and-forget stall probe emits `stall-detected` when the workstream log and gates go quiet for 5 min (observe-only — ADR-007 Tier 1; dispatch is never killed). Exit: 0 complete/clean-stop, 1 needs-attention, 2 locked, 3 `--fail-on-advisory` advisory blockers remain. See docs/runbooks/autonomous-run.md.
`devteam commit [--all] [--dry-run] [--message <msg>] [--json]`	Stage exactly the right pipeline artifacts and commit (Phase 12.2, ADR-010). Reads `pipeline/run-state.json`, determines which stages have not yet been committed (idempotency cursor), stages gate files (PASS/WARN only) + named artifacts per stage, excludes volatile runtime files, generates a `"pipeline: stages NN–NN PASS"` commit message, and prompts `y/n/e` before committing. `--all` ignores the cursor; `--dry-run` prints without committing; calling it twice is a no-op.
`devteam compact [--dry-run] [--json]`	Strip all devteam-managed marker sections (`<!-- devteam:*:begin/end -->`) from `pipeline/context.md` in one shot. Sections (run-blockers, red-team-blockers, deploy-target, advise, etc.) are regenerated by devteam on the next run when still needed, so removal is always safe. Use to prune `context.md` after a long pipeline run or before switching to `isolation: bounded`. `--dry-run` lists what would be removed without modifying the file.
`devteam status [--json]`	Liveness report (ADR-007 Tier 1). Reads `run-state.json` + tail of `run-log.jsonl`; reports `status` / `current_stage` / `last_action` / `iterations` / `cost_usd` / `last_heartbeat_age_ms` / `last_event_age_ms` / `stall_detected`. Read-only; no `--watch`.
`devteam advise [--apply <decisions>] [--json]`	Show advisory panel for `noted_for_followup[]` items across all completed gates; classify risk (QA_BLOCKER, PEER_REVIEW_RISK, QA_NOISE); apply operator decisions to `pipeline/context.md`. Apply format: `--apply AC-11=A,AC-10=B:PROJ-123,AC-12=A`. Actions: scaffold, defer, amend, nothing, known-flaky, wontfix. See rules/advise.md.
`devteam restart <stage> [--cascade] [--keep-context] [--dry-run]`	Clear a stage's gate(s) so the pipeline can re-run it. With `--cascade`, also clears every stage that comes after. Use after FAIL/ESCALATE. See docs/runbooks/escalation.md.
`devteam ruling --topic "..." [--context <paths>] [--target-gate <gate>] [--headless]`	Dispatch the Principal subagent for an ad-hoc ruling outside the normal stage flow. Writes a `PRINCIPAL-RULING:` line to `pipeline/context.md`; no gate is touched.
`devteam merge <stage>`	Aggregate per-workstream gate files into a single merged stage gate (required after multi-role stages)
`devteam derive-approvals [<file>] [--cwd <dir>] [--json]`	Re-run the `approval-derivation` hook on `pipeline/code-review/by-*.md` and update the per-area `stage-05.<area>.json` gates. Use after hand-editing a review file outside an active Claude Code session — the hook only fires on agent saves; shell/editor saves bypass it. Follow with `devteam merge peer-review` to rebuild the merged gate. See docs/runbooks/fix-and-retry.md § Case 5.
`devteam validate`	Run the gate validator manually against the current gate directory
`devteam verify-chain [--track <t>] [--json]`	C6 tamper-evident gate chain. Verify that each stage gate's recorded `chain.prev_hash` still matches its predecessor's content; reports breaks (located) and unstamped gates. Exit 0 intact, 1 broken — CI-usable.
`devteam stamp-chain [--track <t>]`	(Re)stamp the gate chain on every stage gate in order. Use after a deliberate earlier-stage re-run (which legitimately invalidates the chain) or to stamp gates written interactively.
`devteam verify <stage-id> [--json]`	Orchestrator-stamped verification. For stage-04a (lint+tests) and stage-06 (tests + AC mapping), runs the configured commands and stamps the gate with what was actually observed. Flips status to FAIL if the orchestrator's truth disagrees with the model's claim.
`devteam summary`	One-screen view of all stages: pass/warn/fail/escalate/pending per stage and workstream
`devteam log [--follow] [--json]`	Chronological event timeline: every gate + every artifact write, mtime-sorted, with per-stage key fields. `--follow` polls at 1s for live tailing. Works in headless and user-driven modes.
`devteam stages`	List all stages and their stage IDs
`devteam hosts`	List available host adapters
`devteam doctor`	Check that the install is healthy: hooks wired, agent files present, host CLIs on PATH. When `profile: dogfood` is set, adds a "Dogfood mode" section with six checks (pre-commit guard, hook executable, gitignore dogfood block, `.git/info/exclude` deploy.md entry, no npm publish script, budget-usd reminder).
`devteam reproduce <stage-id> [--json]`	Read a past gate and report replay readiness — which reproducibility fields are recorded, what's missing, whether the current prompt's hash matches what the gate captured.
`devteam replay <stage-id> [--dry-run] [--json]`	Re-run a recorded stage against the current config; diff the result across status, blockers, cost, tokens, duration, reproducibility fields. New gate goes to `pipeline/gates/replay/`.
`devteam ci install [--ci github-actions] [--out <dir>] [--force]`	Drop a CI workflow template into the target project's `.github/workflows/`. The workflow validates gates and posts each as a GitHub check run on PRs. Does NOT run the LLM pipeline in CI by design.
`devteam memory <ingest\|query\|stats\|promote\|clear> [--org]`	Persistent project memory: ingest briefs/ADRs/lessons; query by similarity. `--org` targets the cross-project shared store at `~/.stagecraft/memory/`.
`devteam architecture lookup "<topic>"`	Friendlier wrapper around `memory query --org --kind adr`. Surfaces prior ADRs at design time so architecture decisions don't silently re-litigate.
`devteam ui`	Open the web dashboard (live gate view, cost, performance charts)
`devteam spec generate`	Scaffold `pipeline/spec.feature` from AC-N criteria in `pipeline/brief.md`
`devteam spec verify`	Check `pipeline/spec.feature` for drift against the brief
`devteam consistency analyze [--strict] [--json]`	Cross-artifact drift check across the full pipeline chain — brief → spec → `pr-*.md ## Verify` → red-team `must_address` → test-report → gate field reality. Generalizes `devteam spec verify` to every intermediate artifact + the gate-vs-reality dimension. Exits non-zero on any drift.
`devteam assess [--description "..."] [--json] [--apply] [--confirm] [files...]`	G6 / ADR-006 — rule-based track recommendation for a proposed change. Analyzes description keywords and file/content heuristics (security patterns, migration files, config-only paths) and returns the recommended track with rationale. Default writes `pipeline/track.json` (`source: "inferred"`) as the per-run provenance record read by `devteam run`. `--confirm` sets `source: "human"` (silences the unconfirmed-track guard). `--apply` writes `pipeline.custom_stages` to `.devteam/config.yml` (project-wide setting).
`devteam standards discover [--cwd <dir>] [--json] [--dry-run] [--force]`	B10 — scan the current project and produce `docs/project-conventions.md` with detected tech stack, module system, file layout, naming conventions, tooling, test configuration, and most-used import sources. `--dry-run` prints without writing; `--json` emits the structured discovery result. Add the output file to your AGENTS.md or readFirst lists to inject discovered conventions into agent prompts.

See devteam help for the up-to-date list with flags.

Auditing an existing codebase

The pipeline builds features. The audit workflow analyzes an existing codebase and produces a prioritized improvement roadmap under docs/audit/00–10, plus a status.json for resume. It is read-only by design: it writes findings, never source code. The implement skill consumes docs/audit/10-roadmap.md to pick the next change.

Inside Claude Code:

/audit              # full audit: Phases 0-3, ~30-60 min, 11 output files
/audit-quick        # Phases 0-1 only: ~5-15 min, 6 output files

The slash commands are claude-code-only; on other hosts the auditor role and audit skill ship with every devteam init and you dispatch them as a normal role-plus-skill prompt.

For auditing a feature built with Stagecraft (where the pipeline left a rich on-disk trail), the audit splits into four modes — code / process / consistency / threat — each answering a different question with a different tool. See docs/user-guide.md § Auditing a feature built with Stagecraft.

Full operational guidance: docs/user-guide.md § Auditing a codebase. Phase definitions: skills/audit/SKILL.md.

Architecture in one diagram

User in any AI tool (Claude Code, Codex, terminal)
         │
         │  /devteam slash command, or `devteam` CLI
         ▼
Host adapter (hosts/<host>/adapter.js)
  - declares capabilities (hooks, subagents, …)
  - renders stage prompt for this host
  - installs surface into the target project
         │
         ▼
Core (model-agnostic spine, never invokes a model)
  - stage definitions + tracks       (core/pipeline/stages.js)
  - gate schemas + validator         (core/gates/)
  - guards: stoplist, security-heuristic
  - routing + orchestrator           (core/router.js, core/orchestrator.js)
         │
         ▼
Stage prompt rendered for the LLM
         │
         │  consumed by the model inside the host
         ▼
Model produces:
  - the artifact (brief, design-spec, code, test report, …)
  - a gate JSON conforming to core/gates/schemas/stage-NN.schema.json
         │
         ▼
Core validates, advances, escalates, or halts.

Full design notes: ARCHITECTURE.md. Host-adapter contract: core/adapters/host-adapter.md.

Repository layout

stagecraft/
├── ARCHITECTURE.md             ← design model + locked decisions
├── README.md                   ← you are here
├── LICENSE                     ← MIT
├── package.json
├── bin/devteam                 ← CLI entrypoint
├── core/                       ← model-agnostic spine
│   ├── adapters/               ← host-adapter contract + shared helpers
│   ├── config.js               ← .devteam/config.yml loader + routing
│   ├── gates/                  ← validator + per-stage schemas
│   ├── guards/                 ← stoplist, security/migration heuristics
│   ├── hooks/                  ← approval-derivation, secret-scan
│   ├── memory/                 ← persistent project memory (embed/store/index)
│   ├── spec/                   ← Gherkin generator + drift verifier
│   ├── ui/                     ← dashboard web server
│   ├── orchestrator.js         ← runStage, mergeWorkstreamGates, next
│   ├── pipeline/stages.js      ← STAGES table + STAGES_BY_TRACK
│   └── router.js               ← per-(stage, role) host resolution
├── roles/                      ← single source of truth for role briefs (12)
├── rules/                      ← pipeline rules docs (10 top-level + 9 per-stage)
├── skills/                     ← task-oriented helpers (13)
├── templates/                  ← artifact templates (15)
├── hosts/                      ← per-host adapters
│   ├── claude-code/
│   ├── codex/
│   ├── gemini-cli/
│   └── generic/
└── docs/                       ← guides, walkthroughs, BACKLOG

Design decisions

12 locked decisions are documented in ARCHITECTURE.md. The big ones:

The core never spawns a model. It emits prompts and validates JSON.
Gate JSON is the stable seam. Identity fields: stage, workstream, orchestrator, host, status. Anything host-specific stays in the adapter.
Role briefs have one source. roles/*.md. Adapters render into host-expected paths at install time.
Per-workstream host selection. A single pipeline can route different roles to different hosts; the orchestrator merges across them via the gate seam.
Multi-host install is the default code path. Single-host is just a list of length 1.
Capability negotiation. Each adapter declares hooks, subagents, slashCommands, worktrees, headless, and where it enforces each core rule.

What's deferred

Programmatic invoke() testing requires claude / codex on PATH (the contract works with stubs).
See docs/BACKLOG.md for the prioritized list of next ideas — bucketed by reach, depth, quality, observability, DX, integrations, and innovation bets.

Why this exists

claude-dev-team and codex-dev-team ran the same pipeline with the same templates, schemas, and ~80% of scripts, but as separate forks that were slowly diverging. This project unifies them into a single core with thin per-host adapters. The stress-test that locked the design is in docs/walkthroughs/stage-04-split-host.md: one feature traced end-to-end through a mixed-host pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stagecraft

Table of contents

First 30 minutes

Documentation map

Evaluator — should we adopt?

Operator — I run pipelines daily

Contributor — I change Stagecraft

Model — never reads docs/

Why "Stagecraft"?

What this gives you

Prerequisites

Quick start

Path A — `--headless` (single terminal, start here)

Path B — Interactive in Claude Code (two windows)

Which path should I pick?

What `devteam init` installs

CLI reference

Auditing an existing codebase

Architecture in one diagram

Repository layout

Design decisions

What's deferred

Why this exists

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 730 Commits
.github		.github
bin		bin
changelog.d		changelog.d
core		core
docs		docs
examples		examples
hosts		hosts
plans		plans
roles		roles
rules		rules
scripts		scripts
skills		skills
templates		templates
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
AGENTS.md		AGENTS.md
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
EXAMPLE.md		EXAMPLE.md
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

Stagecraft

Table of contents

First 30 minutes

Documentation map

Evaluator — should we adopt?

Operator — I run pipelines daily

Contributor — I change Stagecraft

Model — never reads docs/

Why "Stagecraft"?

What this gives you

Prerequisites

Quick start

Path A — --headless (single terminal, start here)

Path B — Interactive in Claude Code (two windows)

Which path should I pick?

What devteam init installs

CLI reference

Auditing an existing codebase

Architecture in one diagram

Repository layout

Design decisions

What's deferred

Why this exists

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Path A — `--headless` (single terminal, start here)

What `devteam init` installs

Packages