diff --git a/.agents/references/hunk-agent-guide.md b/.agents/references/hunk-agent-guide.md new file mode 100644 index 0000000..629db55 --- /dev/null +++ b/.agents/references/hunk-agent-guide.md @@ -0,0 +1,61 @@ +# Hunk Agent Guide + +How agents interact with a live Hunk session for code review. + +## Pre-requisites + +- Hunk must be running in a terminal (e.g. `git diff ... | hunk patch -`). +- The Hunk session daemon auto-registers on startup. + +## Inspect + +```bash +hunk session list +git diff upstream/main...HEAD --diff-filter=M | hunk patch - +``` + +## Inspect + +```bash +hunk session list # find live sessions +hunk session get --repo . # confirm session repo match +hunk session review --repo . --json # file/hunk structure +hunk session review --repo . --include-patch --json # include raw diff text +hunk session context --repo . # current focus +``` + +## Navigate + +```bash +hunk session navigate --repo . --file --hunk # 1-based hunk +hunk session navigate --repo . --file --new-line +hunk session navigate --repo . --next-comment +hunk session navigate --repo . --prev-comment +``` + +## Reload content + +```bash +hunk session reload --repo . -- diff --exclude-untracked +hunk session reload --repo . -- show HEAD~1 +``` + +Always pass `--` before the nested Hunk command. + +## Add comments + +Single note: +```bash +hunk session comment add --repo . --file --new-line --summary "text" [--focus] +``` + +Batch: +```bash +printf '%s\n' '{"comments":[{"filePath":"...","newLine":N,"summary":"..."}]}' \ + | hunk session comment apply --repo . --stdin [--focus] +``` + +## Common fixes + +- **"No active session matches repoRoot"** — pass session ID explicitly instead of `--repo .`. +- **"No active Hunk sessions"** — Hunk is not running; ask the user to open it first. diff --git a/.agents/references/hunk-user-guide.md b/.agents/references/hunk-user-guide.md new file mode 100644 index 0000000..2f36514 --- /dev/null +++ b/.agents/references/hunk-user-guide.md @@ -0,0 +1,57 @@ +# Hunk User Guide + +Quick reference for [Hunk](https://github.com/modem-dev/hunk) — the review-first terminal diff viewer used in this fork. + +## Open a diff (modified files only) + +```bash +git diff upstream/main...HEAD --diff-filter=M | hunk patch - +``` + +`--diff-filter=M` shows only **modified** tracked files, excluding new/untracked files. + +## Navigation + +| Key | Action | +|-----|--------| +| `↑ / ↓` | move line by line | +| `Space` or `f` | page down | +| `b` or `Shift+Space` | page up | +| `d / u` | half page down / up | +| `[ / ]` | previous / next hunk | +| `, / .` | previous / next file | +| `{ / }` | previous / next comment | +| `← / →` | scroll code horizontally (Shift = faster) | +| `Home / End` or `g / G` | jump to top / bottom | + +## View + +| Key | Action | +|-----|--------| +| `1 / 2 / 0` | split / stack / auto layout | +| `s` | toggle sidebar | +| `t` | toggle theme | +| `a` | toggle AI notes | +| `z` | toggle unchanged context | +| `l / w / m` | toggle line numbers / wrap / metadata | +| `e` | open file in `$EDITOR` | + +## Review + +| Key | Action | +|-----|--------| +| `/` | focus file filter | +| `c` | create review note | +| `Tab` | toggle files/filter focus | +| `F10` | open menus | +| `r` | reload (watch mode) | +| `q` | quit | + +## Mouse + +- Wheel — scroll vertically +- Shift+Wheel — scroll horizontally + +## In-app help + +Press `?` or `h` inside Hunk to open the full controls help modal. diff --git a/.claude/skills/fork-upstream-sync/SKILL.md b/.claude/skills/fork-upstream-sync/SKILL.md new file mode 100644 index 0000000..143bc74 --- /dev/null +++ b/.claude/skills/fork-upstream-sync/SKILL.md @@ -0,0 +1,63 @@ +--- +name: fork-upstream-sync +description: Sync this fork with new commits from unclebob/swarm-forge upstream. Identifies real conflicts (only files both sides modified since merge base), checks whether fork changes are superseded, migrates divergences to fork.bb, and opens a PR to gabadi/swarm-forge. Use when upstream has new commits, user says "sync upstream", "upstream has changes", or "merge upstream". +--- + +# Fork Upstream Sync + +## Core rule: only intersecting files are real conflicts + +After fetching upstream, find what each side changed since the merge base: + +```bash +MERGE_BASE=$(rtk git merge-base HEAD upstream/main) +rtk git diff "$MERGE_BASE"..HEAD --name-only # our side +rtk git diff "$MERGE_BASE"..upstream/main --name-only # upstream side +``` + +Files only upstream changed → trivial forward merge, don't mention them. +Files only we changed → no conflict. +**Files in both lists → real conflicts. Analyze each one.** + +## Classify each real conflict + +For every intersecting file, answer in order: + +1. **Superseded?** Read upstream's new version and grep for the intent of our change. If upstream solved it already (even differently), our change is moot — take theirs. +2. **Non-overlapping edits?** If our edit and upstream's are on different lines, take both — no decision needed. +3. **Migration needed?** If upstream rewrote/replaced the file and we have substantive logic in it, extract our logic to `fork.bb` (see below). + +## Migration pattern: fork.bb + +When upstream replaces a script file we've extended, move our logic into `swarmforge/scripts/fork.bb`. This file is loaded by `swarmforge.bb` via `(load-file ...)` and is 100% fork-owned — zero conflict surface on future syncs. + +- Extractable: self-contained functions (settings writers, skill installers, sparse-checkout setup, prompt bundle resolvers) +- Must stay in `swarmforge.bb`: config parsing for new fields, permission-mode flags, setup guards, load-file call itself +- Keep the upstream file edits to small, stable hook call sites only + +## Resolving constitution/articles conflicts + +These files get edited by both sides frequently. Always: +- Take upstream's structural/wording changes +- Preserve our fork-specific rule insertions (check `rtk git diff "$MERGE_BASE"..HEAD -- ` to see exactly what we added) +- Never remove an upstream rule unless there's an explicit ADR for it + +## Doing the merge + +```bash +rtk git checkout -b feat/upstream-sync +rtk git merge upstream/main +# Files to take entirely from upstream: +rtk git checkout --theirs && rtk git add +# Manual resolutions: edit file, then rtk git add +rtk git commit +rtk git push origin feat/upstream-sync +gh pr create -R gabadi/swarm-forge --title "..." --body "..." +``` + +**Never** open a PR against `unclebob/swarm-forge`. Always target `gabadi/swarm-forge`. +`gh` CLI defaults to upstream — always pass `-R gabadi/swarm-forge`. + +## After the PR + +If any migrated divergences aren't yet documented, add an ADR row or manifest entry. ADR house style: divergence + why only, no rejected-options section. diff --git a/.claude/skills/retro-triage/SKILL.md b/.claude/skills/retro-triage/SKILL.md new file mode 100644 index 0000000..3c5c5bb --- /dev/null +++ b/.claude/skills/retro-triage/SKILL.md @@ -0,0 +1,220 @@ +--- +name: retro-triage +description: Use when unprocessed session retros sit in ~/.claude/worklog/retros/ and a batch needs root-cause analysis across the sessions. Triggers on "triage the retros", "consolidate retros", "what did we learn this batch", "file issues from the last sessions", "any new pains from the swarm runs". +--- + +# retro-triage + +## Overview + +Turn a batch of session retros into a **validated root-cause diagnosis** with framed candidate actionables — written to a consolidation doc. The pipeline is **harvest → reconstruct → diagnose → validate → frame**. You do NOT bucket-and-file: bucketing per-signal scatters root causes and reproduces the retros' own framing. Diagnosis is the product; a human files issues from it. + +This is NOT `mattpocock-skills:triage`. That skill triages *incoming* issues from a human reporter through a grilling/labeling state machine. Here there is no reporter — the source is a pile of dead session transcripts, and the output is a diagnosis that must carry its own evidence. Do not hand off to that skill. + +**Core principle: the retro is a symptom report, not a diagnosis. Your job is the diagnosis.** A retro reliably tells you *what hurt* in one role's one session. Its `## What Didn't Work`/`## Actions`/`## What Worked` are the author's framing under a keyhole view — hints, never findings. **The root cause almost never lives inside a single retro.** It lives *across* retros (one upstream decision surfaces as different pains in five roles) and *below* their notice (routine work nobody thought to complain about). If your actionables look like the retros' proposed fixes with an "unverified" label, you have sorted, not diagnosed — and you are wrong. + +**Two failure modes this skill exists to prevent (both happened in production):** +- **Codifying a workaround as a win.** A slick conflict-resolution technique landed in `## What Worked` → got filed as a "pattern worth codifying." It was a workaround for self-inflicted merge conflicts caused by the squash-merge strategy. The real finding was the upstream cause, invisible because nobody's retro said "the strategy made me merge." +- **Inheriting the retro's fix.** A prior batch filed the retros' proposed "push before handoff" rule. The mechanism was wrong (the workflow already merged by hash; the role *had* pushed). Every issue was closed as mis-framed. + +## When to Use + +- Unprocessed retros sit in `~/.claude/worklog/retros/` (no `consolidated:` frontmatter) — the curator's `processed/` archive is also scanned, so curated retros stay visible to a later diagnosis +- Periodically after a swarmforge batch closes + +**Do NOT use:** +- For a single retro — too little signal. Read it directly. +- To re-process already-stamped retros — skip them. +- To triage incoming human-reported issues — that is `mattpocock-skills:triage`. + +## Inputs + +1. **Unprocessed retros** — files in `~/.claude/worklog/retros/*.md` lacking `consolidated:` in their first 5 lines AND whose name lacks `CONSOLIDATED`. Use this detector exactly (`grep -L` over the whole file gives false positives when "consolidated" appears in the body): + ```bash + for f in ~/.claude/worklog/retros/*.md ~/.claude/worklog/retros/processed/*.md; do + [ -e "$f" ] || continue # globs that match nothing expand literally + case "$(basename "$f")" in *CONSOLIDATED*) continue ;; esac + head -5 "$f" | grep -q '^consolidated:' || echo "$f" + done + ``` +2. **Prior consolidations** — any `*-CONSOLIDATED-actionables.md` in the same dir. +3. **Open issues** — `gh issue list --state open --limit 50 --json number,title,labels`. +4. **Project gotchas** — only the `## Gotchas` section of the repo's `AGENTS.md`. Read other repo files (role prompts, constitution, scripts) only when a signal explicitly references them for diagnosis. +5. **Closed issues (on-demand)** — only when a signal smells pre-decided. Dispatch a Haiku subagent with `gh issue list --state closed --search ""`; do not pull closed-issue text into main context. + +Context discipline for the bulk read is defined in Phase 1 (one subagent, never split). + +## Phase 1 — Harvest (raw symptoms only, no conclusions) + +Extract verbatim. These sections are **the author's framing — input to diagnosis, never output.** Do not let a section's label decide an actionable's fate. + +| Section | What it actually is | +|---|---| +| `## What Didn't Work` | Symptoms + the author's guessed cause. Keep the symptom verbatim; discard the guess until you re-derive it. | +| `## Actions` | The author's *proposed fix*. A hypothesis. Never file it as-is. | +| `## What Worked` | What the author was *proud of*. **Trap:** a thing done well may be a workaround for an upstream problem. Run the workaround test (Phase 3) before believing it's a pattern. | +| `## Tool Result Waste` | Efficiency symptoms. Usually a symptom of something, not a finding itself. | + +Each retro header carries **Session ID**, **Branch**, **Date**, and references **commit SHAs / PRs**. Capture the commits — they are the traceability anchor (see below). + +Do NOT skip token/cost tables: a cluster of "expensive session" lines is often the visible edge of an invisible-work root cause (Phase 3). + +**Context discipline:** delegate the bulk read to ONE subagent (not several split by file — root causes cross the split line, and a half-batch reader can't see them). It returns per-file verbatim signals; you do the cross-retro work in the main thread. + +## Phase 2 — Reconstruct the episode (independent of the retros) + +Before any classification, rebuild what the **system actually did** this batch, from durable artifacts, NOT from what the retros say happened: + +- `git log --oneline --graph` over the batch range; note the branch topology and **how branches landed** (squash? merge commit? rebase?). Squash-to-main + long-lived role branches *mechanically* generate divergence — a root cause no single retro will name. +- Which gates ran and which didn't (`verify` chain, mutation, reality-check, arch-check). Read the actual scripts/config when a symptom touches them. +- What landed where (PRs, the final commits on `main`). + +Write a 3–6 line factual reconstruction. This is the lens you cluster symptoms onto. + +## Phase 3 — Diagnose (derive cause across symptoms, then validate) + +Cluster the harvested symptoms onto the reconstruction and ask: **what one decision or missing mechanism explains this cluster?** Two mandatory probes, because the retros are blind to both: + +1. **Workaround-vs-win.** For every `## What Worked` item and every "we handled it well": *would this work have been necessary if something upstream were right?* If the heroics exist to cope with a self-inflicted problem, the finding is the upstream cause — the technique is evidence of cost, not a pattern to codify. +2. **Invisible / normalized work.** Scan for work every role did routinely and nobody flagged as pain — repeated merges, branch resets, re-runs, "expensive session" cost lines. Normalized cost is where the biggest root causes hide, precisely because no retro complains about it. + +Then **validate each candidate cause against the artifacts before it becomes an actionable.** Read the prompt/script/config the cause implicates. Kill the ones the evidence contradicts (the "push before handoff" fix died here: `workflow.prompt` already merged by hash). An unvalidated cause is not a finding — it is the retro's guess wearing a label. + +**The session transcript IS an artifact — and the retro can be wrong about its own session.** Every retro records a Session ID; that is the durable handle to ground truth. Before quoting any *figure, sequence, or "the user said X"* from a retro, confirm it in the actual transcript (resolve via the `entire` CLI by session id). Retros mis-state: a real batch retro reported "70.41%, 22 survivors" for one file when the number belonged to a *different* file and that session had run no mutation at all. **Delegate this check to ONE subagent** (give it all the session IDs + the claims; one reader so findings stay coherent — never split across agents). Quote what the transcript actually shows, not what the retro says it shows. + +## The root-cause record — the unit of output, and the evidence gate + +**This skill's own thesis is "prose rules get skipped; prefer a mechanical gate." Apply it to yourself.** "Validate the cause" is prose an agent can claim without doing (this skill was used to write `Validated:` with no artifact read, and to quote a mutation figure that belonged to a different file). The gate is: **every root cause is recorded in this exact shape, and the Evidence block must contain literal receipts. A receipt is a fact someone else could re-pull, not your summary of it.** No receipt → the verdict is `INSUFFICIENT` by default → it is NOT a finding, cannot become an actionable, cannot be filed. + +```markdown +### RC-N — +- **Symptoms it explains:** verbatim quotes + which retros/sessions (the cross-retro cluster) +- **Probes:** workaround-vs-win → ; invisible/normalized-work → +- **Evidence (receipts — each line re-pullable by someone else):** + - `file:line` quoted, OR `command` + its ACTUAL output, OR transcript quote `@` + - …one line per artifact checked; state what each proves or CONTRADICTS +- **Verdict:** SUPPORTED | WEAKENED | INSUFFICIENT — and it must follow from the receipts above, not from the retro's say-so +- **Disposition + framing:** → if SUPPORTED, the framed candidate (investigate/decide, target file, default `ready-for-human`) +``` + +Hard rules on the record: +- **A bare `Validated:` / "confirmed" with no receipt line is a forgery.** Treat it as INSUFFICIENT. +- **The retro's own numbers, sequences, and "the user said X" are claims, not receipts.** A receipt is the transcript quote (by session id), the git output, the `file:line`. If your only source is the retro, your verdict is at best INSUFFICIENT. +- **A WEAKENED/INSUFFICIENT cause can still be a real finding** — file it as needs-info with the gap stated. What you may NOT do is upgrade it to a prescribed fix. + +The buckets below are a **disposition tag on a finished record**, not bins you sort raw signals into. + +## Disposition tags (applied to a validated record — never to raw signals) + +| # | Tag | Test | +|---|---|---| +| 1 | **Failed-to-learn** | Pain recurs AND a prior CONSOLIDATED row or AGENTS.md `## Gotchas` row documented a **specific fix**. Recurs without a prior remediation → Bucket 3. **Always emit the header** (write "None this batch — verified against [N] prior fixes."). | +| 2 | **Dupe-of-existing-issue** | Matches an open issue. Output: link + a clarification comment (self-contained, see below). | +| 3 | **New-actionable, issue-shaped** | New pain worth tracking. File a self-contained, traceable issue (see Authoring contract). Default `ready-for-human`. | +| 4 | **New-actionable, spec-shaped** | Fix needs design before a ticket (new protocol, a system port). State the design question in the issue; still file as `ready-for-human`. | +| 5 | **Needs-info / decision** | Mechanism contested or unclear. `ready-for-human`. | +| 6 | **Pattern-worth-codifying** | A genuine technique worth a rule/template — ONLY after it passes the workaround-vs-win probe (Phase 3). If the "win" exists to cope with an upstream problem, it is NOT a pattern; route the upstream cause to Bucket 3/5 and cite the technique as evidence of cost. | +| 7 | **Already-learned / dropped** | Pain matches a documented rule, the session **followed it**, retro just confirms it worked. One-line note + source. | +| 8 | **Noise** | Not documented anywhere AND "structural to async swarm" / "generic unfixable friction" / "one-off not worth a rule". Explicit rationale per item. Tiebreak vs 7: written rule exists → 7; no rule and no mechanism → 8. | + +## Issue authoring contract (used only at the human-gated filing step) + +A filed issue is a SUPPORTED root-cause record, rewritten to be **self-contained** and **traceable**: a future agent must extract a valid learning from the body *alone*, without reloading any transcript or local file. The record's receipts become the issue's evidence; the record's verdict sets the issue's confidence. + +**Two hard rules, both learned from real failures:** + +1. **No reference to anything local.** Never cite a retro filename, a `~/.claude/worklog/...` path, a consolidation doc, or a session-transcript `.jsonl` path. Those live on one machine and die elsewhere. Cite only repo paths (`swarmforge/...`, `api/src/...`, `AGENTS.md`) and durable handles (commit SHAs, PR/issue numbers). +2. **Preserve the signal verbatim — do not paraphrase it away.** Your prose summary is not the evidence. Quote the exact `## What Didn't Work` / `## Actions` lines, error strings, commands, and user corrections. Paraphrase loses the debuggable signal. + +**Traceability anchor = the git commit (the `explain` skill pattern).** The durable link from an issue to its origin is the **commit SHA / PR** where the work or pain landed — it is on `origin`, shared and permanent. Provide the resolver line literally: +``` +entire checkpoint explain --commit +``` +The Session UUID is a *secondary, local-only* hint — label it as such; never make it the primary anchor. When a pain never landed as its own commit (halted session, local-only WIP), say so explicitly rather than inventing an anchor. + +**Body schema** (adapted from the `extracting-skill-learnings` skill): +```markdown +> *This was generated by AI during triage.* + +--- +date: YYYY-MM-DD +model: +harness: +source: multi-agent (swarmforge) session retrospectives, +--- + +Pain is stated as fact. The cause carries the record's verdict (SUPPORTED / WEAKENED / INSUFFICIENT) — never assert more confidence than the receipts earned. + +## Raw signals +(verbatim quotes, attributed to role + Session ID; redact secrets/PII as [redacted], keep branch names / SHAs / paths) + +## Defect +**What happened:** ... +**Cause (verdict: SUPPORTED|WEAKENED|INSUFFICIENT):** ... +**Evidence:** the record's receipts — `file:line`, command+output, transcript quote. (NOT the retro's summary.) +**Attribution:** skill workflow | model reasoning | harness enforcement + +## Actionables +| # | Actionable | Target | Confidence | Status | +|---|-----------|--------|-----------|--------| +| 1 | investigate/decide (prescribed fix ONLY if verdict=SUPPORTED on a mechanism you validated) | | from verdict | pending | + +## Traceability +- Landing commit(s) / PR(s): — resolve via `entire checkpoint explain --commit ` +- Session UUID(s) (local transcript only, not durable): () +``` + +## Judgment rules + +- **Pain = fact, cause = hypothesis.** Never promote a retro's guessed cause to asserted root cause. Default new actionables to `ready-for-human`; reserve a prescribed fix (and `ready-for-agent`) for a mechanism you independently verified, not merely a plausible one. +- **Don't over-fragment one cause into many fixes.** Several signals often trace to one gap seen from different roles. Collapse into one broad problem statement rather than N prescriptive tickets — especially since each per-signal "fix" is only a hypothesis. Tiebreak for keeping separate: the fixes would be genuinely different commits/PRs. +- **Prefer a lint/CI gate over a prompt/doc edit** when the rule is mechanically checkable — that is what makes rules stick (prose rules get skipped). Note this in Actionables. +- **Source attribution** belongs in YOUR scratch reasoning only (`Source: (primary), (secondary)`). It must NOT appear in issue bodies (authoring rule 1). +- **Conflict rule:** two retros propose different mechanisms for one pain → list both as options, `ready-for-human`. Do not silently pick. +- **Stale-status verification:** for every `[x] done` row in the most recent prior CONSOLIDATED, grep git log / target file for evidence. Flag absences. +- **Governing insight:** include a section only if one meta-pattern explains ≥3 pains. Frame it as a candidate explanation ("these pains *may* share…"), not a proven diagnosis. Forced coherence is fabrication. + +## Output: the consolidation doc (you do NOT file issues) + +Write the diagnosis to `~/.claude/worklog/retros/-CONSOLIDATED-actionables.md`: the Phase-2 reconstruction, the validated root causes, and the bucketed candidates. Filing GitHub issues is a **separate, human-gated step** — present the doc and ask before creating anything. (Both prior auto-filed batches were closed as mis-framed; the validation step is necessary but not yet proven sufficient.) + +When the human approves filing, follow the authoring contract per Bucket-3/Bucket-5 candidate: +- `gh issue create --label "," --body-file ` — category `enhancement`/`bug`; state defaults to `ready-for-human`. Reserve `ready-for-agent` for a mechanism you validated in Phase 3, never a plausible one. +- Bucket 2: post the clarification comment with `gh issue comment`; ensure category+state labels are present. +- Verify every touched issue ends with exactly one category label and one state label. +- Close any issue whose feature has demonstrably landed (cite the merged PR/commit). + +## Post-step: stamp source retros + +After the doc is written, prepend each source retro (every one included, even Bucket 7/8) with: +```yaml +--- +consolidated: YYYY-MM-DD +--- +``` +Without the stamp the detector re-processes them next run. For retros consolidated by a prior pre-existing doc, stamp with that doc's date, not today's. + +## Common mistakes + +- **Paraphrasing the signal instead of quoting it.** Your summary is not the evidence. Quote verbatim. +- **Anchoring on the session UUID or a retro filename.** Both are local-only. Anchor on the commit SHA; UUID is a secondary hint. +- **Stating the retro's guessed cause as fact.** Pain is fact; cause is hypothesis. Label it. +- **Bucketing raw signals instead of validated causes.** The buckets format Phase-3 output. Classifying raw symptoms straight into buckets is the sort-not-diagnose failure. +- **Codifying a workaround as a pattern.** Run the workaround-vs-win probe on every `## What Worked` item first. +- **Skipping the Phase-2 reconstruction.** Without rebuilding what the system did from git/config, cross-retro and invisible-work causes stay invisible. +- **Filing N tickets for one underlying gap.** Collapse to one broad problem statement. +- **Loading all retros into main context, or splitting the read across subagents.** One reader, verbatim; diagnose in the main thread. + +## Red flags — STOP + +| Thought | Reality | +|---|---| +| "I've bucketed the signals, now I'll file." | You sorted, you didn't diagnose. Do Phase 2–3 first; buckets format *validated causes*. | +| "This went in `## What Worked`, so it's a pattern to codify." | Maybe it's a workaround for an upstream problem. Run the workaround-vs-win probe. | +| "The retro's proposed fix sounds right, I'll file it." | That's the author's guess. Re-derive and validate against the artifact, or you ship a closed-as-mis-framed issue. | +| "Every retro mentions merge pain — five separate findings." | Probably ONE upstream cause (e.g. squash divergence) seen five ways. Reconstruct, then collapse. | +| "No retro complains about it, so it's fine." | Normalized/invisible work hides the biggest causes. Probe for it explicitly. | +| "The cause is obvious / ready-for-agent." | Unvalidated = the retro's guess with a label. Validate against the artifact; default `ready-for-human`. | +| "I'll write `Validated:` / `confirmed` here." | Not without a receipt on the next line. A verdict with no `file:line` / command-output / transcript quote is a forgery → INSUFFICIENT. | +| "The retro says 70.41%, I'll quote that." | The retro's number is a claim, not a receipt. Pull it from the transcript by session id first — it may belong to a different file. | +| "I'll summarize the pain in my own words." | Paraphrase loses the signal. Quote verbatim. | diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml new file mode 100644 index 0000000..a925e70 --- /dev/null +++ b/.github/workflows/ci.yml @@ -0,0 +1,27 @@ +name: CI + +on: + push: + branches: [main, six-pack] + pull_request: + +jobs: + test: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 + + - name: Install babashka + run: | + curl -fsSLO https://github.com/babashka/babashka/releases/download/v1.12.218/babashka-1.12.218-linux-amd64-static.tar.gz + tar xzf babashka-1.12.218-linux-amd64-static.tar.gz + sudo mv bb /usr/local/bin/ + + - name: Install dependencies + run: sudo apt-get install -y tmux zsh + + - name: Run upstream tests + run: bb test + + - name: Run fork extension tests + run: bb fork-test diff --git a/.gitignore b/.gitignore index 02be7d6..6180b58 100644 --- a/.gitignore +++ b/.gitignore @@ -1,5 +1,6 @@ .DS_Store .env -.claude/ +.claude/* +!.claude/skills/ .swarmforge/ .worktrees/ diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..f603f61 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,5 @@ +# Agent Orientation + +- Read `CONTEXT.md` for fork terminology and conventions. +- Read `docs/adr/` for architecture decisions that govern this fork. +- For diff review with Hunk, see `.agents/references/hunk-user-guide.md`. diff --git a/CONTEXT.md b/CONTEXT.md new file mode 100644 index 0000000..c5fe15f --- /dev/null +++ b/CONTEXT.md @@ -0,0 +1,97 @@ +# SwarmForge Fork + +A permanent fork of `unclebob/swarm-forge` (rationale in `docs/adr/`). This glossary holds only terms whose fork-specific meaning is already settled; terms are added as decisions are made, not in advance. + +## Language + +**Idle gate**: +The rule that a role does nothing until it receives a handoff — no startup work, scanning, installing, or self-assigned tasks. The single line is "Wait for a handoff. Do not act without one." +_Avoid_: startup guard, wait condition + +**Ready notification** (presence signal): +The startup "I'm awake" message each role sends to the specifier. Informational only — it tells the operator the role launched. Stamped a distinct `presence` type and excluded from the _Delivery sequence_; in the fork's idle model readiness is implicit (a role at idle with an empty queue is ready). +_Avoid_: awake handoff, ready handoff + +**Delivery sequence**: +The steps that start a work handoff on a receiver: `/clear` → re-inject the role bundle → send the task message. Runs for work handoffs only, never for presence pings. Delivered immediately if the receiver is idle, or by its Stop hook when it next stops if busy. (Upstream instead types the message straight into the terminal with no clear.) +_Avoid_: inject, dispatch + +**Prompt bundle** (role bundle): +The single, structured context a role launches and re-launches with: its constitution and role prompt resolved into one deduplicated XML envelope, into which _promoted knowledge_ (`AGENTS.md` + the role's `.agents/` file) is also injected. It is the unit re-sent on every _delivery sequence_ after `/clear`, not just built once at launch. (Upstream concatenates the prompt files with a plain recursive read, with no dedup or structure.) +_Avoid_: context blob, prompt file, instruction file + +**setup-swarm** (the skill): +The one-time, stack-aware step that makes a project swarm-ready — installs the project's language quality tools, enables session tracking, grants the agents' permissions, pins skill versions, and emits the _swarm-ready marker_. Ships inside the swarm install. **It is the operator's first action on a project** (`/setup-swarm`), before the run path is ever invoked. The run path (`./swarm`) performs no **project provisioning** and never triggers the skill itself; it only **guards** — if the marker is absent it refuses and tells the operator to run `setup-swarm` first. (It still bootstraps the swarm's *own* runtime skills automatically — launcher infrastructure, distinct from project provisioning.) (Upstream instead installs tooling per-role at startup.) +_Avoid_: setup skill, preflight, bootstrap, onboarding + +**Swarm-ready marker**: +The file (`.swarmforge/setup-complete`) that `setup-swarm` writes to record that a project has been made swarm-ready. The run path guards on its presence; the operator deletes it to force a re-run. There is no `./swarm setup` subcommand. +_Avoid_: setup flag, ready file, lock + +**Integrator**: +The terminal role that lands finished work. From the QA-approved commit it opens a pull request, gates on CI, merges only on green, runs the post-merge verification, and notifies the specifier — one PR per feature. It never merges locally: CI is a hard precondition, so a project without CI is not swarm-ready (setup ensures CI; see ADR 0003). CI failures route to the owning role via [[back-routing]]. (Upstream has no integrator — the specifier merges ad hoc.) +_Avoid_: merger, releaser, deployer + +**UX Engineer** (six-pack only): +The role, immediately after the coder, that runs the built product and fixes visual/usability mismatches in rendering code (leaving a regression check behind) — an engineer that fixes, not a flag-only reviewer. Checks against the feature's _UX Intent_ and any optional design inputs the feature references. Skips (passes through) when the feature has no UX Intent. Routes back to the coder via [[back-routing]] when a fix needs a model-state change. Framework-agnostic; the visual-testing tool is named by the constitution. +_Avoid_: UX Reviewer, designer + +**UX Intent**: +The section the specifier authors inline in the feature file stating, in concrete observable terms, what a feature should look and feel like. Part of the swarm and the _UX Engineer_'s primary target. Distinct from optional project design inputs (DESIGN.md, EXPERIENCE.md, mockups) — those are not swarm-owned; the specifier merely references them from the feature file when they exist. +_Avoid_: design spec, UX requirements + +**Refuting QA**: +QA's posture in the fork: assume the build does not meet the spec and the acceptance tests are too weak to notice, until proven otherwise — attack the specified contract rather than run a checklist and confirm. Bounded by the spec (unspecified gaps route back to the specifier, they are not QA pass/fail). Includes _conversion fidelity_: a QA procedure converted into an executable script must encode the procedure's full intent, not a green version that asserts nothing (_test theater_). (Upstream QA confirms the spec is met and fixes what fails.) +_Avoid_: verification, acceptance check, confirm + +**Back-routing**: +Sending rework back to the stage whose decision it exposes as flawed, instead of resolving it where it was found. The trigger is any finding that an earlier stage's work must change — a bug, a refactor blocked by a bad earlier decision, or a design/spec revision. Applies only to _structural_ rework (re-opening an earlier stage's job: an ambiguous/missing spec, a weak/missing test, a design that can't hold the behavior); _local_ work the finder can resolve without re-opening an earlier decision stays with the finder. Two caps: a single finding bounces back at most once, and a feature tolerates at most three back-route cycles total (N=3, tracked by a routing count in the handoff) before the role stops and asks the user. (Upstream fixes everything in place.) +_Avoid_: rejection, escalation, bounce, defect back-routing + +**QA holdout**: +The end-to-end QA suite kept physically out of reach of every role that shapes the implementation, so it stays a blind test. The harness sparse-checks-out the suite's pinned path from each role worktree except the specifier's (which authors it) and QA's (which runs it) — present in the commit, absent from disk. Distinct from upstream's prompt-level "ignore it," which leaves the files in the coder's worktree. Covers only the end-to-end QA suite; the Gherkin acceptance tests stay visible because the coder builds and runs them. (Upstream walls it by instruction only.) +_Avoid_: hidden tests, secret suite, test isolation + +**Spec header**: +The structured block of comment sections the specifier fills in at the top of every feature file, above the Gherkin scenarios — the spec-authoring layer that states what the scenarios cannot: contract, constraints, sequencing, NFRs, side effects, scope (and, six-pack only, _UX Intent_). The scenarios are the contract by example; the spec header is the contract's surrounding intent. Every section is addressed; several default to `none` (a deliberate answer). Comments only, so the Gherkin parser ignores them. (Upstream feature files are pure Gherkin with no header.) +_Avoid_: preamble, comment block, feature description + +**Surface harness**: +The way the live-verification roles (QA always; the _UX Engineer_ on six-pack) drive the running system through its real production interface — a declared per-surface tool (tmux/PTY for a TUI, Playwright for web, an HTTP client for an API, event injection for a headless service) chosen from the constitution's surface tool table. Replaces upstream's mechanically-silent "through the user interface only," which let in-process function calls pass as interface verification. Every surface also carries a _baseline scenario_. The role identifies the surface from the codebase; nothing declares it in `project.prompt`. +_Avoid_: UI test, e2e harness, driver + +**Baseline scenario**: +The permanent idle/no-op scenario committed alongside a surface's flow scenarios, asserting the system is stable when nothing is happening — TUI: no input, identical consecutive captures, zero scrollback growth; web: idle load with no console errors; headless: a no-op event changes no state. It catches idle-state defects that flow scenarios never observe because flow scenarios only assert while the user is acting. +_Avoid_: smoke test, idle test, sanity check + +**Observation harness**: +The project `observation-harness/` directory holding the committed, re-runnable surface scenarios — the per-surface _baseline scenario_ plus one set per verified flow — that form the permanent regression record. Authored by the live-verification role (the _UX Engineer_ on six-pack) using the _surface harness_ tool, and re-executed by QA before final verification; a user-facing surface with no scenarios is a finding that routes back. (Upstream has no such artifact.) +_Avoid_: e2e folder, regression dir + +**Fidelity manifest**: +The constitution sub-file (`dependency-manifest.prompt`) declaring every dependency beyond the system itself by _dependency tier_, each as `name: tier N; implementation; gaps: `. A declared gap is binding: the specifier and QA refuse to write or accept any scenario that rests on it, so a known emulator limitation can never pass as covered behavior. Specifier-owned; defaults to `(none)`. +_Avoid_: mock list, dependency doc, services file + +**Dependency tier**: +The fidelity level at which a dependency is provided, declared in the _fidelity manifest_. Tier 1 — owned infrastructure run locally as the real engine (Postgres in Docker); tier 2 — stateful protocol-level emulation (vendor-official > third-party > swarm-built twin as last resort); tier 3 — external domain the swarm does not own, wire-level stubbed against a referenced contract. The system itself is always implicit, never a tier. +_Avoid_: mock level, fidelity grade + +**Curator**: +The terminal role, after the integrator, that turns a run's session retros into versioned repo knowledge via one self-merging PR, then releases the specifier for the next feature. Makes no code changes — writes only _promoted knowledge_. An empty run notifies the specifier immediately; the line never stalls on it. (Upstream has no such role; lessons live only in unread retros.) +_Avoid_: librarian, archivist, scribe + +**Promoted knowledge**: +The project-versioned knowledge contract the _curator_ writes and the launcher injects into role bundles: a root `AGENTS.md` (universal invariants + navigation) and `.agents/` (per-role files, references, skills, the enforcement-gate backlog, the _knowledge ledger_). Lives in the repo, not `~/.claude`, so a fresh clone carries every lesson. `AGENTS.md` and the role's file are injected into that role's bundle at launch; references load on demand by pointer. (Upstream bundles only the constitution and role prompt.) +_Avoid_: docs, memory, knowledge base + +**Knowledge ledger**: +`.agents/ledger.md` — the append-only audit the _curator_ writes, one never-pruned line per processed retro item (`date | session-id | role | failure-class | verdict`). Makes recurrence provable: an item rejected before and seen again has proven itself worth promoting. +_Avoid_: changelog, history, log + +**Session retro**: +The single per-role, per-session retrospective the `agent-retro` skill writes (automatically, as each role's last step before idle) to the shared retro pool. A symptom report from one role's one session under a keyhole view — its proposed fixes are hints, never findings. The shared input consumed independently by both the _curator_ and _retro-triage_; neither destroys what the other has not yet seen. +_Avoid_: retrospective, session log, postmortem + +**Retro-triage**: +The operator-invoked analysis (the `retro-triage` skill) that turns a *batch* of _session retros_ into a validated, cross-session **root-cause diagnosis** from which a human files issues. Distinct from the _curator_: the curator is autonomous and per-item and fixes "the swarm doesn't *know* X" by promoting agent-facing knowledge into the repo; retro-triage is manual and cross-batch and fixes "the swarm is *structurally doing* X wrong" by surfacing causes no single retro names (and that the per-item curator structurally cannot see) for pipeline/tooling/strategy changes a human must make. Diagnosis is the product, validated against transcripts and git artifacts — not the retros' own framing. +_Avoid_: triage, consolidation, retro processing diff --git a/README.md b/README.md index 3072f1f..3d27061 100644 --- a/README.md +++ b/README.md @@ -67,22 +67,38 @@ In the directory where you want to use SwarmForge, choose a runnable branch and ```sh BRANCH=four-pack -curl -L "https://github.com/unclebob/swarm-forge/archive/refs/heads/${BRANCH}.tar.gz" | tar -xz --strip-components=1 +curl -L "https://github.com/gabadi/swarm-forge/archive/refs/heads/${BRANCH}.tar.gz" | tar -xz --strip-components=1 ``` Use `BRANCH=two-pack` for the quick two-agent workflow, `BRANCH=four-pack` for the compact specification workflow, or `BRANCH=six-pack` for the full six-agent workflow. Do not use `main` for this command; `main` is documentary and stores the shared operational scripts, while the runnable branches provide the configurations and prompts intended for projects. -After copying a runnable branch, start the swarm from the target project: +After copying a runnable branch, run `./swarm` once to bootstrap the scripts and install skills: ```sh ./swarm ``` -The `./swarm` wrapper keeps the runnable branch small. On first use, if `swarmforge/scripts/` is missing, it downloads the `main` branch archive, copies the shared operational scripts from `swarmforge/scripts/`, stages shared constitution articles from `swarmforge/constitution/articles/`, and then launches `swarmforge/scripts/swarmforge.sh`. Later runs reuse the existing local scripts directory instead of overwriting it. +This will print `Error: project is not swarm-ready. Run /setup-swarm first.` — that is expected. It downloads the shared operational scripts and installs the SwarmForge skills into `.claude/skills/`, making `/setup-swarm` available in Claude Code. + +Then open Claude Code in the project directory and run the one-time setup skill: + +```sh +/setup-swarm +``` + +This installs language-appropriate quality tools, writes permission allow-rules, and scaffolds `.gitignore`. You only need to run it once per project. + +Then launch the swarm: + +```sh +./swarm +``` + +The `./swarm` wrapper keeps the runnable branch small. On later runs, if `swarmforge/scripts/` already exists, it skips the download and launches immediately. The windows should open automatically. -To stop the swarm, close the first window listed in `swarmforge/swarmforge.conf`. That cleanup window shuts down the tmux sessions and closes the remaining tracked windows. +To stop the swarm, run `./swarm stop` or close the first window listed in `swarmforge/swarmforge.conf`. That cleanup window shuts down the tmux sessions and closes the remaining tracked windows. ## What SwarmForge Does @@ -224,8 +240,15 @@ Any fields after the receive mode are passed directly to the agent CLI as additi ```conf window coder copilot wt-coder --yolo window architect claude wt-arch task --dangerously-skip-permissions +window senior-dev claude wt-senior task --model claude-opus-4-8 --effort high ``` +One special token is intercepted before passthrough: + +| Token | Applies to | Effect | +|-------|-----------|--------| +| `advisor=` | claude only | writes `advisorModel` into the worktree's `.claude/settings.local.json` instead of being passed as a CLI flag | + You can define as many windows as your project needs. Each `role` maps to a corresponding prompt file at `swarmforge/roles/.prompt`, so a config containing `architect`, `coder`, `reviewer`, `research`, and `release` windows would expect: - `swarmforge/roles/architect.prompt` diff --git a/bb.edn b/bb.edn index def1fe3..1b99978 100644 --- a/bb.edn +++ b/bb.edn @@ -7,4 +7,6 @@ (require 'swarmforge.script-test) (let [{:keys [fail error]} (clojure.test/run-tests 'swarmforge.handoff-test 'swarmforge.script-test)] - (System/exit (+ fail error))))}}} + (System/exit (+ fail error))))} + fork-test {:doc "Run fork extension tests (fork.bb overrides)" + :task (shell "bb" "test/fork_runner.bb")}}} diff --git a/docs/adr/0001-permanent-fork-synced-by-merge.md b/docs/adr/0001-permanent-fork-synced-by-merge.md new file mode 100644 index 0000000..ef0d616 --- /dev/null +++ b/docs/adr/0001-permanent-fork-synced-by-merge.md @@ -0,0 +1,11 @@ +--- +status: accepted +--- + +# Permanent fork of unclebob/swarm-forge, synced by merge + +This repo is a permanent fork of `unclebob/swarm-forge` (remote `upstream`); nothing is contributed back. Upstream moves fast, so we keep current by **merging** `upstream/` into our branches — never rebasing — because the fork is published/shared and rebasing would rewrite shared history and re-surface every conflict on each sync. `git rerere` is enabled (`rerere.enabled`, `rerere.autoupdate`) so conflict resolutions replay automatically. Every divergence should be **additive** (a new file or an appended rule) and recorded as its own ADR in this directory; a non-additive edit to an upstream line is a conscious, documented cost. Two branches are maintained: `main` (shared scripts + these docs) and `six-pack` (runnable: role prompts, `swarmforge.conf`, templates). + +Because the fork can be hard-reset back to a pristine upstream commit (see the `backup/*-pre-reset` branches), the merge history that would otherwise encode the integration point is not a dependable anchor. The upstream baseline each branch's fork layer is re-applied onto is therefore recorded **explicitly**: a SHA line in `docs/fork-change-manifest.md` and an annotated `fork-base/-` tag at each sync, both surviving a reset. + +Two merge styles, by source: **fork divergences are squash-merged** — every divergence PR lands as a single commit, so the fork layer reads as one clean, revertible, re-appliable commit per divergence. **Upstream is integrated by a history-preserving merge** — never squashed and never rebased, so upstream's commit story stays intact and `rerere`-replayable. A landed commit is never rewritten afterward. diff --git a/docs/adr/0002-idle-gate-and-clear-first-delivery.md b/docs/adr/0002-idle-gate-and-clear-first-delivery.md new file mode 100644 index 0000000..ce6b8f8 --- /dev/null +++ b/docs/adr/0002-idle-gate-and-clear-first-delivery.md @@ -0,0 +1,22 @@ +--- +status: accepted +--- + +# Idle gate and clear-first delivery + +The fork uses upstream's handoff harness as-is (queue, scripts, `logbook.jsonl`); the only engine discrepancy is **delivery**. Upstream does setup work at startup and never clears context between tasks — it types each handoff straight into the terminal and lets the terminal buffer it whether the agent is working or not. The fork instead requires every role to (1) do nothing until it receives a handoff and (2) start each task from a cleared session. + +**Idle gate** — a prompt rule ("Wait for a handoff. Do not act without one.") plus removal of the startup-install directives from role prompts (install work moves to a separate setup skill). Additive prompt edits. + +**Clear-first delivery** — `/clear` clears the session for **any** agent, so it cannot be sent to a working agent. Delivery therefore must know whether the receiver is idle or busy. Upstream tracks no such state, so the fork adds a minimal per-role **idle/busy marker**. Delivery then has two cases, both running `/clear` → re-inject the role bundle → send the task message: + +- receiver **busy** — the handoff waits in upstream's queue (`.swarmforge/handoffs/queue/`); the receiver's **Stop hook** delivers it when the agent next stops. +- receiver **idle** — deliver immediately, because no stop will occur to trigger the hook. + +The marker is set *busy* when a delivery starts and *idle* when the Stop hook finds the queue empty; the hook re-checks the queue before declaring idle to close the narrow "went idle just as a sender judged it busy" race. + +**Re-injection is universal.** `/clear` wipes the session regardless of backend, so the role bundle is always re-sent after `/clear`. + +**Delivery engine.** The central `swarm-handoff` script handles queueing, busy checking, and backend-aware delivery. The engine writes every outgoing message to a pending queue, checks the busy marker, and either delivers immediately (idle) or exits and lets the Stop hook drain the queue later (busy). The delivery path is backend-aware: for `claude` roles, `/clear` + re-inject is sent before the handoff body; for other backends, the handoff is delivered directly until their own clear-first mechanism is built. + +Ready is implicit (idle + empty queue = ready). Upstream's startup "I'm awake" ping is kept only as an operator-visible **presence** signal — stamped a distinct `presence` type and excluded from the clear-first path, so the Stop hook never clears for it. diff --git a/docs/adr/0003-setup-is-a-one-time-skill.md b/docs/adr/0003-setup-is-a-one-time-skill.md new file mode 100644 index 0000000..7d4f295 --- /dev/null +++ b/docs/adr/0003-setup-is-a-one-time-skill.md @@ -0,0 +1,23 @@ +--- +status: accepted +--- + +# Setup is a one-time skill, not in-execution work + +Adapting a project to the swarm — installing the project's language quality tools (mutation, CRAP, DRY, the Acceptance Pipeline commands), enabling session tracking, and granting the permissions the agents need — lives in a **`setup-swarm` skill** that ships inside the swarm install. It is the operator's *first* action on a project (`/setup-swarm`); the run path does no project provisioning. (Installing the swarm's *own* pinned `entire` skills is launcher bootstrap, not project setup — that belongs to ADR 0018.) + +**Setup runs first; the run path only guards.** `setup-swarm` writes a **swarm-ready marker** (`.swarmforge/setup-complete`) when it finishes. `./swarm` checks that marker before launching any role and, if it is absent, refuses and tells the operator to run `setup-swarm` first — it never runs setup itself. (An earlier design had `./swarm` auto-run setup on first launch; that is superseded — setup is an explicit operator step and the launcher merely verifies it happened.) `./swarm` still fetches its own code when missing, bootstraps its own pinned skills (ADR 0018), and does per-launch plumbing (worktrees, sessions, copying constitution files); it never adapts the *project* to its stack. The operator deletes the marker to force a re-run. + +**The only edits to upstream files are four role-prompt lines.** The "At startup, install the language tools" directives in `coder`, `QA`, `cleaner`, and `hardender` are removed; that install work moves into the setup skill and runs once. ADR 0002 already removes these same lines for the idle gate (a role does nothing until handed off); here they go for a second, complementary reason — tool install is a one-time setup step, not per-task startup work. The removal is the seam between the two decisions; neither owns it alone. + +**Why a skill rather than functions added to the launch script.** A skill is a new fork-owned file, so it adds zero upstream merge-conflict surface — exactly the additive divergence ADR 0001 asks for. Adding setup functions inside `swarmforge.sh` would instead edit an upstream-tracked file, a permanent conflict point on every sync. A skill also lets setup *reason about the stack* (which tools for Go vs Java vs Clojure, which gates matter), which a deterministic script cannot. + +**Why replace rather than overlay.** Setup is an explicit one-time step; the run path stays pure "start the agents." The accepted cost is that the swarm no longer self-installs project tooling on first run — the operator runs the setup skill once before the first `./swarm`. Any setup step this moves out of the run path is named and documented so the divergence stays auditable. + +**Setup also lays down the project scaffold.** Beyond tooling, the skill writes the one-time repository scaffold the swarm assumes: a `.gitignore` covering the swarm's runtime artifacts (`logbook.jsonl`, `tmp/`, `.swarmforge/`), the project's default branch probed once (`git symbolic-ref refs/remotes/origin/HEAD`) and recorded in `swarmforge.conf`, and a small, targeted set of permission allow-rules in `.claude/settings.json` (for example `Bash(gh pr merge*)` for the integrator, `Bash(git reset --hard origin/)` for the specifier). Under autonomous permission mode (ADR 0019) those allow-rules are advisory hints rather than a load-bearing whitelist, so the set is kept deliberately small. + +## Pending implementation + +- The `setup-swarm` skill, shipped at `swarmforge/skills/setup-swarm/` (mirroring `agent-retro`): it reasons about the stack and writes the project tooling, session tracking (`entire enable …` plus `entire agent add ` per `swarmforge.conf` backend), the permission allow-rules, and the `.gitignore`/default-branch scaffold, then writes the marker. *How* it detects the stack is the skill's own domain — deliberately not prescribed here, since reasoning about the stack is the whole reason setup is a skill and not a script. +- `main`: `./swarm` checks `.swarmforge/setup-complete` before launching roles and refuses (with a message to run `setup-swarm`) if it is absent. +- The `entire` skill install is **not** part of this skill — it is launcher bootstrap (ADR 0018). diff --git a/docs/adr/0004-rework-routes-back.md b/docs/adr/0004-rework-routes-back.md new file mode 100644 index 0000000..62e9893 --- /dev/null +++ b/docs/adr/0004-rework-routes-back.md @@ -0,0 +1,21 @@ +--- +status: accepted +--- + +# Rework routes back to its cause + +Upstream fixes a problem wherever it is found — the QA role's prompt says plainly "fix bugs found by the QA suite." That keeps the line moving but lets fixes pile up downstream of the stage that caused them, and the responsible stage never learns it did its job wrong. The fork instead sends the work **back to the stage whose decision it exposes as flawed**, so the fix lands at the cause. + +The trigger is not only a defect. Any finding that an earlier stage's work must change routes back — a failing behavior (a bug), a refactor blocked because the structure rests on a bad earlier decision, or a design/spec revision surfaced when a later stage tries to hold a behavior the specification can't carry. A defect is the most obvious case, not the only one. + +**Only structural rework routes back.** It routes back when resolving it means re-opening an earlier stage's job — an ambiguous or missing specification, a weak or missing acceptance test, a design that can't hold the behavior. The stage that owns that work gets it back and corrects the root cause. **Local** work — anything the finder can resolve without re-opening an earlier stage's decision — stays with the finder. Routing a contained, local change backward only adds a round trip and teaches no one. + +**Two caps, at two scopes.** A *single finding* routes back to its cause **at most once**: if it returns still unresolved, the finder resolves it in place and flags it, so two stages never volley the same item. Independently, a *feature* tolerates **at most three back-route cycles total** (depth cap N=3), tracked by a routing count carried in the handoff trail; after the third the routing role stops and asks the user rather than looping. The first cap stops ping-pong on one issue; the second stops a feature from churning through endless distinct bounces. (The role prompts — ux-engineer, integrator — carry the N=3 feature-level cap.) + +## Pending implementation + +- How a finding is attributed to an origin stage (the line must be able to trace it back to the spec, test, or design that owns it). + +## Implementation notes + +- Back-routing always uses `git_handoff` with the sender's current branch HEAD as the commit, even when the sender authored no functional lines. Two distinct rules (forwarding vs. back-routing) replace the old single "no functional change" block in `swarmforge/constitution/articles/handoffs.prompt`. diff --git a/docs/adr/0005-qa-refutes-not-confirms.md b/docs/adr/0005-qa-refutes-not-confirms.md new file mode 100644 index 0000000..cc5ab72 --- /dev/null +++ b/docs/adr/0005-qa-refutes-not-confirms.md @@ -0,0 +1,19 @@ +--- +status: accepted +--- + +# QA refutes rather than confirms + +Upstream QA verifies that the accepted specification is met and fixes what fails — a *confirm* posture. It converts the specifier's written QA procedures into executable scripts and runs them through the real user interface. The fork flips the posture: QA assumes the build does **not** meet the spec and the acceptance tests are too weak to notice, until it proves otherwise. Its job is to make the claim "this meets the spec and the tests prove it" *fail*. + +**Refute against the spec, not beyond it.** QA attacks the specified contract — it hunts specified-but-untested behavior, proves the acceptance tests too weak to catch a real violation, and throws inputs designed to break the specified behavior. It does **not** invent new requirements. A genuinely unspecified gap it stumbles on is not a QA pass/fail; it is a finding that routes back to the specifier. This keeps QA adversarial but bounded, so it never blocks the line on behavior no one agreed to. + +**Conversion fidelity.** When QA turns the specifier's written procedures into executable scripts, the script must encode the procedure's full intent — not a weakened version that passes. QA refutes its *own* conversion. This is the highest-leverage guard in the line because the QA end-to-end suite is the one suite the hardener's mutation testing explicitly does not cover: a weak conversion ("test theater" — a green test that asserts nothing real) that hides there is caught by nothing else. + +**Findings route back; QA owns the attack, not the routing.** A structural weakness QA surfaces routes back to its cause (a weak acceptance test or an ambiguous spec → the specifier); a local defect QA fixes in place — per the back-routing decision. Refuting QA is the engine that *generates* structural findings; it needs no routing rule of its own. + +## Pending implementation + +- Prompt change on `six-pack`. +- Conversion fidelity is made auditable by the surface-harness conversion rule (ADR 0010): every Expected bullet maps to a harness assertion or is marked `NOT AUTOMATED — `, so a dropped bullet is visible rather than taken on QA's word. +- Whether QA's converted end-to-end suite should itself be mutation-tested (the hardener currently ignores it) — the objective way to detect a theatrical conversion rather than relying on QA's self-judgment. diff --git a/docs/adr/0006-harness-enforced-holdout.md b/docs/adr/0006-harness-enforced-holdout.md new file mode 100644 index 0000000..e6ff310 --- /dev/null +++ b/docs/adr/0006-harness-enforced-holdout.md @@ -0,0 +1,23 @@ +--- +status: accepted +--- + +# Harness-enforced holdout of the QA suite + +Upstream holds the end-to-end QA suite back from the coder by prompt instruction alone: the coder's prompt says "ignore the specifier's end-to-end QA suite," but the files sit in the coder's own worktree (every worktree is `git worktree add -B … HEAD`, a full checkout of the commit the specifier wrote the suite into). The wall is honor-system. The fork makes it **mechanical**: the QA suite is physically absent from the worktree of every role that shapes the implementation, so "ignore it" becomes "cannot reach it." + +**Why mechanical, not instructional.** The verification-loop reference is explicit that the scenario suite is a *holdout* — "never visible to the code generation agent" — and names the failure mode directly: "holdout leakage … must be enforced architecturally (filesystem isolation, separate repos, access controls)," not by a prompt. A holdout the implementer can read is a holdout the implementer can quietly fit to; the suite then stops being a blind test and QA running it proves nothing. This is the prevention layer that the detection layers (mutation testing + refuting QA, ADR 0005) cannot supply: detection catches a gamed suite after the fact; the wall stops the gaming. + +**Mechanism: `git sparse-checkout`, not file deletion.** The worktree-prep step the harness already runs sets a sparse-checkout on each role worktree that excludes the QA-suite path. Sparse-checkout makes the file *absent from disk but still tracked in the commit* — so the role cannot read it, yet its commit cannot accidentally drop it downstream. Naive deletion (`rm` from the worktree) was rejected for exactly this reason: the role commits with `git add`, the deletion gets staged, and the suite vanishes for QA. A separate QA-only branch was rejected as more flow change for no extra protection. + +**Scope: hide from implementers, keep for author and verifier.** The exclusion applies to every worktree *except* the **specifier's** (it authors the suite) and **QA's** (it runs the suite — it is the verifier). Key the exclusion on the specifier *role*, not a fixed worktree name: it is the `master` worktree on upstream today, but ADR 0008 moves the specifier to its own `specifier` worktree, and this rule must follow it. Coder, UX Engineer, cleaner, architect, and hardener all touch the implementation before QA and so are walled. The integrator never touches implementation; its worktree is irrelevant either way. + +**Precondition: a fixed QA-suite path.** For the harness to exclude the suite it must live at a deterministic path; the specifier writes the end-to-end QA suite under a pinned location (e.g. `qa/`). This is the only added convention. The existing coder-prompt "ignore it" line stays as defense-in-depth. + +**Scope boundary: only the end-to-end QA suite.** The Gherkin acceptance tests and the acceptance pipeline stay fully visible — the coder builds and runs them. The holdout is the specifier's end-to-end QA suite alone. + +## Pending implementation + +- Add the sparse-checkout exclusion to the worktree-prep step (`six-pack`/scripts), keyed to skip the specifier's worktree (whatever its name — `master` today, `specifier` once ADR 0008 lands) and QA's. +- Pin the end-to-end QA-suite path in the specifier prompt. +- Confirm sparse-checkout interacts cleanly with the coder→cleaner→…→QA handoff commits (the excluded path must survive each role's commit untouched). diff --git a/docs/adr/0007-ux-engineer-role.md b/docs/adr/0007-ux-engineer-role.md new file mode 100644 index 0000000..1c8d1bf --- /dev/null +++ b/docs/adr/0007-ux-engineer-role.md @@ -0,0 +1,25 @@ +--- +status: accepted +--- + +# UX Engineer role and UX Intent + +Upstream has no UX role — nothing in the line owns whether the product is *usable*, only whether it is correct. The fork adds a **UX Engineer** (six-pack only) that runs the built product and **fixes** visual and usability mismatches in rendering code, leaving a regression check behind. It is an engineer, not a flag-only reviewer: the fork's pattern is that every stage fixes in place and leaves a durable artifact, so a report-only role is the anti-pattern it rejects. + +**It checks against UX Intent.** The specifier authors a **UX Intent** section inline in the feature file — concrete, observable statements of what the feature should look and feel like. UX Intent is part of the swarm and travels with the feature. A feature with no UX Intent is the signal to skip: the UX Engineer passes straight through to the next stage, the same "no work, no handoff" pattern used elsewhere. + +**Optional design inputs are referenced, not owned.** When a project supplies design artifacts — a DESIGN.md (visual system), an EXPERIENCE.md (interaction and feel), mockups (concrete visual targets) — the specifier **references** them from the feature file, and the UX Engineer consults them alongside UX Intent. These are optional project inputs; the swarm neither defines, scaffolds, nor requires them, and does not walk the directory tree to discover them — the only link is an explicit reference from the feature file. + +**It also enforces a universal visual-quality bar, independent of any project input.** Beyond UX Intent and any DESIGN.md, the role applies a fixed standard the prompt enumerates: AI-aesthetic anti-patterns (unjustified default purple/indigo, gradient noise, uniformly maximal rounding, oversized equal padding, shadow-heavy chrome, missing loading/error/empty states), type-hierarchy rules (primary content must dominate; no skipped heading levels), and colour rules including **WCAG contrast minimums (4.5:1 normal, 3:1 large) and "colour is never the sole state indicator."** These hold even when a feature has no UX Intent and the project has no DESIGN.md — they are the floor, not project preferences. + +**It leaves a durable, re-runnable artifact.** "Fixes, leaves a regression check behind" is concrete: per verified flow the role commits re-runnable scenarios to `observation-harness/` using the project's surface tool (ADR 0010), plus golden-file snapshots per state and rendering invariants for structural properties. These are the permanent regression record and must pass against the committed code — and QA re-executes them downstream (ADR 0010), routing back here if a user-facing surface has none. + +**Framework-agnostic.** The role defines the *class* of check — the running product matches its stated UX — and leaves the specific visual-testing tool to the project's constitution. No terminal-UI assumptions live in the role. + +**Placement and routing.** The UX Engineer sits immediately after the coder, so the downstream roles (cleaner, architect, hardener, QA) see implementation and rendering code together in one pass rather than running twice. When a mismatch cannot be fixed in rendering alone and needs a model-state change, it routes back to the coder — using the back-routing rule already decided (`0004`), not a separate mechanism. The back-route message carries what UX Intent says, what the implementation does, what must change, and the current routing count; the role observes the N=3 feature-level cap (`0004`) and stops to ask the user after the third cycle. + +## Pending implementation + +- Six-pack only: new `ux-engineer` role prompt; UX Intent authoring in the specifier and the feature template; coder reads UX Intent; `swarmforge.conf` adds the window after the coder. +- Routing follows `0004`; durable artifact (`observation-harness/`, snapshots, rendering invariants) follows `0010`. +- DESIGN.md is referenced from the feature file only — the specifier does not scaffold it and the ux-engineer does not walk the tree to find it. diff --git a/docs/adr/0008-integrator-role.md b/docs/adr/0008-integrator-role.md new file mode 100644 index 0000000..2a83ae2 --- /dev/null +++ b/docs/adr/0008-integrator-role.md @@ -0,0 +1,24 @@ +--- +status: accepted +--- + +# Integrator role lands work behind a CI gate + +Upstream has no integrator: when QA signals done, the **specifier** merges the work ad hoc (a local `git merge`) and asks for the next feature. There is no gate between "QA passed" and "landed on the main branch." The fork adds a dedicated **integrator** as the terminal stage of the line that owns *landing* the work — and nothing lands except through a green CI gate. + +**Landing is PR + CI, with no fallback.** From the QA-approved commit the integrator opens a pull request, watches CI, and merges only when CI is green; then it runs a **post-merge gate** — it watches the resulting main-branch CI run and, if the project defines a full verification suite, runs that on green too — before handing off. It never merges locally — a local merge is exactly what the specifier already did, so the integrator's whole value is that the main branch only ever receives green-CI'd work. **CI is therefore a hard precondition, not optional:** a project without CI is not swarm-ready, and ensuring CI is in place belongs to project setup (`0003`). + +**It hands off to the curator.** The integrator is the last *code* stage, but not the last stage: on a green landing it notifies the **curator** (ADR 0013), which promotes the run's retro knowledge and only then releases the specifier for the next feature. + +**One PR per feature.** Rework updates the same PR; a second PR is never opened for the same feature. + +**Failure routing reuses back-routing.** A CI failure routes to the role that owns it — a failing test to the coder, a failing cleanliness gate to the cleaner, a failing architecture check to the architect; a trivially autofixable failure (lint/format) the integrator fixes in place on the PR branch and re-runs. This is the back-routing rule already decided (`0004`) with the integrator as the finder, capped at N=3 (`0004`): it tracks the cycle depth by counting its own failure comments on the PR, and after three it posts a final `FAILED: depth cap reached` comment and stops rather than looping. The post-merge gate's CI-red is routed the same way as pre-merge. + +**The specifier stops merging.** Merging moves entirely to the integrator, so the specifier no longer needs the main checkout — it moves from the `master` worktree to its own worktree and starts each feature from a clean reset to the default branch. + +## Pending implementation + +- Runnable branch (`six-pack`): new terminal `integrator` role; `swarmforge.conf` window; specifier worktree change and removal of its merge step. (four-pack is frozen per ADR 0001 / the change manifest.) +- The PR/CI mechanism (platform, e.g. `gh`) named at implementation. +- CI-in-place enforced as a setup precondition (`0003`); routing per `0004`. +- Terminal handoff target is the curator (`0013`), not the specifier; autofixable lint/format is the integrator's only allowed code change. diff --git a/docs/adr/0009-feature-file-spec-header.md b/docs/adr/0009-feature-file-spec-header.md new file mode 100644 index 0000000..2262dd6 --- /dev/null +++ b/docs/adr/0009-feature-file-spec-header.md @@ -0,0 +1,20 @@ +--- +status: accepted +--- + +# Feature files open with a structured spec header + +Upstream feature files are pure Gherkin: a `Feature:` line, then scenarios. The fork prepends a **structured spec header** — a block of comment sections the specifier fills in before writing any scenario, captured in a template (`swarmforge/templates/feature.feature`) that the specifier starts every feature from. + +The header is the **spec-authoring layer** the reference verification loop puts ahead of the scenarios (its Step 1): the Gherkin scenarios are the contract *by example*, but they cannot state what is out of scope, what was assumed, what non-functional targets apply, or what side effects must be observed. The header carries exactly that — the WHAT/WHY around the examples — so those concerns are stated once, up front, where every downstream role reads them. + +**Sections (seven base):** `TRACKING` (traceability to an issue), `CONTRACT` (every input, every response shape and status, fields deliberately absent), `CONSTRAINTS` (dataset bounds, validation, exclusions), `SEQUENCING` (ordering / async dependencies, defaults `none`), `NFR` (latency, idempotency key+window, in-flight UI, error distinguishability), `SIDE EFFECTS` (public-contract changes, derived artifacts to regenerate, defaults `none`), `SCOPE` (`Does NOT:` exclusions and `ASSUMED:` assumptions). Each section pairs an `Ask:` (the questions that elicit it) with a `Format:` (how to write the answer). + +**Six-pack adds an eighth section, `UX INTENT`**, with four dimensions — Visual Composition, Information Hierarchy, Interaction Feel, State Transitions — written as concrete observable statements. Its content and semantics are owned by ADR 0007; the header is merely its home in the feature file. It is six-pack-only because the UX Engineer that consumes it is six-pack-only. + +**Address every section; do not fill every section.** `SEQUENCING`, `SIDE EFFECTS`, and (six-pack) `UX INTENT` default to `none`. `none` is a deliberate answer, not a skipped one — and for `UX INTENT`, `none` is the signal that tells the UX Engineer to pass through (ADR 0007). The sections are comments (`#`), so the Gherkin parser ignores them and the acceptance pipeline is unaffected. + +## Pending implementation + +- Template already drafted on `six-pack` (8 sections, with `UX INTENT`); land it. (four-pack is frozen per ADR 0001 / the change manifest — it keeps pure Gherkin, no header.) +- Specifier phase 1 starts from the template and addresses all header sections before scenarios. Fix the stale count in the **six-pack** specifier prompt: it says "complete all seven header sections" but the six-pack template has eight — change to "eight" (or "all"). diff --git a/docs/adr/0010-surface-harness-doctrine.md b/docs/adr/0010-surface-harness-doctrine.md new file mode 100644 index 0000000..a34c4be --- /dev/null +++ b/docs/adr/0010-surface-harness-doctrine.md @@ -0,0 +1,28 @@ +--- +status: accepted +--- + +# Live verification runs through a declared surface harness + +Two defects (a screen blink and a runaway key-repeat) once survived a 250-scenario, eight-role pipeline. The cause was structural: no gate ever drove the *running* system through its real production interface — every check ran below the surface, against functions and return values. The fork closes this with a **surface harness doctrine**: the roles that own live verification drive the running system through its actual surface, using a declared tool, and every surface carries a permanent idle baseline. + +This is the reference verification loop's execute-and-observe layer (its Steps 5–7) made concrete: build the real thing, drive it through its surface, assert on what comes out. + +**Surface tool table (in `engineering.prompt`).** Following the existing language-tool-table pattern, the constitution declares the harness tool per surface type: tmux/PTY for a TUI (`send-keys -l` for raw input at controlled timing, `capture-pane` for screen state over time), Playwright for web, an HTTP client for HTTP APIs, event-injection-at-ingress for headless services. Roles owning live verification — **QA** and the **UX Engineer** (six-pack, ADR 0007) — identify the project's surface *from the codebase* and acquire the matching tool before their first harness run, exactly as they acquire language tools. + +**No surface field in `project.prompt`.** Roles read the code to know the surface; an explicit declaration would be a meaningless placeholder until the project is customised. + +**Every surface carries a mandatory baseline scenario**, committed alongside the flow scenarios: TUI → idle stability (no input, consecutive captures identical, zero scrollback growth); web → idle page loads with no console errors; headless → a no-op event produces no state change. The baseline is what the tetris defects would have hit — they were *idle-state* failures invisible to any flow test, because flow tests only assert while the user is acting. + +**The harness scenarios are committed and re-run, not throwaway.** Per verified flow, the live-verification role commits re-runnable scenarios to a project `observation-harness/` directory using the surface tool — alongside the per-surface baseline — as a permanent regression record that must pass against the committed code (on six-pack the UX Engineer authors these, ADR 0007; it also adds golden-file snapshots per state and rendering invariants for structural properties). **QA re-executes the committed `observation-harness/` scenarios before its own final verification**, and routes back (ADR 0004) if a user-facing surface exists but has no scenarios. This is what makes the surface check durable: a defect fixed once stays fixed because its scenario re-runs every cycle. + +**QA verifies through the declared surface harness, not "the UI" (idea Q).** Upstream QA's "operate through the user interface only" was right in intent but mechanically silent — it let in-process function calls masquerade as UI verification. The fork replaces the phrase with "through the declared surface harness," and adds an auditable conversion rule: **every Expected bullet maps to a harness assertion, or is explicitly marked `NOT AUTOMATED — `.** This is the mechanism that makes the conversion-fidelity guard of ADR 0005 checkable rather than a matter of QA's word — a silently dropped bullet becomes a visible marker. Findings route back per ADR 0004. + +**The hardener pins pure rendering with property tests.** Where rendering is a pure function of state (`state → string`), the hardener writes property-based tests over that function — the structural complement to the UX Engineer's golden snapshots and rendering invariants. A snapshot pins one concrete state's output; a property pins the rule across the input space (every state renders without truncation, every cell stays within bounds), catching the rendering defects that no single captured frame happens to exhibit. + +## Pending implementation + +- Add the surface tool table + context-driven acquisition rule to `engineering.prompt` on `six-pack`. (four-pack is frozen per ADR 0001 / the change manifest; all `six-pack`-only below for the same reason.) +- Change QA's "through the UI only" to "through the declared surface harness" and add the Expected-bullet → assertion / `NOT AUTOMATED` rule in `QA.prompt` (`six-pack`). +- Require the per-surface baseline scenario to be committed with every feature's flow scenarios. +- `six-pack`: add the rendering-invariant property-test rule for pure rendering functions to `hardender.prompt`. Source: recover from `backup/six-pre-reset`. diff --git a/docs/adr/0011-dependency-fidelity-manifest.md b/docs/adr/0011-dependency-fidelity-manifest.md new file mode 100644 index 0000000..f967718 --- /dev/null +++ b/docs/adr/0011-dependency-fidelity-manifest.md @@ -0,0 +1,20 @@ +--- +status: accepted +--- + +# Dependencies are declared by fidelity tier in a manifest + +A scenario that rests on an emulated dependency the emulator does not actually implement passes green and proves nothing — the system was never exercised against the behavior the scenario claims to cover. The fork makes dependency fidelity **explicit and refusable** through a new constitution sub-file, `swarmforge/dependency-manifest.prompt`, that declares every dependency beyond the system itself by fidelity tier. This is the reference loop's Digital-Twin discipline: a twin is only trustworthy if its fidelity — and its gaps — are stated. + +**A separate constitution file, not `project.prompt`.** The manifest holds project-specific dependency data that would clutter `project.prompt`; it lives in its own file, auto-resolved by the same bundle resolver as the other constitution sub-files. It ships on `six-pack` (four-pack is frozen) and defaults to `(none)` — a project with no external dependencies declares nothing. + +**Three tiers (the system itself is always implicit).** Tier 1 — owned infrastructure run locally as the real engine (e.g. Postgres in Docker). Tier 2 — stateful, protocol-level emulation (preference order: vendor-official emulator > established third-party > a swarm-built twin only as last resort). Tier 3 — external domain the swarm does not own (third-party APIs, other teams' services), wire-level stubbed against a referenced contract. Entry format: `name: tier N; implementation; gaps: `. + +**Declared gaps are machine-readable and binding.** The specifier and QA must not write or accept scenarios that rest on a declared gap — so a known emulator limitation can never masquerade as covered behavior. Supporting rules: every harness scenario starts from a declared seed state and resets dependency state between scenarios; tier-2/3 dependencies must expose post-interaction state for assertion (the message landed in the emulator's outbox), so scenarios assert *effects*, not only the system's own surface; and a swarm-built twin must not be authored by the role that wrote the system code it emulates, and must be validated against recorded real traffic or the vendor's official SDK tests. + +**The specifier owns the manifest.** Before writing scenarios it reads the manifest; if a feature touches an external system not yet declared, it stops, proposes name/tier/implementation/gaps to the user, and waits for approval before adding the entry — tier assignment is an architectural decision the user must own, mirroring the other specifier approval gates. + +## Pending implementation + +- Add `swarmforge/dependency-manifest.prompt` (tier definitions inline, body `(none)`) on `six-pack`. (four-pack is frozen per ADR 0001 / the change manifest.) +- Add the read-manifest / propose-on-undeclared rule to `specifier.prompt` (`six-pack`); QA's refusal of gap-resting scenarios is part of refuting QA (ADR 0005). diff --git a/docs/adr/0012-per-role-model-effort-advisor.md b/docs/adr/0012-per-role-model-effort-advisor.md new file mode 100644 index 0000000..430073f --- /dev/null +++ b/docs/adr/0012-per-role-model-effort-advisor.md @@ -0,0 +1,24 @@ +--- +status: accepted +--- + +# Per-role advisor model in `swarmforge.conf` + +Different roles benefit from different advisor models for in-editor suggestions. Upstream added generic `extra-cli-args` passthrough for per-role agent flags (model, effort, etc.), so those no longer need fork-specific handling. The one remaining fork addition is `advisor=X` — there is no `--advisor` CLI flag; it must be written into the worktree's `.claude/settings.local.json`. + +**Syntax: `advisor=` as an extra token on the window line.** It is intercepted before passthrough; all other extra tokens are forwarded to the agent CLI verbatim. + +```conf +# model/effort are raw extra-args (upstream passthrough) +window specifier claude specifier task --model claude-opus-4-8 --effort xhigh advisor=claude-sonnet-4-6 +window coder claude coder task --model claude-sonnet-4-6 + +# advisor only (no model override needed) +window architect claude architect advisor=claude-sonnet-4-6 +``` + +| Token | Applies to | Effect | +|-------|-----------|--------| +| `advisor=` | claude only | writes `advisorModel` into the worktree's `.claude/settings.local.json` — there is no `--advisor` CLI flag | + +**Implementation:** `parse-config` strips `advisor=X` tokens from the trailing fields before building `extra-args`; the extracted value is stored as `:advisor` on the row. `write-worktree-settings!` in `fork.bb` writes it to settings.local.json at launch time. diff --git a/docs/adr/0013-curator-knowledge-promotion.md b/docs/adr/0013-curator-knowledge-promotion.md new file mode 100644 index 0000000..12e7908 --- /dev/null +++ b/docs/adr/0013-curator-knowledge-promotion.md @@ -0,0 +1,28 @@ +--- +status: accepted +--- + +# Curator role and the knowledge-promotion loop + +Upstream ends the line at QA: the specifier merges and asks for the next feature, and whatever the run *learned* — a wrong path taken, a convention discovered, a gate that should have existed — lives only in a session retro that no one reads again. The fork adds a terminal **curator** role, after the integrator, that turns those retros into **versioned repo knowledge** via one self-merging PR per run, then releases the specifier for the next feature. + +**Pipeline position: integrator → curator → specifier.** The integrator notifies the curator on a green landing; the curator promotes the run's knowledge and only then notifies the specifier. An empty run (no unprocessed retros) notifies the specifier immediately with no PR — the line never stalls on the curator. The curator makes no code changes; it may only write `AGENTS.md` and files under `.agents/` (ADR 0014). + +**Capture everything; discard once, at the curator.** The retro skill tags every action with a scope — `project | swarmforge | skill | ephemeral` — and captures all of them without filtering for "obviousness." The single discard gate is the curator's **non-inferable check**: could a future agent reach this fix from the error output and the files it names, with no foreknowledge? If yes, it is not worth promoting. Putting the one filter here, not at capture, means nothing is lost before a consistent judge sees it. + +**Promote to the highest rung that fits (the routing ladder).** A mechanical fix (config line, CI gate, script guard) goes to the enforcement-gate backlog — a gate beats documentation. Otherwise: `AGENTS.md` for universal invariants, `.agents/roles/.md` for one role's operational knowledge, `.agents/references/.md` for deep dives (each needs a pointer line or it never loads), `.agents/skills//` only on the second occurrence of a need, `.agents/upstream/.md` for `swarmforge`-scoped items, ledger-only for ephemeral and rejected. A learning whose fix is global routes *up* the ladder, never into `AGENTS.md`, and is discarded only when the gap is already mechanically closed. Every item is rewritten from a phenomenon ("X can fail because Y") into a rule ("every X MUST Z because Y") before it is promoted. + +**The ledger is the append-only audit.** `.agents/ledger.md` records one never-pruned line per processed item — `date | session-id | role | failure-class | verdict` — so recurrence is provable: an item rejected before and now recurring has proven itself non-trivial and is promoted rather than rejected again. + +**The curator self-merges from day one.** The knowledge PR is merged in-role with no user confirmation; the PR body (a metric line plus one verbatim bullet per promoted rule) and the ledger are the asynchronous review surface. Budgets hold the knowledge small: `AGENTS.md` ≤ 60 lines, each role file ≤ 40 — over budget, the stalest or now-inferable lines are pruned and ledgered. + +**Loop health is self-reported.** Each PR body carries running totals (`promoted | rejected | upstream | ephemeral`). Kill criterion: fewer than three promotions that survive contact with later sessions over 90 days → disable the curator window; the ledger and promoted docs stay. + +**Retros are captured automatically, from the transcript, before idle.** The loop only has something to promote because every role runs `agent-retro` as its last step before going idle — a line added to every role prompt — so a retro is produced for each role-session with no one asking. The skill reconstructs the session from its transcript rather than the role's from-memory account: it extracts via the `entire` CLI (`entire session current` → `session info --transcript`), falling back to Claude Code's `~/.claude/projects/` transcript path when `entire` is absent. Grounding the retro in the transcript is what lets the curator (and `retro-triage`, ADR 0021) judge against what actually happened, not what the role remembers happening. + +## Pending implementation + +- `six-pack` (four-pack is frozen per ADR 0001 / the change manifest): new `curator` role prompt; `swarmforge.conf` gains the curator window (last); rewire — integrator notifies the curator, specifier waits on the curator before the next feature, `workflow.prompt` documents the integrator→curator→specifier chain. +- `main`: upgrade the `agent-retro` skill — scope tag on every action, capture-first (no pre-filter), and an autonomous mode that marks actions `pending-curation` without prompting a human. +- `main`: `agent-retro` transcript capture (`entire session current` → `session info --transcript`, with the `~/.claude/projects/` fallback); add the "run `agent-retro` before going idle" line to every role prompt. Source: `feat/issue-20-a-retro-skill-upgrade`. +- Pairs with ADR 0014 (the `.agents/` contract the curator writes and the launcher injects). diff --git a/docs/adr/0014-agents-knowledge-injection.md b/docs/adr/0014-agents-knowledge-injection.md new file mode 100644 index 0000000..eea76c3 --- /dev/null +++ b/docs/adr/0014-agents-knowledge-injection.md @@ -0,0 +1,17 @@ +--- +status: accepted +--- + +# `.agents/` knowledge contract injected into every bundle + +Promoted knowledge is worthless if it never reaches the agent that needs it. Upstream bundles only the constitution and the role prompt into an agent's context, so there is no channel for project-specific, accumulated knowledge. The fork defines a versioned knowledge contract in the project repo and **injects it into every role bundle at launch**, closing the loop the curator (ADR 0013) feeds. + +**The contract lives in the project repo, under `.agents/` plus a root `AGENTS.md`.** `AGENTS.md` is the navigation map and universal invariants (≤ 60 lines); `.agents/roles/.md` is one role's operational knowledge (≤ 40 lines); `.agents/references/.md` holds deep dives reached by pointer; `.agents/skills//` holds promoted procedures; `.agents/backlog.md` is the enforcement-gate backlog; `.agents/ledger.md` is the append-only audit. All of it is written only by the curator and **versioned in the project**, not in `~/.claude` — so a fresh clone carries every promoted lesson and nothing depends on a machine's local memory. + +**Injection is automatic and role-scoped.** When the launcher builds a role's bundle it appends, when the files exist, the root `AGENTS.md` (so every role gets the universal invariants) and that role's `.agents/roles/.md` (so a role gets only its own operational knowledge). References are not injected — they load on demand when an included line points to them, which is why every reference must be pointed at from `AGENTS.md` or a role file. Missing files are silently skipped: a project that has not bootstrapped its knowledge yet launches cleanly with no knowledge blocks. + +## Pending implementation + +- `main`: extend the bundle generator (`write_agent_instruction_file` in `swarmforge.sh`) to append `AGENTS.md` and `.agents/roles/.md` from the project root when present, and add the preamble sentence telling the agent these knowledge files (and on-demand references) are included. +- Acceptance: a scratch project with an `AGENTS.md` → every generated bundle carries it; adding `.agents/roles/coder.md` → only the coder's bundle gains it; removing both → bundles generate with no knowledge blocks and no errors. +- Pairs with ADR 0013 (the curator is the only writer of this contract). diff --git a/docs/adr/0015-platform-feasibility-stop-rule.md b/docs/adr/0015-platform-feasibility-stop-rule.md new file mode 100644 index 0000000..f969ca6 --- /dev/null +++ b/docs/adr/0015-platform-feasibility-stop-rule.md @@ -0,0 +1,15 @@ +--- +status: accepted +--- + +# Platform-feasibility stop rule + +Upstream has no rule for what a role does when a spec requirement conflicts with what the platform can actually deliver. So the role improvises — it ships a silent workaround and leaves a code comment acknowledging the conflict, and behavior diverges from the spec with no one having decided that trade-off. The fork adds a constitution rule: **when a spec requirement conflicts with a real platform capability, stop and report to the user before proceeding.** + +**The workaround comment is the smell.** A comment in the code acknowledging a spec-vs-platform conflict is the signal that this rule fired and was suppressed — it is treated as a defect, not an accepted note. + +**Narrow on purpose.** This is not a general "stop when confused" rule; it fires specifically on spec-versus-platform-capability conflicts. It lives in the constitution (`workflow.prompt`), so it binds every role that reads the constitution rather than being repeated per role. + +## Pending implementation + +- `six-pack`: add the rule to `swarmforge/constitution/workflow.prompt`. (four-pack is frozen per ADR 0001 / the change manifest.) diff --git a/docs/adr/0016-boundary-logic-detection.md b/docs/adr/0016-boundary-logic-detection.md new file mode 100644 index 0000000..64ee55e --- /dev/null +++ b/docs/adr/0016-boundary-logic-detection.md @@ -0,0 +1,15 @@ +--- +status: accepted +--- + +# Boundary-logic detection + +Boundary files — environmentally-unsuitable adapter shells like TUI drivers, OS input handlers, and environment adapters — are excluded by design from every quality tool's worklist, because they can't run under test. Upstream leaves it there, so pure logic that gets embedded in a boundary file is invisible to mutation, CRAP, and coverage alike. The fork closes that hole: the cleaner (six-pack) / refactorer (four-pack) **also scans boundary files**, at a lower threshold, and extracts the logic when it finds too much. + +**A lower threshold, because boundary files should be thin.** Testable source keeps the existing 100-mutation-site split threshold; boundary files trigger at ~15–20 sites — above that, the file holds implementation, not adaptation, and the logic is extracted to a testable module before handoff. Extraction funnels that logic into the normal mutation/CRAP/coverage loops automatically, so no new test type is needed. + +**"Tested only through a stripped view" counts as untested.** A test that asserts a simplified projection of the output — ANSI-stripped text when the real output includes the escape codes and newline placement the function exists to add — does not cover the behavior. This is an explicit anti-pattern the cleaner/refactorer treats as missing coverage. + +## Pending implementation + +- `six-pack`: extend `swarmforge/roles/cleaner.prompt` to scan boundary files at the ~15–20 site threshold and add the stripped-view anti-pattern. (four-pack — whose equivalent role is `refactorer` — is frozen per ADR 0001 / the change manifest; no change there.) diff --git a/docs/adr/0017-inlined-prompt-bundle.md b/docs/adr/0017-inlined-prompt-bundle.md new file mode 100644 index 0000000..625ef57 --- /dev/null +++ b/docs/adr/0017-inlined-prompt-bundle.md @@ -0,0 +1,15 @@ +--- +status: accepted +--- + +# The agent context is one inlined, deduplicated prompt bundle + +Upstream builds a role's launch context by concatenating its constitution and role prompt, following `*.prompt` references with a simple recursive read — no deduplication and no structure, just text appended to text. The fork replaces this with a **resolved prompt bundle**: a breadth-first walk over the `*.prompt` reference graph that visits each file once (dedup by resolved path, already-visited references skipped so a cycle cannot loop), emitted as a single XML envelope `` with each source file in its own `` block. + +**The bundle is the unit of delivery, not just of launch.** Clear-first delivery (ADR 0002) wipes the session with `/clear` and then *re-injects the role bundle* before every task. That re-injection needs a single, complete, deduplicated context to re-send — which is exactly what the resolver produces. A naive recursive concatenation is fine to build once at launch but is the wrong shape to re-send reliably on every handoff. + +**It is the prerequisite for knowledge injection.** ADR 0014 appends the project's `AGENTS.md` and the role's `.agents/` file into this same envelope. There is nowhere to append them, and no well-defined boundary to append them at, until the context is a structured bundle rather than flat concatenated text. 0014 builds on top of the bundle. + +**Why an XML envelope.** Explicit `` boundaries let the agent tell its constitution from its role prompt from its promoted knowledge, instead of inferring breaks in a wall of concatenated text; and the BFS dedup keeps a cross-referenced constitution (articles, the dependency manifest) from appearing two or three times. + +This divergence is taken in its **minimal translated form**: the resolver and envelope are ported onto upstream's current tmux delivery harness. diff --git a/docs/adr/0018-swarm-pins-and-upgrades-itself.md b/docs/adr/0018-swarm-pins-and-upgrades-itself.md new file mode 100644 index 0000000..208053a --- /dev/null +++ b/docs/adr/0018-swarm-pins-and-upgrades-itself.md @@ -0,0 +1,20 @@ +--- +status: accepted +--- + +# The swarm pins and upgrades its own dependencies + +The swarm depends on an external skill set (the `entire` skills) that it installs into the target project's `.claude/skills/`. The fork makes that dependency **pinned and upgradable**: a SHA recorded in `swarmforge/scripts/install-pins.conf`, installed automatically at launch, and refreshable through an explicit `./swarm upgrade`. + +**Pinned, not floating.** `install-pins.conf` records `ENTIRE_SKILLS_SHA`; the swarm installs exactly that SHA and writes it to `.swarmforge/skills-installed`. Moving versions means bumping the pin and committing it on `main` — so two runs weeks apart install identical skills, and an upstream skill change can never alter a run mid-flight. + +**Auto-install is launcher bootstrap, not project setup.** `ensure_skills_installed` runs at launch: if the recorded sentinel matches the pin it does nothing, otherwise it (re)installs. This is the program fetching its own dependencies — the same category as `./swarm` self-fetching its scripts — and is deliberately kept separate from the two things that are *not* automatic: project provisioning (the `setup-swarm` skill, ADR 0003) and role work (the idle gate, ADR 0002). It does not contradict "roles do nothing at startup": the launcher, not any role, installs the skills, and it does so before a single role starts. Skill installation therefore lives here, not in `setup-swarm`. + +**`./swarm upgrade` refreshes the installation.** It re-pulls the scripts (from `main`) and the role prompts (from the branch recorded in `.swarmforge/source-branch`) and forces a skill reinstall. `source-branch` is written on first run so `upgrade` knows whether a checkout's prompts came from `six-pack` or `four-pack`. + +**Why the swarm needs this at all.** A tool whose job is to adapt arbitrary projects must itself be reproducible and updatable in place; without a pin, runs drift; without `upgrade`, an operator's only way to take a fix is to re-clone. + +## Pending implementation + +- `main`: `install_skills` + `ensure_skills_installed` (pin-aware, idempotent) in `swarmforge.sh`, plus the new `swarmforge/scripts/install-pins.conf`. Source: `backup/main-pre-reset` (`~L946`). +- Runnable branches (`six-pack`/`four-pack`, root `swarm` bootstrap — not `main`): the `upgrade` subcommand, `download_from_main`, `write_source_branch`, and `.swarmforge/source-branch` tracking. Source: `swarm` bootstrap commit `8994322`. diff --git a/docs/adr/0019-autonomous-permission-mode.md b/docs/adr/0019-autonomous-permission-mode.md new file mode 100644 index 0000000..28934eb --- /dev/null +++ b/docs/adr/0019-autonomous-permission-mode.md @@ -0,0 +1,17 @@ +--- +status: accepted +--- + +# Roles run unattended in autonomous permission mode + +Upstream launches the `claude` and `grok` roles with `--permission-mode acceptEdits`, which auto-approves file edits but still raises an interactive permission prompt on every bash/tool call. The fork's roles run **fully unattended** in isolated worktrees — there is no human present to answer that prompt, so for the fork the prompt is not a safety net, it is a silent hang. The fork launches with `--permission-mode auto`. + +**Why `auto` and not the other never-prompt modes.** Claude Code offers three modes that never block on a prompt: `auto`, `dontAsk`, and `bypassPermissions`. `bypassPermissions` ignores all allow/deny rules and ships no safety checks — unacceptable for worktrees that touch a real repository and the network. `dontAsk` is deterministic but runs only an explicit allow-list and denies everything else, which would mean building and maintaining an exhaustive command allow-list spanning every language and tool the swarm drives — ongoing complexity the fork chooses not to take on. `auto` keeps roles moving with near-zero configuration while retaining built-in guardrails (it still refuses force-pushes to the main branch, mass deletion, and similar high-blast-radius actions). Because `auto` is in force, the permission allow-rules that `setup-swarm` writes (ADR 0003) stay a small, targeted, advisory set rather than a load-bearing whitelist. + +**This is a real mode, deliberately verified.** `auto` is one of Claude Code's documented `--permission-mode` values — unlike the per-role advisor knob of ADR 0012, which turned out to have no CLI flag and had to be written to settings instead. The lesson there was applied here before committing to the divergence. + +The `codex` backend launches with no permission-mode flag at all, so this change touches only the `claude` and `grok` launch lines. + +## Pending implementation + +- `main`: change `--permission-mode acceptEdits` → `auto` on the `claude` and `grok` lines in `launch_role` (a one-word change on each line; reapply on every upstream sync). Source: `backup/main-pre-reset` commit `1097233`. diff --git a/docs/adr/0020-worktree-auto-compaction.md b/docs/adr/0020-worktree-auto-compaction.md new file mode 100644 index 0000000..0aa26b7 --- /dev/null +++ b/docs/adr/0020-worktree-auto-compaction.md @@ -0,0 +1,15 @@ +--- +status: accepted +--- + +# Role worktrees auto-compact before context overflow + +A swarm role can run a long, many-turn session — build, run the suite, read failures, fix, re-verify — that walks its context toward the model's window limit. Upstream leaves context management to the client's defaults. The fork provisions each role worktree so the role **compacts its own context before it overflows** rather than failing partway through a task. + +**The settings.** Each worktree's `.claude/settings.local.json` is given `autoCompactEnabled: true`, `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE: "88"` (compact at 88% of the window) and `CLAUDE_CODE_AUTO_COMPACT_WINDOW: "200000"`. The thresholds are tunable; these are the fork's current defaults, set to leave headroom ahead of a hard limit so compaction happens on the role's terms, not as a crash. + +**Why per-worktree `settings.local.json`.** The file is fork-owned and not upstream-tracked, so writing to it adds no merge-conflict surface — the additive divergence ADR 0001 asks for. It is also the same provisioning seam already used to write the per-role advisor (ADR 0012); both perform a read-modify-write into this one file, so they share a single mechanism rather than each inventing its own. + +## Pending implementation + +- `main`: write the three keys into each worktree's `.claude/settings.local.json` (a `write_worktree_permissions` step, or folded into the existing advisor writer), called from `prepare_worktrees`; share the read-modify-write with the ADR 0012 advisor writer. Source: `backup/main-pre-reset` (`write_worktree_permissions`, ~L679; commit `93f8c5d`). diff --git a/docs/adr/0021-retro-triage-skill.md b/docs/adr/0021-retro-triage-skill.md new file mode 100644 index 0000000..146bd9d --- /dev/null +++ b/docs/adr/0021-retro-triage-skill.md @@ -0,0 +1,20 @@ +--- +status: accepted +--- + +# retro-triage: operator root-cause diagnosis, distinct from the curator + +The fork keeps a `retro-triage` skill: an operator-invoked tool that turns a *batch* of session retros into a validated, cross-session **root-cause diagnosis** from which a human files issues. It lives in `.claude/skills/` (an operator tool), not `swarmforge/skills/` (the skills the swarm's own roles run). + +**Why it exists alongside the curator.** The curator (ADR 0013) already consumes session retros — but autonomously, one item at a time, to promote agent-facing knowledge into the repo. retro-triage is its complement, not a duplicate. The curator fixes *"the swarm doesn't **know** X"* — a missing rule becomes repo knowledge. retro-triage fixes *"the swarm is **structurally doing** X wrong"* — a pipeline, tooling, or strategy defect becomes a filed issue for a human to act on. The structural causes it hunts (one upstream decision surfacing as different pains across five roles) are precisely what a per-item consumer like the curator cannot see, because they live *across* retros and below any single retro's notice. + +**Diagnosis is the product, not sorting.** The skill exists to prevent two failure modes that occurred in real runs: codifying a workaround as a win (a slick technique that only exists to cope with a self-inflicted problem is evidence of cost, not a pattern), and inheriting a retro's own proposed fix (the retro reports a symptom; its suggested fix is a hypothesis, not a finding). Every root cause is recorded with re-pullable receipts — a transcript quote by session id, git output, a `file:line` — and validated against the artifacts; an unvalidated cause is not a finding and cannot be filed. + +**Why `.claude/skills/`, not `swarmforge/skills/`.** It is a human's meta-analysis tool, not a step any swarm role executes. Keeping it with the operator skills leaves the swarm's own skill set to the things the swarm itself runs (`agent-retro`, `setup-swarm`). + +**Sharing the retro pool without starvation.** Both the curator and retro-triage read `~/.claude/worklog/retros/`. They must not consume each other's unseen retros: the curator processes and archives retros to `processed/` each run (ADR 0013), and retro-triage reads the full history — live pool plus archive — while tracking its own consolidation independently of the curator's mark. Neither destroys what the other has not yet seen. + +## Pending implementation + +- `main`: restore `.claude/skills/retro-triage/SKILL.md` as-is (byte-identical across branches). Source: `feat/issue-20-a-retro-skill-upgrade`. +- Make retro-triage's retro detector glob the curator's `processed/` archive in addition to the live `~/.claude/worklog/retros/` directory, so curated retros remain visible to a later diagnosis. diff --git a/docs/adr/0022-specifier-frontier-gate.md b/docs/adr/0022-specifier-frontier-gate.md new file mode 100644 index 0000000..faaca2f --- /dev/null +++ b/docs/adr/0022-specifier-frontier-gate.md @@ -0,0 +1,17 @@ +--- +status: accepted +--- + +# Specifier gates on frontier intent, not on formal spec + +Upstream gates human review on the full formal spec — Gherkin + QA suite — before handing off to the coder. The specifier writes everything, then asks for approval. + +The fork moves the gate earlier: the specifier drafts a **frontier brief** (one-sentence intent, 2–4 prose scenarios, and explicit fog exclusions) and gets confirmation before generating any formal artifact. Everything after confirmation — `grill-with-docs`, Gherkin with headers, QA suite, handoff — runs autonomously. + +**Why:** Gherkin and its header sections are a mechanical encoding of confirmed intent, not a new decision. Reviewing the encoding adds no human judgment — only friction. The frontier brief is the decision point. The formal spec is how that decision is expressed. + +**Fog of war:** the brief names only what is knowable and decidable at the frontier. Uncertain items are listed as "not in scope" — explicitly excluded, not silently omitted — and become `Does NOT:` entries in the Gherkin SCOPE header. The specifier does not plan past the fog. + +**Contradiction rule:** during autonomous generation, the specifier may add scenarios that are natural consequences of confirmed behavior. The only trigger to surface to the user is a contradiction or mismatch — something that makes the confirmed brief inconsistent or impossible to implement as specified. + +**Handoff** is fully automatic after brief confirmation; there is no second human gate. diff --git a/docs/tool-analysis-crap-dry-mutation.md b/docs/tool-analysis-crap-dry-mutation.md new file mode 100644 index 0000000..48b39b7 --- /dev/null +++ b/docs/tool-analysis-crap-dry-mutation.md @@ -0,0 +1,211 @@ +# Code Quality Tools: CRAP / DRY / Mutation + +**Scope:** JavaScript/TypeScript, Python, Rust source targets +**Goal:** Cover all three tool families for all three stacks, matching Uncle Bob's Go/Clj/Java tools + +--- + +## Status + +| Stack | CRAP | DRY | Mutation | +|-------|------|-----|----------| +| **JS/TS** | `crap4js` v0.1.0 — **done** | `drywall` — **done** | `mutate` Rust binary — **todo** | +| **Python** | `crap4py` — **todo** | `drywall` — **done** | `mutate4py` — **todo** | +| **Rust** | `cargo-crap` — **reuse** | `drywall` — **done** | `mutate4rs` Rust binary — **todo** | + +--- + +## 1. What Exists Today + +### crap4js v0.1.0 +- **Install:** `npm install --save-dev github:gabadi/crap4js#v0.1.0` +- **Source:** `github.com/gabadi/crap4js` +- Branch coverage (BRDA) and `?.` CC exclusion are both implemented +- Distributed via GitHub releases — no npm registry + +### drywall v0.1.0 +- **Install:** `cargo install --git https://github.com/gabadi/drywall --tag v0.1.0` +- **Source:** `github.com/gabadi/drywall` +- Single Rust binary covering JS/TS (OXC), Python (tree-sitter-python), Rust (syn) +- Implements Uncle Bob's AST subtree Jaccard algorithm +- Drop-in CLI compatible with dry4go + +### cargo-crap +- Reuse as-is (pre-1.0 but functional) +- Requires lcov.info from `cargo llvm-cov --lcov` + +--- + +## 2. What to Build + +### 2.1 crap4py — Python CRAP script (~200 LOC) + +New implementation for Python source. Not a port of crap4go (which analyzes Go source) — +crap4py analyzes Python source using Python's own `ast` module. + +**Inputs:** +- Python source files (walked from a root directory) +- LCOV tracefile (from pytest-cov or coverage.py with `branch = True` in `.coveragerc`) + +**Output:** same column format as crap4go — Function, Module, CC, Cov%, CRAP — sorted worst first + +**CC decision points in Python AST:** +`If`, `IfExp` (ternary), `BoolOp` (`and`/`or`), `For`, `While`, `ExceptHandler`, each `match case` + +**Branch coverage:** reads `BRDA:` records from LCOV; `cov(m) = BRH_in_range / BRF_in_range × 100` + +### 2.2 mutate — Rust binary (JS/TS + Rust) + +One binary, two language targets. Ports mutate4go's algorithm to JS/TS and Rust. +OXC parser is already used in drywall — this reuses the same investment. + +``` +mutate --lang [flags] path/to/file +``` + +| Flag | Default | Description | +|------|---------|-------------| +| `--test-command` | (required) | Test command to run | +| `--since-last-run` | false | Differential: skip functions whose hash matches manifest | +| `--mutate-all` | false | Force full run, ignore manifest | +| `--reuse-coverage` | false | Skip coverage regeneration | +| `--lcov` | — | Path to pre-generated LCOV file | +| `--max-workers` | 1 | Parallel mutation workers | +| `--scan` | false | Count mutation sites only, no tests | +| `--verbose` | false | Log actions to stderr | + +### 2.3 mutate4py — Python mutation script + +Same algorithm as the Rust binary, for Python source. +Uses Python's `ast` module. Reads LCOV from pytest-cov / coverage.py. + +``` +mutate4py [flags] path/to/file.py +``` + +Same flags as `mutate`. + +--- + +## 3. Key Decisions + +### Why port mutation instead of reusing StrykerJS / mutmut / cargo-mutants + +All three existing tools lack the property that makes mutate4go useful in practice: +an embedded-in-source manifest. + +| Property | mutate4go | StrykerJS | mutmut | cargo-mutants | +|----------|-----------|-----------|--------|---------------| +| Manifest stored in source file | Yes | No | No | No | +| Survives repo clone | Yes | No | No | No | +| Team-shared automatically | Yes | No | No | No | +| Zero CI setup for incremental | Yes | No | No | No | +| Incremental granularity | per-function hash | per-mutant position | per-function hash | line (external diff) | + +The manifest is embedded as comments in the source file footer, committed with the code. +Any developer who pulls the repo gets differential reruns automatically. StrykerJS's +incremental JSON and mutmut's SQLite cache are external artifacts that each require +explicit CI cache configuration and provide nothing to developers working locally. + +### Why crap4py is not a port of crap4go + +crap4go analyzes **Go** source. crap4py analyzes **Python** source. They implement the +same CRAP formula but are completely separate tools using their language's native AST. +There is no Python port of crap4go to reuse — it does not exist. + +### Why one mutate binary covers JS/TS and Rust + +OXC (JS/TS parser) and `syn` (Rust parser) are both Rust crates. Building them in one +binary reuses the OXC investment already made for drywall and avoids distributing two +separate binaries. + +--- + +## 4. Mutation Algorithm + +Same for all three ports. Matches mutate4go's implementation. + +**Per run:** +1. Parse source file → walk functions → normalize each (identifiers → `_ID`, literals → `_LIT`) → hash (FNV-1a) +2. Read embedded manifest from file footer; skip functions whose hash matches +3. For changed functions: read LCOV → for each covered mutation site → apply operator → run test command → restore +4. Write updated manifest to file footer + +**Operator set** (Uncle Bob's spec): + +| Category | Mutations | +|----------|-----------| +| Arithmetic | `+` ↔ `-`, `*` → `/` | +| Comparison | `>` ↔ `>=`, `<` ↔ `<=` | +| Equality | `==` ↔ `!=` | +| Boolean | `true` ↔ `false` | +| Logical | `&&` ↔ `||` | +| Constant | `0` ↔ `1` (inline, in expressions) | +| Unary | remove `-a` → `a`, remove `!a` → `a` | +| Null | replace return value with `null` / `None` | + +**Manifest format** (identical across all three, language-native comments): + +```python +# mutate4py-manifest: version=1 +# fn:compute_score hash=a3f9c1d2 lines=5-25 tested=2026-06-21 +# fn:validate_input hash=b7e2a4f1 lines=30-48 tested=2026-06-21 +``` + +```typescript +// mutate4js-manifest: version=1 +// fn:computeScore hash=a3f9c1d2 lines=5-25 tested=2026-06-21 +``` + +--- + +## 5. Open Questions + +- Whether dry4go's 0.82 Jaccard threshold needs calibration for JS/Python codebases +- Whether `syn` or `tree-sitter-rust` is the better choice for mutate4rs normalization + (`syn` has higher semantic fidelity; `tree-sitter-rust` is consistent with the other parsers in drywall) +- Whether cargo-crap's pre-1.0 status is a blocker or acceptable for internal use + +--- + +## Appendix A — LCOV Format Reference + +LCOV tracefile format is produced by Jest, Vitest, c8, nyc, pytest-cov, coverage.py, +cargo-llvm-cov, and cargo-tarpaulin. + +| Record | Syntax | Meaning | +|--------|--------|---------| +| `SF` | `SF:` | Opens a source file section | +| `FN` | `FN:[,],` | Function declaration | +| `FNDA` | `FNDA:,` | Function execution count | +| `BRDA` | `BRDA:,,,` | Branch edge (`taken='-'` means unreachable) | +| `BRF` | `BRF:` | Total branch records | +| `BRH` | `BRH:` | Branch records with taken > 0 | +| `DA` | `DA:,` | Line execution count | +| `end_of_record` | `end_of_record` | Closes a source file section | + +CRAP uses branch coverage: `cov(m) = BRH_in_function / BRF_in_function × 100`. +Dead branches (`taken='-'`) are excluded from both numerator and denominator. + +**coverage.py requirement:** add `branch = True` to `.coveragerc` to emit BRDA records. + +--- + +## Appendix B — Uncle Bob Reference CLIs + +### crap4go +``` +crap4go [--test-command ] [--max-workers ] [path-fragment ...] +``` +Deletes stale coverage → runs test command → parses LCOV + AST → prints CRAP per function, sorted worst first. + +### dry4go +``` +dry4go [--threshold 0.82] [--min-lines 4] [--min-nodes 20] [--format text|json] [path ...] +``` + +### mutate4go +``` +mutate4go [--since-last-run] [--mutate-all] [--scan] [--test-command ] [--max-workers 1] path/to/file.go +``` +Embedded manifest in source file footer. Differential by default when manifest exists. diff --git a/swarmforge/constitution/articles/handoffs.prompt b/swarmforge/constitution/articles/handoffs.prompt index e425d04..9258b66 100644 --- a/swarmforge/constitution/articles/handoffs.prompt +++ b/swarmforge/constitution/articles/handoffs.prompt @@ -1,21 +1,7 @@ # Handoff Rules -## Startup Notification -- After reading the constitution and your role prompt at startup, send an awake - notification to the main branch agent, if any, unless you are that agent or - your role prompt says otherwise. -- Write a draft file with: - -```text -type: awake -to: -priority: 50 -``` - -- Run `swarm_handoff.sh `. -- This startup notification is only a presence signal; it does not replace any role-specific handoff rule. - ## Sending Handoffs +- Always run `swarm_handoff.sh` from your assigned worktree. Invoking it from another directory silently delivers to the wrong outbox. - Write a draft handoff file with only structured headers, then run `swarm_handoff.sh `. - Use only these message types: - `awake` @@ -41,13 +27,15 @@ type: git_handoff to: [,...] priority: NN task: -commit: <10-character-commit-abbrev> +commit: $(git rev-parse --short=10 HEAD) ``` -- Do not send or forward a `git_handoff` when the received commit produces no - functional project change. Complete the inbound task instead. +- Do not send or forward a `git_handoff` downstream when the received commit + produces no functional project change. Complete the inbound task instead. - Treat manifest-only, audit-only, generated metadata, formatting-only, and other non-functional churn as no forwardable change. +- When routing work back upstream, commit your findings file first so the commit + carries context, then send `git_handoff` with that HEAD as the commit. - Preserve the received task name when forwarding work for the same task. If the handoff starts new work, invent a short stable task name. - For `note`, write: @@ -78,7 +66,7 @@ message: current batch in helper-delivered order. - Use only the task information printed by the helper scripts. - If a tmux wake-up arrives while already working on a task, ignore it. -- When the task or batch is fully complete, run `done_with_current.sh`. +- When the task or batch is fully complete, send any outbound handoff first, then run `agent-retro`, then run `done_with_current.sh`. - `note` handoffs are tasks too; after reading or acting on a note, run `done_with_current.sh` before accepting any other handoff. - If `done_with_current.sh` prints `TASK: `, treat the printed `PAYLOAD` diff --git a/swarmforge/constitution/articles/workflow.prompt b/swarmforge/constitution/articles/workflow.prompt index b130bdd..ee61199 100644 --- a/swarmforge/constitution/articles/workflow.prompt +++ b/swarmforge/constitution/articles/workflow.prompt @@ -12,3 +12,10 @@ ## Failure Conditions - If the expected git layout or assigned worktree is missing, stop and report instead of silently working in the wrong place. + +## Task Completion +- When work is done, send any outbound handoff first, then run `agent-retro`. +- When routing back upstream, commit your findings file before sending `git_handoff` so the commit carries context. + +## Idle Gate +- Wait for a handoff. Do not act without one. diff --git a/swarmforge/scripts/fork.bb b/swarmforge/scripts/fork.bb new file mode 100644 index 0000000..ac6e3eb --- /dev/null +++ b/swarmforge/scripts/fork.bb @@ -0,0 +1,228 @@ +;; Fork-specific extensions loaded into swarmforge namespace via (load-file ...). +;; No ns declaration — evaluated in swarmforge namespace at load time. +;; Add new ADR implementations here to minimize future swarmforge.bb merge conflicts. + +(require '[cheshire.core :as json]) + +;;; ADR 0020 + 0012: Worktree settings (auto-compaction + advisor model) + +(defn write-worktree-settings! + "Write .claude/settings.local.json with auto-compaction keys and optional advisor model." + ([worktree-path] (write-worktree-settings! worktree-path "")) + ([worktree-path advisor-model] + (let [settings-dir (fs/path worktree-path ".claude") + settings-file (fs/path settings-dir "settings.local.json")] + (fs/create-dirs settings-dir) + (let [cfg (try (json/parse-string (slurp (str settings-file)) true) + (catch Exception _ {})) + cfg (-> cfg + (assoc :autoCompactEnabled true) + (assoc-in [:env :CLAUDE_AUTOCOMPACT_PCT_OVERRIDE] "88") + (assoc-in [:env :CLAUDE_CODE_AUTO_COMPACT_WINDOW] "200000")) + marker-path (str worktree-path "/.swarmforge/agent-running") + cfg (-> cfg + (assoc-in [:hooks :UserPromptSubmit] [{:hooks [{:type "command" :command (str "touch " marker-path)}]}]) + (assoc-in [:hooks :Stop] [{:hooks [{:type "command" :command (str "rm -f " marker-path)}]}])) + swarm-allow ["Bash(gh pr merge*)" "Bash(git reset --hard origin/*)"] + existing-allow (get-in cfg [:permissions :allow] []) + cfg (assoc-in cfg [:permissions :allow] + (into existing-allow (remove (set existing-allow) swarm-allow))) + cfg (if (seq advisor-model) + (assoc cfg :advisorModel advisor-model) + cfg)] + (spit (str settings-file) (json/generate-string cfg {:pretty true})))))) + +;;; ADR 0017: Inlined prompt bundle + swarm-persona skill + +(defn resolve-prompt-bundle + "Collect all .prompt files referenced transitively from constitution + role prompt." + [working-dir constitution-file roles-dir role] + (let [working-dir-str (str working-dir)] + (loop [queue [(str constitution-file) (str (fs/path roles-dir (str role ".prompt")))] + seen #{} + bundle []] + (if-let [file (first queue)] + (let [rel (str/replace-first file (str working-dir-str "/") "")] + (if (or (contains? seen rel) (not (fs/exists? (fs/path file)))) + (recur (rest queue) seen bundle) + (let [content (slurp file) + refs (->> (re-seq #"swarmforge/[A-Za-z0-9_./-]+\.prompt" content) + distinct + (map #(str working-dir-str "/" %)) + (remove #(contains? seen (str/replace-first % (str working-dir-str "/") "")))) + article-files (when (str/ends-with? file "constitution.prompt") + (let [articles-dir (fs/path (str/replace file "constitution.prompt" "constitution/articles"))] + (when (fs/exists? articles-dir) + (->> (fs/list-dir articles-dir) + (filter #(str/ends-with? (str (fs/file-name %)) ".prompt")) + (map str) + (remove #(contains? seen (str/replace-first % (str working-dir-str "/") ""))))))) + new-queue (concat (rest queue) refs article-files)] + (recur new-queue (conj seen rel) (conj bundle rel))))) + bundle)))) + +(defn write-persona-skill-file! + "Create .claude/skills/swarm-persona/SKILL.md with bundled role+constitution." + [ctx role worktree-path] + (let [working-dir (:working-dir ctx) + skill-dir (fs/path worktree-path ".claude" "skills" "swarm-persona") + skill-file (fs/path skill-dir "SKILL.md") + bundle-files (resolve-prompt-bundle working-dir (:constitution-file ctx) (:roles-dir ctx) role) + knowledge-files ["AGENTS.md" (str ".agents/roles/" role ".md")]] + (fs/create-dirs skill-dir) + (spit (str skill-file) + (str "---\n" + "name: swarm-persona\n" + "description: Load this agent's SwarmForge role, constitution, and operating instructions\n" + "---\n\n" + "\n" + "\n" + "This prompt bundle is pre-resolved. Do not open or re-read any swarmforge/*.prompt files" + " — all relevant instructions are already included below. Project knowledge files" + " (AGENTS.md and your role file under .agents/roles/) are included below when present.\n" + "\n" + (apply str + (for [rel bundle-files + :let [abs (fs/path (str working-dir) rel)] + :when (fs/exists? abs)] + (str "\n" (slurp (str abs)) "\n\n"))) + (apply str + (for [rel knowledge-files + :let [abs (fs/path (str working-dir) rel)] + :when (fs/exists? abs)] + (str "\n" (slurp (str abs)) "\n\n"))) + "\n")))) + +;; Override upstream's write-agent-instruction-file! to use swarm-persona skill pointer. +(defn write-agent-instruction-file! [role prompt-file] + (spit (str prompt-file) + (str "You are the " role " in a SwarmForge multi-agent development swarm. " + "Your full role, constitution, and operating instructions are in your swarm-persona skill.\n"))) + +;;; ADR 0006: Sparse checkout to hide QA holdout from non-QA/specifier worktrees + +(defn sparse-checkout-setup! + "Configure sparse checkout to exclude qa-holdout-path for non-QA/specifier roles." + [worktree-path qa-holdout-path role] + (when-not (#{"specifier" "QA"} role) + (process/sh {:continue true} "git" "-C" (str worktree-path) "sparse-checkout" "init" "--no-cone") + (let [git-dir-res (process/sh {:continue true} "git" "-C" (str worktree-path) "rev-parse" "--git-dir") + git-dir (str/trim (:out git-dir-res)) + git-dir-path (if (fs/absolute? (fs/path git-dir)) + (fs/path git-dir) + (fs/path worktree-path git-dir)) + sparse-file (fs/path git-dir-path "info" "sparse-checkout")] + (fs/create-dirs (fs/parent sparse-file)) + (spit (str sparse-file) (str "/*\n!/" qa-holdout-path "/\n"))) + (process/sh {:continue true} "git" "-C" (str worktree-path) "read-tree" "-mu" "HEAD"))) + +;;; ADR 0018: Skill installation + +(defn- parse-pins-file [pins-file] + (into {} (for [line (str/split-lines (slurp (str pins-file))) + :let [line (str/trim line)] + :when (and (seq line) + (not (str/starts-with? line "#")) + (str/includes? line "=")) + :let [sep (str/index-of line "=")] + :when sep] + [(str/trim (subs line 0 sep)) (str/trim (subs line (inc sep)))]))) + +(defn install-skills! + "Install local skills and pinned entire and mattpocock skills into .claude/skills/." + [ctx] + (let [pins-file (fs/path (:script-dir ctx) "install-pins.conf")] + (when (fs/exists? pins-file) + (let [pins (parse-pins-file pins-file) + entire-sha (get pins "ENTIRE_SKILLS_SHA") + mattpocock-sha (get pins "MATTPOCOCK_SKILLS_SHA") + mattpocock-include (when-let [inc (get pins "MATTPOCOCK_SKILLS_INCLUDE")] + (set (map str/trim (str/split inc #",")))) + skills-src (fs/path (:script-dir ctx) ".." "skills") + skills-dst (fs/path (:working-dir ctx) ".claude" "skills")] + (println (str cyan "Installing skills..." reset)) + (fs/create-dirs (:state-dir ctx)) + (fs/create-dirs skills-dst) + (when (fs/exists? skills-src) + (doseq [skill-dir (->> (fs/list-dir skills-src) (filter fs/directory?))] + (let [skill-name (str (fs/file-name skill-dir)) + dst (fs/path skills-dst skill-name)] + (when (fs/exists? dst) (fs/delete-tree dst)) + (fs/copy-tree skill-dir dst) + (println (str " " green "✓" reset " " skill-name " (local)"))))) + (when entire-sha + (let [tmp-dir (str (fs/create-temp-dir)) + url (str "https://github.com/entireio/skills/archive/" entire-sha ".tar.gz") + result (process/sh {:continue true} "sh" "-c" + (str "curl -fsSL " (sq url) " | tar -xz --strip-components=1 -C " (sq tmp-dir)))] + (if (zero? (:exit result)) + (do + (let [skills-extracted (fs/path tmp-dir "skills")] + (when (fs/exists? skills-extracted) + (doseq [skill-dir (->> (fs/list-dir skills-extracted) (filter fs/directory?))] + (let [skill-name (str (fs/file-name skill-dir)) + dst (fs/path skills-dst skill-name)] + (when (fs/exists? dst) (fs/delete-tree dst)) + (fs/copy-tree skill-dir dst))))) + (fs/delete-tree tmp-dir) + (println (str " " green "✓" reset " entire skills (" (subs entire-sha 0 8) ")")) + (spit (str (fs/path (:state-dir ctx) "skills-installed")) entire-sha)) + (do + (fs/delete-tree tmp-dir) + (println (str " " yellow "⚠" reset " entire skills unavailable (no network?) — proceeding without them")))))) + (when mattpocock-sha + (let [tmp-dir (str (fs/create-temp-dir)) + url (str "https://github.com/mattpocock/skills/archive/" mattpocock-sha ".tar.gz") + result (process/sh {:continue true} "sh" "-c" + (str "curl -fsSL " (sq url) " | tar -xz --strip-components=1 -C " (sq tmp-dir)))] + (if (zero? (:exit result)) + (do + (let [skills-root (fs/path tmp-dir "skills")] + (when (fs/exists? skills-root) + (doseq [subdir (->> (fs/list-dir skills-root) (filter fs/directory?)) + skill-dir (->> (fs/list-dir subdir) (filter fs/directory?)) + :let [skill-name (str (fs/file-name skill-dir))] + :when (or (nil? mattpocock-include) (contains? mattpocock-include skill-name))] + (let [dst (fs/path skills-dst skill-name)] + (when (fs/exists? dst) (fs/delete-tree dst)) + (fs/copy-tree skill-dir dst))))) + (fs/delete-tree tmp-dir) + (println (str " " green "✓" reset " mattpocock skills (" (subs mattpocock-sha 0 8) ")")) + (spit (str (fs/path (:state-dir ctx) "mattpocock-skills-installed")) mattpocock-sha)) + (do + (fs/delete-tree tmp-dir) + (println (str " " yellow "⚠" reset " mattpocock skills unavailable (no network?) — proceeding without them")))))))))) + +(defn ensure-skills-installed! + "Install skills if pins changed or first run." + [ctx] + (let [pins-file (fs/path (:script-dir ctx) "install-pins.conf")] + (when (fs/exists? pins-file) + (let [pins (parse-pins-file pins-file) + entire-sha (get pins "ENTIRE_SKILLS_SHA") + mattpocock-sha (get pins "MATTPOCOCK_SKILLS_SHA") + sentinel (fs/path (:state-dir ctx) "skills-installed") + mattpocock-sentinel (fs/path (:state-dir ctx) "mattpocock-skills-installed")] + (when-not (and (fs/exists? sentinel) + (= entire-sha (str/trim (slurp (str sentinel)))) + (or (nil? mattpocock-sha) + (and (fs/exists? mattpocock-sentinel) + (= mattpocock-sha (str/trim (slurp (str mattpocock-sentinel))))))) + (install-skills! ctx)))))) + +;;; ADR 0021: Curator skill links + +(defn link-curator-skills! + "Symlink .agents/skills/* into .claude/skills/." + [target-path] + (let [agents-skills-dir (fs/path target-path ".agents" "skills") + claude-skills-dir (fs/path target-path ".claude" "skills")] + (when (fs/exists? agents-skills-dir) + (fs/create-dirs claude-skills-dir) + (doseq [skill-dir (->> (fs/list-dir agents-skills-dir) (filter fs/directory?))] + (let [skill-name (str (fs/file-name skill-dir)) + link (fs/path claude-skills-dir skill-name)] + (when-not (fs/exists? link) + (process/sh {:continue true} "ln" "-sfn" + (str "../../.agents/skills/" skill-name) + (str link)))))))) diff --git a/swarmforge/scripts/handoffd.bb b/swarmforge/scripts/handoffd.bb index ba93225..438c799 100755 --- a/swarmforge/scripts/handoffd.bb +++ b/swarmforge/scripts/handoffd.bb @@ -92,17 +92,21 @@ ".swarmforge" "handoffs" "inbox" "new" filename)) (defn notify! [socket session] - (let [send-text (sh "tmux" "-S" socket "send-keys" "-t" session "-l" wake-message) - _ (Thread/sleep 150) - send-carriage-return (sh "tmux" "-S" socket "send-keys" "-t" session "C-m") - _ (Thread/sleep 50) - send-line-feed (sh "tmux" "-S" socket "send-keys" "-t" session "C-j")] - (when-not (zero? (:exit send-text)) - (throw (ex-info "tmux send text failed" send-text))) - (when-not (zero? (:exit send-carriage-return)) - (throw (ex-info "tmux send carriage return failed" send-carriage-return))) - (when-not (zero? (:exit send-line-feed)) - (throw (ex-info "tmux send line feed failed" send-line-feed))))) + (letfn [(send! [text] + (let [r (sh "tmux" "-S" socket "send-keys" "-t" session "-l" text)] + (when-not (zero? (:exit r)) + (throw (ex-info (str "tmux send failed: " text) r))))) + (enter! [] + (let [r (sh "tmux" "-S" socket "send-keys" "-t" session "Enter")] + (when-not (zero? (:exit r)) + (throw (ex-info "tmux send Enter failed" r)))))] + (send! "/clear") + (Thread/sleep 500) + (enter!) + (Thread/sleep 2000) + (send! (str "/swarm-persona " wake-message)) + (Thread/sleep 150) + (enter!))) (defn move-with-collision [source target-dir] (fs/create-dirs target-dir) @@ -136,8 +140,9 @@ delivered (add-delivery-headers message recipient)] (fs/create-dirs (fs/parent target)) (when-not (fs/exists? target) - (spit (str target) (render-message (:headers delivered) (:body delivered)))) - (notify! socket (:session role-info))))) + (spit (str target) (render-message (:headers delivered) (:body delivered))) + (when-not (fs/exists? (fs/path (:worktree-path role-info) ".swarmforge" "agent-running")) + (notify! socket (:session role-info))))))) (move-with-collision path (fs/path (get-in roles [sender-role :worktree-path]) ".swarmforge" "handoffs" "sent")) diff --git a/swarmforge/scripts/install-pins.conf b/swarmforge/scripts/install-pins.conf new file mode 100644 index 0000000..ff74631 --- /dev/null +++ b/swarmforge/scripts/install-pins.conf @@ -0,0 +1,9 @@ +# Pinned external dependency versions for swarm install/upgrade. +# Bump a SHA here and commit on main to pull in a newer version. + +# entireio/skills — installed to .claude/skills/ in the target project +ENTIRE_SKILLS_SHA=4c9a02513c3ec6ebabd9a9dc6bd8240854a218ac + +# mattpocock/skills — selective install; only listed skills are copied +MATTPOCOCK_SKILLS_SHA=6eeb81b5fcfeeb5bd531dd47ab2f9f2bbea27461 +MATTPOCOCK_SKILLS_INCLUDE=grill-with-docs,domain-modeling,grilling diff --git a/swarmforge/scripts/ready_for_next_batch.bb b/swarmforge/scripts/ready_for_next_batch.bb index e61e2fd..0c3203b 100755 --- a/swarmforge/scripts/ready_for_next_batch.bb +++ b/swarmforge/scripts/ready_for_next_batch.bb @@ -2,6 +2,7 @@ (ns ready-for-next-batch (:require [babashka.fs :as fs] + [clojure.java.shell :as sh] [clojure.string :as str])) (defn inbox-dir [] @@ -96,6 +97,22 @@ (println "BATCH_ITEM:" (inc index)) (print-task file)))) +(defn sync-to-trunk! [] + (let [fetch-result (sh/sh "git" "fetch" "origin")] + (when-not (zero? (:exit fetch-result)) + (binding [*out* *err*] + (println "WARNING: git fetch failed:" (str/trim (:err fetch-result)))))) + (let [branch-result (sh/sh "git" "symbolic-ref" "--short" "refs/remotes/origin/HEAD") + default-branch (when (zero? (:exit branch-result)) + (str/trim (:out branch-result)))] + (if default-branch + (let [reset-result (sh/sh "git" "reset" "--hard" default-branch)] + (when-not (zero? (:exit reset-result)) + (binding [*out* *err*] + (println "WARNING: git reset --hard" default-branch "failed:" (str/trim (:err reset-result)))))) + (binding [*out* *err*] + (println "WARNING: could not resolve default branch; skipping trunk sync"))))) + (defn fail! [status & lines] (binding [*out* *err*] (doseq [line lines] @@ -143,6 +160,7 @@ (set-header! target-file "dequeued_at" (timestamp)))) (when (empty? selected-files) (fail! 2 (str "AMBIGUOUS_TASK_STATE: no tasks selected for batch priority " batch-priority "."))) + (sync-to-trunk!) (print-batch batch-dir)))))))) (-main) diff --git a/swarmforge/scripts/ready_for_next_task.bb b/swarmforge/scripts/ready_for_next_task.bb index 3312621..b4a459a 100755 --- a/swarmforge/scripts/ready_for_next_task.bb +++ b/swarmforge/scripts/ready_for_next_task.bb @@ -2,6 +2,7 @@ (ns ready-for-next-task (:require [babashka.fs :as fs] + [clojure.java.shell :as sh] [clojure.string :as str])) (defn state-dir [] @@ -81,6 +82,22 @@ (println "PAYLOAD:") (print (body file)))) +(defn sync-to-trunk! [] + (let [fetch-result (sh/sh "git" "fetch" "origin")] + (when-not (zero? (:exit fetch-result)) + (binding [*out* *err*] + (println "WARNING: git fetch failed:" (str/trim (:err fetch-result)))))) + (let [branch-result (sh/sh "git" "symbolic-ref" "--short" "refs/remotes/origin/HEAD") + default-branch (when (zero? (:exit branch-result)) + (str/trim (:out branch-result)))] + (if default-branch + (let [reset-result (sh/sh "git" "reset" "--hard" default-branch)] + (when-not (zero? (:exit reset-result)) + (binding [*out* *err*] + (println "WARNING: git reset --hard" default-branch "failed:" (str/trim (:err reset-result)))))) + (binding [*out* *err*] + (println "WARNING: could not resolve default branch; skipping trunk sync"))))) + (defn fail! [status & lines] (binding [*out* *err*] (doseq [line lines] @@ -115,6 +132,7 @@ (fail! 2 (str "AMBIGUOUS_TASK_STATE: target in-process file already exists: " target-file))) (fs/move source-file target-file) (set-header! target-file "dequeued_at" (timestamp)) + (sync-to-trunk!) (print-task target-file)))))))) (-main) diff --git a/swarmforge/scripts/swarm_handoff.bb b/swarmforge/scripts/swarm_handoff.bb index 00dbb59..8c8fa00 100755 --- a/swarmforge/scripts/swarm_handoff.bb +++ b/swarmforge/scripts/swarm_handoff.bb @@ -78,8 +78,17 @@ role (exit! 1 "Set SWARMFORGE_ROLE."))) +(defn worktree-path-for-role [role] + (let [lines (str/split-lines (slurp (str (roles-file)))) + line (some (fn [l] + (when (= role (first (str/split l #"\t"))) l)) + lines)] + (if line + (nth (str/split line #"\t") 2) + (exit! 1 (format "Role '%s' not found in roles.tsv" role))))) + (defn state-dir [] - (fs/path (System/getProperty "user.dir") ".swarmforge" "handoffs")) + (fs/path (worktree-path-for-role (sender-role)) ".swarmforge" "handoffs")) (defn timestamp [] (.format java.time.format.DateTimeFormatter/ISO_INSTANT @@ -211,7 +220,7 @@ (cond (str/blank? commit) [nil "Missing required header 'commit' for git_handoff."] (not (re-matches #"[0-9a-fA-F]{10}" commit)) - [nil (format "Header 'commit' must be exactly 10 hexadecimal characters; got '%s'." commit)] + [nil (format "Header 'commit' must be exactly 10 hexadecimal characters; got '%s'. Run: git rev-parse --short=10 HEAD" commit)] :else (canonical-commit commit)) [nil nil]) git-errors (cond-> [] diff --git a/swarmforge/scripts/swarmforge.bb b/swarmforge/scripts/swarmforge.bb index b26322c..191fb17 100755 --- a/swarmforge/scripts/swarmforge.bb +++ b/swarmforge/scripts/swarmforge.bb @@ -14,6 +14,12 @@ (def bold "\u001b[1m") (def reset "\u001b[0m") + +;; Forward-declare symbols defined in fork.bb (loaded at runtime via load-file before -main). +(declare sparse-checkout-setup! link-curator-skills! + write-persona-skill-file! write-worktree-settings! + ensure-skills-installed!) + (defn sh [& args] (apply process/sh args)) @@ -154,6 +160,8 @@ extra-arg-tokens (if (#{"task" "batch"} (first trailing)) (rest trailing) trailing) + advisor (some #(when (str/starts-with? % "advisor=") (subs % 8)) extra-arg-tokens) + extra-arg-tokens (remove #(str/starts-with? % "advisor=") extra-arg-tokens) extra-args (when (seq extra-arg-tokens) (str/join " " extra-arg-tokens))] (when-not (= "window" keyword) @@ -182,7 +190,8 @@ :worktree-name worktree :worktree-path worktree-path :receive-mode receive-mode - :extra-args extra-args}] + :extra-args extra-args + :advisor (or advisor "")}] (recur (next lines) (conj rows row) (conj roles role) @@ -255,7 +264,8 @@ :when (not (#{"none" "master"} worktree-name))] (when-not (or (fs/exists? (fs/path worktree-path ".git")) (fs/directory? (fs/path worktree-path ".git"))) - (sh "git" "-C" (str (:working-dir ctx)) "worktree" "add" "--force" "-B" branch-name (str worktree-path) "HEAD")))) + (sh "git" "-C" (str (:working-dir ctx)) "worktree" "add" "--force" "-B" branch-name (str worktree-path) "HEAD")) + (sparse-checkout-setup! worktree-path (:qa-holdout-path ctx) (:role row)))) (defn prepare-handoff-dirs! [ctx] (doseq [row (:roles ctx) @@ -282,7 +292,8 @@ (fs/copy (:sessions-file ctx) (fs/path role-state-dir "sessions.tsv") {:replace-existing true}) (fs/copy (:roles-file ctx) (fs/path role-state-dir "roles.tsv") {:replace-existing true}) (fs/copy (:tmux-socket-file ctx) (fs/path role-state-dir "tmux-socket") {:replace-existing true}) - (fs/copy (:tmux-env-file ctx) (fs/path role-state-dir "tmux-env") {:replace-existing true})))) + (fs/copy (:tmux-env-file ctx) (fs/path role-state-dir "tmux-env") {:replace-existing true}) + (link-curator-skills! worktree-path)))) (defn check-dependency! [command] (when-not (command-exists? command) @@ -318,14 +329,16 @@ base (str "export SWARMFORGE_ROLE=" (sq role) " && export PATH=" (sq (str role-script-dir)) ":$PATH" " && cd " (sq (str role-worktree)) - " && ")] + " && ") + permission-mode (when-not (str/includes? (or (:extra-args row) "") "--permission-mode") + " --permission-mode auto")] (write-agent-instruction-file! role prompt-file) (cond-> (str base (case agent - "claude" (str "claude --append-system-prompt-file " (sq (str prompt-file)) " --permission-mode acceptEdits -n " (sq (str "SwarmForge " display)) " " (extra-args-prefix row) "\"$(cat " (sq (str prompt-file)) ")\"") + "claude" (str "claude --append-system-prompt-file " (sq (str prompt-file)) permission-mode " -n " (sq (str "SwarmForge " display)) " " (extra-args-prefix row)) "codex" (str "codex -C " (sq (str role-worktree)) " " (extra-args-prefix row) "\"$(cat " (sq (str prompt-file)) ")\"") "copilot" (str "copilot -C " (sq (str role-worktree)) " --name " (sq (str "SwarmForge " display)) " " (extra-args-prefix row) "-i \"$(cat " (sq (str prompt-file)) ")\"") - "grok" (str "grok --cwd " (sq (str role-worktree)) " --permission-mode acceptEdits " (extra-args-prefix row) "--rules \"$(cat " (sq (str prompt-file)) ")\" --verbatim \"$(cat " (sq (str prompt-file)) ")\""))) + "grok" (str "grok --cwd " (sq (str role-worktree)) permission-mode " " (extra-args-prefix row) "--rules \"$(cat " (sq (str prompt-file)) ")\" --verbatim \"$(cat " (sq (str prompt-file)) ")\""))) (= index 0) (str "; exit_code=$?; SWARMFORGE_TERMINAL_BACKEND=" (sq (:terminal-backend ctx)) " nohup " (sq (str (fs/path (:script-dir ctx) "swarm-cleanup.sh"))) @@ -339,6 +352,8 @@ display (:display-name row) prompt-file (fs/path (:prompts-dir ctx) (str (:role row) ".md")) command (launch-command ctx index row)] + (write-persona-skill-file! ctx (:role row) (:worktree-path row)) + (write-worktree-settings! (:worktree-path row) (or (:advisor row) "")) (sh "tmux" "-S" (:tmux-socket ctx) "send-keys" "-t" (tmux-agent-target display (:tmux-pane-base-index ctx) session) command "Enter") @@ -445,7 +460,8 @@ :tmux-socket-file (fs/path state-dir "tmux-socket") :tmux-env-file (fs/path state-dir "tmux-env") :tmux-window-base-index 0 - :tmux-pane-base-index 0})) + :tmux-pane-base-index 0 + :qa-holdout-path (or (System/getenv "SWARMFORGE_QA_HOLDOUT_PATH") "qa-e2e")})) (defn prepare-ctx [ctx] (-> ctx @@ -473,6 +489,9 @@ (let [ctx (prepare-ctx ctx)] (check-backend-dependencies! ctx) (prepare-workspace! ctx) + (ensure-skills-installed! ctx) + (when-not (fs/exists? (fs/path (:state-dir ctx) "setup-complete")) + (fail! (str red "Error:" reset " project is not swarm-ready. Run /setup-swarm first."))) (prepare-worktrees! ctx) (prepare-handoff-dirs! ctx) (let [ctx (assoc ctx :terminal-backend (detect-terminal-backend))] @@ -542,6 +561,9 @@ (drop 2 args)) "--test-agent-start-delay" (println (env-long "SWARMFORGE_AGENT_START_DELAY_MS" 1500)) "--test-tmux-base-indexes" (test-tmux-base-indexes! (second args)) + "start" (run-main! (System/getProperty "user.dir")) (run-main! (or (first args) (System/getProperty "user.dir"))))) +(load-file (str (fs/parent *file*) "/fork.bb")) + (apply -main *command-line-args*) diff --git a/swarmforge/skills/agent-retro/SKILL.md b/swarmforge/skills/agent-retro/SKILL.md new file mode 100644 index 0000000..58a8bd0 --- /dev/null +++ b/swarmforge/skills/agent-retro/SKILL.md @@ -0,0 +1,183 @@ +--- +name: agent-retro +description: Run a conversation retrospective — analyze what happened in this session, what worked, what didn't, and propose concrete improvements. Use when the user says "retro", "retrospective", "what happened in this session", "session review", "what did we do", "analyze this conversation", or when wrapping up a long session. Especially useful after using a skill you're developing. In swarmforge: invoked automatically as the last step before each role goes idle. +compatibility: Primary — requires `entire` CLI (0.6.2+) for transcript extraction. Fallback — Claude Code ~/.claude/projects/ path. Python 3.8+ for the extraction script. +metadata: + author: gabadi/swarm-forge (fork of giannimassi/agent-retro) + version: "0.1.0" +--- + +# agent-retro + +## Step 1 — Extract Session Data + +**Primary path (entire):** +1. Run `entire session current --json` to get the active session ID and worktree path. +2. If a session ID is returned: + - Run `entire session info --transcript > /tmp/retro-session.jsonl` + - Verify: `python3 ${CLAUDE_SKILL_DIR}/scripts/extract.py /tmp/retro-session.jsonl --metadata-only` + - If verification succeeds, run full extraction: `python3 ${CLAUDE_SKILL_DIR}/scripts/extract.py /tmp/retro-session.jsonl --summary > /tmp/retro-extract.json` + - Proceed to Step 2 with `/tmp/retro-extract.json`. + +**Fallback path (Claude Code only):** +If `entire` is not installed or `entire session current` returns no session: +1. Look for session pid files in `~/.claude/sessions/*.json`. Read each, match `cwd` to `$PWD`. Take the most recently modified matching entry. +2. If found: use the `sessionId` to find the transcript in `~/.claude/projects//.jsonl`. +3. If not found via pid: take the most recently modified `.jsonl` in `~/.claude/projects//`. +4. Verify: `python3 ${CLAUDE_SKILL_DIR}/scripts/extract.py --metadata-only` +5. Run full extraction: `python3 ${CLAUDE_SKILL_DIR}/scripts/extract.py --summary > /tmp/retro-extract.json` + +**If no transcript is found:** Report "No session transcript found" and stop. Do not fabricate data. + +Raw JSONL is 1MB+ per session — never stream transcript bytes inline into context. Always write to a temp file and pass the path to extract.py. + +--- + +## Step 2 — Read the Conversation Arc + +Read `conversation_arc` from `/tmp/retro-extract.json`. This is the full story of the session: every user message and assistant response in order. + +Identify: +- User corrections ("no, not that", "stop", "undo", "wrong") +- Redirects (user changing direction mid-task) +- Repeated instructions (same request given more than once) +- Pivots (abandoned approaches) +- Friction moments (back-and-forth on a single point) + +--- + +## Step 3 — Classify Outcomes + +Classify what the session produced: +- New code / feature +- Bug fix +- Communication (messages, comments, docs) +- Setup / configuration changes +- Spec or design artifact +- Process improvement +- Review or analysis +- Research +- Skill development + +A session may have multiple outcomes. + +--- + +## Step 4 — Analyze What Worked + +Identify: +- First-try successes (task completed without corrections) +- Efficient delegation (agents dispatched with clear scope) +- Good skill matches (right skill for the task) +- Clean conversation flow (no redirects) +- Smart tool choice (right tool, right scope) + +--- + +## Step 5 — Analyze What Didn't Work + +Identify friction patterns: +- User corrections, redirects, repetitions, stops, frustration signals +- Wasted agent dispatches (dispatched but result unused) +- Oversized tool results (large reads never referenced) +- Tool call retries (same tool called multiple times for the same target) +- Abandoned approaches (started, then discarded) +- Over-engineering (more than the task required) +- Under-specification (task started with insufficient context) + +For skill-development retros: read the active SKILL.md (`${CLAUDE_SKILL_DIR}/SKILL.md` of the skill being developed) and identify which instruction caused each friction. + +Read `tool_result_sizes` from the extract — flag any tool result over 50KB that was followed by no further reference to that file. + +--- + +## Step 6 — Propose Actions + +Lead with the defense-first question: **"What defensive rule did this session's work absorb that future maintainers must keep intact?"** Answer it before cataloging friction — rule-shaped learnings surface before cause-shaped ones. + +Capture-first guard: enumerate every candidate learning from Steps 4–5 in full before writing anything to the retro file. Do not filter for "obviousness" or "self-correcting" here — capture everything; the curation stage downstream owns discards. + +For each friction pattern, propose one of these action types: +- `skill-update` — change an existing skill. Include before/after text. +- `skill-create` — create a new skill. +- `rule-update` — change a rule or instruction in CLAUDE.md or a role prompt. +- `rule-create` — create a new rule. +- `setup-change` — change a configuration or environment setting. +- `memory-update` — update or create a memory entry. +- `investigate` — flag something for human review (uncertain root cause). +- `acknowledge` — nothing to change; note what worked well. + +Be specific. "Improve X" is not a proposal. "Change the wording in Step 3 from Y to Z" is a proposal. + +**Scope** — tag every proposed action with exactly one scope value: +- `project` — knowledge about the target project (its code, config, tools, conventions). +- `swarmforge` — knowledge about the harness itself (role prompts, constitution, scripts, pipeline mechanics). +- `skill` — a reusable procedure that should become or amend a skill. +- `ephemeral` — true one-offs; recorded for audit, never promoted. + +--- + +## Step 7 — Write the Retro File + +Write to `~/.claude/worklog/retros/YYYY-MM-DD-.md` where `` is a 3–5 word kebab-case summary of the session. + +Structure: +```markdown +# Session Retro: +Date: YYYY-MM-DD +Session ID: +Role: +Branch: +Duration: m +Cost: $ + +## Token Budget +| Category | Tokens | Cost | +|---|---|---| +| Input | N | $N | +| Output | N | $N | +| Cache create | N | $N | +| Cache read | N | $N | +| **Total** | **N** | **$N** | + +## Tool Result Waste + + +## What Worked + + +## What Didn't Work + + +## Actions +| # | Type | Scope | Description | Target | +|---|------|-------|-------------|--------| +| 1 | skill-update | project | ... | ... | +``` + +--- + +## Step 8 — Walk Through Actions + +Determine the mode: + +**Interactive session (a human is present):** +- Present the retro file path and summary counts (N worked, N didn't work, N actions). +- Walk through each proposed action one by one: show type, scope, description, target. Ask: "Apply? [y/n/defer]". Apply approved actions immediately; mark deferred/skipped in the table. +- After the walkthrough, show the final action table with statuses. + +**Autonomous session (swarmforge role, no human in the loop):** +- Do not ask anything. Do not apply any action. +- Mark every action's status as `pending-curation` in the table and finish the retro file. +- The curator role consumes the file downstream; your only job is complete, well-tagged capture. + +--- + +## Step 9 — Preemptive Handoff Recommendation + +Check `session` metadata from the extract: +- If `turn_count` > 500, `duration_seconds` > 14400 (4h), or `estimated_cost_usd` > 300: + - Add a `investigate` action: "Session size threshold reached — consider handoff" + - Include two ready-to-paste prompts: + - For `/compact`: "Continue from: " + - For `/clear`: "Resume from: — key context: <3 bullet points>" diff --git a/swarmforge/skills/agent-retro/scripts/extract.py b/swarmforge/skills/agent-retro/scripts/extract.py new file mode 100644 index 0000000..21c8794 --- /dev/null +++ b/swarmforge/skills/agent-retro/scripts/extract.py @@ -0,0 +1,630 @@ +#!/usr/bin/env python3 +""" +Extract structured data from a Claude Code session transcript (JSONL). + +Usage: + python extract.py [--subagents-dir ] [--summary] [--metadata-only] + +Outputs JSON to stdout. Use --summary for a compact version that omits +individual tool call details (just counts and key events). +Use --metadata-only for cheap session verification (head/tail read only). +""" + +import json +import sys +import os +import glob +from collections import Counter, defaultdict +from pathlib import Path +from datetime import datetime + +# Approximate pricing per million tokens, by model family. +# Update these when Anthropic changes pricing and bump PRICING_LAST_VERIFIED. +# cache_create = 1.25x input (5-minute TTL); cache_read = 0.1x input. +PRICING_LAST_VERIFIED = "2026-06-14" +PRICE_TABLE = { + "opus": {"input": 5.0, "output": 25.0, "cache_create": 6.25, "cache_read": 0.50}, + "sonnet": {"input": 3.0, "output": 15.0, "cache_create": 3.75, "cache_read": 0.30}, + "haiku": {"input": 1.0, "output": 5.0, "cache_create": 1.25, "cache_read": 0.10}, + "fable": {"input": 10.0, "output": 50.0, "cache_create": 12.5, "cache_read": 1.00}, +} +# Fall back to the most expensive family for an unknown/empty model so cost is +# never silently understated. +DEFAULT_PRICE_FAMILY = "opus" + + +def price_for_model(model): + """Map a model id/name to its pricing family. Unknown models fall back to + DEFAULT_PRICE_FAMILY.""" + m = (model or "").lower() + for family in ("haiku", "sonnet", "opus", "fable"): + if family in m: + return PRICE_TABLE[family] + return PRICE_TABLE[DEFAULT_PRICE_FAMILY] + + +def compute_cost(tokens, model): + """Cost in USD for a token-usage dict, priced for the given model.""" + p = price_for_model(model) + return ( + tokens["input_tokens"] / 1_000_000 * p["input"] + + tokens["output_tokens"] / 1_000_000 * p["output"] + + tokens["cache_creation_input_tokens"] / 1_000_000 * p["cache_create"] + + tokens["cache_read_input_tokens"] / 1_000_000 * p["cache_read"] + ) + +SCHEMA_VERSION = "0.1.0" + +# Head/tail buffer size for lite reads (matches Claude Code's LITE_READ_BUF_SIZE) +LITE_READ_BUF_SIZE = 65536 + + +def stream_jsonl(path): + """Yield parsed records one at a time without loading the full file.""" + with open(path) as f: + for line in f: + line = line.strip() + if line: + try: + yield json.loads(line) + except json.JSONDecodeError: + continue + + +def read_head_tail(path): + """Read first and last 64KB of a file. Returns (head_str, tail_str, file_size).""" + size = os.path.getsize(path) + with open(path, "rb") as f: + head_bytes = f.read(LITE_READ_BUF_SIZE) + head = head_bytes.decode("utf-8", errors="replace") + + if size <= LITE_READ_BUF_SIZE: + return head, head, size + + f.seek(max(0, size - LITE_READ_BUF_SIZE)) + tail_bytes = f.read(LITE_READ_BUF_SIZE) + tail = tail_bytes.decode("utf-8", errors="replace") + + return head, tail, size + + +def extract_json_field(text, key): + """Extract a JSON string field value without full parsing (regex-free). + Matches '"key":"value"' or '"key": "value"' patterns.""" + for pattern in [f'"{key}":"', f'"{key}": "']: + idx = text.find(pattern) + if idx < 0: + continue + start = idx + len(pattern) + i = start + while i < len(text): + if text[i] == "\\": + i += 2 + continue + if text[i] == '"': + return text[start:i] + i += 1 + return None + + +def extract_metadata_lite(path): + """Extract session metadata from head/tail only — no full parse. + Used for session verification and discovery.""" + head, tail, size = read_head_tail(path) + + # Extract from head (start of session) + session_id = extract_json_field(head, "sessionId") + cwd = extract_json_field(head, "cwd") + git_branch = extract_json_field(head, "gitBranch") + version = extract_json_field(head, "version") + start_time = extract_json_field(head, "timestamp") + + # Extract from tail (end of session). When the file exceeds the buffer the + # tail starts mid-line, so the first split element is a partial record whose + # timestamp would be wrong — drop it before scanning. Scan complete lines + # backwards for the last timestamp. + tail_lines = tail.split("\n") + if size > LITE_READ_BUF_SIZE and tail_lines: + tail_lines = tail_lines[1:] + end_time = None + for line in reversed(tail_lines): + ts = extract_json_field(line, "timestamp") + if ts: + end_time = ts + break + + # First user message for verification + first_prompt = None + for line in head.split("\n"): + if '"role":"user"' not in line and '"role": "user"' not in line: + continue + if '"tool_result"' in line: + continue + # Try to extract text content + text = extract_json_field(line, "text") + if text and not text.startswith(""): + first_prompt = text[:200] + break + + duration_seconds = None + if start_time and end_time: + start = parse_ts(start_time) + end = parse_ts(end_time) + if start and end: + duration_seconds = round((end - start).total_seconds()) + + return { + "session_id": session_id, + "cwd": cwd, + "git_branch": git_branch, + "version": version, + "start_time": start_time, + "end_time": end_time, + "duration_seconds": duration_seconds, + "file_size_bytes": size, + "first_prompt": first_prompt, + } + + +def parse_ts(ts_str): + """Parse ISO 8601 timestamp string to datetime.""" + if not ts_str: + return None + try: + return datetime.fromisoformat(ts_str.replace("Z", "+00:00")) + except (ValueError, TypeError): + return None + + +def extract_all_streaming(jsonl_path, subagents_dir=None, summary_mode=False): + """Main extraction pipeline using streaming — processes line-by-line.""" + + # Session metadata + session = { + "session_id": None, + "cwd": None, + "git_branch": None, + "version": None, + "start_time": None, + "end_time": None, + "duration_seconds": None, + "branches_seen": set(), + "model": None, + } + + # Token totals + tokens_total = { + "input_tokens": 0, + "output_tokens": 0, + "cache_creation_input_tokens": 0, + "cache_read_input_tokens": 0, + } + turn_count = 0 + + # Tool tracking + tool_calls = [] + tool_counts = Counter() + total_tool_calls = 0 + + # Tool result sizes (tool_use_id -> size in bytes) + tool_result_sizes = {} + + # Conversation arc + arc = [] + + # Git tracking + branches = set() + commits = [] + prs = [] + + # File tracking + files = defaultdict(set) + + for rec in stream_jsonl(jsonl_path): + # --- Session metadata --- + if rec.get("sessionId") and not session["session_id"]: + session["session_id"] = rec["sessionId"] + if rec.get("cwd") and not session["cwd"]: + session["cwd"] = rec["cwd"] + if rec.get("gitBranch"): + if not session["git_branch"]: + session["git_branch"] = rec["gitBranch"] + session["branches_seen"].add(rec["gitBranch"]) + branches.add(rec["gitBranch"]) + if rec.get("version") and not session["version"]: + session["version"] = rec["version"] + + ts = rec.get("timestamp") + if ts: + if not session["start_time"]: + session["start_time"] = ts + session["end_time"] = ts + + msg = rec.get("message", {}) + role = msg.get("role") + content = msg.get("content", "") + usage = msg.get("usage", {}) + + # --- Token usage (assistant messages only) --- + if usage and role == "assistant": + tokens_total["input_tokens"] += usage.get("input_tokens", 0) + tokens_total["output_tokens"] += usage.get("output_tokens", 0) + tokens_total["cache_creation_input_tokens"] += usage.get("cache_creation_input_tokens", 0) + tokens_total["cache_read_input_tokens"] += usage.get("cache_read_input_tokens", 0) + turn_count += 1 + if not session["model"] and msg.get("model"): + session["model"] = msg.get("model") + + # --- Process content blocks --- + if isinstance(content, list): + for block in content: + if not isinstance(block, dict): + continue + + block_type = block.get("type") + + # Tool use blocks (assistant calling tools) + if block_type == "tool_use": + name = block.get("name", "unknown") + tool_input = block.get("input", {}) + tool_counts[name] += 1 + total_tool_calls += 1 + + call_summary = { + "name": name, + "timestamp": ts, + "tool_use_id": block.get("id", ""), + } + + if name == "Agent": + call_summary["agent_description"] = tool_input.get("description", "") + call_summary["agent_type"] = tool_input.get("subagent_type", "") + call_summary["agent_model"] = tool_input.get("model", "") + call_summary["agent_prompt_preview"] = tool_input.get("prompt", "")[:300] + call_summary["run_in_background"] = tool_input.get("run_in_background", False) + elif name == "Skill": + call_summary["skill_name"] = tool_input.get("skill", "") + call_summary["skill_args"] = tool_input.get("args", "") + elif name == "Bash": + call_summary["command"] = tool_input.get("command", "")[:300] + elif name in ("Read", "Write", "Edit"): + call_summary["file_path"] = tool_input.get("file_path", "") + elif name in ("Grep", "Glob"): + call_summary["pattern"] = tool_input.get("pattern", "") + elif name in ("TaskCreate", "TaskUpdate", "TaskList", "TaskOutput"): + call_summary["task_detail"] = { + k: v for k, v in tool_input.items() + if k in ("description", "status", "id") + } + elif name == "AskUserQuestion": + questions = tool_input.get("questions", []) + call_summary["questions"] = [q.get("question", "") for q in questions] + elif name.startswith("mcp__"): + call_summary["mcp_inputs_preview"] = json.dumps(tool_input)[:300] + + tool_calls.append(call_summary) + + # Track files + fp = tool_input.get("file_path", "") + if fp: + if name == "Read": + files["read"].add(fp) + elif name == "Write": + files["written"].add(fp) + elif name == "Edit": + files["edited"].add(fp) + + # Track git activity from bash commands + if name == "Bash": + cmd = tool_input.get("command", "") + if "git commit" in cmd: + commits.append({"command": cmd[:200], "timestamp": ts}) + if "gh pr" in cmd: + prs.append({"command": cmd[:200], "timestamp": ts}) + + # Tool result blocks — capture SIZE only, not content + elif block_type == "tool_result": + tool_use_id = block.get("tool_use_id", "") + result_content = block.get("content", "") + if isinstance(result_content, str): + size_bytes = len(result_content.encode("utf-8", errors="replace")) + elif isinstance(result_content, list): + # Multi-block results (e.g., images + text) + size_bytes = 0 + for rb in result_content: + if isinstance(rb, dict): + text = rb.get("text", "") + if text: + size_bytes += len(text.encode("utf-8", errors="replace")) + # Image/binary blocks — estimate from base64 if present + data = rb.get("data", "") + if data: + size_bytes += len(data) + elif isinstance(rb, str): + size_bytes += len(rb.encode("utf-8", errors="replace")) + else: + size_bytes = len(json.dumps(result_content).encode("utf-8")) + + if tool_use_id: + tool_result_sizes[tool_use_id] = size_bytes + + # Text blocks — conversation arc (both assistant AND user) + elif block_type == "text": + text = block.get("text", "").strip() + if role == "assistant" and text and len(text) > 20: + arc.append({ + "role": "assistant", + "text": text[:1000], + "timestamp": ts, + }) + + # After processing all blocks in a list-format user message, + # collect text blocks into the arc + if role == "user" and isinstance(content, list): + user_text = "" + for block in content: + if isinstance(block, dict) and block.get("type") == "text": + user_text += block.get("text", "") + elif isinstance(block, str): + user_text += block + user_text = user_text.strip() + if user_text and not user_text.startswith(""): + arc.append({ + "role": "user", + "text": user_text[:2000], + "timestamp": ts, + }) + + # User messages with string content (simple format) + elif role == "user": + text = "" + if isinstance(content, str): + text = content + text = text.strip() + if text and not text.startswith(""): + arc.append({ + "role": "user", + "text": text[:2000], + "timestamp": ts, + }) + + # --- Post-processing --- + + # Compute duration + if session["start_time"] and session["end_time"]: + start = parse_ts(session["start_time"]) + end = parse_ts(session["end_time"]) + if start and end: + session["duration_seconds"] = round((end - start).total_seconds()) + session["branches_seen"] = sorted(session["branches_seen"]) + + # Compute cost (priced for the session's own model) + cost = compute_cost(tokens_total, session["model"]) + + # Attach result sizes to tool calls + for call in tool_calls: + tid = call.get("tool_use_id", "") + if tid in tool_result_sizes: + call["result_size_bytes"] = tool_result_sizes[tid] + + # Compute tool result size stats + result_size_stats = {} + if tool_result_sizes: + sizes_by_tool = defaultdict(list) + for call in tool_calls: + if "result_size_bytes" in call: + sizes_by_tool[call["name"]].append(call["result_size_bytes"]) + + for tool_name, sizes in sorted(sizes_by_tool.items(), key=lambda x: -sum(x[1])): + result_size_stats[tool_name] = { + "count": len(sizes), + "total_bytes": sum(sizes), + "avg_bytes": round(sum(sizes) / len(sizes)), + "max_bytes": max(sizes), + } + + # Extract agents + agents = _extract_agents(tool_calls, subagents_dir) + + # Extract skills + skills = [ + {"name": c.get("skill_name", ""), "args": c.get("skill_args", ""), "timestamp": c.get("timestamp")} + for c in tool_calls if c["name"] == "Skill" + ] + + # Warn if agents exist but have no cost data (subagents_dir missing) + agents_without_cost = [a for a in agents if a.get("estimated_cost_usd") is None + and a.get("description") and not a.get("description", "").startswith("[unmatched")] + if agents_without_cost: + print(f"Warning: {len(agents_without_cost)} agent dispatch(es) have no subagent cost data. " + f"Pass --subagents-dir to attribute subagent costs.", + file=sys.stderr) + + # Build result + result = { + "schema_version": SCHEMA_VERSION, + "session": session, + "tokens": { + "total": tokens_total, + "turn_count": turn_count, + "estimated_cost_usd": round(cost, 4), + }, + "agents": agents, + "skills": skills, + "git": { + "branches": sorted(branches), + "commits": commits, + "pr_operations": prs, + }, + "files": {k: sorted(v) for k, v in files.items()}, + "conversation_arc": arc, + "tool_result_sizes": result_size_stats, + } + + if summary_mode: + result["tools"] = { + "counts": dict(tool_counts.most_common()), + "total_calls": total_tool_calls, + } + else: + result["tools"] = { + "calls": tool_calls, + "counts": dict(tool_counts.most_common()), + "total_calls": total_tool_calls, + } + + return result + + +def _extract_agents(tool_calls, subagents_dir=None): + """Extract agent dispatch details and match with subagent JSONL files.""" + agents = [] + for call in tool_calls: + if call["name"] == "Agent": + agent = { + "description": call.get("agent_description", ""), + "type": call.get("agent_type", "") or "general-purpose", + "model": call.get("agent_model", "") or "inherited", + "prompt_preview": call.get("agent_prompt_preview", ""), + "background": call.get("run_in_background", False), + "timestamp": call.get("timestamp"), + "tool_use_id": call.get("tool_use_id", ""), + "tokens": None, + "estimated_cost_usd": None, + } + if "result_size_bytes" in call: + agent["result_size_bytes"] = call["result_size_bytes"] + agents.append(agent) + + if subagents_dir and os.path.isdir(subagents_dir): + _match_subagent_files(agents, subagents_dir) + + return agents + + +def _match_subagent_files(agents, subagents_dir): + """Match subagent JSONL files to dispatches using timestamp proximity.""" + subagent_files = sorted(glob.glob(os.path.join(subagents_dir, "*.jsonl"))) + MAX_MATCH_WINDOW_S = 60 + + subagent_info = [] + for sa_file in subagent_files: + sa_tokens = {"input_tokens": 0, "output_tokens": 0, + "cache_creation_input_tokens": 0, "cache_read_input_tokens": 0} + sa_start = None + sa_model = None + turn_count = 0 + + for rec in stream_jsonl(sa_file): + msg = rec.get("message", {}) + usage = msg.get("usage", {}) + if usage and msg.get("role") == "assistant": + sa_tokens["input_tokens"] += usage.get("input_tokens", 0) + sa_tokens["output_tokens"] += usage.get("output_tokens", 0) + sa_tokens["cache_creation_input_tokens"] += usage.get("cache_creation_input_tokens", 0) + sa_tokens["cache_read_input_tokens"] += usage.get("cache_read_input_tokens", 0) + turn_count += 1 + if sa_model is None and msg.get("model"): + sa_model = msg.get("model") + if sa_start is None and "timestamp" in rec: + sa_start = parse_ts(rec["timestamp"]) + + # Price each subagent for the model it actually ran on. + sa_cost = compute_cost(sa_tokens, sa_model) + + # Load meta file if present + meta = None + meta_file = sa_file.replace(".jsonl", ".meta.json") + if os.path.exists(meta_file): + with open(meta_file) as f: + meta = json.load(f) + + subagent_info.append({ + "file": os.path.basename(sa_file), + "tokens": sa_tokens, + "cost": round(sa_cost, 4), + "start_time": sa_start, + "model": sa_model, + "meta": meta, + }) + + # Match by timestamp proximity + matched_dispatches = set() + matched_subagents = set() + + for sa_idx, sa in enumerate(subagent_info): + if not sa["start_time"]: + continue + best_match = None + best_delta = None + + for ag_idx, agent in enumerate(agents): + if ag_idx in matched_dispatches: + continue + dispatch_time = parse_ts(agent["timestamp"]) + if not dispatch_time: + continue + delta = abs((sa["start_time"] - dispatch_time).total_seconds()) + if delta <= MAX_MATCH_WINDOW_S and (best_delta is None or delta < best_delta): + best_match = ag_idx + best_delta = delta + + if best_match is not None: + agents[best_match]["tokens"] = sa["tokens"] + agents[best_match]["estimated_cost_usd"] = sa["cost"] + if sa["model"]: + agents[best_match]["model"] = sa["model"] + agents[best_match]["subagent_file"] = sa["file"] + agents[best_match]["match_delta_s"] = round(best_delta, 1) + if sa["meta"]: + agents[best_match]["meta"] = sa["meta"] + matched_dispatches.add(best_match) + matched_subagents.add(sa_idx) + + # Report unmatched subagents + for sa_idx, sa in enumerate(subagent_info): + if sa_idx not in matched_subagents: + agents.append({ + "description": f"[unmatched subagent: {sa['file']}]", + "type": "unknown", + "model": sa["model"] or "unknown", + "prompt_preview": "", + "background": False, + "timestamp": str(sa["start_time"]) if sa["start_time"] else None, + "tool_use_id": "", + "tokens": sa["tokens"], + "estimated_cost_usd": sa["cost"], + "subagent_file": sa["file"], + "match_confidence": "unmatched", + "meta": sa["meta"], + }) + + +if __name__ == "__main__": + if len(sys.argv) < 2: + print("Usage: python extract.py [--subagents-dir ] [--summary] [--metadata-only]") + sys.exit(1) + + jsonl_path = sys.argv[1] + subagents_dir = None + summary_mode = "--summary" in sys.argv + metadata_only = "--metadata-only" in sys.argv + + if metadata_only: + result = extract_metadata_lite(jsonl_path) + print(json.dumps(result, indent=2, default=str)) + sys.exit(0) + + if "--subagents-dir" in sys.argv: + idx = sys.argv.index("--subagents-dir") + if idx + 1 < len(sys.argv): + subagents_dir = sys.argv[idx + 1] + else: + # Auto-detect: look for sibling directory with same name as the JSONL + stem = Path(jsonl_path).stem + candidate = Path(jsonl_path).parent / stem / "subagents" + if candidate.is_dir(): + subagents_dir = str(candidate) + + result = extract_all_streaming(jsonl_path, subagents_dir, summary_mode) + print(json.dumps(result, indent=2, default=str)) diff --git a/swarmforge/skills/setup-swarm/SKILL.md b/swarmforge/skills/setup-swarm/SKILL.md new file mode 100644 index 0000000..4c1b460 --- /dev/null +++ b/swarmforge/skills/setup-swarm/SKILL.md @@ -0,0 +1,127 @@ +--- +name: setup-swarm +description: One-time project setup for SwarmForge. Run before the first `./swarm` launch. Installs language-appropriate quality tools, wires session tracking, writes permission allow-rules, and scaffolds .gitignore. Triggers on "setup swarm", "setup the swarm", "/setup-swarm", "first time setup", or "prepare project for swarm". +compatibility: Requires git, Python 3. Optional but recommended: entire CLI (0.6.2+) for session tracking. +metadata: + author: gabadi/swarm-forge + version: "0.1.0" +--- + +# setup-swarm + +Run this skill **once** before invoking `./swarm`. It prepares the project so the swarm can operate without interruption. If you need to re-run setup, delete `.swarmforge/setup-complete` first. + +--- + +## Step 1 — Ask the operator for the project stack + +Read `swarmforge/constitution/articles/engineering.prompt` and extract the stacks listed under "Language tool table". Present only those stacks as numbered options — do not offer stacks that are not in that table. + +Ask the operator: + +> Which stack is this project? +> (list the stacks found in engineering.prompt, numbered) + +Wait for the operator's answer before proceeding. Do not infer or detect the stack from the repository. + +Once the operator answers, stamp the chosen language into the local engineering article so all agents know the project language. Append to `swarmforge/constitution/articles/local-engineering.prompt`: +```bash +printf '\n## Project Language\n- Project language: .\n' >> swarmforge/constitution/articles/local-engineering.prompt +``` +Where `` is the language name exactly as it appears in the engineering.prompt tool table entry. + +--- + +## Step 2 — Install quality tools + +Read the "Language tool table" section of `swarmforge/constitution/articles/engineering.prompt`. For the chosen stack, install the mutation, CRAP, and DRY tools listed there — use the exact repositories and install method specified in that table. + +Also install the Acceptance Pipeline Specification (APS) tools: +``` +git clone https://github.com/unclebob/Acceptance-Pipeline-Specification /tmp/aps-build +cd /tmp/aps-build && go build -o gherkin-parser ./cmd/gherkin-parser && go build -o gherkin-mutator ./cmd/gherkin-mutator +cp gherkin-parser gherkin-mutator /usr/local/bin/ 2>/dev/null || cp gherkin-parser gherkin-mutator ~/.local/bin/ +``` +Warn and continue if the build fails (APS tools are quality-of-life, not blocking). + +--- + +## Step 3 — Session tracking with entire + +```bash +entire enable --no-github --telemetry=false +``` + +Then, for each unique backend listed in `swarmforge/swarmforge.conf` column 3 (e.g. `claude`, `codex`, `copilot`, `grok`): +```bash +entire agent add +``` + +If `entire` is not installed: print a warning ("entire not found — session tracking skipped") and continue. Setup never blocks on this. + +--- + +## Step 4 — Permission allow-rules + +Write minimal allow-rules to `.claude/settings.json` so the integrator and specifier can run their necessary git/gh commands unattended. Read the current file first (create `{}` if absent), merge in these two rules, and write it back: + +```json +{ + "permissions": { + "allow": [ + "Bash(gh pr merge*)", + "Bash(git reset --hard*)" + ] + } +} +``` + +Use Python to merge (preserve any existing `allow` entries): +```python +import json, pathlib +p = pathlib.Path('.claude/settings.json') +cfg = json.loads(p.read_text()) if p.exists() else {} +cfg.setdefault('permissions', {}).setdefault('allow', []) +for rule in ['Bash(gh pr merge*)', 'Bash(git reset --hard*)']: + if rule not in cfg['permissions']['allow']: + cfg['permissions']['allow'].append(rule) +p.parent.mkdir(exist_ok=True) +p.write_text(json.dumps(cfg, indent=2)) +``` + +--- + +## Step 5 — Scaffold .gitignore and probe default branch + +Ensure these entries exist in `.gitignore` (append if missing, do not duplicate): +``` +.swarmforge/ +.worktrees/ +tmp/ +.claude/skills/swarm-persona/ +``` + +Probe the repository's default remote branch: +```bash +git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||' +``` + +If this resolves to a branch name (e.g. `main`, `master`), record it: +```bash +mkdir -p .swarmforge +echo "" > .swarmforge/default-branch +``` +This file lets the specifier reset to origin's default branch without hard-coding the name. + +--- + +## Step 6 — Emit the swarm-ready marker + +```bash +mkdir -p .swarmforge +printf '%s %s\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$(git rev-parse HEAD 2>/dev/null || echo 'no-git')" > .swarmforge/setup-complete +``` + +Print: `SwarmForge setup complete. Run ./swarm to start the session.` + +The marker's presence is the signal to `./swarm` that the project is ready. If you need to re-run setup, delete this file. diff --git a/test/fork_runner.bb b/test/fork_runner.bb new file mode 100644 index 0000000..fc89a25 --- /dev/null +++ b/test/fork_runner.bb @@ -0,0 +1,88 @@ +#!/usr/bin/env bb +;; Fork extension tests — verify fork.bb overrides exist and produce correct output. +;; Run: bb test/fork_runner.bb + +(require '[babashka.fs :as fs] + '[babashka.process :as process] + '[clojure.string :as str]) + +;; Stubs for swarmforge.bb constants used only in install-skills! (not under test here). +(def cyan "") (def green "") (def yellow "") (def reset "") +(defn sq [v] (str "'" v "'")) + +(load-file (str (fs/cwd) "/swarmforge/scripts/fork.bb")) + +(def failures (atom [])) + +(defn check [label ok?] + (if ok? + (println (str " ok " label)) + (do (println (str " FAIL " label)) + (swap! failures conj label)))) + +;;; write-agent-instruction-file! + +(let [tmp (str (fs/create-temp-file {:prefix "test-instr" :suffix ".md"}))] + (write-agent-instruction-file! "coder" tmp) + (let [content (slurp tmp)] + (check "agent-instruction: contains role identity" + (str/includes? content "You are the coder in a SwarmForge multi-agent development swarm.")) + (check "agent-instruction: points to swarm-persona skill" + (str/includes? content "swarm-persona skill")) + (check "agent-instruction: no Invoke directive (double-load guard)" + (not (str/includes? content "Invoke")))) + (fs/delete (fs/path tmp))) + +;;; write-worktree-settings! + +(let [tmp (str (fs/create-temp-dir {:prefix "test-wt-"}))] + (write-worktree-settings! tmp) + (let [content (slurp (str (fs/path tmp ".claude" "settings.local.json")))] + (check "worktree-settings: autoCompactEnabled" (str/includes? content "autoCompactEnabled")) + (check "worktree-settings: CLAUDE_AUTOCOMPACT_PCT_OVERRIDE" (str/includes? content "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE")) + (check "worktree-settings: CLAUDE_CODE_AUTO_COMPACT_WINDOW" (str/includes? content "CLAUDE_CODE_AUTO_COMPACT_WINDOW")) + (check "worktree-settings: UserPromptSubmit hook" (str/includes? content "UserPromptSubmit")) + (check "worktree-settings: Stop hook" (str/includes? content "Stop")) + (check "worktree-settings: gh pr merge allow rule" (str/includes? content "gh pr merge")) + (check "worktree-settings: git reset allow rule" (str/includes? content "git reset --hard origin/"))) + (fs/delete-tree (fs/path tmp))) + +;;; write-persona-skill-file! (exercises resolve-prompt-bundle transitively) + +(let [root (str (fs/create-temp-dir {:prefix "test-persona-root-"})) + wt (str (fs/create-temp-dir {:prefix "test-persona-wt-"}))] + (fs/create-dirs (fs/path root "swarmforge" "constitution" "articles")) + (spit (str (fs/path root "swarmforge" "constitution.prompt")) "# Constitution\n") + (spit (str (fs/path root "swarmforge" "constitution" "articles" "workflow.prompt")) "# Workflow\n") + (fs/create-dirs (fs/path root "swarmforge" "roles")) + (spit (str (fs/path root "swarmforge" "roles" "coder.prompt")) "# Coder\n") + (let [ctx {:working-dir (fs/path root) + :constitution-file (fs/path root "swarmforge" "constitution.prompt") + :roles-dir (fs/path root "swarmforge" "roles")} + skill-file (str (fs/path wt ".claude" "skills" "swarm-persona" "SKILL.md"))] + (write-persona-skill-file! ctx "coder" wt) + (let [content (slurp skill-file)] + (check "persona-skill: SKILL.md created" (fs/exists? (fs/path skill-file))) + (check "persona-skill: name: swarm-persona" (str/includes? content "name: swarm-persona")) + (check "persona-skill: bundles role file" (str/includes? content "swarmforge/roles/coder.prompt")) + (check "persona-skill: bundles constitution article" (str/includes? content "swarmforge/constitution")))) + (fs/delete-tree (fs/path root)) + (fs/delete-tree (fs/path wt))) + +;;; link-curator-skills! + +(let [tmp (str (fs/create-temp-dir {:prefix "test-curator-"}))] + (fs/create-dirs (fs/path tmp ".agents" "skills" "my-skill")) + (spit (str (fs/path tmp ".agents" "skills" "my-skill" "SKILL.md")) "test\n") + (link-curator-skills! tmp) + (check "link-curator: symlink created in .claude/skills/" + (fs/exists? (fs/path tmp ".claude" "skills" "my-skill"))) + (fs/delete-tree (fs/path tmp))) + +;;; Report + +(println) +(if (empty? @failures) + (do (println (str "All " "fork.bb extension tests passed.")) (System/exit 0)) + (do (println (str (count @failures) " failure(s): " (str/join ", " @failures))) + (System/exit 1)))