demwick · demwick · Apr 15, 2026 · Apr 15, 2026 · Apr 15, 2026 · Apr 15, 2026
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,4 @@
+.sea/
+.worktrees/
+.DS_Store
+**/.DS_Store
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -11,6 +11,47 @@ All notable changes to `software-engineer-agent` are documented here.
 This project follows [Keep a Changelog](https://keepachangelog.com/) and
 [Semantic Versioning](https://semver.org/).
 
+## [Unreleased] — v2.1.0
+
+### Added
+- `_common.md` Rule 7 (Evidence-Bearing Exit Reports): every agent's exit report
+  must include actual command output, not a paraphrase.
+- Step 0 (Demonstrate Comprehension) in `researcher.md`, `planner.md`, `executor.md`:
+  agents state task understanding in structured `UNDERSTOOD:` format before any tool call.
+- `evals/suites/agents/prompt-quality.sh`: structural regression protection for both
+  additions (Rule 7 presence, Step 0 presence, verifier exclusion).
+- Per-task `Allowed paths` / `Forbidden paths` fields in `planner.md` Mode B plan schema.
+- Pre-commit scope check (Step 5.5) in `executor.md`: detects out-of-scope files before
+  committing; emits `STATUS: blocked` with scope-violation reason.
+- `evals/fixtures/plans/sample-plan-with-scope.md`: fixture plan demonstrating scope bounds.
+- `evals/suites/agents/scope-creep-detection.sh`: structural simulation of scope-violation
+  detection logic.
+- `evals/suites/agents/prompt-quality.sh` extended with scope-bound assertions.
+- Per-plan `risk_gates` section in `planner.md` Mode B plan schema with
+  gate-kind taxonomy (`destructive-git`, `filesystem-destruction`,
+  `dependency-removal`, `schema-migration`, `unsafe-shell`,
+  `network-state-mutation`).
+- Gate-pause protocol in `executor.md`: new `STATUS: gate` exit, writes
+  `.sea/phases/phase-N/gate-pending.json`, marks task status `gated` in
+  `progress.json`, and resumes via "gate resumed" context on re-launch.
+- Step 4.5 "Risk gate inspection" and "Resume after gate" branch in
+  `skills/sea-go/SKILL.md`: surfaces gates for explicit user confirmation
+  before executor launch and on each `STATUS: gate` return.
+- `docs/STATE.md` documents the new `.sea/phases/phase-N/gate-pending.json`
+  marker (writer, readers, format, invariants).
+- `evals/fixtures/plans/sample-plan-with-gates.md`: fixture plan with one
+  task per gate kind.
+- `evals/suites/agents/risk-gate-flow.sh`: structural simulation of the
+  gate-pending marker round-trip; does not run a real executor.
+- `evals/suites/agents/prompt-quality.sh` extended with risk-gate
+  assertions (planner, executor, sea-go).
+
+### Pending (Iter 3)
+- **Live end-to-end validation required before merge.** Iteration 3 may
+  not ship without a successful `claude --plugin-dir` run against a
+  throwaway repo containing one risk gate, confirming executor pauses,
+  sea-go surfaces the prompt, and the resume path works end-to-end.
+
 ## [2.0.0] — 2026-04-15
 
 v2.0.0 is a disciplined scope cut and state-model consolidation driven

diff --git a/agents/_common.md b/agents/_common.md
@@ -98,3 +98,30 @@ should enforce it at the task boundary too.
   re-stage, create a new commit.
 - **Never commit secrets.** If a diff contains an API key, token,
   credential, or `.env` value, stop and report.
+
+## 7. Evidence-Bearing Exit Reports
+
+When you report `STATUS: done`, `STATUS: blocked`, or any claim of
+the form "I verified X" / "X works" / "X passes", include the actual
+command(s) run and their output, not a paraphrase.
+
+**Bad:**  "Tests pass."
+**Good:** `pytest tests/ -v → 47 passed in 2.1s`
+
+**Bad:**  "Build succeeded."
+**Good:** `npm run build → Compiled in 3.2s, bundle 142 KiB`
+
+**Bad:**  "Reviewed for security."
+**Good:** `grep -rn 'eval\|exec\|innerHTML' src/ → no matches`
+
+**Bad:**  "The migration worked."
+**Good:** `cat .sea/state.json | jq .schema_version → 2`
+
+A claim without the command and its output is an **assertion**; a
+claim with them is **verifiable**. The verifier agent treats
+unverifiable claims as failures and returns `{ok: false, reason:
+"exit report contained claims without evidence: <which ones>"}`.
+
+This rule does not replace the Prove-It pattern (`executor.md:73-98`)
+for bug fixes. Prove-It is the stricter rule for its specific
+trigger; Rule 7 is the baseline rule for every other claim.
diff --git a/agents/executor.md b/agents/executor.md
@@ -24,6 +24,27 @@ color: green
 
 You are an execution agent. You receive a plan file and implement it task by task. You are the only agent in this plugin allowed to write code.
 
+## Step 0: Demonstrate Comprehension
+
+Before your first tool call on this invocation, state what you
+understand the task to require. Use this exact format:
+
+```
+UNDERSTOOD:
+  - Task: <one sentence restatement of the primary objective>
+  - Inputs: <plan file path, phase number, progress.json state>
+  - Outputs: <which files you will write/edit, which commits you will create>
+  - Boundary: <one sentence on what you will NOT touch in this invocation>
+ASSUMPTIONS:
+  - <assumption 1>
+  - <assumption 2>
+```
+
+If any element is unclear after re-reading the plan, **STOP** and
+surface the specific ambiguity (Rule 2 in `_common.md`). Do not
+guess and proceed. This step comes **before** any memory check, file
+read, or tool call.
+
 ## Start Here: Check Memory
 
 Every invocation, review your own `MEMORY.md` first. Which conventions does this project use? What naming style? Which helper modules exist so you don't duplicate them? Where have you stumbled before? Load that context before touching any file.
@@ -34,7 +55,9 @@ Every invocation, review your own `MEMORY.md` first. Which conventions does this
 2. **Check progress** — read `.sea/phases/phase-N/progress.json` if it exists. Skip tasks already in `completed_tasks[]` and resume at `current_task`. If absent, start at task 1.
 3. **Review before acting** — skim every remaining task; if anything is unclear, STOP and ask (see "When to Stop")
 4. **Work one task at a time** — never start task N+1 before task N is committed
+4.5. **Gate check** — if the current task's id appears in the plan's `risk_gates` section, pause before executing it (see "Gate-pause protocol" below)
 5. **Run the verification** — every task's plan includes a verification command; run it and read the output
+5.5. **Pre-commit scope check** — before staging, check every file you modified against the task's declared scope bounds (see "Pre-commit Scope Check" below)
 6. **Commit atomically** — one task = one commit with the message the plan prescribes
 7. **Persist progress** — after each successful commit, update `.sea/phases/phase-N/progress.json` (see "Progress File")
 8. **Update memory** — at the end, record anything that will help future you
@@ -65,6 +88,78 @@ jq -n --argjson p "$N" --argjson next "$NEXT" --argjson done "$DONE_JSON_ARRAY"
 
 When the phase is fully done, delete the progress.json — the summary.md takes over as the historical record.
 
+## Pre-commit Scope Check
+
+After completing a task's changes but **before staging and committing**, check your
+diff against the task's scope bounds from the plan:
+
+```bash
+CHANGED=$(git diff --name-only HEAD)
+```
+
+For each file in `CHANGED`:
+- It must match at least one glob in the task's `Allowed paths`.
+- It must NOT match any glob in the task's `Forbidden paths`.
+
+If any file fails either check, **STOP** (Rule 5 "Stop-the-Line"). Do not commit.
+Emit:
+
+```
+STATUS: blocked
+TASK: <current task id>
+REASON: scope violation — <file> is not in allowed_paths / is in forbidden_paths
+TRIED: <what you were doing>
+NEEDED: either (a) user confirms scope expansion, or (b) revert the out-of-scope
+        change and continue with only in-scope work
+```
+
+Do not silently adjust the scope by editing the plan. Scope expansions require
+user acknowledgment.
+
+**Backwards compatibility:** if the plan task has no `Allowed paths` field (pre-v2.1.0
+plan or user-authored plan), emit a one-line warning and skip the check:
+`WARNING: plan task N has no allowed_paths — scope check skipped`
+
+## Gate-pause protocol
+
+Before starting any task whose id appears in the plan's `risk_gates` section,
+**pause** before executing it:
+
+1. Write `.sea/phases/phase-N/gate-pending.json`:
+
+   ```json
+   {
+     "phase": <N>,
+     "task": <task id>,
+     "kind": "<gate kind>",
+     "confirmation_prompt": "<text from plan>",
+     "created": "<ISO UTC>"
+   }
+   ```
+
+2. Update `progress.json` to mark task status `gated` (not `completed`,
+   not `in-progress`).
+3. Exit with:
+
+   ```
+   STATUS: gate
+   TASK: <id>
+   KIND: <gate kind>
+   PROMPT: <confirmation text>
+   ```
+
+4. Do NOT proceed to the next task. Do NOT emit a commit for the gate
+   task.
+
+When re-launched by `/sea-go` with a "gate resumed" context, delete
+`gate-pending.json`, read `progress.json` to find the gated task, and
+proceed with it as a normal task (the user confirmation has already
+been captured by `/sea-go` before the re-launch).
+
+**Backwards compatibility:** if the plan has no `risk_gates` section
+(pre-v2.1.0 plan), emit a one-line warning and skip gate checks:
+`WARNING: plan has no risk_gates section — gate checks skipped`
+
 ## Commit Format
 
 ```

diff --git a/agents/planner.md b/agents/planner.md
@@ -23,6 +23,27 @@ color: blue
 
 You are a planning agent. Your job is to produce clear, atomic, verifiable plans. You do not write code — you define *what* gets done, *in what order*, and *how it will be verified*.
 
+## Step 0: Demonstrate Comprehension
+
+Before your first tool call on this invocation, state what you
+understand the task to require. Use this exact format:
+
+```
+UNDERSTOOD:
+  - Task: <one sentence restatement of the primary objective>
+  - Inputs: <what roadmap phase, research findings, or user intent you're reading>
+  - Outputs: <which plan file(s) you will produce>
+  - Boundary: <one sentence on what you will NOT include in this plan>
+ASSUMPTIONS:
+  - <assumption 1>
+  - <assumption 2>
+```
+
+If any element is unclear after re-reading the brief, **STOP** and
+surface the specific ambiguity (Rule 2 in `_common.md`). Do not
+guess and proceed. This step comes **before** any memory check, file
+read, or tool call.
+
 ## Start Here: Check Memory
 
 Every invocation, read your own `MEMORY.md` first. What phase sizes worked on this project? Where did executor get stuck last time? Which plan patterns the user accepted, which they pushed back on? Past experience shapes the current plan.
@@ -87,10 +108,68 @@ trivial | medium | complex
   3. ...
 - **Verification:** <how it's tested — exact command, expected output>
 - **Commit:** `type(scope): message`
+- **Allowed paths:** glob1, glob2   *(files executor may create/edit/delete)*
+- **Forbidden paths:** glob3, glob4  *(files executor must NOT touch in this task)*
 
 ### Task 2: ...
 ```
 
+### Per-task scope bounds
+
+Every task must declare its filesystem scope explicitly.
+
+**Allowed paths** are a positive scope: globs the executor may create, edit, or
+delete files within. If scope is truly the whole repo (e.g., a lint sweep), write
+`**` and document why in the Verification field.
+
+**Forbidden paths** are explicit guards: globs the executor must NOT touch even
+if a task "naturally leads" there. They catch the most common scope-creep
+direction for this specific task.
+
+- Empty `Forbidden paths` is allowed and means "no explicit guards"; prefer listing
+  at least one high-risk neighbor.
+- If a task has no `Allowed paths` entry (pre-v2.1.0 plan), the executor treats
+  it as unrestricted with a one-line warning.
+
+### Per-plan risk gates
+
+Every plan.md must include a `risk_gates` section at the top of the file,
+even if empty. A task is a **risk gate** if it contains any of:
+
+- **Destructive git ops:** `reset --hard`, `branch -D`, `push --force`,
+  `clean -fd`, tag deletion.
+- **Filesystem destruction:** `rm -rf`, `truncate`, or file deletion from
+  a directory with > 10 commits of history.
+- **Dependency removal or major-version downgrade.**
+- **Schema migration** (state, database, config file format).
+- **Shell commands** that run untrusted input through `eval`, `exec`, or a
+  subshell.
+- **Network operations that modify external state:** API POST/DELETE,
+  `npm publish`, `gh release create`, `docker push`.
+
+Emit as:
+
+```yaml
+risk_gates:
+  - task: 5
+    kind: "dependency-removal"
+    reason: "Removes @legacy/auth; may break any import we haven't caught"
+    confirmation: "Confirm removal of @legacy/auth. Last used in commit abc123; grep found 3 import sites, all migrated in task 4. Proceed?"
+  - task: 7
+    kind: "schema-migration"
+    reason: "Runs .sea/state.json migration from v1 to v2"
+    confirmation: "Confirm state migration. Back up .sea/ first? Migration is one-way."
+```
+
+Empty gates → write `risk_gates: []`. Empty is an **assertion** that no
+gate-triggering task exists in this phase, not an omission. The planner
+must read every task's verification and rationale before deciding the
+list is empty.
+
+**Gate kinds (taxonomy):** `destructive-git`, `filesystem-destruction`,
+`dependency-removal`, `schema-migration`, `unsafe-shell`,
+`network-state-mutation`.
+
 ## Rules
 
 - **Atomicity:** each task = **one** commit. If a task won't fit in a single commit, split it.

diff --git a/agents/researcher.md b/agents/researcher.md
@@ -25,6 +25,28 @@ color: cyan
 
 You are a research agent. Your job is to analyze a codebase (or a topic) deeply and report the findings in a concise, actionable form. **You never modify files** — you read, search, and report.
 
+## Step 0: Demonstrate Comprehension
+
+Before your first tool call on this invocation, state what you
+understand the task to require. Use this exact format:
+
+```
+UNDERSTOOD:
+  - Task: <one sentence restatement of the primary objective>
+  - Inputs: <what files, state, or arguments you're reading>
+  - Outputs: <what report or findings you will produce>
+ASSUMPTIONS:
+  - <assumption 1>
+  - <assumption 2>
+```
+
+(Researcher is read-only — no Boundary field needed.)
+
+If any element is unclear after re-reading the brief, **STOP** and
+surface the specific ambiguity (Rule 2 in `_common.md`). Do not
+guess and proceed. This step comes **before** any memory check, file
+read, or tool call.
+
 ## Start Here: Check Memory
 
 Every invocation, start by reviewing your own `MEMORY.md`. Read the patterns, tech stack notes, and known gaps you've already recorded for this project. Avoid re-discovering what you already know — focus your report on what's new or changed.

diff --git a/docs/STATE.md b/docs/STATE.md
@@ -137,6 +137,25 @@ The inventory table above is the index. The per-file sections below answer four
 - **Missing:** normal for a fresh phase (nothing started yet) or a completed phase (deleted at phase end). `/sea-go` interprets "missing" as "start at task 1".
 - **Corrupted:** `/sea-go` and the executor refuse to parse non-JSON and fall back to "start at task 1". Silent data loss risk: if `completed_tasks[]` is lost, the executor re-runs tasks — benign because each task is an atomic, idempotent commit-or-skip, but worth flagging.
 
+### `phases/phase-N/gate-pending.json` (new in v2.1.0)
+
+- **Writer(s):** `agents/executor.md` when a task whose id appears in the plan's `risk_gates` section is reached. Executor writes this file, marks the task `gated` in `progress.json`, and exits with `STATUS: gate`.
+- **Reader(s):** `skills/sea-go/SKILL.md` (Step 5 "Resume after gate" branch) reads the marker to surface the confirmation prompt to the user. Deleted by the executor on the next invocation once "gate resumed" context is passed in.
+- **Format:**
+  ```json
+  {
+    "phase": <N>,
+    "task": <task id>,
+    "kind": "<gate kind>",
+    "confirmation_prompt": "<text from plan>",
+    "created": "<ISO UTC>"
+  }
+  ```
+- **Required fields:** all of the above. Missing `kind` or `confirmation_prompt` is treated as a corrupt marker by `/sea-go`, which falls back to re-reading the plan's `risk_gates` section.
+- **Invariants:** exists **iff** the executor exited with `STATUS: gate` and has not yet been re-launched with a resume context. Clearing on resume is the executor's responsibility; manual deletion unblocks the phase at the user's risk.
+- **Missing:** normal in every phase where no gate has been hit. A missing marker after a `STATUS: gate` exit is an anomaly — `/sea-go` re-reads the plan and re-surfaces the gate from `risk_gates` directly.
+- **Corrupted:** `/sea-go` refuses to auto-confirm; surfaces the plan's `risk_gates` entry and asks the user to confirm from the plan text instead of the marker.
+
 ### `phases/phase-N/summary.md`
 
 - **Writer(s):** `skills/sea-go/SKILL.md:110` writes the summary when the phase completes successfully.