intrepideai · alexandertomana · Jun 10, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,29 @@
 
 ## Unreleased
 
+- **`guards.protect` + `no_protected_edits`** — pin the files your checks
+  *mean* (package.json scripts, lint/test/build configs). They're hashed into
+  the baseline like the donefile itself; any change, deletion, or new
+  shadowing file is a finding. Closes the `"test": "exit 0"` hole. Falls back
+  to the git diff when there's no baseline, so it works in CI too.
+- **`SubagentStop` adapter (Claude Code)** — `donegate install claude` now
+  also wires a guards-only tamper scan at every subagent boundary
+  (`donegate hook claude --subagent`). No checks run, so fan-out workflows are
+  gated per node at git-diff cost; subagent bounces use their own ledger.
+- **`donegate check --against <ref>`** — judge mode: evaluate checks + guards
+  against an explicit git ref, ignoring the session baseline. Makes donegate
+  scriptable as the deterministic judge in fan-out workflows (grade each
+  worktree against its fork point) and re-derives verdicts from git history
+  alone. Receipts record the comparison as `explicit`; a nonexistent ref is a
+  config error, not a silent pass.
+- **Progress-aware bounce budget** — `gate.max_bounces` now counts
+  *consecutive bounces without new progress*: a stop attempt with strictly
+  fewer failing checks + tripped guards than the session's best refreshes the
+  budget (and says so). An agent steadily fixing a long list is never cut off
+  mid-fix; "best ever" as the bar keeps total bounces bounded.
+- `docs/agent-loops.md` — where donegate sits in agentic loops and dynamic
+  workflows: terminal gate, per-subagent guard scan, judge mode, worktree
+  behavior.
 - **The donefile can no longer be deleted or broken out of the way.** The stop
   hook used to treat a missing DONE.md as "not my repo" and an unparseable one
   as a config typo — both fail-open, both one `rm` or one bad edit away from

diff --git a/README.md b/README.md
@@ -139,9 +139,12 @@ guards:
   no_done_edits: true      # this file edited mid-session         → fail
   no_new_todos: warn
   no_debug_artifacts: warn
+  protect:                 # files that define what the checks MEAN
+    - package.json         # ("test": "exit 0" is not a fix)
+    - eslint.config.js
 
 gate:
-  max_bounces: 3           # re-prompts per session before giving up
+  max_bounces: 3           # no-progress re-prompts before giving up
 ```
 ````
 
@@ -162,6 +165,7 @@ tries to finish, it diffs reality against that baseline:
 | `no_deleted_tests` | deleted test files, per-file test counts dropping | fail |
 | `no_disabled_lint` | `eslint-disable` `biome-ignore` `@ts-ignore` `# noqa` `# type: ignore` `//nolint` `#[allow(...)]` `@SuppressWarnings` `rubocop:disable` — added anywhere | fail |
 | `no_done_edits` | DONE.md modified or deleted mid-session | fail |
+| `no_protected_edits` | files listed in `guards.protect` (package.json, lint/test configs — the files that define what the checks *mean*) changed, deleted, or shadowed | fail |
 | `no_new_todos` | `TODO` / `FIXME` / `HACK` introduced in code | warn |
 | `no_debug_artifacts` | `console.log` `debugger` `breakpoint()` `pdb.set_trace` `binding.pry` `dbg!` left in non-test code | warn |
 
@@ -176,16 +180,20 @@ followed, so moving a test file is never "deleting" it.
 
 Guards are a **ratchet, not a sandbox**: they make the cheap, common shortcuts
 loud and expensive, with receipts. An agent with shell access can still find
-quieter moves — weakening assertions, redefining what `npm test` means in
-package.json, re-blessing the baseline itself. What the gate catches, what it
-deliberately doesn't, and why CI is the copy of the gate an agent can't touch:
-[docs/threat-model.md](docs/threat-model.md).
+quieter moves — weakening assertions, re-blessing the baseline itself. What
+the gate catches, what it deliberately doesn't, and why CI is the copy of the
+gate an agent can't touch: [docs/threat-model.md](docs/threat-model.md).
+
+Running fan-out workflows with subagents and worktrees? donegate gates those
+boundaries too — a guards-only scan at every `SubagentStop`, and
+`check --against <ref>` as the deterministic judge over any diff:
+[docs/agent-loops.md](docs/agent-loops.md).
 
 ## Works with
 
 | | command | mechanism |
 |---|---|---|
-| **Claude Code** | `donegate install claude` | `Stop` hook — blocks the stop, feeds failures back |
+| **Claude Code** | `donegate install claude` | `Stop` hook — blocks the stop, feeds failures back · `SubagentStop` — guards-only scan per subagent |
 | **Codex CLI** | `donegate install codex` | `Stop` hook (`.codex/hooks.json`) |
 | **Cursor** | `donegate install cursor` | `stop` hook → `followup_message` |
 | **GitHub Actions** | `donegate install ci` | gates PRs, posts the receipt as a comment |
@@ -239,6 +247,11 @@ is not the agent's to edit.
 **Won't it delete the failing test?** That trips `no_deleted_tests` — file
 deletions *and* per-file test-count drops.
 
+**Won't it just change what `npm test` means in package.json?** List the
+files your checks depend on in `guards.protect` and that trips
+`no_protected_edits` — they're hashed into the baseline like the donefile
+itself.
+
 **Does this replace CI?** No — it runs *before* the agent declares victory,
 while it still has context to fix things. CI stays as the backstop (and
 `donegate install ci` makes CI speak DONE.md too).

diff --git a/docs/agent-loops.md b/docs/agent-loops.md
@@ -0,0 +1,90 @@
+# donegate in agent loops
+
+Coding agents run a loop: gather context → take action → verify → repeat.
+Increasingly that loop fans out — orchestrators spawn subagents, subagents get
+their own worktrees, workflow scripts coordinate the lot. donegate has a
+specific seat at three points of that topology, and this page maps them.
+
+## The three seats
+
+| where | mechanism | what runs | cost |
+|---|---|---|---|
+| **terminal stop** | `Stop` hook | full gate: checks + guards | your test suite |
+| **subagent boundary** | `SubagentStop` hook (`hook claude --subagent`) | guards only | git diffs + regexes — fast |
+| **judge in a fan-out** | `donegate check --against <ref> --json` | checks + guards vs an explicit ref | your call (use `--only` to scope) |
+
+### Terminal stop — the gate on the loop's exit
+
+The classic donegate role: the agent tries to finish, the gate runs the
+repo's definition of done, failure bounces the agent back with the report in
+its context. This is the **deterministic verifier** in the loop's
+verify-work phase — exit codes and diffs, no LLM judging anything, which also
+means it can't share an LLM judge's self-preference for the code that was
+just written.
+
+### Subagent boundary — tamper scan per node
+
+A full test suite per subagent would be brutal; a tamper scan isn't. The
+`SubagentStop` hook (installed automatically by `donegate install claude`)
+runs **guards only**: did this subagent skip or delete tests, silence the
+linter, touch a protected file, edit the donefile? Findings block the
+subagent's completion the same way the stop hook blocks the session — the
+finding lands while the subagent still has the context to undo it, instead of
+surfacing at the terminal stop after its output was already absorbed.
+
+Read-only subagents (searchers, reviewers) change nothing, trip nothing, and
+pay one git diff. Subagent bounces are tracked in their own ledger
+(`<session>:subagent`), so a noisy fan-out can't burn the bounce budget the
+terminal gate relies on.
+
+### Judge mode — `--against` in workflow scripts
+
+Fan-out patterns end with verification: N agents produced N diffs, something
+deterministic should grade them before anything merges. `--against` pins the
+comparison to an explicit ref — the worktree's fork point, the PR base —
+instead of whatever baseline/merge-base resolution would guess. With `--json`
+the receipt is machine-readable; the exit code is the verdict
+(0 done / 1 checks failed / 3 bar was lowered).
+
+```js
+// inside a workflow script: judge each worktree before accepting it
+const verdict = await bash(
+  `cd ${worktree} && npx -y donegate check --against ${forkPoint} --json --quiet`,
+);
+// exit 0 → accept; exit 3 → the diff "passes" because the bar moved — reject loudly
+```
+
+`--against` deliberately **ignores the session baseline** — judge mode judges
+a diff, not a session. That also makes it the answer to a re-blessed
+baseline: `donegate check --against origin/main` re-derives the verdict from
+git history alone.
+
+## Worktree behavior
+
+Linked worktrees get their own `.donegate/` (it's per-root and gitignored).
+Inside a fresh worktree there is usually **no session baseline**, so guards
+fall back to git comparisons — added-line scans against HEAD or merge-base
+still work; baseline-only detections (count drops in untouched files,
+protected-file hashes) degrade gracefully. For full-strength guards in a
+worktree, record a baseline when it's created (`donegate baseline`) or judge
+it from outside with `--against <fork point>`.
+
+## Loop-until-done, bounded
+
+A fixed bounce cap fights the loop: an agent steadily fixing a long failure
+list gets cut off mid-fix. donegate's budget counts **consecutive bounces
+without new progress** — when a stop attempt's failure count (failing checks +
+tripped guards) drops below the session's best, the budget refreshes and the
+agent is told so. "Best ever" is the bar, not "better than last time," so
+oscillating between two failure sets can't farm refreshes; total bounces stay
+bounded and a wedged session still exits with a red receipt.
+
+## What this does not change
+
+The loop's failure modes that donegate addresses are the *mechanical* ones:
+declaring done early (agentic laziness that trips a check), lowering the bar
+to get green (guards), drifting past the definition of done (DONE.md is
+re-read from disk every stop — compaction can't summarize it away). The
+*semantic* failure modes — weakened assertions, vacuous tests, an agent
+grading its own homework — still need a clean-context reviewer or a human;
+see [threat-model.md](threat-model.md) for the honest boundary.
diff --git a/docs/hooks.md b/docs/hooks.md
@@ -1,12 +1,18 @@
 # Agent integrations
 
-`donegate install <target>` wires the gate into an agent's lifecycle. Two hooks
-get installed per agent:
+`donegate install <target>` wires the gate into an agent's lifecycle:
 
 - **session start** → `donegate baseline --if-missing --quiet` — snapshots
-  test files and DONE.md so the tamper guards have something to diff against.
+  test files, protected files, and DONE.md so the tamper guards have something
+  to diff against.
 - **stop** → `donegate hook <agent>` — runs the full gate when the agent tries
   to finish, and blocks the stop (with the failure report) if the verdict is red.
+- **subagent stop** (Claude Code only) → `donegate hook claude --subagent` —
+  a **guards-only** tamper scan at every subagent boundary. No checks run, so
+  it's cheap enough to pay per subagent; a subagent that skipped tests or
+  touched a protected file is bounced while it still has the context to undo
+  it. Subagent bounces use their own ledger so a noisy fan-out can't burn the
+  terminal gate's budget.
 
 Project-level installs are the default and are **shareable** — commit the config
 and every teammate's agent is gated too. Add `--global` to install at the user
@@ -29,6 +35,9 @@ budget — keep their sum under the stop timeout.
     "Stop": [
       { "hooks": [{ "type": "command", "command": "npx -y donegate hook claude" }] }
     ],
+    "SubagentStop": [
+      { "hooks": [{ "type": "command", "command": "npx -y donegate hook claude --subagent" }] }
+    ],
     "SessionStart": [
       { "hooks": [{ "type": "command", "command": "npx -y donegate baseline --if-missing --quiet" }] }
     ]
@@ -37,7 +46,9 @@ budget — keep their sum under the stop timeout.
 ```
 
 On a red verdict the hook prints `{"decision": "block", "reason": "<report>"}`
-— Claude Code keeps the session going and feeds the report to the model.
+— Claude Code keeps the session going and feeds the report to the model. The
+`SubagentStop` entry speaks the same contract but runs guards only (see
+[agent-loops.md](agent-loops.md)).
 
 ## Codex CLI
 
@@ -85,10 +96,16 @@ because tests were deleted" is visible right in the review.
 
 A stop hook that can block forever is a hostage situation, so every block
 increments a per-session bounce counter (`.donegate/state.json`, pruned after
-24h). After `gate.max_bounces` (default 3) the gate stops blocking and lets the
-stop through with a loud warning — but it keeps verifying, so the receipt
-always tells the truth. Sessions that recover reset their counter on the first
-green run.
+24h). After `gate.max_bounces` (default 3) **consecutive attempts without new
+progress** the gate stops blocking and lets the stop through with a loud
+warning — but it keeps verifying, so the receipt always tells the truth.
+
+Progress refreshes the budget: when a stop attempt's failure count (failing
+checks + tripped guards) drops strictly below the session's best so far, the
+counter resets and the agent is told so — a session steadily fixing a long
+list is never cut off mid-fix. "Best ever" is the bar rather than "better
+than last time", so oscillating between failure sets can't farm refreshes and
+total bounces stay bounded. Sessions that go green reset entirely.
 
 ## When the gate itself is the target
 

diff --git a/docs/spec.md b/docs/spec.md
@@ -40,14 +40,24 @@ guards:               # optional — tamper detection levels
   no_disabled_lint: true     # eslint-disable/noqa/@ts-ignore/nolint added
   no_new_todos: warn         # TODO/FIXME/HACK introduced
   no_debug_artifacts: warn   # console.log/debugger/pdb.set_trace left behind
+  no_protected_edits: true   # files matching `protect` changed mid-session
   test_globs:                # optional — what counts as a test file
     ["**/*.test.*", "**/*.spec.*", "**/test_*.py", "**/*_test.go", "..."]
   exclude: []                # optional — files exempt from guard analysis
                              # (for code that legitimately CONTAINS the
                              # patterns: lint configs, scanners, donegate itself)
+  protect: []                # optional — globs for files the verdict depends on
+                             # but the gate doesn't run: the files that define
+                             # what the check commands MEAN (package.json,
+                             # eslint/jest/pytest/tsconfig configs). Hashed into
+                             # the baseline; any change, deletion, or new
+                             # shadowing file trips no_protected_edits.
 
 gate:                 # optional
-  max_bounces: 3      # stop-hook re-prompts per session before giving up (1-20)
+  max_bounces: 3      # consecutive no-progress stop-hook re-prompts per
+                      # session before giving up (1-20); progress — a strictly
+                      # lower failing-check + tripped-guard count than the
+                      # session's best — refreshes the budget
 ```
 
 Guard levels: `true` (findings fail the gate), `"warn"` (findings are reported
@@ -73,10 +83,17 @@ pass?"* Guards ask ***"was the bar lowered so it would pass?"*** They compare
 the current tree against a **baseline**:
 
 1. a **session baseline** recorded when an agent session starts (test-file
-   hashes, test/skip counts, the DONE.md hash, and the git HEAD at that moment), or
+   hashes, test/skip counts, hashes of `guards.protect` files, the DONE.md
+   hash, and the git HEAD at that moment), or
 2. **HEAD**, when there's uncommitted work and no session baseline, or
 3. the **merge-base with the default branch**, for clean trees (the CI case).
 
+An **explicit ref** (`donegate check --against <ref>`) overrides all three,
+including the session baseline: judge mode evaluates a diff, not a session.
+The verdict is then derivable from git history alone — useful for grading
+fan-out worktrees from a workflow script, pinning CI to the PR base, or
+re-deriving a verdict past a re-blessed baseline.
+
 All guard findings are deterministic, diff-based, and cite `file:line`
 evidence. Guards never call a model and never make network requests.
 
@@ -110,9 +127,14 @@ tries to finish:
   output tails, guard findings with file:line) is fed back to the agent, which
   keeps working. Each block increments a per-session **bounce counter**.
 - **pass** → the stop proceeds; the bounce counter resets; the receipt is green.
-- **bounces exhausted** (`gate.max_bounces`) → the gate stops *blocking* but
-  never stops *verifying*: the stop is allowed with a loud warning and a red
-  receipt. The gate must not be able to trap an agent in an infinite loop.
+- **progress** → a stop attempt whose failure count (failing checks + tripped
+  guards) is strictly below the session's best **refreshes the bounce budget**:
+  an agent steadily working down a list is never cut off mid-fix. Best-ever is
+  the bar, so alternating between failure sets cannot farm refreshes.
+- **bounces exhausted** (`gate.max_bounces` consecutive attempts without new
+  progress) → the gate stops *blocking* but never stops *verifying*: the stop
+  is allowed with a loud warning and a red receipt. The gate must not be able
+  to trap an agent in an infinite loop.
 - a repo **without** a DONE.md → the hook is a silent no-op. A **broken**
   DONE.md → warn and allow (a config typo must never wedge an agent).
 - user-initiated aborts are never blocked.

diff --git a/docs/threat-model.md b/docs/threat-model.md
@@ -20,6 +20,8 @@ Deterministic, diff-based, with `file:line` receipts:
 | DONE.md edited mid-session | `no_done_edits` → exit 3 |
 | DONE.md **deleted** mid-session | stop bounced — the baseline remembers it existed |
 | DONE.md **broken** mid-session (no longer parses) | stop bounced — its hash no longer matches the baseline |
+| a `guards.protect` file changed/deleted/shadowed (package.json, lint config) | `no_protected_edits` → exit 3 |
+| a subagent tampering inside a fan-out | `SubagentStop` guard scan bounces it at its own boundary |
 
 Exit 3 — *"checks pass but the bar was lowered to get there"* — is
 deliberately a distinct, alarmable signal.
@@ -35,12 +37,16 @@ that rather than guess it:
   count stays flat), updating snapshots to bake a bug in, or writing both the
   bug and the test that blesses it. Catching these means judging *meaning*; a
   diff and a regex can't, and donegate refuses to guess.
-- **Command indirection.** DONE.md is hash-guarded, but `run: npm test`
-  resolves through package.json — and package.json, jest.config, pytest.ini,
-  tsconfig, and the linter config are all agent-editable. `"test": "exit 0"`
-  is green. Mitigations: prefer direct invocations in your checks
-  (`npx vitest run` rather than `npm test`) and treat tooling-config diffs as
-  review-worthy, because they are.
+- **Command indirection — unless you pin it.** DONE.md is hash-guarded, but
+  `run: npm test` resolves through package.json — and package.json,
+  jest.config, pytest.ini, tsconfig, and the linter config are all
+  agent-editable. `"test": "exit 0"` is green. The fix is `guards.protect`:
+  list those files and they're hashed into the baseline; any change, deletion,
+  or new shadowing config trips `no_protected_edits`. What protect *can't* do
+  is know which files you forgot to list — prefer direct invocations in your
+  checks (`npx vitest run` rather than `npm test`), protect the configs those
+  tools read, and treat tooling-config diffs as review-worthy, because they
+  are.
 - **Attacks on donegate's own state.** An agent with shell access can run
   `donegate baseline` to bless its own donefile edit, hand-edit
   `.donegate/state.json` to burn the bounce budget, or patch