Skip to content

fix(hooks): deleting or breaking DONE.md mid-session no longer bypasses the gate#1

Open
alexandertomana wants to merge 1 commit into
mainfrom
fix/donefile-tamper-bypass
Open

fix(hooks): deleting or breaking DONE.md mid-session no longer bypasses the gate#1
alexandertomana wants to merge 1 commit into
mainfrom
fix/donefile-tamper-bypass

Conversation

@alexandertomana

Copy link
Copy Markdown
Contributor

The problem

The stop hook had two fail-open holes, each one command away from disarming the gate mid-session:

  1. rm DONE.md was a total bypass. The hook short-circuits with exit 0 when findDonefile() comes up empty — before any guard can run. The no_done_edits guard's documented "or deleted" finding was unreachable through the hook path, because the hook needs the donefile to load before guards exist. One rm and the agent stops freely.

  2. Corrupting DONE.md was also a bypass. loadConfig() throws → warn on stderr → allow. Reasonable for a human's config typo; exploitable as a one-edit off switch for the exact file the agent is incentivized to break.

Both contradict the README's stated guarantee (no_done_edits ... "DONE.md modified or deleted mid-session → fail") and the FAQ ("Won't the agent just edit DONE.md?").

The fix

The session baseline (.donegate/baseline.json) already records donefile_path + donefile_sha — it just wasn't consulted on these paths. Now:

  • Missing donefile + orphaned baseline → the hook walks up looking for a .donegate/baseline.json whose recorded donefile has vanished, and bounces the stop with restore instructions ({"decision":"block", ...} / followup_message).
  • Unparseable donefile whose hash no longer matches the baseline → same treatment, with the parse error in the report.

Verified end-to-end through the built CLI:

$ donegate baseline --quiet && rm DONE.md
$ echo '{"session_id":"s1", ...}' | donegate hook claude
{"decision":"block","reason":"donegate: NOT DONE — DONE.md was deleted mid-session (attempt 1/3). ..."}

What deliberately did NOT change

  • No-trap guarantees hold. Both new paths share the per-session bounce budget (the default 3 — the donefile that would configure gate.max_bounces is exactly what's missing/unreadable), then give up loudly. Cursor aborted/ctrl-c turns are never gated. Repos that never opted in (no donefile, no baseline) remain silent no-ops.
  • Pre-existing breakage still fails open. A donefile that was already broken when the session started (no baseline, or hash unchanged) warns and allows, as before — a config typo must never trap an agent that didn't cause it.
  • The give-up paths tell humans the legitimate off switch: delete .donegate/ too when removing donegate for real.

Also in this PR

  • docs/threat-model.md — an honest map: what the gate catches outright, what it deliberately can't (semantic cheats like weakened assertions, command indirection through package.json, attacks on donegate's own state from inside the sandbox), and why donegate install ci + branch protection is the actual security boundary.
  • README: guards framed as a "ratchet, not a sandbox" with a pointer to the threat model; FAQ updated to cover delete/corrupt; hooks docs explain the new behavior.
  • Refactor: the three blocking paths share one bounceOrGiveUp() helper instead of triplicating the protocol JSON + bounce-state logic.

Tests

Three new cases in test/hooks.test.ts (79 total, all passing): deletion bounces ×3 then gives up then recovers on restore; corruption bounces then recovers on repair; cursor aborted turns stay ungated even with the donefile gone. node dist/cli.js check (the repo's own gate) is clean, all six guards green.

🤖 Generated with Claude Code

…es the gate

The stop hook had two fail-open holes, each one command away from
disarming the gate:

- rm DONE.md → findDonefile() comes up empty → silent no-op allow. The
  no_done_edits guard's "deleted" finding was unreachable through the
  hook: it needs the donefile to load before any guard can run.
- a DONE.md that no longer parses → loadConfig() throws → warn and allow,
  even when the breakage happened mid-session.

Both paths now bounce the stop when the session baseline proves the
donefile existed (and what it hashed to) at session start:

- missing donefile + orphaned .donegate/baseline.json → block with
  restore instructions
- unparseable donefile whose hash no longer matches the baseline → block
  with the parse error in the report

The no-trap guarantees hold: both paths share the per-session bounce
budget (the default 3, since the donefile that would configure
gate.max_bounces is exactly what's missing or unreadable), cursor
ctrl-c/aborted turns are still never gated, repos that never opted in
are still silent no-ops, and a donefile that was already broken before
the session started still fails open.

Also: docs/threat-model.md — an honest map of what the gate catches,
what it deliberately doesn't (semantic cheats, command indirection,
attacks on donegate's own state), and why CI + branch protection is the
actual boundary.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant