Your agent says it's done. DONE.md decides.
One file. One command. Works with Claude Code, Codex, Cursor, CI, and anything that runs in a shell.
Every coding agent session ends the same way:
“All done! I've implemented the feature and all tests are passing ✅”
Sometimes that's true. Often it isn't — teams keep finding a third of the work
missing behind green checkmarks, tests that pass because the agent wrote both
the bug and the test, and suites that got quietly .skip'd into submission.
The transcript says done. The repo disagrees.
donegate is the bouncer at the door. It turns your repo's definition of
done into a file — DONE.md — and physically stops your agent from finishing
until that file is satisfied. Not satisfied? The agent gets bounced back with
the failing output and keeps working. Tried to cheat? The guards catch it,
with file:line receipts.
No model calls. No config sprawl. No SaaS. Exit codes.
npx donegate init # writes DONE.md from your stack (npm/pnpm/uv/go/cargo/…)
npx donegate install # gates every coding agent you use, project-wideThat's the whole setup. From now on, when Claude Code / Codex / Cursor tries to
say "done", donegate runs your checks first. Commit DONE.md and
.claude/settings.json (etc.) and everyone on the team gets the same gate
for free. (donegate status shows the whole posture at a glance — donefile,
baseline, hooks, last receipt.)
Want to see it catch a cheat before touching your own repo? Run the sandboxed demo:
curl -fsSL https://raw.githubusercontent.com/intrepideai/donegate/main/examples/demo.sh | bashAct 1 — the agent finishes, the gate disagrees. The agent tries to stop; donegate runs the checks and bounces it back with evidence:
$ donegate check
✓ lint (104ms)
✗ tests (153ms)
$ npm test exit 1
│ # tests 2
│ # pass 1
│ # fail 1
✗ NOT DONE — 1 of 2 checks failed (377ms)
The stop is blocked. The failing output goes straight back into the agent's context: fix it, then finish.
Act 2 — the agent gets "creative". It skips the failing test instead of fixing the bug. Tests are green now. Watch:
$ donegate check
✓ lint (105ms)
✓ tests (138ms) ← green!
guards
✗ no_deleted_tests
test/discount.test.js — test count dropped from 2 to 1 since the baseline
✗ no_new_skips
test/discount.test.js:9 — test marked as skipped
test.skip('rejects discounts over 100%', () => {
✗ NOT DONE — 2 guards tripped (314ms)
Exit code 3: checks pass, but the bar was lowered to get there. The agent gets told exactly what it did and that it didn't work.
Act 3 — the agent actually fixes the bug.
$ donegate check
✓ lint (104ms)
✓ tests (139ms)
✓ DONE — 2 checks passed, guards clean (320ms)
receipt: .donegate/receipts/latest.json
Now it's allowed to say done — and there's a receipt to prove it.
README.md is for humans. AGENTS.md is for agents. DONE.md is for the
moment they claim to be finished.
# Definition of Done
**Done here means:** types compile, every test passes from a real run,
and nothing was skipped, deleted, or suppressed to get there.
```yaml
version: 1
checks:
- name: typecheck
run: npx tsc --noEmit
- name: tests
run: npm test
timeout: 600
guards:
no_new_skips: true # .skip/.only/xfail/t.Skip added → fail
no_deleted_tests: true # test files/counts reduced → fail
no_disabled_lint: true # eslint-disable/noqa/@ts-ignore added → fail
no_done_edits: true # this file edited mid-session → fail
no_new_todos: warn
no_debug_artifacts: warn
gate:
max_bounces: 3 # re-prompts per session before giving up
```Prose for people, one yaml block for the gate. Checks are your own commands — donegate never invents or modifies them. Full schema: docs/spec.md · copy-paste starters: examples/ (node, python, go, rust, monorepo).
When a session starts, donegate snapshots a baseline: hash of every test file, test/skip counts per file, and the hash of DONE.md itself. When the agent tries to finish, it diffs reality against that baseline:
| guard | catches | default |
|---|---|---|
no_new_skips |
.skip .only xit @pytest.mark.skip xfail t.Skip() #[ignore] @Disabled — added in test files, or skip-counts rising |
fail |
no_deleted_tests |
deleted test files, per-file test counts dropping | fail |
no_disabled_lint |
eslint-disable biome-ignore @ts-ignore # noqa # type: ignore //nolint #[allow(...)] @SuppressWarnings rubocop:disable — added anywhere |
fail |
no_done_edits |
DONE.md modified or deleted mid-session | fail |
no_new_todos |
TODO / FIXME / HACK introduced in code |
warn |
no_debug_artifacts |
console.log debugger breakpoint() pdb.set_trace binding.pry dbg! left in non-test code |
warn |
Everything is deterministic — diffs and regexes against your git history and the session baseline. Same tree, same verdict, on any machine. No LLM judges anything.
Two escape hatches, both human-controlled: files that legitimately contain
these patterns (lint configs, pattern scanners) go in guards.exclude, and a
deliberate DONE.md edit is blessed with donegate baseline. Renames are
followed, so moving a test file is never "deleting" it.
| command | mechanism | |
|---|---|---|
| Claude Code | donegate install claude |
Stop hook — blocks the stop, feeds failures back |
| Codex CLI | donegate install codex |
Stop hook (.codex/hooks.json) |
| Cursor | donegate install cursor |
stop hook → followup_message |
| GitHub Actions | donegate install ci |
gates PRs, posts the receipt as a comment |
| anything else | donegate run -- <cmd> |
baseline → run your agent → gate |
Hooks are installed project-level by default (commit them — the whole team
is gated) or --global for every repo you touch, and they carry explicit
generous timeouts so a long test suite is never cut off by an agent's 60-second
hook default. Repos without a DONE.md are silently ignored, an agent can never
be trapped (see bounce protection in docs/hooks.md), and
ctrl-c always means ctrl-c.
Every run writes .donegate/receipts/latest.json: verdict, every command +
exit code + output tail, every guard finding with file:line, the baseline
used, diffstat, git state, sha-stamped. donegate receipt --md renders it for
PR comments — the CI install does this automatically:
check status time typecheck✅ pass 1.2s tests✅ pass 8.4s donegate v0.1.0 · baseline
c4bbac02e7(session) · 3 files, +21/−18 · receiptab19e3289c073e8c
When someone asks "did the agent actually run the tests?" — the answer is a file, not a feeling.
0 |
done — checks pass, guards clean |
1 |
checks failed |
2 |
config error |
3 |
checks pass but a guard tripped — it was made to look done |
Code 3 is the one to alert a human on.
Won't the agent just edit DONE.md? That trips no_done_edits. The
definition of done is not the agent's to edit.
Won't it delete the failing test? That trips no_deleted_tests — file
deletions and per-file test-count drops.
Does this replace CI? No — it runs before the agent declares victory,
while it still has context to fix things. CI stays as the backstop (and
donegate install ci makes CI speak DONE.md too).
Does it phone home / call a model? No network calls, no LLM, zero runtime dependencies. The entire tool is a few thousand lines of TypeScript you can read in one sitting.
More in docs/faq.md.
- Trust receipts, not transcripts. An agent's claim is not evidence. A command that exited 0 is.
- The bar is not negotiable mid-task. Humans set DONE.md; agents satisfy it.
- Deterministic or it doesn't ship. No model-graded gates. Same tree, same verdict.
- Fail open on infrastructure, fail closed on work. A broken config never traps an agent; a failing test never gets talked past.
Guard patterns for more languages, stack detection, and new agent adapters are all small, well-tested PRs — see CONTRIBUTING.md. The repo is gated by its own DONE.md, naturally.
Built by Intrepide — an AI product studio.
Every agent we ship runs behind guardrails, observability, and audit trails.
donegate is that rule, open-sourced: agents don't declare success — evidence does.
MIT licensed.