task-proof

Independent verification framework for AI coding agents — closes the self-certification gap by spawning a fresh LLM session that has never seen the build context.

The problem

The agent that wrote the code physically cannot judge it cleanly: it has the same context, the same assumptions, and the same blind spots it had while building. Self-certification bias is real, and on tasks with three or more acceptance criteria it shows up reliably as "everything looks good" reports for work that quietly missed AC3.

task-proof closes that gap with two pieces working together:

fresh-verify guard — every git push and git commit is held for review by an LLM that sees only your last task description and the diff. No build context, no anchoring, no rubber-stamping.
task-proof skill — a 7-step proof loop (spec freeze → build → evidence pack → fresh verify → fix loop) so non-trivial tasks have a structured verification trail, not just a builder's word for it.

A proof-recommend nudge fires once per session to suggest the skill when the user's request smells complex (5+ words, multi-step language).

Quickstart

Claude Code

bash <(curl -sSL https://raw.githubusercontent.com/uplift-labs/task-proof/main/remote-install.sh) --with-claude-code

That installs the core under .uplift/task-proof/, registers hooks in .claude/settings.json (idempotently — existing hooks are kept), and drops the task-proof skill into .claude/skills/.

Commit .uplift/task-proof/, .claude/settings.json, and .claude/skills/task-proof/ so the proof loop is available in worktrees and to your collaborators.

Codex

bash <(curl -sSL https://raw.githubusercontent.com/uplift-labs/task-proof/main/remote-install.sh) --with-codex

That installs the core under .uplift/task-proof/, enables Codex hooks in .codex/config.toml, registers lifecycle hooks in .codex/hooks.json, and installs the repo-scoped skill under .agents/skills/task-proof/.

Commit .uplift/task-proof/, .codex/, and .agents/skills/task-proof/. Codex loads project .codex/ config only for trusted projects.

Both hosts

bash <(curl -sSL https://raw.githubusercontent.com/uplift-labs/task-proof/main/remote-install.sh) --with-claude-code --with-codex

Core only

bash <(curl -sSL https://raw.githubusercontent.com/uplift-labs/task-proof/main/remote-install.sh)

Installs only the core multiplexer and guards under .uplift/task-proof/core/. Wire them into your own host (pre-commit hook, GitHub Actions, etc.) following the public CLI contract.

How it works

Claude Code / Codex hook event
        │
        ▼
.uplift/task-proof/adapter/hooks/<host-hook>.sh    ← host adapter (JSON ↔ tags)
        │
        ▼
.uplift/task-proof/core/cmd/task-proof-run.sh pre-commit  ← multiplexer
        │
        ▼
.uplift/task-proof/core/guards/fresh-verify.sh      ← guard (BLOCK/ASK/empty)
        │
        ▼
.uplift/task-proof/core/lib/llm-client.sh           ← backend abstraction
        │
        ▼
TASK_PROOF_LLM_CMD → Codex-session codex exec → claude -p → codex exec → ANTHROPIC_API_KEY

Two layers, on purpose:

core/ is host-agnostic and speaks plain text tags (BLOCK: / ASK: / WARN: / empty). Runs anywhere.
adapters/<host>/ translates those tags to the host's hook protocol. Today: Claude Code and Codex. Future: OpenCode, GitHub Actions, pre-commit, anything with a hook surface.

Runtime proof artifacts are separate from installed code and live under .task-proof/runs/<TASK_ID>/.

Configuration

Everything is environment variables — see CONTRACT.md for the full list. The most useful ones:

Variable	Purpose
`TASK_PROOF_DISABLED=1`	Kill switch for the whole product
`TASK_PROOF_DISABLE_FRESH_VERIFY=1`	Disable just the verifier
`TASK_PROOF_LLM_CMD=...`	Plug in any LLM (ollama, vLLM, openai CLI, mock)
`TASK_PROOF_LLM_BACKEND=codex/claude/anthropic`	Force one built-in backend
`TASK_PROOF_CODEX_ASK_BEHAVIOR=warn`	Let Codex degrade `ASK:` to a warning instead of a block
`ANTHROPIC_API_KEY=...`	Fallback when `claude` CLI is not available
`CI=true`	Skip everything in CI environments

The skill

Once installed, invoke task-proof / $task-proof with a task description. The skill walks the seven steps in the host-specific skill file under adapters/<host>/skills/task-proof/SKILL.md:

Spec freeze — write acceptance criteria, get user confirmation, then treat the spec as immutable.
Build — implement normally with all your other guards active.
Evidence pack — record actual proof for each criterion in .task-proof/runs/<TASK_ID>/evidence.json.
Fresh verify — independent LLM checks spec vs evidence and writes a verdict.
Fix loop — up to 3 iterations on failures; escalate after.
Complete — report final verdict and commit only when appropriate for the host workflow.
Self-improve — reflect on what could have caught the issue sooner.

Layout

task-proof/
├── core/
│   ├── cmd/task-proof-run.sh        ← public CLI entry point
│   ├── guards/{fresh-verify,proof-recommend}.sh
│   └── lib/{json-merge.py,json-field.sh,llm-client.sh}
├── adapters/
│   ├── claude-code/
│   │   ├── settings-hooks.json
│   │   ├── hooks/{pre-bash,user-prompt-submit}.sh
│   │   └── skills/task-proof/SKILL.md
│   └── codex/
│       ├── hooks.json
│       ├── hooks/{codex-pre-tool-use,codex-user-prompt-submit}.sh
│       └── skills/task-proof/SKILL.md
├── templates/{spec.md.tmpl,gitignore.snippet}
├── tests/{run.sh,fixtures/...}
├── install.sh
├── remote-install.sh
└── CONTRACT.md

Tests

bash tests/run.sh

Sets up a throwaway git repo, mocks the LLM backend with TASK_PROOF_LLM_CMD, runs every fixture under tests/fixtures/, and checks the Codex adapter, Codex LLM backend selection, and Codex installer idempotency. True-positive fixtures (tp-*.json) must produce non-empty output; true-negative fixtures (tn-*.json) must stay silent.

Uninstall

python3 .uplift/task-proof/core/lib/json-merge.py .claude/settings.json /dev/null --uninstall
python3 .uplift/task-proof/core/lib/json-merge.py .codex/hooks.json /dev/null --uninstall
rm -rf .uplift/task-proof .claude/skills/task-proof .agents/skills/task-proof

The --uninstall flag removes only hooks whose command contains the /task-proof/adapter/hooks/ marker. Other products' hooks are left untouched. If no other Codex hooks use it, you can also remove codex_hooks = true from .codex/config.toml.

License

MIT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

task-proof

The problem

Quickstart

Claude Code

Codex

Both hosts

Core only

How it works

Configuration

The skill

Layout

Tests

Uninstall

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.agents/skills/task-proof		.agents/skills/task-proof
.claude		.claude
.codex		.codex
.uplift/task-proof		.uplift/task-proof
adapters		adapters
core		core
templates		templates
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.shellcheckrc		.shellcheckrc
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRACT.md		CONTRACT.md
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
remote-install.sh		remote-install.sh

Folders and files

Latest commit

History

Repository files navigation

task-proof

The problem

Quickstart

Claude Code

Codex

Both hosts

Core only

How it works

Configuration

The skill

Layout

Tests

Uninstall

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages