Overview
Adopt key practices from OpenAI's harness engineering approach to make the repo more agent-friendly. The codebase already has solid docs (docs/design.md, docs/plans/) and comprehensive tests — but lacks the glue that lets agents (Codex, Claude Code, etc.) self-orient and self-validate.
Tasks
1. Add CLAUDE.md as the sole agent instruction file (~100 lines)
The core insight from the article: treat the agent instruction file as a map, not an encyclopedia. A short root-level file that points to deeper sources of truth.
Note: No AGENTS.md exists in this repo. CLAUDE.md is the only agent instruction file — it serves as both the navigational map and the authoritative quick-reference. Keep it tool-agnostic (useful for Codex, Claude Code, Cursor, etc.) and link out to docs/ for anything longer than a paragraph to avoid frequent churn.
Should include:
- Package layout and key interfaces (
LLMProvider, Tool[S], ChatInterface)
- Dependency layering rule (see task 3 for the full matrix)
- How to run tests:
go test -race ./..., cd debug/frontend && npm ci && npx tsc --noEmit
- Go version: must match
go.mod (currently 1.25.6)
- Key invariants (Message.ImageData is raw bytes, providers are stateless, etc.)
- Pointers to
docs/design.md and docs/plans/
- Feedback loop protocol (see task 5)
2. Add CI pipeline (.github/workflows/ci.yml)
Agents need a fast feedback loop.
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with: { go-version-file: go.mod }
- run: go vet ./...
- run: go test -race ./...
CI Go version must come from go.mod (currently 1.25.6) — use go-version-file: go.mod rather than hardcoding a version string to avoid drift.
TypeScript check: Keep out of CI for now — npx tsc --noEmit requires npm ci and a Node setup step, adding complexity for a small frontend. Document it in CLAUDE.md pre-submit checks instead. Can be added to CI later if the frontend grows.
3. Add TestDependencyLayers structural test
Mechanically enforce the dependency layering invariant (the article's "enforce invariants, not implementations" pattern).
Explicit allowed/forbidden import matrix:
| Source file(s) |
May import from root package? |
May import concrete providers? |
Notes |
provider.go |
No (defines core types) |
No |
Zero intra-package deps |
tool.go, chat.go |
Only provider.go types |
No |
|
agent.go (engine) |
provider.go, tool.go, chat.go types |
No (anthropic.go, gemini.go, openai.go) |
Engine must stay provider-agnostic |
anthropic.go, gemini.go, openai.go |
Core types only |
No cross-provider imports |
Each provider is self-contained |
debug/ |
May import root package |
No |
|
| Root package |
Must not import debug/ |
— |
|
Since this is a single flat package (not sub-packages), the "imports" here means function/type references, not Go import paths. The test should use go/ast to parse files and verify that, e.g., agent.go never references AnthropicProvider, GeminiProvider, or OpenAICompatibleProvider.
This test should run as part of go test ./... (no build tags) so CI catches violations automatically.
4. Add pre-submit checks to CLAUDE.md
A machine-readable checklist in CLAUDE.md (not a separate file) that any agent or human runs before committing:
## Pre-submit checks
1. go vet ./...
2. go test -race ./...
3. cd debug/frontend && npx tsc --noEmit (requires: npm ci)
Optionally wire up steps 1-2 as a git pre-commit hook later.
5. Add entropy management: doc-drift detection
The article describes agents that run periodically to find inconsistencies in documentation and constraint violations. Lightweight version for this repo:
Add a TestDocDrift test (runs in go test ./...) that validates CLAUDE.md and docs/design.md stay in sync with reality:
- Every public interface mentioned in
CLAUDE.md actually exists in code
- Every provider listed in the dependency matrix has a corresponding
*_test.go file
- The Go version stated in
CLAUDE.md matches go.mod
This catches the slow rot where docs describe a codebase that no longer exists. Keeping it as a Go test means CI enforces it automatically — no scheduled jobs or extra infrastructure.
6. Add "agent struggles = missing context" feedback protocol
The article's central mental model: when an agent produces bad output, treat it as a signal that something is missing (docs, guardrails, tools) and feed it back into the repo — don't just fix the output.
Add a section to CLAUDE.md:
## Feeding back agent failures
When an agent (or a human following this guide) makes a repeated mistake:
1. Identify what was missing — unclear invariant? undocumented convention? missing test?
2. Fix the root cause in this file, docs/, or tests — not just the generated code.
3. If a new invariant emerges, add it to TestDependencyLayers or TestDocDrift.
The goal: every class of mistake only happens once.
This is a process practice, not code — but encoding it in the instruction file means agents internalize it too.
Future directions (not in scope)
- Dynamic context providers (observability data, runtime state) — the article emphasizes these but they're relevant at larger scale
- Full development loop encoding (PR templates, review checklists, automated feedback/recovery) — worth revisiting once the basics are in place
- Custom linter framework beyond
go vet (repo is small, not worth the config overhead)
- ArchUnit-style dependency (the structural test covers it without external deps)
- Separate
AGENTS.md / CHECKS.md files (everything lives in CLAUDE.md to avoid split guidance)
References
Overview
Adopt key practices from OpenAI's harness engineering approach to make the repo more agent-friendly. The codebase already has solid docs (
docs/design.md,docs/plans/) and comprehensive tests — but lacks the glue that lets agents (Codex, Claude Code, etc.) self-orient and self-validate.Tasks
1. Add
CLAUDE.mdas the sole agent instruction file (~100 lines)The core insight from the article: treat the agent instruction file as a map, not an encyclopedia. A short root-level file that points to deeper sources of truth.
Should include:
LLMProvider,Tool[S],ChatInterface)go test -race ./...,cd debug/frontend && npm ci && npx tsc --noEmitgo.mod(currently 1.25.6)docs/design.mdanddocs/plans/2. Add CI pipeline (
.github/workflows/ci.yml)Agents need a fast feedback loop.
CI Go version must come from
go.mod(currently 1.25.6) — usego-version-file: go.modrather than hardcoding a version string to avoid drift.TypeScript check: Keep out of CI for now —
npx tsc --noEmitrequiresnpm ciand a Node setup step, adding complexity for a small frontend. Document it inCLAUDE.mdpre-submit checks instead. Can be added to CI later if the frontend grows.3. Add
TestDependencyLayersstructural testMechanically enforce the dependency layering invariant (the article's "enforce invariants, not implementations" pattern).
Explicit allowed/forbidden import matrix:
provider.gotool.go,chat.goprovider.gotypesagent.go(engine)provider.go,tool.go,chat.gotypesanthropic.go,gemini.go,openai.go)anthropic.go,gemini.go,openai.godebug/debug/Since this is a single flat package (not sub-packages), the "imports" here means function/type references, not Go import paths. The test should use
go/astto parse files and verify that, e.g.,agent.gonever referencesAnthropicProvider,GeminiProvider, orOpenAICompatibleProvider.This test should run as part of
go test ./...(no build tags) so CI catches violations automatically.4. Add pre-submit checks to
CLAUDE.mdA machine-readable checklist in
CLAUDE.md(not a separate file) that any agent or human runs before committing:Optionally wire up steps 1-2 as a git pre-commit hook later.
5. Add entropy management: doc-drift detection
The article describes agents that run periodically to find inconsistencies in documentation and constraint violations. Lightweight version for this repo:
Add a
TestDocDrifttest (runs ingo test ./...) that validatesCLAUDE.mdanddocs/design.mdstay in sync with reality:CLAUDE.mdactually exists in code*_test.gofileCLAUDE.mdmatchesgo.modThis catches the slow rot where docs describe a codebase that no longer exists. Keeping it as a Go test means CI enforces it automatically — no scheduled jobs or extra infrastructure.
6. Add "agent struggles = missing context" feedback protocol
The article's central mental model: when an agent produces bad output, treat it as a signal that something is missing (docs, guardrails, tools) and feed it back into the repo — don't just fix the output.
Add a section to
CLAUDE.md:This is a process practice, not code — but encoding it in the instruction file means agents internalize it too.
Future directions (not in scope)
go vet(repo is small, not worth the config overhead)AGENTS.md/CHECKS.mdfiles (everything lives inCLAUDE.mdto avoid split guidance)References