feat: adopt harness engineering practices for agent-first development

## Overview

Adopt key practices from OpenAI's [harness engineering](https://openai.com/index/harness-engineering/) approach to make the repo more agent-friendly. The codebase already has solid docs (`docs/design.md`, `docs/plans/`) and comprehensive tests — but lacks the glue that lets agents (Codex, Claude Code, etc.) self-orient and self-validate.

## Tasks

### 1. Add `CLAUDE.md` as the sole agent instruction file (~100 lines)

The core insight from the article: treat the agent instruction file as a **map, not an encyclopedia**. A short root-level file that points to deeper sources of truth.

> **Note:** No `AGENTS.md` exists in this repo. `CLAUDE.md` is the only agent instruction file — it serves as both the navigational map and the authoritative quick-reference. Keep it tool-agnostic (useful for Codex, Claude Code, Cursor, etc.) and link out to `docs/` for anything longer than a paragraph to avoid frequent churn.

Should include:
- Package layout and key interfaces (`LLMProvider`, `Tool[S]`, `ChatInterface`)
- Dependency layering rule (see task 3 for the full matrix)
- How to run tests: `go test -race ./...`, `cd debug/frontend && npm ci && npx tsc --noEmit`
- Go version: **must match `go.mod` (currently 1.25.6)**
- Key invariants (Message.ImageData is raw bytes, providers are stateless, etc.)
- Pointers to `docs/design.md` and `docs/plans/`
- Feedback loop protocol (see task 5)

### 2. Add CI pipeline (`.github/workflows/ci.yml`)

Agents need a fast feedback loop.

```yaml
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with: { go-version-file: go.mod }
      - run: go vet ./...
      - run: go test -race ./...
```

**CI Go version must come from `go.mod`** (currently 1.25.6) — use `go-version-file: go.mod` rather than hardcoding a version string to avoid drift.

**TypeScript check**: Keep out of CI for now — `npx tsc --noEmit` requires `npm ci` and a Node setup step, adding complexity for a small frontend. Document it in `CLAUDE.md` pre-submit checks instead. Can be added to CI later if the frontend grows.

### 3. Add `TestDependencyLayers` structural test

Mechanically enforce the dependency layering invariant (the article's "enforce invariants, not implementations" pattern).

**Explicit allowed/forbidden import matrix:**

| Source file(s) | May import from root package? | May import concrete providers? | Notes |
|---|---|---|---|
| `provider.go` | No (defines core types) | No | Zero intra-package deps |
| `tool.go`, `chat.go` | Only `provider.go` types | No | |
| `agent.go` (engine) | `provider.go`, `tool.go`, `chat.go` types | **No** (`anthropic.go`, `gemini.go`, `openai.go`) | Engine must stay provider-agnostic |
| `anthropic.go`, `gemini.go`, `openai.go` | Core types only | No cross-provider imports | Each provider is self-contained |
| `debug/` | May import root package | No | |
| Root package | **Must not** import `debug/` | — | |

Since this is a single flat package (not sub-packages), the "imports" here means **function/type references**, not Go import paths. The test should use `go/ast` to parse files and verify that, e.g., `agent.go` never references `AnthropicProvider`, `GeminiProvider`, or `OpenAICompatibleProvider`.

This test should run as part of `go test ./...` (no build tags) so CI catches violations automatically.

### 4. Add pre-submit checks to `CLAUDE.md`

A machine-readable checklist in `CLAUDE.md` (not a separate file) that any agent or human runs before committing:

```
## Pre-submit checks
1. go vet ./...
2. go test -race ./...
3. cd debug/frontend && npx tsc --noEmit  (requires: npm ci)
```

Optionally wire up steps 1-2 as a git pre-commit hook later.

### 5. Add entropy management: doc-drift detection

The article describes **agents that run periodically to find inconsistencies** in documentation and constraint violations. Lightweight version for this repo:

Add a `TestDocDrift` test (runs in `go test ./...`) that validates `CLAUDE.md` and `docs/design.md` stay in sync with reality:

- Every public interface mentioned in `CLAUDE.md` actually exists in code
- Every provider listed in the dependency matrix has a corresponding `*_test.go` file
- The Go version stated in `CLAUDE.md` matches `go.mod`

This catches the slow rot where docs describe a codebase that no longer exists. Keeping it as a Go test means CI enforces it automatically — no scheduled jobs or extra infrastructure.

### 6. Add "agent struggles = missing context" feedback protocol

The article's central mental model: **when an agent produces bad output, treat it as a signal that something is missing** (docs, guardrails, tools) and feed it back into the repo — don't just fix the output.

Add a section to `CLAUDE.md`:

```markdown
## Feeding back agent failures

When an agent (or a human following this guide) makes a repeated mistake:
1. Identify what was missing — unclear invariant? undocumented convention? missing test?
2. Fix the root cause in this file, docs/, or tests — not just the generated code.
3. If a new invariant emerges, add it to TestDependencyLayers or TestDocDrift.

The goal: every class of mistake only happens once.
```

This is a process practice, not code — but encoding it in the instruction file means agents internalize it too.

## Future directions (not in scope)

- **Dynamic context providers** (observability data, runtime state) — the article emphasizes these but they're relevant at larger scale
- **Full development loop encoding** (PR templates, review checklists, automated feedback/recovery) — worth revisiting once the basics are in place
- Custom linter framework beyond `go vet` (repo is small, not worth the config overhead)
- ArchUnit-style dependency (the structural test covers it without external deps)
- Separate `AGENTS.md` / `CHECKS.md` files (everything lives in `CLAUDE.md` to avoid split guidance)

## References

- [Harness engineering: leveraging Codex in an agent-first world | OpenAI](https://openai.com/index/harness-engineering/)
- [Custom instructions with AGENTS.md | OpenAI Developers](https://developers.openai.com/codex/guides/agents-md/)
- [Harness Engineering | Martin Fowler](https://martinfowler.com/articles/exploring-gen-ai/harness-engineering.html)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adopt harness engineering practices for agent-first development #3

Overview

Tasks

1. Add `CLAUDE.md` as the sole agent instruction file (~100 lines)

2. Add CI pipeline (`.github/workflows/ci.yml`)

3. Add `TestDependencyLayers` structural test

4. Add pre-submit checks to `CLAUDE.md`

5. Add entropy management: doc-drift detection

6. Add "agent struggles = missing context" feedback protocol

Future directions (not in scope)

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Source file(s)	May import from root package?	May import concrete providers?	Notes
`provider.go`	No (defines core types)	No	Zero intra-package deps
`tool.go`, `chat.go`	Only `provider.go` types	No
`agent.go` (engine)	`provider.go`, `tool.go`, `chat.go` types	No (`anthropic.go`, `gemini.go`, `openai.go`)	Engine must stay provider-agnostic
`anthropic.go`, `gemini.go`, `openai.go`	Core types only	No cross-provider imports	Each provider is self-contained
`debug/`	May import root package	No
Root package	Must not import `debug/`	—

feat: adopt harness engineering practices for agent-first development #3

Description

Overview

Tasks

1. Add CLAUDE.md as the sole agent instruction file (~100 lines)

2. Add CI pipeline (.github/workflows/ci.yml)

3. Add TestDependencyLayers structural test

4. Add pre-submit checks to CLAUDE.md

5. Add entropy management: doc-drift detection

6. Add "agent struggles = missing context" feedback protocol

Future directions (not in scope)

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. Add `CLAUDE.md` as the sole agent instruction file (~100 lines)

2. Add CI pipeline (`.github/workflows/ci.yml`)

3. Add `TestDependencyLayers` structural test

4. Add pre-submit checks to `CLAUDE.md`