epic: agent-reviewed contracts — LLM validation gates with rework feedback

## Vision

Transform Wave's contract system from mechanical format checks into real quality gates where a **separate agent** reviews work products, provides structured feedback, and drives rework loops. This closes the gap between "output matches schema" and "output is actually correct."

Today a pipeline step can pass all contracts (JSON schema valid, tests green) while producing a no-op PR, wrong implementation, or missing the issue entirely. The contract system validates **shape**, not **substance**.

### Target state

```yaml
handover:
  contract:
    # Mechanical checks run first (fast, cheap)
    - type: test_suite
      command: "{{ project.contract_test_command }}"
      on_failure: retry

    # Agent review runs second (slower, costs tokens, but catches quality issues)
    - type: agent_review
      reviewer: navigator
      model: claude-haiku
      criteria_path: .wave/contracts/impl-review-criteria.md
      context:
        - artifact: assessment
        - artifact: impl-plan
        - source: git_diff
      on_failure: rework
      rework_step: fix-implement
```

The reviewer agent sees the git diff, upstream context (issue, plan), and review criteria. It outputs a structured verdict. If rework is needed, the feedback is injected as an artifact into the rework step — the implementer knows exactly what to fix.

---

## Principles

1. **Separation of concerns** — the agent that did the work never reviews its own work
2. **Cheap first, expensive second** — mechanical checks (schema, tests) run before agent review to avoid wasting tokens on obviously broken output
3. **Feedback flows forward** — review feedback is a first-class artifact, not a log message
4. **Invisible to simple pipelines** — existing `json_schema` / `test_suite` contracts work unchanged. `agent_review` is opt-in per step
5. **Cost-bounded** — agent reviews have token budgets and model pinning (haiku by default)

---

## Child Issues

### 1. Expand ContractResult with structured feedback

**Scope**: `internal/contract/`

Extend `ContractResult` to carry rich review output alongside the existing pass/fail:

```go
type ContractResult struct {
    Pass     bool
    Error    string
    Feedback *ReviewFeedback  // nil for mechanical contracts
}

type ReviewFeedback struct {
    Verdict     string   `json:"verdict"`      // "pass", "rework", "fail"  
    Issues      []Issue  `json:"issues"`       // specific problems
    Suggestions []string `json:"suggestions"`  // improvement ideas
    Confidence  float64  `json:"confidence"`   // 0.0-1.0
}

type Issue struct {
    Severity string `json:"severity"` // "critical", "major", "minor"
    File     string `json:"file,omitempty"`
    Detail   string `json:"detail"`
}
```

Backward-compatible — all existing validators return `Feedback: nil`. The executor checks `Feedback` only when non-nil.

**Acceptance criteria:**
- [ ] `ContractResult` has optional `Feedback` field
- [ ] Existing contract types unaffected (return nil feedback)
- [ ] `ReviewFeedback` has JSON tags for serialization
- [ ] Executor logs feedback when present
- [ ] All existing tests pass unchanged

---

### 2. Implement `agent_review` contract validator

**Scope**: `internal/contract/agent_review.go` (new file)

New contract type that spawns a lightweight agent to review step output.

**Input to the reviewer:**
- Git diff of the step's worktree changes
- Step output artifacts
- Upstream artifacts specified in `context` config
- Review criteria from `criteria_path`
- A system prompt enforcing structured JSON output

**Output:** `ReviewFeedback` JSON parsed from the agent's response.

**Configuration:**
```yaml
type: agent_review
reviewer: navigator           # persona name — MUST differ from step persona
model: claude-haiku           # model override (cheap by default)
criteria_path: .wave/contracts/review.md  # review prompt
context:                      # what the reviewer sees
  - artifact: assessment      # from named prior step artifact
  - artifact: impl-plan
  - source: git_diff          # automatic: worktree diff
max_tokens: 8192              # budget cap for the review
timeout: 120                  # seconds
```

**Key constraint:** The reviewer persona must be different from the step's persona. Validator should enforce this at parse time.

**Acceptance criteria:**
- [ ] `agent_review` registered as a contract type
- [ ] Reviewer sees git diff + configured context artifacts
- [ ] Structured `ReviewFeedback` parsed from agent output
- [ ] Token budget enforced (`max_tokens`)
- [ ] Timeout enforced
- [ ] Error if reviewer persona == step persona
- [ ] Falls back to `ContractResult{Pass: true}` if agent is unavailable (fail-open configurable)

---

### 3. Wire adapter runner into contract validation

**Scope**: `internal/pipeline/executor.go`, `internal/contract/`

Currently `validateContract()` is purely mechanical — no adapter access. The `agent_review` type needs to spawn a subprocess.

**Change:** Pass the `AdapterRunner` into the contract validation path as an optional dependency. Existing types ignore it.

```go
type ValidatorContext struct {
    Runner       adapter.AdapterRunner  // for agent_review
    WorkspaceDir string                 // for git_diff
    ArtifactDir  string                 // for context injection
    Manifest     *manifest.Manifest     // for persona resolution
}
```

**Acceptance criteria:**
- [ ] Contract validators receive `ValidatorContext` with optional adapter
- [ ] Existing validators (`json_schema`, `test_suite`, etc.) unchanged
- [ ] `agent_review` validator uses the adapter to spawn review agent
- [ ] Review agent runs in the step's worktree (read-only)
- [ ] Review agent tokens tracked in cost ledger

---

### 4. Feed review feedback into rework steps

**Scope**: `internal/pipeline/executor.go` — rework step creation path

When `on_failure: rework` triggers after an `agent_review` failure, the rework step should receive the full `ReviewFeedback` as an injected artifact, not just an error string.

**Change:** In the rework step creation path, if `ContractResult.Feedback` is non-nil:
1. Write `ReviewFeedback` to `.wave/artifacts/review-feedback.json`
2. Inject it into the rework step's artifact context
3. The rework step's prompt can reference specific issues

**Rework step sees:**
```
The implementation was reviewed by navigator and needs rework.

Review verdict: rework
Issues:
  - [critical] Missing error handling in handlers_compare.go:45
  - [major] Tests don't cover the empty-state path
Suggestions:
  - Add a table-driven test for the comparison edge cases

Full review: .wave/artifacts/review-feedback.json
```

**Acceptance criteria:**
- [ ] Review feedback written to `.wave/artifacts/review-feedback.json` on rework
- [ ] Rework step receives feedback as injected artifact
- [ ] Rework prompt template includes review issues
- [ ] Feedback artifact cleaned up after successful rework

---

### 5. Add `git_diff` as automatic context source

**Scope**: `internal/workspace/`, `internal/contract/`

The reviewer needs to see what the step changed. Add a workspace method to produce the git diff, and wire it as an automatic context source for `agent_review`.

```go
// In workspace package
func (w *Workspace) Diff() (string, error) {
    // git diff HEAD in the worktree
}
```

When `context` includes `source: git_diff`, the contract validator calls `workspace.Diff()` and includes the output in the reviewer's context.

**Acceptance criteria:**
- [ ] `Workspace.Diff()` returns the uncommitted diff in the worktree
- [ ] Handles clean worktrees (returns empty diff, not error)
- [ ] Diff truncated at configurable limit (default 50KB) to avoid blowing context
- [ ] Available as `source: git_diff` in `agent_review` context config

---

### 6. Contract composition — run multiple contracts in sequence

**Scope**: `internal/contract/`, `internal/pipeline/executor.go`

Today a step has one contract. For agent review to work properly, steps need **multiple contracts** that run in order: mechanical first, agent review second.

```yaml
handover:
  contracts:  # plural — ordered list
    - type: test_suite
      command: "{{ project.contract_test_command }}"
      on_failure: retry
    - type: agent_review
      reviewer: navigator
      on_failure: rework
```

The executor runs contracts sequentially. If an early contract fails, later ones are skipped. Each contract can have its own `on_failure` policy.

**Acceptance criteria:**
- [ ] `handover.contracts` (plural) accepted alongside existing `handover.contract` (singular)
- [ ] Contracts run in definition order
- [ ] Early failure skips remaining contracts
- [ ] Each contract has independent `on_failure` policy
- [ ] Singular `contract` still works (backward-compatible)

---

### 7. Upgrade Wave's own pipelines with agent review

**Scope**: `.wave/pipelines/`, `.wave/contracts/`

After the infrastructure is in place, upgrade Wave's own pipelines to use `agent_review`:

**Priority pipelines:**
- `impl-issue` — review the implement step's output (diff + tests)
- `impl-speckit` — review at implement + create-pr steps
- `ops-pr-review` — review the review output itself (meta-review)

**Review criteria files to create:**
- `.wave/contracts/impl-review-criteria.md` — does the diff match the plan? tests adequate? no leaked files?
- `.wave/contracts/pr-review-criteria.md` — is the PR description accurate? changes scoped correctly?

**Acceptance criteria:**
- [ ] `impl-issue` implement step uses `agent_review` with navigator
- [ ] `impl-speckit` implement step uses `agent_review`
- [ ] Review criteria files created and tested
- [ ] At least 3 successful pipeline runs with agent review active
- [ ] False-positive rate < 20% (review doesn't block correct implementations)

---

### 8. Observability — review verdicts in dashboard and retros

**Scope**: `internal/webui/`, `internal/retro/`

Make agent reviews visible in the dashboard and retrospectives:

- **Run detail page**: Show review verdict per step (pass/rework/fail with expandable issues)
- **Retros**: Track `review_rework` as a friction point type (distinct from `retry` and `contract_failure`)
- **Analytics**: Review pass rate per pipeline, average review tokens, rework-after-review rate

**Acceptance criteria:**
- [ ] Run detail shows review verdicts inline with step cards
- [ ] Retro friction points include `review_rework` type
- [ ] Analytics tracks review token spend

---

## Implementation Order

```
1. ContractResult expansion          (no dependencies, safe)
2. git_diff context source           (no dependencies, safe)
3. Contract composition              (depends on 1)
4. Wire adapter into validation      (depends on 1)
5. agent_review validator            (depends on 2, 3, 4)
6. Rework feedback injection         (depends on 5)
7. Upgrade Wave's pipelines          (depends on 5, 6)
8. Dashboard + retro observability   (depends on 5)
```

Issues 1-4 can be parallelized. Issue 5 is the core. Issues 7-8 validate the system end-to-end.

---

## Cost Model

Agent reviews add token cost per step. With haiku at ~$0.25/MTok input:

| Scenario | Context size | Review cost | Per-pipeline overhead |
|----------|-------------|-------------|----------------------|
| Small diff (< 5KB) | ~3K tokens | ~$0.001 | ~$0.003 (3 steps) |
| Medium diff (5-20KB) | ~10K tokens | ~$0.003 | ~$0.009 |
| Large diff (20-50KB) | ~25K tokens | ~$0.006 | ~$0.018 |

At 100 pipeline runs/day with medium diffs: **~$0.90/day** additional cost. Negligible compared to the implementation steps themselves.

---

## Non-Goals

- **Replacing mechanical contracts** — `test_suite` and `json_schema` remain. Agent review supplements, doesn't replace
- **Blocking on review for every step** — agent review is opt-in per step, not global
- **Self-review** — the step persona reviewing its own output. This is architecturally prevented
- **Human-in-the-loop review** — that's the existing `gate` mechanism. Agent review is fully automated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

epic: agent-reviewed contracts — LLM validation gates with rework feedback #697

Vision

Target state

Principles

Child Issues

1. Expand ContractResult with structured feedback

2. Implement `agent_review` contract validator

3. Wire adapter runner into contract validation

4. Feed review feedback into rework steps

5. Add `git_diff` as automatic context source

6. Contract composition — run multiple contracts in sequence

7. Upgrade Wave's own pipelines with agent review

8. Observability — review verdicts in dashboard and retros

Implementation Order

Cost Model

Non-Goals

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scenario	Context size	Review cost	Per-pipeline overhead
Small diff (< 5KB)	~3K tokens	~$0.001	~$0.003 (3 steps)
Medium diff (5-20KB)	~10K tokens	~$0.003	~$0.009
Large diff (20-50KB)	~25K tokens	~$0.006	~$0.018

epic: agent-reviewed contracts — LLM validation gates with rework feedback #697

Description

Vision

Target state

Principles

Child Issues

1. Expand ContractResult with structured feedback

2. Implement agent_review contract validator

3. Wire adapter runner into contract validation

4. Feed review feedback into rework steps

5. Add git_diff as automatic context source

6. Contract composition — run multiple contracts in sequence

7. Upgrade Wave's own pipelines with agent review

8. Observability — review verdicts in dashboard and retros

Implementation Order

Cost Model

Non-Goals

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

2. Implement `agent_review` contract validator

5. Add `git_diff` as automatic context source