alanshurafa · alanshurafa · Jun 13, 2026 · Jun 12, 2026 · Jun 12, 2026 · Jun 12, 2026
diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md
@@ -10,7 +10,19 @@ Co-Evolution is a tooling repo for structured iterative refinement between AI ag
 - [x] **v1.1 Polish & Ergonomics** (shipped 2026-04-17) — v1.0 code review fixes (WR-01/02/03) + runtime ergonomics (REVISE auto-loop, visible live mode, branch/worktree management). 4 phases, 6 requirements closed. PR [#2](https://github.com/alanshurafa/co-evolution/pull/2) · See [`milestones/v1.1-ROADMAP.md`](milestones/v1.1-ROADMAP.md) · [`milestones/v1.1-SUMMARY.md`](milestones/v1.1-SUMMARY.md) · [`milestones/v1.1-REQUIREMENTS.md`](milestones/v1.1-REQUIREMENTS.md)
 - [x] **v1.3 Reliability, Measurement & Cross-Platform** (shipped 2026-06-11) — stranded-fix landing, macOS/bash-3.2+5.2 portability with 3-OS CI, silent-failure hardening, and the bounce measurement stack (state.json, deterministic scorer + marker-fate ledger, blind judge, human report). Headline: 17.6% deletion-convergence measured; Fable-5 judge 7/7 improved. 9 phases. See [`milestones/v1.3-SUMMARY.md`](milestones/v1.3-SUMMARY.md) · audit at `docs/audits/2026-06-10-v13-audit.md`
 
-## Active Milestone: v1.4 Distribution — npm + MCP (2026-06-11)
+## Active Milestone: v1.5 Build with Codex — model ladder + orchestrated execution (2026-06-12)
+
+**Goal:** Adopt the Codex-execution / Fable-orchestration split (per @cjzafir's pattern) in the dev-review runner: fix 3 latent env-export bugs, add per-seat model/effort config, a `--preset codex-build` shortcut, detached background execution with harness-exit-wakeup, a status-reader script, token capture to measure the 50% cost claim, and a `/codex-build` orchestration skill for both the runner and plugin transports. Design basis: `.planning/v1.5-DESIGN.md` (approved 2026-06-12).
+
+- [ ] **Phase 0: Environment + research** (2026-06-12, in progress) — codex symlink + smoke; plugin install; R1 pin `claude -p --output-format json` envelope; R2 pin codex end-of-run token line; register milestone in .planning/; research notes filed.
+- [ ] **Phase 1: Seat plumbing + env-export correctness** — `lib/co-evolution.sh` effort knobs + `invoke_codex_schema` move (B2); `dev-review.sh` `export CODEX_MODEL` (B1) + `export WORKDIR` (B3) + `--verifier/--claude-model` flags + per-seat env via `apply_seat_env`. Gate: `tests/run-all.sh` green; byte-parity with knobs off.
+- [ ] **Phase 2: Claude-verifier hardening + `--preset codex-build`** — fenced-JSON verdict fallback; preset expansion (fable/high → codex/xhigh → fable/max, bounces=2, revise-loop=1); banner; `tests/preset-expansion-simulation.sh`.
+- [ ] **Phase 3: Runner observability + status reader** — `state.json` additions (`current_phase`, `runner_pid`, `pre/post_execute_sha`, `orchestration.parent_run_id`); new `dev-review-status.sh` (~120 lines, exit codes 0/2/3/4/5); `tests/status-reader-simulation.sh`.
+- [ ] **Phase 4: Token capture** — `CO_EVOLVE_TOKEN_CAPTURE=1` (default off); `invoke_claude` gated JSON mode; codex stderr harvest; `collect_token_usage` → `state.json.tokens`; `tests/token-capture-simulation.sh`.
+- [ ] **Phase 5: `/codex-build` skill + docs** — new `skills/codex-build/SKILL.md` (preflight → plan → kick → wake/gate loop, both runner and plugin transports); CLAUDE.md Default Rule update; routing doc updates.
+- [ ] **Phase 6: Dogfood + evidence** — 2–3 real `/codex-build` tasks (ACCEPT / REVISE→ACCEPT / ESCALATE); token evidence note; MCP parity (`vendor.sh` + `npm test`); memory update.
+
+## Previous Milestone: v1.4 Distribution — npm + MCP (2026-06-11)
 
 **Goal:** Make the bounce protocol invocable without `git clone`: a Node/TS
 MCP server (`@alanshurafa/co-evolution-mcp`, one `co_evolve` tool) published

diff --git a/.planning/STATE.md b/.planning/STATE.md
@@ -1,17 +1,17 @@
 ---
 gsd_state_version: 1.0
-milestone: v1.4
-milestone_name: Distribution — npm + MCP
+milestone: v1.5
+milestone_name: Build with Codex — model ladder + orchestrated execution
 status: executing
-stopped_at: v1.4 Phases 0-4 EXECUTED (2026-06-11). mcp/ package built + 4/4 hermetic smoke tests + stdio handshake verified; CI gains 3-OS mcp job; publish-mcp.yml ready. Remaining = Phase 5 (HUMAN: verify npm scope @alanshurafa, add NPM_TOKEN secret, git tag -> auto-publish, Claude Desktop round-trip) + Phase 6 post-ship registry/awesome-list.
-last_updated: "2026-06-10T23:45:00.000Z"
-last_activity: 2026-06-11 -- v1.4 milestone registered; v1.3 archived
+stopped_at: v1.5 Phases 0-5 EXECUTED (b672145..30a9a00). Phase 6 PARTIAL (2026-06-12) — MCP vendor parity green; codex-verifier degrade path now FULLY GREEN end-to-end: model-leak fix + review-verdict.json schema 400 fix both landed, first real /codex-build ACCEPT produced (subtract-helper task, exit 0, APPROVED conf 96, verify tokens captured). Remaining Phase 6: claude /login (human, for full-ladder + non-zero claude_* tokens), REVISE→ACCEPT row, interactive baseline. v1.4 Phase 5 BLOCKED ON HUMAN (npm scope + NPM_TOKEN + git tag) — running in parallel, untouched.
+last_updated: "2026-06-12T17:43:00.000Z"
+last_activity: 2026-06-12 -- v1.5 Phase 6: review-verdict.json schema 400 fixed; codex-verifier degrade path E2E green (first ACCEPT)
 progress:
-  total_phases: 8
-  completed_phases: 8
-  total_plans: 17
-  completed_plans: 18
-  percent: 100
+  total_phases: 7
+  completed_phases: 6
+  total_plans: 0
+  completed_plans: 0
+  percent: 86
 ---
 
 # Project State
@@ -30,7 +30,18 @@ Milestone: v1.4 Distribution — npm + MCP
 Phase: 0-4 complete; Phase 5 (publish) blocked on human items
 Status: npm scope verification + NPM_TOKEN secret + git tag are Alan's; everything else built and CI-gated. v1.2 SC-4 gate still open (VERIFY-SC4.md).
 Last activity: 2026-06-10 -- Phase 0 merges + LF policy + audit report
-Working directories: `~/co-evolution-v13/` on the Mac (per-machine clone; SMB checkout `/Volumes/Project/co-evolution` is sync-only), `C:/Users/alan/Project/co-evolution-*` on the PC
+
+Milestone: v1.5 Build with Codex — model ladder + orchestrated execution
+Phase: 0-5 EXECUTED; Phase 6 (dogfood + evidence) PARTIAL
+Status: Phases 0-5 shipped on feat/v1.5-codex-build (b672145 Phase 0 env + research · fb6862a Phase 1 seats + B1/B2/B3 fixes · ffe765f Phase 2 verifier hardening + preset · 13c2bee Phase 3 observability + status reader · fb965ad Phase 4 token capture · 30a9a00 Phase 5 /codex-build skill + docs). Phase 6 partial — see below.
+Phase 6 progress (2026-06-12):
+  - MCP vendor parity GREEN: `bash mcp/scripts/vendor.sh` clean; `(cd mcp && npm test)` 4/4 pass. `mcp/vendor/` is gitignored (generated-at-publish via `npm run build:vendor`, NOT checked in) — Phase 1/3/4 lib changes were additive and broke nothing.
+  - First real `/codex-build` dogfood (slugify task, scratch repo under $TMPDIR, `--verifier codex` degrade since headless claude is logged out): execute phase SUCCEEDED (slugify landed, all 4 scratch tests pass), but verify phase ERRORED → runner exit 2, `verdict_present: false`, ESCALATE. codex_total_tokens=21497 (token capture works); claude_* totals=0 (ladder not exercised on the degrade path). One re-kick (`--parent-run`, lineage recorded) hit the same error. NO ACCEPT data point yet.
+  - Real bug found AND FIXED (2026-06-12): the documented `--verifier codex` degrade leaked the preset's `VERIFIER_MODEL=fable` into the codex seat (`apply_seat_env`, dev-review.sh:1370-1372) → codex on a ChatGPT account returned HTTP 400 "The 'fable' model is not supported". Fix = cross-agent leak guard in `apply_seat_env` + `resolve_seat_model_string` (drop a wrong-kind model+effort pair as a unit; codex seat falls back to `codex:(default)@(default)`). Sim scenario (h) added (preset-expansion-simulation.sh now 8/8; run-all 25/25 green). Re-run proof: fable 400 GONE, execute SUCCEEDED, scratch run-tests ALL PASS, codex_total_tokens=17237, wall 54s — BUT verdict still null: the verify seat now hit a SEPARATE, pre-existing schema 400 (`invalid_json_schema`: nested `issues.items` missing `additionalProperties:false` in skills/dev-review/schemas/review-verdict.json). Seat fix proven; degrade path was blocked one layer deeper by the schema bug. Detail in .planning/research/2026-06-12-token-evidence.md.
+  - Schema 400 FIXED (2026-06-12): OpenAI strict structured-output requires `additionalProperties:false` + a `required` list covering EVERY property on EVERY object node. `issues.items` was missing both; top-level `required` omitted `scope_creep_detected`/`iteration_notes`. Tightened all THREE canonical copies identically (schemas/, runners/codex-ps/schemas/, skills/dev-review/schemas/) so the drift guard stays green; shell `validate_review_verdict` is unaffected (stays loose, independent of this file). DEGRADE-PATH E2E NOW GREEN: real /codex-build (subtract-helper task, scratch repo under $TMPDIR, `--verifier codex`, --branch auto) → exit 0, verify OK, **verdict APPROVED conf 96**, verdict.json complete (all 6 strict fields), tokens execute=30606 verify=14485 codex_total=45091, wall 59s, scratch run-tests ALL 4 PASS. First full ACCEPT-path evidence row (degrade path). run-all 25/25 green. First ACCEPT data point logged in .planning/research/2026-06-12-token-evidence.md.
+  - Remaining Phase 6 (HUMAN + follow-up): `claude /login` on this Mac (unblocks full ladder + non-zero claude_* tokens; degrade-path claude_* are 0 by design); REVISE→ACCEPT row still owed (ESCALATE + ACCEPT now evidenced); an interactive `/dev-review` baseline for the 50%-claim denominator.
+Last activity: 2026-06-12 -- Phase 6: schema 400 fixed (3 copies); degrade-path E2E green, first ACCEPT (.planning/research/2026-06-12-token-evidence.md)
+Working directories: `~/Project/co-evolution/` on the Mac (per-machine clone; SMB checkout `/Volumes/Project/co-evolution` is sync-only), `C:/Users/alan/Project/co-evolution-*` on the PC
 
 macOS baseline before v1.3 fixes: scorer-verification 11/14; code-proposer sim 1/16; pr-emitter sim 4/12; template-proposer sim 1/8; revise-loop sim aborts. Root causes: bash 3.2 (mapfile, source <(…)), BSD sed GNU-isms. Target after Phase 0.5: all green on macOS.
 

diff --git a/.planning/research/2026-06-12-claude-json-envelope.md b/.planning/research/2026-06-12-claude-json-envelope.md
@@ -0,0 +1,71 @@
+# R1: claude -p JSON Envelope — Field Reference
+
+**Date:** 2026-06-12  
+**Phase:** v1.5 Phase 0  
+**Purpose:** Pin the exact JSON output structure of `claude -p --output-format json` for Phase 4 token-capture parsing.
+
+## Command run
+
+```
+claude -p --output-format json --model claude-haiku-4-5-20251001 "Reply with exactly: PING"
+```
+
+## Verbatim output (not-logged-in state; all usage fields present and zero)
+
+```json
+{"type":"result","subtype":"success","is_error":true,"api_error_status":null,"duration_ms":1103,"duration_api_ms":0,"num_turns":1,"result":"Not logged in · Please run /login","stop_reason":"stop_sequence","session_id":"4642f382-c299-4d72-bcaf-3e7bca396c7d","total_cost_usd":0,"usage":{"input_tokens":0,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":0,"server_tool_use":{"web_search_requests":0,"web_fetch_requests":0},"service_tier":"standard","cache_creation":{"ephemeral_1h_input_tokens":0,"ephemeral_5m_input_tokens":0},"inference_geo":"","iterations":[],"speed":"standard"},"modelUsage":{},"permission_denials":[],"terminal_reason":"completed","fast_mode_state":"off","uuid":"def45115-84e3-4e92-a601-20e0dae05dda"}
+```
+
+Exit code: 1 (error because not logged in; envelope still emitted on stdout)
+
+## Top-level fields
+
+| Field | Type | Notes |
+|---|---|---|
+| `type` | string | Always `"result"` |
+| `subtype` | string | `"success"` even on error |
+| `is_error` | bool | `true` when auth fails or API error |
+| `api_error_status` | null/int | HTTP status code on API errors; null here |
+| `duration_ms` | int | Wall time in ms |
+| `duration_api_ms` | int | API time in ms |
+| `num_turns` | int | Number of conversation turns |
+| `result` | string | The model's text output (or error message) |
+| `stop_reason` | string | e.g. `"stop_sequence"`, `"end_turn"` |
+| `session_id` | string | UUID |
+| `total_cost_usd` | float | Total cost in USD (0 when not logged in) |
+| `usage` | object | Token usage breakdown — see below |
+| `modelUsage` | object | Per-model usage breakdown (empty when not logged in) |
+| `permission_denials` | array | Tool permission denial events |
+| `terminal_reason` | string | `"completed"` |
+| `fast_mode_state` | string | `"off"` |
+| `uuid` | string | Run UUID |
+
+## usage subfields
+
+| Field | Type | Notes |
+|---|---|---|
+| `input_tokens` | int | Prompt input tokens |
+| `cache_creation_input_tokens` | int | Cache write tokens |
+| `cache_read_input_tokens` | int | Cache hit tokens |
+| `output_tokens` | int | Response tokens |
+| `server_tool_use.web_search_requests` | int | Web search count |
+| `server_tool_use.web_fetch_requests` | int | Web fetch count |
+| `service_tier` | string | `"standard"` or `"priority"` |
+| `cache_creation.ephemeral_1h_input_tokens` | int | 1-hour ephemeral cache write tokens |
+| `cache_creation.ephemeral_5m_input_tokens` | int | 5-min ephemeral cache write tokens |
+| `inference_geo` | string | Inference geography code |
+| `iterations` | array | Per-iteration usage (for multi-turn) |
+| `speed` | string | `"standard"` |
+
+## Notes for Phase 4
+
+- All token fields live under `.usage`. Phase 4's `invoke_claude` gated JSON mode should extract: `.usage.input_tokens`, `.usage.output_tokens`, `.usage.cache_creation_input_tokens`, `.usage.cache_read_input_tokens`.
+- `.total_cost_usd` is a direct top-level field, not nested.
+- The envelope is always emitted to **stdout** even when `is_error=true`.
+- The exit code is 1 on auth error; Phase 4 must handle non-zero exit with usable envelope (capture stdout regardless of exit code using `|| true`).
+- **Limitation:** This capture is from a not-logged-in shell. When logged in, `modelUsage` will be populated and `iterations` may have per-turn breakdown. Field names are stable across auth state.
+
+## Auth status at capture time
+
+`claude whoami` returns: `Not logged in · Please run /login`  
+(The Mac's interactive Claude Code sessions authenticate through the Electron app, not this shell. The sub-agent shell does not carry the session token.)