From f329d71516086cdc94292871491967a9e65f6f4d Mon Sep 17 00:00:00 2001 From: Taeil Ma Date: Sun, 21 Jun 2026 12:18:17 +0900 Subject: [PATCH] =?UTF-8?q?feat(v1.12.0):=20capture=20fidelity=20=E2=80=94?= =?UTF-8?q?=20capture=5Fstates=20so=20the=20critic=20sees=20where=20each?= =?UTF-8?q?=20dimension=20manifests?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Dogfooding v1.11.0 on f1 (10 grounded rounds) surfaced the next maze variant: measurable improvement is bounded by feedback fidelity. The 5-G critic saw only one low-speed cockpit frame, so the chase view (clearcoat hero car + HDRI), sky, speed effects, and slip-angle physics were invisible and the grade stalled. capture_states drives the artifact into the states where each dimension shows; unreachable state = capture_failure (null), not a low score. plugin 1.11.0→1.12.0; verify.py 64. Co-Authored-By: Claude Opus 4.8 --- .claude-plugin/plugin.json | 2 +- CHANGELOG.md | 23 +++++++++++++++++++++++ config.example.json | 2 ++ skills/evolve/SKILL.md | 20 +++++++++++++++++++- 4 files changed, 45 insertions(+), 2 deletions(-) diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json index de55683..13a7a13 100644 --- a/.claude-plugin/plugin.json +++ b/.claude-plugin/plugin.json @@ -1,7 +1,7 @@ { "name": "ooda-loop", "displayName": "OODA-loop", - "version": "1.11.0", + "version": "1.12.0", "description": "An autonomous operations layer for your live side project. It watches, re-orients from which PRs you merge and reject, and opens small revertible PRs — bounded by a HALT file, protected paths, and a hard cost cap. Built on Boyd's OODA loop. You stay in command.", "author": { "name": "Taeil Ma", diff --git a/CHANGELOG.md b/CHANGELOG.md index 1f86558..4cea124 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,29 @@ independently. Bump there signals migration work for downstream projects. --- +## [v1.12.0] — 2026-06-21 + +### Added — capture fidelity (the feedback-coverage fix) + +Dogfooding v1.11.0 on the f1 game (10 research-grounded grade→ground→regrade +rounds, honest re-grade 0.31 → ~0.41 stills / ~0.58 true) surfaced the next maze +variant: **measurable improvement is bounded by FEEDBACK FIDELITY.** The 5-G +critic captured ONE low-speed first-person cockpit frame (the car kept stopping at +the start gantry), so a chase camera (where the clearcoat hero car + HDRI +reflections live), the HDRI sky, speed motion-blur/FOV, and slip-angle physics — +all real, gated, merged — were INVISIBLE to the grader. The grade stalled and the +critic kept steering effort toward only-what-was-in-that-one-frame. + +- **`capture_states` (evolve 5-G + config).** A dimension is only gradeable in the + state(s) where it MANIFESTS; capture each dimension by driving the artifact INTO + those states (chase view for paint, high-speed for sense-of-speed, pitched-up for + sky) and pass the critic ALL frames. A state that can't be reached is a + capture_failure (null), never a low score — don't grade a dimension from a frame + that structurally can't show it. + +Spec/config only (verify.py unchanged at 64). plugin 1.11.0→1.12.0. +Builds on v1.8.0 per-dimension `capture_method` + v1.11.0 reference grounding. + ## [v1.11.0] — 2026-06-21 ### Added — Research-Grounded OODA (the anti-maze methodology) diff --git a/config.example.json b/config.example.json index 0cafde9..11faab0 100644 --- a/config.example.json +++ b/config.example.json @@ -281,6 +281,8 @@ "name": "visual_fidelity", "weight": 0.25, "capture_method": "screenshot", + "__capture_states_doc__": "v1.12.0 (feedback fidelity) — a dimension is only gradeable in the state(s) where it MANIFESTS; a single default frame under-credits + misdirects the loop. List the states to drive the artifact into before capturing; the critic gets ALL frames. The f1 probe earned this: a chase camera / HDRI sky / speed blur / slip-angle physics were all added but every capture was one low-speed cockpit frame, so the work was invisible and the grade stalled. A state that can't be reached = capture_failure (null), not a low score.", + "capture_states": ["chase view, car stationary on a straight", "chase view at ~80% top speed", "camera pitched up to show the sky/horizon"], "description": "3D visual quality vs SHIPPED games. Score against the reference anchors, not the artifact's past.", "reference": { "score_0.10": "flat-shaded primitive meshes, solid-colour sky, no shadows/post-processing (a 1990s look)", diff --git a/skills/evolve/SKILL.md b/skills/evolve/SKILL.md index 03acfbb..67f8cf2 100644 --- a/skills/evolve/SKILL.md +++ b/skills/evolve/SKILL.md @@ -1908,7 +1908,25 @@ for dim in rubric.dimensions: if not independent: dim_artifact[dim] = null (capture_failure); continue dim_artifact[dim] = run_protected_harness(dim.gameplay_metrics_command) -- metrics JSON else if method == "screenshot": - dim_artifact[dim] = the shared screenshot (captured once) + -- CAPTURE COVERAGE (v1.12.0 — feedback fidelity). A SINGLE default frame + -- under-credits and MISDIRECTS: a dimension is only gradeable in the state(s) + -- where it MANIFESTS. The f1 probe proved this — 10 grounded rounds added a + -- chase camera (where the clearcoat hero car + HDRI reflections live), real + -- HDRI sky, speed motion-blur/FOV, and slip-angle physics, but every capture + -- was one LOW-SPEED COCKPIT frame facing a wall, so NONE of that work was + -- visible to the critic → the still-grade stalled (~0.41) while the true + -- quality (~0.58) was far higher, and the critic kept steering effort toward + -- only-what-was-in-that-frame. Rule: capture each dimension in the state(s) + -- where it shows. dim.capture_states (list) enumerates them, e.g. + -- visual_paint: ["chase_view at speed 0", "chase_view at speed 0.8"] + -- sense_of_speed:["cockpit at speed 0.9 on a straight"] + -- sky/lighting: ["camera pitched up to the horizon"] + -- Drive the artifact INTO each state (set the view/speed/scenario) before the + -- frame; pass the critic ALL of a dimension's frames. A dimension whose state + -- can't be reached is a CAPTURE_FAILURE (null), NOT a low score — never grade + -- a dimension from a frame that structurally cannot show it. + states = dim.capture_states or rubric.capture_states or ["default"] + dim_artifact[dim] = [ capture_in(state) for state in states ] -- 1+ frames else: dim_artifact[dim] = run dim.capture_command (or rubric.capture_command) -- If a dimension's harness is MISSING entirely, score it null (capture_failure) +