Skip to content

feat(v1.8.0): drive quality to good — fix thrashing guard, per-dimension capture, dimension-lock, remainder#72

Merged
mataeil merged 2 commits into
mainfrom
feat/v1.8.0-quality-loop
Jun 19, 2026
Merged

feat(v1.8.0): drive quality to good — fix thrashing guard, per-dimension capture, dimension-lock, remainder#72
mataeil merged 2 commits into
mainfrom
feat/v1.8.0-quality-loop

Conversation

@mataeil

@mataeil mataeil commented Jun 19, 2026

Copy link
Copy Markdown
Owner

The F1 probe stayed crap after three v1.7.0 leaps (artifact 0.394 → 0.447 → 0.472 → 0.522, never reaching bar 0.65). A 13-agent adversarially-verified diagnosis (.claude/ooda-evolution-v1.8.0.md) found the loop detects a quality gap but isn't built to close it.

Root cause

  1. ~45% of the rubric is frozendriving_feel + fun_challenge were scored from a still screenshot (unmeasurable) and sat unchanged across all 25 cycles.
  2. Settles for +0.05 and rotates targets instead of driving one dimension to bar.
  3. Thrashing guard silently broken — read a nonexistent leap_delta field → fails always 0 → HALT never fired → could thrash forever.
  4. Accepts partial implementation of its own leap plans (leap 3's materials/lighting silently vanished).

The devil's-advocate agent confirmed the leap routing is fine — the binding constraint is perception, not more leap machinery.

The 4 fixes (ranked)

  1. Thrashing-guard bug fix (prerequisite) — count leap_attempts[].delta_score on leap_target (rubric_score.failed_leaps()).
  2. Per-dimension capture_method (5-G) — experiential axes use a human-authored, hash-verified, protected gameplay_metrics harness; missing → null + skill_gap, never faked.
  3. Dimension lock until bar (2-G) — keep leaping the same below-bar target (rubric_score.lock_target(), config.leap.lock_until_bar).
  4. Auto-queue remainder (5-G) — critic-driven, so dropped scope can't be orphaned.

Rejected (devil's-advocate-validated): raising the bar to 0.80 yet; an inner refine loop; an LLM-component-coverage gate; multi_probe.

Validation (leap 4, separate game PR)

Ran a materials/lighting leap under v1.8.0: visual_fidelity 0.59 → 0.63, shipping the previously-dropped shadows/tone-mapping. It hit the screenshot-critique ceiling (~0.63) — proving the bottleneck has moved to perception. The loop's next target (fun_challenge) is unmeasurable by screenshot, so the fixed guard now HALTs requesting a human metrics harness instead of thrashing.

tests/verify.py 59 → 61. plugin 1.7→1.8, config schema 1.3→1.4.

🤖 Generated with Claude Code

mataeil and others added 2 commits June 19, 2026 14:55
…nsion capture, dimension-lock, remainder

The F1 probe stayed crap after 3 v1.7.0 leaps (artifact 0.394→0.447→0.472→0.522,
never reaching bar 0.65). A 13-agent adversarially-verified diagnosis found the
loop DETECTS a quality gap but isn't built to CLOSE it.

- 2-G thrashing-guard BUG FIX: counted a nonexistent `leap_delta` on
  weakest_dimension → fails was ALWAYS 0, HALT never fired. Now counts
  leap_attempts[].delta_score on leap_target (rubric_score.failed_leaps()).
- 5-G per-dimension capture_method: experiential axes (driving_feel +
  fun_challenge = 45% of weight, frozen across all 25 cycles) use a
  human-authored, hash-verified, protected gameplay_metrics harness; missing →
  null + skill_gap, never a faked/silent-screenshot score.
- 2-G dimension lock until bar: a successful leap below bar keeps the plateau on
  the SAME target (drive-to-bar, not detect-and-nudge+rotate). lock_target();
  config.leap.lock_until_bar; tolerance band + working max-attempts HALT.
- 5-G auto-queue remainder: a gate-passing leap still below bar queues a
  high-RICE remainder, triggered by the independent critic (not self-report).

Rejected: raising bar to 0.80 yet, inner refine loop, LLM-coverage gate,
multi_probe. verify.py 59 → 61. plugin 1.7→1.8, config schema 1.3→1.4.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ttleneck moved to perception

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mataeil mataeil merged commit 4fd791c into main Jun 19, 2026
2 checks passed
@mataeil mataeil deleted the feat/v1.8.0-quality-loop branch June 19, 2026 06:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant