Skip to content

docs(v1.8.1): gameplay_metrics validated end-to-end + harness-behaviour guidance#73

Merged
mataeil merged 1 commit into
mainfrom
feat/v1.8.1-harness-guidance
Jun 19, 2026
Merged

docs(v1.8.1): gameplay_metrics validated end-to-end + harness-behaviour guidance#73
mataeil merged 1 commit into
mainfrom
feat/v1.8.1-harness-guidance

Conversation

@mataeil

@mataeil mataeil commented Jun 19, 2026

Copy link
Copy Markdown
Owner

The f1 probe exercised v1.8.0's per-dimension gameplay_metrics capture on the two frozen experiential axes (driving_feel, fun_challenge — 45% of the rubric a screenshot can't judge).

  • Honest measurement first dropped artifact_quality 0.533 → 0.490 (the screenshot had over-scored them: feel 0.51→0.41, fun 0.38→0.29) — confirming measurement was the bottleneck and inflating.
  • Two unlocked leaps: fun_challenge 0.29 → 0.81, driving_feel 0.41 → 0.78artifact_quality crossed the bar for the first time (0.687 ≥ 0.65) → an honest grade A.
  • Guidance: a gameplay_metrics harness must MEASURE BEHAVIOUR, never assert an implementation fact (a hardcoded flag can't credit a fix → spurious thrashing-HALT).

verify.py 61 green. plugin 1.8.0→1.8.1.

🤖 Generated with Claude Code

…r guidance

The f1 probe exercised v1.8.0's per-dimension capture on the frozen experiential
axes. Honest metric measurement first dropped artifact 0.533→0.490 (screenshot
had over-scored feel/fun), then two unlocked leaps (fun_challenge 0.29→0.81,
driving_feel 0.41→0.78) pushed artifact_quality past the bar (0.687) for the first
time — an HONEST grade A.

Guidance added to config.example.json: a gameplay_metrics harness must MEASURE
BEHAVIOUR, not assert an implementation fact (a hardcoded flag can't credit a fix
and would trigger a spurious thrashing-HALT). plugin 1.8.0→1.8.1.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mataeil mataeil merged commit 50e40f3 into main Jun 19, 2026
2 checks passed
@mataeil mataeil deleted the feat/v1.8.1-harness-guidance branch June 19, 2026 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant