Skip to content

docs: surface the v1.7→v1.12 artifact-evaluation arc in the README#80

Merged
mataeil merged 1 commit into
mainfrom
docs/readme-artifact-evaluation
Jun 26, 2026
Merged

docs: surface the v1.7→v1.12 artifact-evaluation arc in the README#80
mataeil merged 1 commit into
mainfrom
docs/readme-artifact-evaluation

Conversation

@mataeil

@mataeil mataeil commented Jun 26, 2026

Copy link
Copy Markdown
Owner

What

The README was last touched at v1.6.1 and never reflected the project's biggest evolution: the shift from measuring process (did a cycle advance?) to measuring whether the artifact is good.

  • New section — "Measuring the artifact, not just the loop" (after "Measuring the loop"): tells the honest, probe-driven arc — Goodhart collapse → independent rubric critic → leap cycles, capture fidelity, Ambition, research-grounding, honest ceilings. Arc framing: lying A → honest D → earned A → honest F+ vs real games → honest ceiling.
  • Production Validation: one clearly-labelled paragraph noting the f1 build-quality dogfood probe — explicitly not a third production deployment (it's a private-repo lab test).
  • .gitignore: ignore the f1 dogfood evidence screenshots (kept on disk; never framework source) so git status stays clean.

Why

This is the most compelling part of the project's recent history (v1.7→v1.12, all merged + now tagged) and it was invisible to anyone reading the README.

Verification

  • Claims adversarially reviewed in 3 lenses before shipping — factual accuracy (every score/version checked vs CHANGELOG), honesty/anti-overclaim (no rubric-score=objective-"good" conflation; probe ≠ production), and README voice/integration.
  • python3 tests/verify.py64/64 passed, 0 failed.
  • Docs-only; no engine/spec/script changes.

🤖 Generated with Claude Code

The README was last touched at v1.6.1 and never reflected the project's
biggest evolution — the move from measuring *process* (did a cycle move?)
to measuring whether the *artifact* is good. Add a "Measuring the artifact,
not just the loop" section telling the honest arc (lying A → honest D →
earned A → honest F+ vs real games → honest ceiling) that the f1 dogfood
probe drove, plus a clearly-labelled build-quality probe note in Production
Validation (explicitly NOT a third production deployment).

Also gitignore the f1 dogfood evidence screenshots (kept on disk, never
framework source) so git status stays clean.

Claims adversarially reviewed (facts/honesty/voice) before shipping.
verify.py 64/64.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mataeil mataeil merged commit f6d6f9c into main Jun 26, 2026
2 checks passed
@mataeil mataeil deleted the docs/readme-artifact-evaluation branch June 26, 2026 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant