From 560eee3f4355fc1d9178d9918a1e5cfde5dcf6e8 Mon Sep 17 00:00:00 2001 From: dadachi Date: Mon, 4 May 2026 11:56:28 +0900 Subject: [PATCH] Stage 1 rubric: tighten renders-cleanly to actual render failures MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Third iteration of the same fundamental issue. After PR #50 dropped domain-match from the Stage 1 rubric, the remaining renders-cleanly criterion still leaked domain-semantic judgment via its phrasing "placeholder text where real content should appear" — the vision judge interpreted "this is a welcome screen, not a queue dashboard" as failing that clause. Same domain-match question in different clothes. Verified live: against the post-substrate-fix iOS welcome screenshot ("Welcome to Vet Clinic Queue" with three sparkles), median-of-3 sampling oscillates between PASS and FAIL across runs. The judge's FAIL rationales explicitly demand "a functional clinic queue interface" — Stage 2 territory per docs/SPEC.md, not Stage 1's "did it render at launch." Tightened to actual render-failure detection only: Does the screen render without an actual rendering failure — that is, no crash dialog, no broken-image-icon glyphs, no text overlapping other text, no content cut off the side of the screen? A welcome / launch / onboarding screen with decorative graphics counts as PASS as long as nothing is technically broken; do not judge whether the screen looks "finished" or shows the app's domain content. Explicitly tells the judge: "decorative graphics with welcome text is PASS." Removes the "where real content should appear" clause that invited Stage-2 interpretation. Verified post-fix: both iOS and Android welcome screenshots from the substrate now PASS both criteria consistently. Sampling variance should be much lower with unambiguous criterion wording. Out of scope (Stage 2 territory): - Domain-semantic UI judging (does the home screen look like a real clinic queue?). That's where mobile-mcp navigation past welcome lands. Co-Authored-By: Claude Opus 4.7 (1M context) --- src/validation/visual-judge.ts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/validation/visual-judge.ts b/src/validation/visual-judge.ts index c152f9a..91bff37 100644 --- a/src/validation/visual-judge.ts +++ b/src/validation/visual-judge.ts @@ -117,7 +117,7 @@ export const DEFAULT_STAGE1_RUBRIC: readonly Layer3Criterion[] = [ { id: "renders-cleanly", question: - "Does the home screen render without obvious layout breakage, missing icons or images, or placeholder text where real content should appear?", + "Does the screen render without an actual rendering failure — that is, no crash dialog, no broken-image-icon glyphs, no text overlapping other text, no content cut off the side of the screen? A welcome / launch / onboarding screen with decorative graphics counts as PASS as long as nothing is technically broken; do not judge whether the screen looks 'finished' or shows the app's domain content.", }, ];