Layer 3 Phase 5d: runStage1Visual — convenience runner (discover + judge) by dadachi · Pull Request #43 · nativeapptemplate/nativeapptemplate-agent

dadachi · 2026-05-02T22:25:54Z

Summary

One-call wrapper that ties Phase 5c artifact discovery (#42) and Phase 5a visual-judge orchestration (#40) together for both platforms. Returns a Stage1VisualResult shaped to match JudgeInput.visual's per-platform expectation, so callers can pass it through to runJudge directly:

const visual = await runStage1Visual({
  iosDir: "./out/<slug>/ios",
  androidDir: "./out/<slug>/android",
  spec: domain.displayName,
});
const judge = await runJudge({ ..., visual });

Per-platform behavior

Pass undefined to skip the platform — runner returns no result for that platform.
If discovery fails (build hasn't happened, project layout unexpected), the runner surfaces a structured VisualJudgeResult with ok=false and an actionable error message ("iOS artifact not discovered (run Layer 2 build mode first)") — same shape as a real launch/capture failure, so downstream aggregation in runJudge (Layer 3 Phase 5b: integrate runVisualJudge into runJudge (opt-in) #41) doesn't need a special case.

Caller responsibilities

Run Layer 2 in build mode first so .app / .apk exists.
Ensure a sim/emulator is booted for each platform being judged.
Decide which platforms to judge — pass undefined for the others.

Test plan

npm run ci — 16/16 green.
Structured failure when artifacts missing (covers the most common path: discovery returns null).
Empty result when no platforms requested (no work done, no error).
After merge: Phase 5e CLI/dispatch wiring.

Out of scope (Phase 5e)

CLI flag / env var that opts dispatch into Stage 1 visual.
Forcing Layer 2 build mode when visual is enabled.
Plumbing the runStage1Visual call into dispatch.ts post-Layer-2.

🤖 Generated with Claude Code

One-call wrapper that ties Phase 5c artifact discovery (#42) and Phase 5a visual-judge orchestration (#40) together for both platforms. Returns a Stage1VisualResult shaped to match JudgeInput.visual's per-platform expectation, so callers can pass it through to runJudge directly: const visual = await runStage1Visual({ iosDir: "./out/<slug>/ios", androidDir: "./out/<slug>/android", spec: domain.displayName, }); const judge = await runJudge({ ..., visual }); Per-platform behavior: - Pass undefined to skip the platform. - If discovery fails (build hasn't happened, project layout unexpected), surfaces a structured VisualJudgeResult with ok=false and an actionable error message ("iOS artifact not discovered (run Layer 2 build mode first)") — same shape as a real launch/capture failure, so downstream aggregation in runJudge (#41) doesn't need a special case. Caller responsibilities: - Run Layer 2 in build mode first so .app / .apk exists - Ensure a sim/emulator is booted for each platform being judged - Decide which platforms to judge (the function judges only those passed) Tests: 16/16 npm run ci green. - Structured failure when artifacts missing ✓ - Empty result when no platforms requested ✓ Out of scope (Phase 5e, the final integration): - CLI flag / env var that opts dispatch into Stage 1 visual - Forcing Layer 2 build mode when visual is enabled - Plumbing the runStage1Visual call into dispatch.ts post-Layer-2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the integration loop. With NATIVEAPPTEMPLATE_VISUAL=1 set in the shell, npm run dev: 1. Runs the existing planner + workers + reviewer chain. 2. Calls runJudge with layer2Mode: "build", forcing real xcodebuild build + ./gradlew assembleDebug instead of the fast-mode toolchain probe. The build outputs are what Stage 1 visual judging needs. 3. Calls runJudge with visual: { iosDir, androidDir, spec }. runJudge in turn calls runStage1Visual (#43) per platform: discoverArtifact (#42) → installAndLaunch (#39) → 3s render wait → captureScreenshot (#38) → runLayer3 (#37) with DEFAULT_STAGE1_RUBRIC (#40). 4. Aggregates Layer 1 + Layer 2 + Layer 3 into JudgeResult. overallPass requires all three to pass; visual failures DO fail the run (matches PLAN.md "a run that green-builds without passing Layer 3 is a failed run"). Without the flag set, behavior is unchanged from #43: Layer 2 in fast mode, Layer 3 skipped, summary reads "Layer 1 3/3 pass · Layer 2 3/3 pass · Layer 3 skipped". Refactors the JudgeInput.visual shape from per-platform pre-resolved configs ({artifactPath, bundleId} pairs) to outDir-based discovery ({iosDir, androidDir}). runJudge.runVisualPhase now delegates to runStage1Visual which does discovery + visual-judge atomically. Latency: a cold build + judge run adds ~60s (iOS) and ~120-180s (Android) on top of the fast-mode baseline. Hot rebuilds are much faster but vary with substrate caches. Recommendations covered: - Trigger: env var NATIVEAPPTEMPLATE_VISUAL=1 (canonical stem per the post-#30 convention; keep stub flags' rename for a follow-up PR). - Build coupling: visual implies Layer 2 build mode. - Failure semantics: visual failures fail the run. - Render wait: 3s default for both platforms (in DEFAULT_RENDER_WAIT_MS, configurable via runVisualJudge input). - Per-platform: judge both when visual enabled — discovery returns null gracefully if either platform's build is missing. Tests: 16/16 npm run ci green. README.md gains an "Optional flags" subsection documenting the trigger and its latency cost. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dadachi merged commit 6f75eb5 into main May 2, 2026
1 check passed

dadachi deleted the layer3-stage1-runner branch May 2, 2026 22:26

dadachi mentioned this pull request May 2, 2026

Layer 3 Phase 5e: dispatch wires NATIVEAPPTEMPLATE_VISUAL=1 #44

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Layer 3 Phase 5d: runStage1Visual — convenience runner (discover + judge)#43

Layer 3 Phase 5d: runStage1Visual — convenience runner (discover + judge)#43
dadachi merged 1 commit intomainfrom
layer3-stage1-runner

dadachi commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dadachi commented May 2, 2026

Summary

Per-platform behavior

Caller responsibilities

Test plan

Out of scope (Phase 5e)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant