Skip to content

Layer 3 Phase 5d: runStage1Visual — convenience runner (discover + judge)#43

Merged
dadachi merged 1 commit intomainfrom
layer3-stage1-runner
May 2, 2026
Merged

Layer 3 Phase 5d: runStage1Visual — convenience runner (discover + judge)#43
dadachi merged 1 commit intomainfrom
layer3-stage1-runner

Conversation

@dadachi
Copy link
Copy Markdown
Contributor

@dadachi dadachi commented May 2, 2026

Summary

One-call wrapper that ties Phase 5c artifact discovery (#42) and Phase 5a visual-judge orchestration (#40) together for both platforms. Returns a Stage1VisualResult shaped to match JudgeInput.visual's per-platform expectation, so callers can pass it through to runJudge directly:

const visual = await runStage1Visual({
  iosDir: "./out/<slug>/ios",
  androidDir: "./out/<slug>/android",
  spec: domain.displayName,
});
const judge = await runJudge({ ..., visual });

Per-platform behavior

  • Pass undefined to skip the platform — runner returns no result for that platform.
  • If discovery fails (build hasn't happened, project layout unexpected), the runner surfaces a structured VisualJudgeResult with ok=false and an actionable error message ("iOS artifact not discovered (run Layer 2 build mode first)") — same shape as a real launch/capture failure, so downstream aggregation in runJudge (Layer 3 Phase 5b: integrate runVisualJudge into runJudge (opt-in) #41) doesn't need a special case.

Caller responsibilities

  • Run Layer 2 in build mode first so .app / .apk exists.
  • Ensure a sim/emulator is booted for each platform being judged.
  • Decide which platforms to judge — pass undefined for the others.

Test plan

  • npm run ci — 16/16 green.
  • Structured failure when artifacts missing (covers the most common path: discovery returns null).
  • Empty result when no platforms requested (no work done, no error).
  • After merge: Phase 5e CLI/dispatch wiring.

Out of scope (Phase 5e)

  • CLI flag / env var that opts dispatch into Stage 1 visual.
  • Forcing Layer 2 build mode when visual is enabled.
  • Plumbing the runStage1Visual call into dispatch.ts post-Layer-2.

🤖 Generated with Claude Code

One-call wrapper that ties Phase 5c artifact discovery (#42) and
Phase 5a visual-judge orchestration (#40) together for both
platforms. Returns a Stage1VisualResult shaped to match
JudgeInput.visual's per-platform expectation, so callers can pass
it through to runJudge directly:

  const visual = await runStage1Visual({
    iosDir: "./out/<slug>/ios",
    androidDir: "./out/<slug>/android",
    spec: domain.displayName,
  });
  const judge = await runJudge({ ..., visual });

Per-platform behavior:
  - Pass undefined to skip the platform.
  - If discovery fails (build hasn't happened, project layout
    unexpected), surfaces a structured VisualJudgeResult with
    ok=false and an actionable error message ("iOS artifact not
    discovered (run Layer 2 build mode first)") — same shape as
    a real launch/capture failure, so downstream aggregation in
    runJudge (#41) doesn't need a special case.

Caller responsibilities:
  - Run Layer 2 in build mode first so .app / .apk exists
  - Ensure a sim/emulator is booted for each platform being judged
  - Decide which platforms to judge (the function judges only
    those passed)

Tests: 16/16 npm run ci green.
  - Structured failure when artifacts missing ✓
  - Empty result when no platforms requested ✓

Out of scope (Phase 5e, the final integration):
  - CLI flag / env var that opts dispatch into Stage 1 visual
  - Forcing Layer 2 build mode when visual is enabled
  - Plumbing the runStage1Visual call into dispatch.ts post-Layer-2

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dadachi dadachi merged commit 6f75eb5 into main May 2, 2026
1 check passed
@dadachi dadachi deleted the layer3-stage1-runner branch May 2, 2026 22:26
dadachi added a commit that referenced this pull request May 2, 2026
Closes the integration loop. With NATIVEAPPTEMPLATE_VISUAL=1 set in
the shell, npm run dev:

  1. Runs the existing planner + workers + reviewer chain.
  2. Calls runJudge with layer2Mode: "build", forcing real
     xcodebuild build + ./gradlew assembleDebug instead of the
     fast-mode toolchain probe. The build outputs are what Stage 1
     visual judging needs.
  3. Calls runJudge with visual: { iosDir, androidDir, spec }.
     runJudge in turn calls runStage1Visual (#43) per platform:
     discoverArtifact (#42) → installAndLaunch (#39) → 3s render
     wait → captureScreenshot (#38) → runLayer3 (#37) with
     DEFAULT_STAGE1_RUBRIC (#40).
  4. Aggregates Layer 1 + Layer 2 + Layer 3 into JudgeResult.
     overallPass requires all three to pass; visual failures DO
     fail the run (matches PLAN.md "a run that green-builds without
     passing Layer 3 is a failed run").

Without the flag set, behavior is unchanged from #43: Layer 2 in
fast mode, Layer 3 skipped, summary reads
"Layer 1 3/3 pass · Layer 2 3/3 pass · Layer 3 skipped".

Refactors the JudgeInput.visual shape from per-platform pre-resolved
configs ({artifactPath, bundleId} pairs) to outDir-based discovery
({iosDir, androidDir}). runJudge.runVisualPhase now delegates to
runStage1Visual which does discovery + visual-judge atomically.

Latency: a cold build + judge run adds ~60s (iOS) and ~120-180s
(Android) on top of the fast-mode baseline. Hot rebuilds are much
faster but vary with substrate caches.

Recommendations covered:
  - Trigger: env var NATIVEAPPTEMPLATE_VISUAL=1 (canonical stem
    per the post-#30 convention; keep stub flags' rename for a
    follow-up PR).
  - Build coupling: visual implies Layer 2 build mode.
  - Failure semantics: visual failures fail the run.
  - Render wait: 3s default for both platforms (in
    DEFAULT_RENDER_WAIT_MS, configurable via runVisualJudge input).
  - Per-platform: judge both when visual enabled — discovery
    returns null gracefully if either platform's build is missing.

Tests: 16/16 npm run ci green.
README.md gains an "Optional flags" subsection documenting the
trigger and its latency cost.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant