Skip to content

Layer 3 Phase 5c: artifact discovery for iOS .app and Android .apk#42

Merged
dadachi merged 1 commit intomainfrom
layer3-artifact-discovery
May 2, 2026
Merged

Layer 3 Phase 5c: artifact discovery for iOS .app and Android .apk#42
dadachi merged 1 commit intomainfrom
layer3-artifact-discovery

Conversation

@dadachi
Copy link
Copy Markdown
Contributor

@dadachi dadachi commented May 2, 2026

Summary

After Layer 2 build mode produces the iOS .app / Android .apk, callers need the artifact path + identifier to feed into runVisualJudge (#40, #41). Hardcoding paths or guessing identifiers from the slug is fragile — the substrate's iOS bundle ID is com.<slugflat>.<pascal>App.ios${TEAM} where ${TEAM} is the developer's signing team, resolved at build time. The build outputs are the source of truth.

API

discoverIosArtifact(iosDir): Promise<{appPath, bundleId} | null>
discoverAndroidArtifact(androidDir): Promise<{apkPath, packageName} | null>

iOS discovery:

  1. Find *.xcodeproj, scheme = filename minus extension.
  2. xcodebuild -showBuildSettings -json → reads BUILT_PRODUCTS_DIR + WRAPPER_NAME.
  3. plutil -extract CFBundleIdentifier raw on the built Info.plist.

Android discovery:

  1. apkPath = app/build/outputs/apk/debug/app-debug.apk (predictable).
  2. Parse applicationId = "..." from app/build.gradle.kts.

Both return null gracefully when the build hasn't happened, the project layout doesn't match, or the tooling fails — no exceptions thrown.

Why post-build for iOS

The substrate's PRODUCT_BUNDLE_IDENTIFIER is com.<slugflat>.<pascal>App.ios${SAMPLE_CODE_DISAMBIGUATOR}, with SAMPLE_CODE_DISAMBIGUATOR=${DEVELOPMENT_TEAM}. Pre-build it's a template; post-build the .app's Info.plist has the resolved value (e.g. .iosNNYDL5U3V3 with the user's team ID). Reading the resolved form makes installAndLaunch's bundleId match what's actually installed on the simulator.

Test plan

  • npm run ci — 14/14 green.
  • Real-mode smoke against existing out/vet-clinic-queue/: Android returned null (no apk built yet); iOS returned a fully-resolved {appPath, bundleId} from a prior xcodebuild run that lived in DerivedData. Both behaved as designed.
  • After merge: a Phase 5d runner can call discoverIosArtifact + discoverAndroidArtifact post-Layer-2-build and feed the results into runJudge's opt-in visual field.

Out of scope (Phase 5d)

  • Higher-level runner: build → discover → runVisualJudge in one call.
  • Plumbing through dispatch.ts with a flag/env var to opt into Stage 1 visual judging.

🤖 Generated with Claude Code

After Layer 2 build mode produces the app bundle / APK, callers need
the artifact path + identifier to feed into runVisualJudge (#40, #41).
Hardcoding paths or guessing identifiers from the slug is fragile —
the substrate's iOS bundle ID is `com.<slugflat>.<pascal>App.ios${TEAM}`
where ${TEAM} is the developer's signing team, resolved at build time.
The right source of truth is the build outputs.

Adds two resolvers:

  discoverIosArtifact(iosDir):
    1. Find *.xcodeproj, scheme = filename minus extension
    2. xcodebuild -showBuildSettings -json → BUILT_PRODUCTS_DIR + WRAPPER_NAME
    3. plutil -extract CFBundleIdentifier raw on the built Info.plist
    Returns {appPath, bundleId} | null

  discoverAndroidArtifact(androidDir):
    1. apkPath = app/build/outputs/apk/debug/app-debug.apk (predictable)
    2. Parse `applicationId = "..."` from app/build.gradle.kts
    Returns {apkPath, packageName} | null

Both return null gracefully when:
  - Build hasn't happened (.app / .apk missing)
  - Project layout doesn't match (missing .xcodeproj, missing
    build.gradle.kts, etc.)
  - Tooling fails (xcodebuild / plutil exit non-zero, JSON parse
    fails)

Why post-build for iOS (vs. parsing project.pbxproj at the source):
the substrate's PRODUCT_BUNDLE_IDENTIFIER is
`com.<slugflat>.<pascal>App.ios${SAMPLE_CODE_DISAMBIGUATOR}` where
SAMPLE_CODE_DISAMBIGUATOR=${DEVELOPMENT_TEAM}. Pre-build it's a
template; post-build the .app's Info.plist has the resolved value
(e.g. ".iosNNYDL5U3V3" with the user's team ID). Reading the resolved
form makes installAndLaunch's bundle ID match what's actually
installed.

Real-mode smoke: against existing out/vet-clinic-queue/, Android
returned null (no apk built yet), iOS returned a fully-resolved
{appPath, bundleId} from a prior xcodebuild run that lived in
DerivedData. Both behaved as designed.

Tests: 14/14 npm run ci green.

Out of scope (Phase 5d):
  - Wire discovery into a higher-level runner that does build →
    discover → runVisualJudge in one call.
  - Plumb that runner into dispatch with a flag/env var to opt in to
    Stage 1 visual judging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dadachi dadachi merged commit 5de8beb into main May 2, 2026
1 check passed
@dadachi dadachi deleted the layer3-artifact-discovery branch May 2, 2026 22:23
dadachi added a commit that referenced this pull request May 2, 2026
One-call wrapper that ties Phase 5c artifact discovery (#42) and
Phase 5a visual-judge orchestration (#40) together for both
platforms. Returns a Stage1VisualResult shaped to match
JudgeInput.visual's per-platform expectation, so callers can pass
it through to runJudge directly:

  const visual = await runStage1Visual({
    iosDir: "./out/<slug>/ios",
    androidDir: "./out/<slug>/android",
    spec: domain.displayName,
  });
  const judge = await runJudge({ ..., visual });

Per-platform behavior:
  - Pass undefined to skip the platform.
  - If discovery fails (build hasn't happened, project layout
    unexpected), surfaces a structured VisualJudgeResult with
    ok=false and an actionable error message ("iOS artifact not
    discovered (run Layer 2 build mode first)") — same shape as
    a real launch/capture failure, so downstream aggregation in
    runJudge (#41) doesn't need a special case.

Caller responsibilities:
  - Run Layer 2 in build mode first so .app / .apk exists
  - Ensure a sim/emulator is booted for each platform being judged
  - Decide which platforms to judge (the function judges only
    those passed)

Tests: 16/16 npm run ci green.
  - Structured failure when artifacts missing ✓
  - Empty result when no platforms requested ✓

Out of scope (Phase 5e, the final integration):
  - CLI flag / env var that opts dispatch into Stage 1 visual
  - Forcing Layer 2 build mode when visual is enabled
  - Plumbing the runStage1Visual call into dispatch.ts post-Layer-2

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dadachi added a commit that referenced this pull request May 2, 2026
Closes the integration loop. With NATIVEAPPTEMPLATE_VISUAL=1 set in
the shell, npm run dev:

  1. Runs the existing planner + workers + reviewer chain.
  2. Calls runJudge with layer2Mode: "build", forcing real
     xcodebuild build + ./gradlew assembleDebug instead of the
     fast-mode toolchain probe. The build outputs are what Stage 1
     visual judging needs.
  3. Calls runJudge with visual: { iosDir, androidDir, spec }.
     runJudge in turn calls runStage1Visual (#43) per platform:
     discoverArtifact (#42) → installAndLaunch (#39) → 3s render
     wait → captureScreenshot (#38) → runLayer3 (#37) with
     DEFAULT_STAGE1_RUBRIC (#40).
  4. Aggregates Layer 1 + Layer 2 + Layer 3 into JudgeResult.
     overallPass requires all three to pass; visual failures DO
     fail the run (matches PLAN.md "a run that green-builds without
     passing Layer 3 is a failed run").

Without the flag set, behavior is unchanged from #43: Layer 2 in
fast mode, Layer 3 skipped, summary reads
"Layer 1 3/3 pass · Layer 2 3/3 pass · Layer 3 skipped".

Refactors the JudgeInput.visual shape from per-platform pre-resolved
configs ({artifactPath, bundleId} pairs) to outDir-based discovery
({iosDir, androidDir}). runJudge.runVisualPhase now delegates to
runStage1Visual which does discovery + visual-judge atomically.

Latency: a cold build + judge run adds ~60s (iOS) and ~120-180s
(Android) on top of the fast-mode baseline. Hot rebuilds are much
faster but vary with substrate caches.

Recommendations covered:
  - Trigger: env var NATIVEAPPTEMPLATE_VISUAL=1 (canonical stem
    per the post-#30 convention; keep stub flags' rename for a
    follow-up PR).
  - Build coupling: visual implies Layer 2 build mode.
  - Failure semantics: visual failures fail the run.
  - Render wait: 3s default for both platforms (in
    DEFAULT_RENDER_WAIT_MS, configurable via runVisualJudge input).
  - Per-platform: judge both when visual enabled — discovery
    returns null gracefully if either platform's build is missing.

Tests: 16/16 npm run ci green.
README.md gains an "Optional flags" subsection documenting the
trigger and its latency cost.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant