Draft
Conversation
…-server dispatch Android is driven by the existing `simulator-server` binary through its `android --id <serial>` subcommand, which exposes the same HTTP/WebSocket/ stdin protocol as iOS. The blueprint now selects the subcommand based on the shape of the udid, so every gesture tool (gesture-tap/swipe/pinch/rotate/ custom, button, keyboard, rotate, screenshot, run-sequence) works on both platforms without callers branching. Things that can't route through simulator-server use platform-specific paths: - describe — uiautomator dump on Android, AXRuntime + native-devtools on iOS - launch-app, restart-app — `am start`/`monkey` on Android, simctl + native devtools on iOS - open-url — `am start VIEW` with shell-escaped URL on Android, simctl openurl on iOS - reinstall-app — `adb install -r` on Android (with optional -g/-d), simctl uninstall+install on iOS Adds 4 android-only tools (android-list-emulators, android-boot-emulator, android-stop-app, android-logcat) and workspace introspection for `android_application_id` and `android_has_gradle`. iOS behavior is preserved: platform dispatch gates every Android branch, and the simulator-server blueprint only calls `ensureAutomationEnabled` for iOS udids. Tests pin each preserved path (launch/restart/reinstall/open-url on iOS) against mock execFile so a future regression surfaces in CI. Covered by 40+ new repro tests including a blueprint-level test that asserts subcommand dispatch, stdio pipe behavior (the server treats stdin EOF as shutdown), AX-automation warmup, and press-key protocol invariants.
- MCP instructions now describe the unified tool surface (iOS + Android dispatch on udid shape) and list platform-specific extras. - Package descriptions updated for both platforms. - README prerequisites split by platform (Xcode for iOS, Android SDK platform tools + emulator package for Android). - Adds unified-surface assertions to auto-screenshot test so any regression in the allow-list shows up immediately.
- Adds `argent-android-emulator-setup` and `argent-android-emulator-interact` SKILLs mirroring their iOS counterparts. The interact skill documents the unified tool surface and Android-specific gotchas (Metro reachability via `adb reverse`, first-launch permission prompts, locked screen / DRM). - `argent.md` rule gains a `<platform_dispatch>` section explaining how the udid shape selects iOS vs Android internally, plus updated skill routing that points to the right platform-specific skill. - `argent-simulator-interact`, `argent-test-ui-flow`, `argent-react-native-app-workflow`, and `argent-metro-debugger` now cover both platforms (RN Metro reachability, gradle, logcat). - `argent-environment-inspector` reports Android applicationId and gradle wrapper presence so downstream workflow skills can drive `./gradlew` builds without extra inspection.
…vice `list-simulators` and `android-list-emulators` collapse into a single `list-devices` that returns iOS simulators and Android devices/emulators in one tagged array (each entry carries a `platform` discriminator), plus the available Android AVDs. Callers no longer have to know which platform to query first. `boot-simulator` and `android-boot-emulator` collapse into a single `boot-device`. Pass `udid` to boot an iOS simulator or `avdName` to launch an Android emulator — the tool picks the platform from which argument is provided and returns a tagged payload. The Android boot stages (AVD validate → spawn → adb register → wait-for-device → boot_completed → PackageManager sanity) are unchanged, including cold-boot default and cleanup-on-failure. Existing unified-surface tools (gesture-tap, describe, launch-app, etc.) continue to dispatch on the udid shape — no changes there.
…rface Tool descriptions should tell the caller what the tool does, when to use it, and what it returns — not which binary or protocol drives it. Strips references to `simulator-server`, `xcrun`, `adb`, `uiautomator`, `AXRuntime`, and `USB HID` from descriptions that the agent reads when picking a tool. The behavior itself is unchanged; implementation details stay in the code (where they belong) and in the skill docs where they actually inform workflow. Also updates `udid` parameter descriptions to point at `list-devices` as the canonical source, rather than restating the platform-shape heuristic in every tool.
…il leaks Skill files now direct the agent to `list-devices` + `boot-device` (one flow for both platforms) instead of the removed platform-specific `list- simulators` / `boot-simulator` / `android-list-emulators` / `android-boot- emulator` pairs. Also trims protocol-layer explanations from skill surfaces where they don't change the caller's behavior.
Platform detection now looks up the udid in the actual inventories from `xcrun simctl list` and `adb devices`. If the id lives in simctl's list it is iOS; if it lives in adb's list it is Android. The shape heuristic survives only as a last-resort fallback when both tools are unavailable (no Xcode AND no adb installed). This drops the 8-16 hex short form from the iOS shape pattern — that form is physical-device-only and routing it to simctl used to produce an opaque "Invalid device" error (review #8). The classifier is async. `list-devices` warms a per-udid cache so every subsequent tool call is O(1); the cache TTL is 30 s so short-lived changes on the host still propagate. Call sites that were in async context switch straight to `classifyDevice`. `launch-app` and `restart-app` move from the tool-object form to a factory (`createLaunchAppTool(registry)`) so the NativeDevtools service resolution can defer into `execute` and share the async classify call. The earlier iOS behavior — ensureEnvReady before `xcrun simctl launch`, refresh before relaunch — is unchanged and pinned by tests.
…ll surface Review findings #1 and #7. Every Android branch interpolates user-supplied `bundleId` (and sometimes `activity` / `tag`) directly into an `adb -s <serial> shell "<template>"` string that the on-device shell re-parses — so a bundleId containing `;`, backtick, `$(…)`, or `&&` gives the caller arbitrary on-device shell execution. Empty udid passed the old schema and flowed into `adb -s "" shell …`, which silently targets whichever device adb picks by default. - `bundleId` is constrained to `[A-Za-z0-9._-]+` (union of Android package grammar and the iOS bundle-id dash extension) via zod `.regex`. - `activity` on launch-app adds `/` to the safe alphabet for `pkg/.Activity`. - `tag` on android-logcat is constrained to the same safe alphabet so it cannot smuggle shell metachars into the logcat filter spec. - `udid` has `.min(1)` across every cross-platform schema. - The four Android-shell-interpolating tools (launch-app, restart-app, android-stop-app, android-logcat) also call `zodSchema.parse(params)` inside `execute` as defense-in-depth, because internal callers like `flow-run` / `flow-add-step` hit `registry.invokeTool` without running per-tool zod validation. 47 injection / empty-udid repros in the new hardening test file pin every combination: semicolons, backticks, command substitution, logical AND, pipe, newline, quote break-out.
Review findings #5, #6, #10. - Numeric character references (`&#N;` decimal, `&#xH;` hex) are now decoded alongside the five named entities. Out-of-range codepoints (past 0x10FFFF or in the surrogate pair range) are replaced with empty instead of throwing, so one bad glyph does not sink the whole describe. - Node conversion is iterative with an explicit work stack. Deeply nested hierarchies (15k-deep RecyclerView + overlays in the review) used to throw `Maximum call stack size exceeded`; now they parse cleanly. - `describe` on Android writes its dump to a per-call path under /data/local/tmp with a random suffix, and removes the file afterwards. The old fixed path (/sdcard/window_dump.xml) raced on concurrent describes of the same serial — one call's `cat` could read the other call's partial write. /data/local/tmp is world-writable on every supported Android version so the new path works where /sdcard does not under scoped storage.
#2 — `adb start-server` now runs BEFORE the `serialsBefore` snapshot. If the daemon was down pre-call, the old order snapshotted an empty list, then once adb came up every already-connected emulator looked "new" and the tool could hand back an unrelated emulator as "booted". #3 — `readAvdName` now probes `ro.boot.qemu.avd_name` (emulator release 30 / Android 11+) first, falling back to the legacy `ro.kernel.qemu.avd_name`. On modern images the legacy key is empty, so AVD-name disambiguation silently failed when two emulators booted concurrently. The helper prefers the new key when both are present. #4 — `waitForBootCompleted` accepts a `shouldAbort` callback and the wait-for-device stage races against an `earlyExitError` poller. An emulator crash between stages 2 and 4 now surfaces as its specific exit- code error within ~1 s instead of blocking for the 180 s / 300 s budgets and throwing a generic timeout. #9 — `listAvds` filter replaced prefix-based `!startsWith(INFO|HAX)` with `/^[A-Za-z0-9._-]+$/`. AVD names created by avdmanager are identifier-only, so legitimate names like `HAX-Pixel-6` or `INFO_BuildBot_Pixel7` are no longer silently dropped. #11 — `bootAndroid` pre-flights `adb version` before spawning the detached emulator. Without this, adb-missing failures orphaned the emulator child process that the user then had to kill manually. Adds a light-weight `listAndroidSerials` helper used by the classifier so a cold-classify is one `adb devices` call instead of 1 + 3N getprop round- trips through the enriched `listAndroidDevices`.
The description-quality CI runs SpiderShield against the tool-server's
extracted descriptions. Four new tools on this branch — list-devices,
boot-device, android-logcat, android-stop-app — plus the new
argent-android-emulator-interact skill were scoring below the 9.0
threshold.
Two concrete causes:
1. Tool descriptions were written as concatenated string literals
("a" + "b" + ...). `scripts/extract-tools.mjs` only captures the
first string segment, so only the opening sentence reached the
scorer. Switched each to a single template literal so the whole
description is graded.
2. The extract regex uses a non-greedy `([\s\S]*?)` against template
literals and does not understand escaped backticks (\`foo\`).
It therefore stops at the first `\`` inside a description and
drops the rest of the text. Removed the backtick-quoted code
spans from these four descriptions — single quotes read as well
and survive extraction intact.
With (1) and (2) fixed I made the remaining text carry the scoring
signals SpiderShield looks for: an imperative verb lead, a `Use when`
scenario trigger, an explicit `Returns { ... }`, and a `Fails when`
failure mode. All four tools now score 10/10 and the corpus average
clears the gate.
Skill fix is narrower: `argent-android-emulator-interact` used
`Use alongside` which doesn't match the grader's `Use when` regex.
Reworded the trigger.
Nothing functional changed. Local spidershield run: 9.11 / 10 (prev
8.73); grade-skills: 10.0 / 10.
Documents and pins concrete issues found while auditing this branch: - AUDIT #1: list-devices description claims it fails when neither Xcode nor adb is on PATH, but every sub-call is try/catch-swallowed so it silently resolves to {devices:[],avds:[]}. - AUDIT #2: iOS and Android entries share only platform+state — there is no common id/name field, so generic MCP clients cannot read an id without narrowing on platform first. - AUDIT #6a: android-logcat priority param description says Default: I, but the code pushes no priority filter when omitted (effective V). Two tests fail on this branch; the rest document current behaviour.
Three independent bugs in the Android-path code that reviewers repro'd: 1. uiautomator entity decoder double-decoded. The decoder ran numeric references as one replace pass, then each of the five named entities as its own pass. An ampersand decoded in the first pass fed straight into the second: `&lt;` (correct XML encoding of the literal string `<`) collapsed to `<`, violating XML 1.0 §4.6. Replaced with a single regex alternation so every match is consumed once. 2. `launch-app` Android path used two different launch mechanisms — a blocking `am start -W` when the caller passed an `activity`, and a fire-and-forget `monkey … LAUNCHER 1` when they didn't. The monkey path returned as soon as the intent was injected, leaving a window where describe/tap raced a still-forking process. Unified on `am start -W` by resolving the default activity up-front via `cmd package resolve-activity --brief`. Also replaced the brittle `/Error|Exception/ && !/Status: ok/` matcher with a positive match on `Status: ok` — the old regex false-succeeded on `Status: null` (activity threw in onCreate) and would have false-failed if Android ever dropped the `Status:` banner from a release that keeps benign strings like `Activity: com.example.ErrorReportingActivity` in the output. 3. `describe` Android path shell-chained cleanup with `&&`, so a failing `uiautomator dump` (keyguard, MFA flap, secure overlay) short-circuited before `rm -f` ever ran and leaked a file per attempt under /data/local/tmp. One-char fix: trailing `; rm -f` instead of `&& rm -f`. Regression tests added for all three: `&lt;` / `&lt;` / `&amp;` stay literal, `am start` success/failure permutations, and a shell-string assertion pinning the `;` before `rm -f`.
Two follow-ups to feat/android-emulator-support review:
1. Orphan emulator on stage-2 timeout (review R1#1). The emulator is
spawned with `{detached: true, stdio: "ignore"}` + `child.unref()`.
If it starts but never registers with adb within the 60s budget,
`serial` stays null and `killEmulatorQuietly(null)` is a no-op —
the emulator process keeps running and the user has to find the
PID and kill it by hand. The tool's description already promises
the opposite ("the spawned emulator is terminated so the next
retry starts clean").
Fix: retain the ChildProcess and, in all error exits before a
serial resolves, call `killDetachedEmulator(child)` which sends
SIGTERM and schedules a SIGKILL 2s later if the child ignores
the first signal (unref'd so the timer doesn't hold the
event loop open).
2. Boot success did not warm the classify cache. The next tool call
after a successful boot — typically `launch-app` or `describe` —
re-ran `xcrun simctl list` + `adb devices` to classify the same
id we just booted. Added `warmDeviceCache([{udid, platform}])`
to both the iOS and Android success paths, matching what
`list-devices` already does.
Regression test pins the SIGTERM signal on the detached child and
asserts `did not register within ...` is the surfaced error.
Four descriptions claimed behavior the code did not implement. SpiderShield
rewarded the scenario/return/error keywords these sentences added (see the
earlier `docs: tighten …` commit) but the substance drifted from reality.
Fix each so the description matches what the code does:
* list-devices — said "Fails when neither Xcode nor adb is on PATH".
It doesn't: every sub-call is try/catch-swallowed and the tool
returns `{devices:[], avds:[]}`. Rewrote to describe the actual
contract: an empty result typically means no tooling is available.
* android-logcat — said priority "Default: I." There's no default in
the code; omitting the priority leaves logcat at its own default
(verbose). Rewrote the schema describe to say so.
* android-stop-app — said "Fails when the udid is not an Android
serial". Unreachable: classifyDevice falls back to "android" for
any non-UUID string, so the actual failure for a bogus id is
adb's own "device not found", not our "not a serial" branch.
Rewrote to describe "udid not registered with adb" which is the
real failure signature.
* mcp-server instructions — claimed the unified tools "auto-dispatch
by the id's shape (UUID → iOS, anything else → Android adb serial)".
Stopped being true when classifyDevice became list-based. Rewrote
to match: "cross-references it against `xcrun simctl list` and
`adb devices`" — pass the id `list-devices` returned and the tools
resolve the platform.
Also fixes argent.md's stale reference to a nonexistent `android-describe-screen`
tool (review R1#3) — the unified `describe` already dispatches to Android
uiautomator internally.
feat/android-emulator-support removes two public MCP tool names and replaces them with new ones: boot-simulator → boot-device list-simulators → list-devices Any consumer pinned to 0.5.x and auto-updating to the tip of 0.5.x would silently lose those tool ids. 0.6.0 is the smallest pre-1.0 semver bump that signals "call surface changed, re-check your tool references" (MINOR for additive/breaking changes in 0.x.y per semver §4). native-devtools-ios is unchanged and stays at 0.5.1.
This file was committed by an earlier review-scaffolding run (dcb825d) with tests designed to FAIL while the bugs they pinned still existed. The previous three commits in this PR fixed the description-drift issues (AUDIT #1, #6a, #6b, #6c) and the tests therefore started failing the opposite way — now asserting the ABSENCE of the old buggy strings. Updates in this commit: - AUDIT #1: assertion flipped from "description promises a throw" to "description no longer promises a throw and now explicitly says it does not throw". - AUDIT #2: DESIGN-push-back, marked `describe.skip`. iOS entries exposing `udid` and Android entries exposing `serial` is the deliberate discriminated-union shape — the underlying tooling names them that way, and the mcp-server instructions explicitly tell agents to pass the id from list-devices. - AUDIT #6a: assertion flipped to expect "logcat's own default (V)" in the priority description. - AUDIT #6b: assertion flipped to expect "cross-referencing it against" (the list-based dispatch phrasing) in mcp-server.ts. - AUDIT #6c: assertion flipped to expect "not registered with adb" in place of the unreachable "not an Android serial" branch. AUDIT #3, #5, #7, #8 are unchanged — each still passes on the current code as before.
This reverts commit c3c669f.
Both `native-devtools` and `ios-profiler-session` blueprints reached deep into simctl / launchctl / xctrace before noticing they'd been handed a target the underlying tooling can't drive. On iOS-only setups that was fine — every udid classified as iOS. With Android serials now appearing in list-devices, an agent that feeds `emulator-5554` to `native-describe-screen` used to surface as an opaque socket/launchctl failure; similarly an ios-profiler call against an Android serial produced an xctrace error from further down the stack. Gate both blueprint factories with a one-line classifyDevice check and throw a specific "iOS-only" error that points the caller at list-devices. Covers ~10 tools at the blueprint boundary instead of adding per-tool asserts. Regression test asserts: the gate rejects Android-classified udids with the platform-specific error for each blueprint, and does NOT false-positive when given an iOS-classified udid.
…romise.all Two small follow-ups to the review: 1. Description leakage. Commit 05a6194 set out to remove binary-name references (xcrun / adb / emulator / am / pidof / etc.) from tool descriptions, on the rationale that an agent picking a tool shouldn't care which CLI drives it. The SpiderShield tightening pass (47b1503) reintroduced some of those names to satisfy the "Fails when..." keyword check. Rewording the relevant clauses to satisfy both goals: boot-device "xcrun / emulator / adb is missing from PATH" → "the required platform developer tooling is missing" list-devices "xcode-select, Android platform-tools" → "the platform SDK is not installed" android-stop-app "equivalent to am force-stop" → "force-stops the process and its background services" + "udid is not registered with adb" → "udid is not in list-devices" android-logcat "resolved via pidof" → "resolved to the app's PID" + "not an Android serial" → "not in list-devices" Local SpiderShield: 9.11 / 10 (unchanged). 2. describe-android-race test now uses Promise.all instead of two sequential awaits. The test existed to guard against the shared-path regression, but sequential awaits hide it — the first call completes before the second starts, so even a constant-path impl would pass. `Promise.all` makes the regression actually reachable. Also flipped the corresponding AUDIT #6c assertion in android-emulator-support_audit.test.ts to match the new description wording ("not in list-devices") after the rename above.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TBD - this project is still in the really early stages of development
Just works thanks to either sim server or metro:
gesture-*button,keyboardscreenshotrun-sequence,flow-*- just programaticAdjusted to work on android too:
describeuiautomator dump→DescribeNodeAXServiceApilaunch-appcmd package resolve-activity+am start -Wsimctl launchrestart-appam force-stop+am start -Wsimctl terminate+simctl launchreinstall-appadb install -r(+-g/-d)simctl uninstall+simctl installopen-urlam start -a VIEW -d <url>simctl openurllist-devicesadb devices+emulator -list-avdsxcrun simctl list --jsonAndroid-only tools:
android-logcat,android-stop-appDoesn't work on android (yet):
boot-device"works" but it's cold booting every time:emulator -avd+ 5-stage wait (adb-register → wait-for-device →sys.boot_completed→pm path android)rotate"works" thanks to sim server but has some issues (i need to dig into this more)native-devtools-*(7 total)ios-profiler-*(3 total)pasteI have to double check:
debugger-*react-profiler-*profiler-*