feat: mobile QA via Revyl cloud devices + browse-mobile Appium driver by TenzinDhonyoe · Pull Request #604 · garrytan/gstack

TenzinDhonyoe · 2026-03-28T17:21:04Z

Summary

Adds full mobile QA support to the /qa and /qa-only skills. Claude Code can now test iOS and Android apps on real cloud-hosted devices via Revyl, with a local Appium + iOS Simulator fallback.

What's new for users:

Run /qa on any React Native / mobile project and it automatically detects mobile, provisions a cloud device, builds your app, uploads it, and runs a full QA pass — zero manual setup
Revyl AI-grounded targeting — tap/swipe/type using natural language descriptions (--target "the Sign In button") instead of brittle element selectors
Dev loop mode — Metro + Cloudflare tunnel for ~2s hot-reload fix cycles on cloud devices (vs ~5min full rebuilds)
Fully automated — auto-configures all Claude Code permissions on first run so QA runs without any approval prompts
Falls back to local Appium + iOS Simulator if Revyl CLI isn't installed

Key Components

`browse-mobile/` — New Appium-backed mobile driver

Pure HTTP Appium client (zero npm deps — replaced webdriverio)
W3C WebDriver spec compliant (/execute/sync endpoint)
Reference system for element targeting (@e1, @e2, etc.)
Self-contained binary: JS bundle + shell launcher
Comprehensive test suite (smoke, ref-system, server, setup-check)

`scripts/gen-skill-docs.ts` — QA template generator updates

generateBrowseMobileSetup() — detects mobile projects, checks Revyl CLI, provisions devices
generateQAMethodology() — full Revyl interaction flow with AI-grounded targeting
Auto-permission configuration (90+ bash permission patterns for fully automated QA)
Host-aware path resolution (works on both Claude Code and Codex)
Debug builds by default (faster than Release, usually already cached in DerivedData)
Smart HMR diagnostic handling (don't kill working dev loops on [hmr] Metro health: FAILED warnings)
Fast-fail DNS checks (single 15s check, 3 attempts — no retry of entire dev loop)

Template changes

qa/SKILL.md.tmpl + qa-only/SKILL.md.tmpl — mobile QA integration
SKILL.md.tmpl (root) — documents mobile QA capabilities
All generated SKILL.md files regenerated

Bug Fixes (from 7 rounds of live QA testing)

Round	Issue	Fix
1	Wrong Revyl CLI commands (`swipe`, `type`)	Corrected to `revyl device swipe/type`
1	Metro port conflict	Detect and kill existing Metro before dev loop
2	Tunnel DNS resolution failures	Added retry + DNS check with fast-fail
2	Stale cached builds on dead tunnels	Detect and rebuild
3	Auth/permissions blocking first-time users	Auto-configure all permissions
3	Xcode detection failures	Better detection logic
4	Release builds too slow for QA	Switch to Debug builds (often already cached)
4	Tunnel DNS retry too slow	Single 15s check, 3 attempts, no full-loop retry
5	Priority flip removed Revyl advantage	Reverted — Revyl stays preferred (AI targeting > local speed)
6	HMR warnings killed working dev loops	Narrowed grep to fatal errors only (`fatal\|panic\|exited with`)
7	Every bash command prompted for approval	90+ comprehensive permission patterns

Also Included (from merged PRs on this branch)

/cso v2 — infrastructure-first security audit (secrets archaeology, supply chain, CI/CD)
Codex 1024-char description limit enforcement + auto-heal stale installs
zsh glob compatibility fix for skill preambles
/review → /ship handoff fix

Files Changed

85 files changed, ~9,500 lines added, ~500 removed
New: browse-mobile/ (5 source files, 5 test files, built binary)
Modified: scripts/gen-skill-docs.ts (+600 lines), all SKILL.md files regenerated
New tests: test/skill-e2e-cso.test.ts, test/gen-skill-docs.test.ts additions

How to Try It

# Install gstack from this branch
cd ~/.claude/skills && git clone https://github.com/TenzinDhonyoe/gstack.git -b feat/browse-mobile && cd gstack && bun install && bun run build

# Then in any React Native project:
# /qa

Requires: Revyl CLI for cloud devices, or Xcode + Appium for local fallback.

Test Plan

bun test passes (skill validation, gen-skill-docs quality, browse integration)
7 rounds of live mobile QA testing on real React Native app
Revyl cloud device provisioning, screenshot, tap, swipe, type all verified
Dev loop with Metro + Cloudflare tunnel verified
HMR warning handling verified (no false kills)
Auto-permission configuration verified (zero approval prompts)
Codex path transformation verified (no ~/.claude/ leaks in Codex output)
bun run test:evals (LLM judge + E2E)

🤖 Generated with Claude Code

@e

New module that implements the same HTTP command protocol as browse/ but backed by Appium WebDriver for mobile app automation. Enables /qa to test Expo/React Native apps on iOS Simulator. Key components: - ref-system.ts: Parse Appium XML accessibility tree into @e refs - mobile-driver.ts: WebDriverIO wrapper with click, fill, screenshot, snapshot - server.ts: HTTP server (same protocol as browse — bearer auth, state file) - cli.ts: CLI entry point + setup-check for dependency validation - platform/ios.ts: iOS Simulator boot, device listing, app management Tested against real Expo app (Gluco) — snapshot, click, fill, screenshot all working. 43 tests passing, 0 failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

QA skills now auto-detect Expo/React Native projects and switch to mobile mode. When app.json is found and browse-mobile is available: - Automatically starts Appium if not running - Boots iOS Simulator if needed - Builds/installs app if not on simulator - Navigates through Expo dev launcher to actual app - Uses $BM instead of $B for all browse commands - Falls back to ~"Label" selector for RN components missing accessibilityRole - Flags missing accessibility props as QA findings Web QA behavior is completely unchanged — mobile branches are gated on detection. Files changed: - scripts/gen-skill-docs.ts: BROWSE_MOBILE_SETUP placeholder + mobile detection in QA methodology + Expo/RN framework guidance - qa/SKILL.md.tmpl: mobile setup block + platform parameter - qa-only/SKILL.md.tmpl: same mobile additions (report-only) - SKILL.md.tmpl: Mobile Testing section with $BM command reference - TODOS.md: 3 new items from eng review Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The compiled binary is 58MB (bundles entire Bun runtime + webdriverio). Same pattern as browse/dist/ which is already gitignored. Users build it locally via: bun build --compile browse-mobile/src/cli.ts --outfile browse-mobile/dist/browse-mobile Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The compiled binary couldn't find server.ts when deployed outside the gstack repo. Now the CLI spawns itself with --server flag to run the server in-process, same pattern as browse/. Works both in dev mode (bun run cli.ts) and as compiled binary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…iled binary Three fixes: 1. Switch from bun --compile (can't resolve webdriverio transitive deps) to bun build (JS bundle) + shell launcher script. 3.2MB bundle vs 58MB binary, and all npm deps resolve correctly at runtime. 2. Filter --server from process.argv in server.ts so bundle ID isn't clobbered when CLI spawns itself in server mode. 3. CLI finds the bundled cli.js relative to itself, works from any directory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Bug 1: handleCommand() threw immediately if not connected. Now it auto-reconnects to Appium when the first command arrives, handling the common case where WDA takes 30-60s to compile on first session. Bug 2: CLI didn't pass BROWSE_MOBILE_BUNDLE_ID env var when spawning the server subprocess. Now extracts bundle ID from goto app://... and forwards it so the Appium session is created with the correct app. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rewrote mobile-driver.ts to use raw fetch() for all Appium WebDriver protocol calls instead of webdriverio. This eliminates the transitive dependency bundling problem permanently. Results: - Bundle: 119KB (was 3.2MB with webdriverio) - Dependencies: 0 npm packages (was webdriverio + 230 transitive deps) - All Appium commands work via W3C WebDriver REST protocol over HTTP Also fixed: - CLI timeout: 180s for goto (Appium connect), 60s for other commands - Removed webdriverio from package.json Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

/execute returns 404 on Appium — the correct W3C route is /execute/sync. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When /qa detects a mobile project for the first time, it checks if browse-mobile bash permissions exist in the user's settings.json. If not, offers to add them — one-time setup that enables fully automated mobile QA without per-command approval prompts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

1. Expanded permission patterns to cover inline bash (SID=..., curl -X POST, JAVA_HOME=...) that the QA skill generates. Previous patterns only matched commands starting with $BM. 2. Added speed guidance: batch multiple $BM commands in single bash calls using && instead of separate tool calls. Take screenshots at milestones only, not after every tap. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

browseDir is ~/.claude/skills/gstack/browse/dist — need ../../ to reach the gstack root, not ../ which only goes up to browse/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…p command Three fixes: 1. Changed ~"Label" to label:Label syntax — the ~ was being interpreted by zsh as home directory expansion, breaking accessibility label clicks. 2. Added tap <x> <y> command for coordinate-based tapping when elements can't be found by ref or label. 3. Updated all skill templates and help text to use new label: syntax. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds Revyl as a second mobile QA backend alongside browse-mobile (Appium). When Revyl is authenticated, /qa and /qa-only prefer cloud devices over local simulator — no Xcode/Appium/Java setup needed. Changes: - Revyl auth detection in browse-mobile setup - Full Revyl QA path: init → app detection → dev loop (with tunnel verification + 30s timeout) → static fallback → build caching → device provisioning → command mapping - YAML validation + auto-fix after revyl init (known CLI bug) - App-id auto-detection with AskUserQuestion for ambiguous matches - Mobile auth strategy (sign-up attempt, credential request, Apple Sign-In scope limitation) - Mobile exploration checklist (8 items: transitions, scroll, keyboard, back nav, empty/loading states, orientation, accessibility) - Fix Rule 5 contradiction: scoped "never read source" to testing phases - Batch re-verification for mobile fixes (rebuild once after all fixes) - Mobile QA timing expectations in setup section - 3 new TODOs: Revyl E2E test, /browse Revyl integration, Android support Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Revyl is available as MCP tools (start_device_session, screenshot, device_tap, etc.), not a CLI binary. The bash-based `revyl auth status` check always failed because there's no `revyl` in PATH. Now the skill tells Claude to check for Revyl MCP tool availability directly — if the tools exist in the conversation context, always use Revyl for mobile QA. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The revyl CLI is installed on the user's machine — detection should check `command -v revyl` in bash. Previous commit wrongly switched to MCP tool detection which doesn't work in bash context. Now: if `revyl` CLI exists in PATH → REVYL_READY, always preferred over Appium. Auth status printed for diagnostics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When a mobile project is detected but revyl CLI isn't installed, AskUserQuestion now tells the user how to install it and offers three options: install now, use local Appium, or skip mobile QA. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The skill templates used `revyl screenshot` but the actual CLI command is `revyl device screenshot --out <path>`. All device interaction lives under the `device` subcommand. Also adds --out flag for explicit output path control. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Static mode fallback works perfectly — this is a DX improvement for reusing an existing Metro process instead of starting a conflicting one. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Bundle IDs and simulator UDIDs are passed to shell commands via string interpolation. Validate they don't contain shell metacharacters to prevent command injection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…llowing - DRY: pointer action construction was duplicated 4x (performClick, tapCoordinates, fill coordinate fallback, scroll). Extract tapAction() and swipeAction() helpers. - findElement() now distinguishes "no such element" (returns null) from actual errors like timeouts and network failures (rethrows). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…Alive - server.ts: tap command now validates args are valid numbers before passing to tapCoordinates, preventing silent NaN propagation. - cli.ts: isPidAlive now returns true for EPERM (process exists but different user), false only for ESRCH (process doesn't exist). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

browse-mobile source changes now trigger QA evals and the new browse-mobile-basic test category. Rebuilt dist with all fixes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- swipe: add --x 220 --y 500 (required start coordinates) - type: add --target param (single command, no separate tap needed) - dev loop: detect existing Metro on :8081, verify it's node/metro before killing to avoid port conflict with Revyl - Update all command references across gen-skill-docs.ts and both qa/qa-only templates for consistency - Add TODO for Revyl command table validation test (P2) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Auto-detect Revyl auth status and run `revyl auth login` if needed instead of passive prose instruction - Add Revyl permissions to Claude Code settings (Step 0) so commands don't trigger 30-50 permission prompts per QA session - Detect Xcode before attempting local build; try EAS cloud build as fallback; give clear guidance if neither is available - Add cost/billing note for Revyl cloud device sessions - Add TODO for headless/CI auth environments (P3) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…l QA Real-world QA testing revealed 6 issues: 1. revyl dev start reports "ready" with broken tunnel — now parse HMR diagnostics and fall back to static mode if all checks fail 2. App loads from cached build with no hot reload — now detect and warn 3. Background process polling was undocumented — add explicit 5s poll loop 4. revyl dev stop doesn't exist — document kill procedure 5. Session times out during fix phases — add keepalive guidance 6. Permission check was weak (grep count) — now checks specific patterns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… cache When HMR diagnostics fail but the app still launches, compare the on-device build's git SHA against HEAD. If they differ, explicitly warn that testing is on stale code and force static mode rebuild. This catches the most dangerous failure mode: app appears to work but recent changes are invisible. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Cloudflare tunnel DNS is inherently racy — first attempt often fails. Now the skill retries once (kill → wait 5s → restart) before falling back to static mode. Also adds direct DNS resolution check via nslookup before HTTP polling, which catches the root cause faster than waiting for curl timeouts. The flow is now: attempt 1 → verify HMR + DNS → if broken, retry → attempt 2 → if still broken, stale build check → static fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace AskUserQuestion permission prompts with automatic setup. Both /qa and /qa-only now auto-add a comprehensive set of allow rules to ~/.claude/settings.json on first run, covering browse, revyl, appium, git, curl, and all other commands used during QA. Uses a marker comment to only run once. Also expanded the Revyl permission list to include nslookup, xcode-select, npx eas, and other commands added in recent fixes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Three fixes from live mobile QA testing: 1. Priority flip: check local simulator first (0s setup), then DerivedData Debug build (~30s), then Revyl cloud devices. Solo devs with the app already running skip Revyl entirely. 2. Fast-fail tunnel DNS: single 15s DNS check instead of 120s x2 retry loop. If tunnel is dead, fall back immediately instead of burning 4+ minutes. 3. Debug builds instead of Release: much faster to build, likely already cached in DerivedData from normal dev work. Release builds are unnecessary for QA testing. Net effect: mobile QA setup drops from ~10 min to ~30s for devs with local tooling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Reverts priority flip (local sim first) — Revyl's AI-grounded targeting is too valuable to skip. Keeps fast-fail DNS (15s) and Debug builds. Also fixes ~/.claude/ path leaking into Codex-generated SKILL.md files: - Settings path now transformed to ~/.codex/ during codex generation - Browse-mobile permission uses ctx.paths.skillRoot - Single host-aware cat permission entry Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The polling grep matched "failed" in HMR diagnostic lines like "[hmr] Metro health: FAILED" and treated them as fatal errors, killing a working dev loop that was still provisioning the device. Now only fatal errors (panic, process died, ENOSPC) trigger DEV_LOOP_FAILED. HMR warnings emit DEV_LOOP_HMR_WARNING instead — the device continues provisioning and loads from the cached build. Hot reload is degraded but QA testing can proceed immediately. This was the root cause of the 10-minute wasted setup: the skill killed the process twice over non-fatal warnings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The QA skill's auto-configure step was missing permissions for variable assignments (METRO_PID=, TUNNEL_URL=, etc.), shell constructs (for, if, [), and common tools (echo, ps, sed, head, etc.). Commands starting with these prefixes would prompt for approval, breaking automation. Added ~60 new permission patterns covering all commands used in the QA and Revyl mobile flows. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…solver system Upstream refactored gen-skill-docs.ts into scripts/resolvers/ modules. Port our mobile QA code into the new architecture: - Create scripts/resolvers/mobile-qa.ts (BROWSE_MOBILE_SETUP + mobile QA sections) - Inject mobile sections into generateQAMethodology via generateMobileQASections() - Register BROWSE_MOBILE_SETUP in resolver index - Fix codex path leak: add catch-all ~/.claude/ → ~/.codex/ replacement - Fix zsh glob safety: use find instead of ls for variant-*.png - Sync package.json version to 0.13.2.0 matching VERSION file - Add browse-mobile-basic to E2E_TIERS - Resolve .gitignore, package.json, touchfiles.ts conflicts (both sides) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Conflicts: - design-shotgun/SKILL.md.tmpl: both sides fixed the same zsh glob safety issue differently (we used `find`, upstream used `setopt` guard). Kept `find` — avoids needing the setopt workaround entirely. - design-shotgun/SKILL.md: generated file, same resolution as template. - package.json: version 0.13.2.0 (ours) vs 0.13.3.0 (upstream). Took upstream's 0.13.3.0 since it's the newer release. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

TenzinDhonyoe and others added 30 commits March 23, 2026 12:59

fix: use /execute/sync endpoint (W3C WebDriver spec)

6ceed17

/execute returns 404 on Appium — the correct W3C route is /execute/sync. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: correct browse-mobile global path — go up 2 levels from browseDir

e94951a

browseDir is ~/.claude/skills/gstack/browse/dist — need ../../ to reach the gstack root, not ../ which only goes up to browse/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: add TODO for Revyl dev loop Metro port conflict (P3)

05bfb7b

Static mode fallback works perfectly — this is a DX improvement for reusing an existing Metro process instead of starting a conflicting one. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(security): sanitize shell arguments in ios.ts execSync calls

b954928

Bundle IDs and simulator UDIDs are passed to shell commands via string interpolation. Validate they don't contain shell metacharacters to prevent command injection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: add browse-mobile to eval touchfiles + rebuild dist

07c9da0

browse-mobile source changes now trigger QA evals and the new browse-mobile-basic test category. Rebuilt dist with all fixes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

TenzinDhonyoe and others added 4 commits March 27, 2026 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: mobile QA via Revyl cloud devices + browse-mobile Appium driver#604

feat: mobile QA via Revyl cloud devices + browse-mobile Appium driver#604
TenzinDhonyoe wants to merge 34 commits intogarrytan:mainfrom
TenzinDhonyoe:TenzinDhonyoe/fix-mobile-qa-bugs

TenzinDhonyoe commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TenzinDhonyoe commented Mar 28, 2026

Summary

Key Components

browse-mobile/ — New Appium-backed mobile driver

scripts/gen-skill-docs.ts — QA template generator updates

Template changes

Bug Fixes (from 7 rounds of live QA testing)

Also Included (from merged PRs on this branch)

Files Changed

How to Try It

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`browse-mobile/` — New Appium-backed mobile driver

`scripts/gen-skill-docs.ts` — QA template generator updates