Skip to content

feat: mobile QA via Revyl cloud devices + browse-mobile Appium driver#604

Open
TenzinDhonyoe wants to merge 34 commits intogarrytan:mainfrom
TenzinDhonyoe:TenzinDhonyoe/fix-mobile-qa-bugs
Open

feat: mobile QA via Revyl cloud devices + browse-mobile Appium driver#604
TenzinDhonyoe wants to merge 34 commits intogarrytan:mainfrom
TenzinDhonyoe:TenzinDhonyoe/fix-mobile-qa-bugs

Conversation

@TenzinDhonyoe
Copy link
Copy Markdown

Summary

Adds full mobile QA support to the /qa and /qa-only skills. Claude Code can now test iOS and Android apps on real cloud-hosted devices via Revyl, with a local Appium + iOS Simulator fallback.

What's new for users:

  • Run /qa on any React Native / mobile project and it automatically detects mobile, provisions a cloud device, builds your app, uploads it, and runs a full QA pass — zero manual setup
  • Revyl AI-grounded targeting — tap/swipe/type using natural language descriptions (--target "the Sign In button") instead of brittle element selectors
  • Dev loop mode — Metro + Cloudflare tunnel for ~2s hot-reload fix cycles on cloud devices (vs ~5min full rebuilds)
  • Fully automated — auto-configures all Claude Code permissions on first run so QA runs without any approval prompts
  • Falls back to local Appium + iOS Simulator if Revyl CLI isn't installed

Key Components

browse-mobile/ — New Appium-backed mobile driver

  • Pure HTTP Appium client (zero npm deps — replaced webdriverio)
  • W3C WebDriver spec compliant (/execute/sync endpoint)
  • Reference system for element targeting (@e1, @e2, etc.)
  • Self-contained binary: JS bundle + shell launcher
  • Comprehensive test suite (smoke, ref-system, server, setup-check)

scripts/gen-skill-docs.ts — QA template generator updates

  • generateBrowseMobileSetup() — detects mobile projects, checks Revyl CLI, provisions devices
  • generateQAMethodology() — full Revyl interaction flow with AI-grounded targeting
  • Auto-permission configuration (90+ bash permission patterns for fully automated QA)
  • Host-aware path resolution (works on both Claude Code and Codex)
  • Debug builds by default (faster than Release, usually already cached in DerivedData)
  • Smart HMR diagnostic handling (don't kill working dev loops on [hmr] Metro health: FAILED warnings)
  • Fast-fail DNS checks (single 15s check, 3 attempts — no retry of entire dev loop)

Template changes

  • qa/SKILL.md.tmpl + qa-only/SKILL.md.tmpl — mobile QA integration
  • SKILL.md.tmpl (root) — documents mobile QA capabilities
  • All generated SKILL.md files regenerated

Bug Fixes (from 7 rounds of live QA testing)

Round Issue Fix
1 Wrong Revyl CLI commands (swipe, type) Corrected to revyl device swipe/type
1 Metro port conflict Detect and kill existing Metro before dev loop
2 Tunnel DNS resolution failures Added retry + DNS check with fast-fail
2 Stale cached builds on dead tunnels Detect and rebuild
3 Auth/permissions blocking first-time users Auto-configure all permissions
3 Xcode detection failures Better detection logic
4 Release builds too slow for QA Switch to Debug builds (often already cached)
4 Tunnel DNS retry too slow Single 15s check, 3 attempts, no full-loop retry
5 Priority flip removed Revyl advantage Reverted — Revyl stays preferred (AI targeting > local speed)
6 HMR warnings killed working dev loops Narrowed grep to fatal errors only (fatal|panic|exited with)
7 Every bash command prompted for approval 90+ comprehensive permission patterns

Also Included (from merged PRs on this branch)

  • /cso v2 — infrastructure-first security audit (secrets archaeology, supply chain, CI/CD)
  • Codex 1024-char description limit enforcement + auto-heal stale installs
  • zsh glob compatibility fix for skill preambles
  • /review/ship handoff fix

Files Changed

  • 85 files changed, ~9,500 lines added, ~500 removed
  • New: browse-mobile/ (5 source files, 5 test files, built binary)
  • Modified: scripts/gen-skill-docs.ts (+600 lines), all SKILL.md files regenerated
  • New tests: test/skill-e2e-cso.test.ts, test/gen-skill-docs.test.ts additions

How to Try It

# Install gstack from this branch
cd ~/.claude/skills && git clone https://github.com/TenzinDhonyoe/gstack.git -b feat/browse-mobile && cd gstack && bun install && bun run build

# Then in any React Native project:
# /qa

Requires: Revyl CLI for cloud devices, or Xcode + Appium for local fallback.

Test Plan

  • bun test passes (skill validation, gen-skill-docs quality, browse integration)
  • 7 rounds of live mobile QA testing on real React Native app
  • Revyl cloud device provisioning, screenshot, tap, swipe, type all verified
  • Dev loop with Metro + Cloudflare tunnel verified
  • HMR warning handling verified (no false kills)
  • Auto-permission configuration verified (zero approval prompts)
  • Codex path transformation verified (no ~/.claude/ leaks in Codex output)
  • bun run test:evals (LLM judge + E2E)

🤖 Generated with Claude Code

TenzinDhonyoe and others added 30 commits March 23, 2026 12:59
New module that implements the same HTTP command protocol as browse/
but backed by Appium WebDriver for mobile app automation. Enables
/qa to test Expo/React Native apps on iOS Simulator.

Key components:
- ref-system.ts: Parse Appium XML accessibility tree into @e refs
- mobile-driver.ts: WebDriverIO wrapper with click, fill, screenshot, snapshot
- server.ts: HTTP server (same protocol as browse — bearer auth, state file)
- cli.ts: CLI entry point + setup-check for dependency validation
- platform/ios.ts: iOS Simulator boot, device listing, app management

Tested against real Expo app (Gluco) — snapshot, click, fill, screenshot
all working. 43 tests passing, 0 failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
QA skills now auto-detect Expo/React Native projects and switch to
mobile mode. When app.json is found and browse-mobile is available:
- Automatically starts Appium if not running
- Boots iOS Simulator if needed
- Builds/installs app if not on simulator
- Navigates through Expo dev launcher to actual app
- Uses $BM instead of $B for all browse commands
- Falls back to ~"Label" selector for RN components missing accessibilityRole
- Flags missing accessibility props as QA findings

Web QA behavior is completely unchanged — mobile branches are gated
on detection.

Files changed:
- scripts/gen-skill-docs.ts: BROWSE_MOBILE_SETUP placeholder + mobile
  detection in QA methodology + Expo/RN framework guidance
- qa/SKILL.md.tmpl: mobile setup block + platform parameter
- qa-only/SKILL.md.tmpl: same mobile additions (report-only)
- SKILL.md.tmpl: Mobile Testing section with $BM command reference
- TODOS.md: 3 new items from eng review

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The compiled binary is 58MB (bundles entire Bun runtime + webdriverio).
Same pattern as browse/dist/ which is already gitignored.
Users build it locally via: bun build --compile browse-mobile/src/cli.ts --outfile browse-mobile/dist/browse-mobile

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The compiled binary couldn't find server.ts when deployed outside the
gstack repo. Now the CLI spawns itself with --server flag to run the
server in-process, same pattern as browse/. Works both in dev mode
(bun run cli.ts) and as compiled binary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…iled binary

Three fixes:
1. Switch from bun --compile (can't resolve webdriverio transitive deps)
   to bun build (JS bundle) + shell launcher script. 3.2MB bundle vs 58MB
   binary, and all npm deps resolve correctly at runtime.
2. Filter --server from process.argv in server.ts so bundle ID isn't
   clobbered when CLI spawns itself in server mode.
3. CLI finds the bundled cli.js relative to itself, works from any directory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug 1: handleCommand() threw immediately if not connected. Now it
auto-reconnects to Appium when the first command arrives, handling
the common case where WDA takes 30-60s to compile on first session.

Bug 2: CLI didn't pass BROWSE_MOBILE_BUNDLE_ID env var when spawning
the server subprocess. Now extracts bundle ID from goto app://... and
forwards it so the Appium session is created with the correct app.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrote mobile-driver.ts to use raw fetch() for all Appium WebDriver
protocol calls instead of webdriverio. This eliminates the transitive
dependency bundling problem permanently.

Results:
- Bundle: 119KB (was 3.2MB with webdriverio)
- Dependencies: 0 npm packages (was webdriverio + 230 transitive deps)
- All Appium commands work via W3C WebDriver REST protocol over HTTP

Also fixed:
- CLI timeout: 180s for goto (Appium connect), 60s for other commands
- Removed webdriverio from package.json

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
/execute returns 404 on Appium — the correct W3C route is /execute/sync.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When /qa detects a mobile project for the first time, it checks if
browse-mobile bash permissions exist in the user's settings.json.
If not, offers to add them — one-time setup that enables fully
automated mobile QA without per-command approval prompts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Expanded permission patterns to cover inline bash (SID=..., curl -X
   POST, JAVA_HOME=...) that the QA skill generates. Previous patterns
   only matched commands starting with $BM.

2. Added speed guidance: batch multiple $BM commands in single bash calls
   using && instead of separate tool calls. Take screenshots at milestones
   only, not after every tap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
browseDir is ~/.claude/skills/gstack/browse/dist — need ../../ to reach
the gstack root, not ../ which only goes up to browse/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…p command

Three fixes:
1. Changed ~"Label" to label:Label syntax — the ~ was being interpreted
   by zsh as home directory expansion, breaking accessibility label clicks.
2. Added tap <x> <y> command for coordinate-based tapping when elements
   can't be found by ref or label.
3. Updated all skill templates and help text to use new label: syntax.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds Revyl as a second mobile QA backend alongside browse-mobile (Appium).
When Revyl is authenticated, /qa and /qa-only prefer cloud devices over
local simulator — no Xcode/Appium/Java setup needed.

Changes:
- Revyl auth detection in browse-mobile setup
- Full Revyl QA path: init → app detection → dev loop (with tunnel
  verification + 30s timeout) → static fallback → build caching →
  device provisioning → command mapping
- YAML validation + auto-fix after revyl init (known CLI bug)
- App-id auto-detection with AskUserQuestion for ambiguous matches
- Mobile auth strategy (sign-up attempt, credential request, Apple
  Sign-In scope limitation)
- Mobile exploration checklist (8 items: transitions, scroll, keyboard,
  back nav, empty/loading states, orientation, accessibility)
- Fix Rule 5 contradiction: scoped "never read source" to testing phases
- Batch re-verification for mobile fixes (rebuild once after all fixes)
- Mobile QA timing expectations in setup section
- 3 new TODOs: Revyl E2E test, /browse Revyl integration, Android support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Revyl is available as MCP tools (start_device_session, screenshot,
device_tap, etc.), not a CLI binary. The bash-based `revyl auth status`
check always failed because there's no `revyl` in PATH.

Now the skill tells Claude to check for Revyl MCP tool availability
directly — if the tools exist in the conversation context, always
use Revyl for mobile QA.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The revyl CLI is installed on the user's machine — detection should
check `command -v revyl` in bash. Previous commit wrongly switched
to MCP tool detection which doesn't work in bash context.

Now: if `revyl` CLI exists in PATH → REVYL_READY, always preferred
over Appium. Auth status printed for diagnostics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a mobile project is detected but revyl CLI isn't installed,
AskUserQuestion now tells the user how to install it and offers
three options: install now, use local Appium, or skip mobile QA.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The skill templates used `revyl screenshot` but the actual CLI command
is `revyl device screenshot --out <path>`. All device interaction lives
under the `device` subcommand. Also adds --out flag for explicit output
path control.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Static mode fallback works perfectly — this is a DX improvement for
reusing an existing Metro process instead of starting a conflicting one.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bundle IDs and simulator UDIDs are passed to shell commands via string
interpolation. Validate they don't contain shell metacharacters to
prevent command injection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…llowing

- DRY: pointer action construction was duplicated 4x (performClick,
  tapCoordinates, fill coordinate fallback, scroll). Extract tapAction()
  and swipeAction() helpers.
- findElement() now distinguishes "no such element" (returns null) from
  actual errors like timeouts and network failures (rethrows).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Alive

- server.ts: tap command now validates args are valid numbers before
  passing to tapCoordinates, preventing silent NaN propagation.
- cli.ts: isPidAlive now returns true for EPERM (process exists but
  different user), false only for ESRCH (process doesn't exist).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
browse-mobile source changes now trigger QA evals and the new
browse-mobile-basic test category. Rebuilt dist with all fixes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- swipe: add --x 220 --y 500 (required start coordinates)
- type: add --target param (single command, no separate tap needed)
- dev loop: detect existing Metro on :8081, verify it's node/metro
  before killing to avoid port conflict with Revyl
- Update all command references across gen-skill-docs.ts and both
  qa/qa-only templates for consistency
- Add TODO for Revyl command table validation test (P2)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Auto-detect Revyl auth status and run `revyl auth login` if needed
  instead of passive prose instruction
- Add Revyl permissions to Claude Code settings (Step 0) so commands
  don't trigger 30-50 permission prompts per QA session
- Detect Xcode before attempting local build; try EAS cloud build as
  fallback; give clear guidance if neither is available
- Add cost/billing note for Revyl cloud device sessions
- Add TODO for headless/CI auth environments (P3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…l QA

Real-world QA testing revealed 6 issues:
1. revyl dev start reports "ready" with broken tunnel — now parse HMR
   diagnostics and fall back to static mode if all checks fail
2. App loads from cached build with no hot reload — now detect and warn
3. Background process polling was undocumented — add explicit 5s poll loop
4. revyl dev stop doesn't exist — document kill procedure
5. Session times out during fix phases — add keepalive guidance
6. Permission check was weak (grep count) — now checks specific patterns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… cache

When HMR diagnostics fail but the app still launches, compare the on-device
build's git SHA against HEAD. If they differ, explicitly warn that testing
is on stale code and force static mode rebuild. This catches the most
dangerous failure mode: app appears to work but recent changes are invisible.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cloudflare tunnel DNS is inherently racy — first attempt often fails.
Now the skill retries once (kill → wait 5s → restart) before falling
back to static mode. Also adds direct DNS resolution check via nslookup
before HTTP polling, which catches the root cause faster than waiting
for curl timeouts. The flow is now: attempt 1 → verify HMR + DNS →
if broken, retry → attempt 2 → if still broken, stale build check →
static fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace AskUserQuestion permission prompts with automatic setup.
Both /qa and /qa-only now auto-add a comprehensive set of allow
rules to ~/.claude/settings.json on first run, covering browse,
revyl, appium, git, curl, and all other commands used during QA.
Uses a marker comment to only run once. Also expanded the Revyl
permission list to include nslookup, xcode-select, npx eas, and
other commands added in recent fixes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three fixes from live mobile QA testing:

1. Priority flip: check local simulator first (0s setup), then
   DerivedData Debug build (~30s), then Revyl cloud devices. Solo
   devs with the app already running skip Revyl entirely.

2. Fast-fail tunnel DNS: single 15s DNS check instead of 120s x2
   retry loop. If tunnel is dead, fall back immediately instead of
   burning 4+ minutes.

3. Debug builds instead of Release: much faster to build, likely
   already cached in DerivedData from normal dev work. Release
   builds are unnecessary for QA testing.

Net effect: mobile QA setup drops from ~10 min to ~30s for devs
with local tooling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reverts priority flip (local sim first) — Revyl's AI-grounded targeting
is too valuable to skip. Keeps fast-fail DNS (15s) and Debug builds.

Also fixes ~/.claude/ path leaking into Codex-generated SKILL.md files:
- Settings path now transformed to ~/.codex/ during codex generation
- Browse-mobile permission uses ctx.paths.skillRoot
- Single host-aware cat permission entry

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TenzinDhonyoe and others added 4 commits March 27, 2026 18:40
The polling grep matched "failed" in HMR diagnostic lines like
"[hmr] Metro health: FAILED" and treated them as fatal errors,
killing a working dev loop that was still provisioning the device.

Now only fatal errors (panic, process died, ENOSPC) trigger
DEV_LOOP_FAILED. HMR warnings emit DEV_LOOP_HMR_WARNING instead —
the device continues provisioning and loads from the cached build.
Hot reload is degraded but QA testing can proceed immediately.

This was the root cause of the 10-minute wasted setup: the skill
killed the process twice over non-fatal warnings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The QA skill's auto-configure step was missing permissions for variable
assignments (METRO_PID=, TUNNEL_URL=, etc.), shell constructs (for, if,
[), and common tools (echo, ps, sed, head, etc.). Commands starting with
these prefixes would prompt for approval, breaking automation.

Added ~60 new permission patterns covering all commands used in the QA
and Revyl mobile flows.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…solver system

Upstream refactored gen-skill-docs.ts into scripts/resolvers/ modules.
Port our mobile QA code into the new architecture:
- Create scripts/resolvers/mobile-qa.ts (BROWSE_MOBILE_SETUP + mobile QA sections)
- Inject mobile sections into generateQAMethodology via generateMobileQASections()
- Register BROWSE_MOBILE_SETUP in resolver index
- Fix codex path leak: add catch-all ~/.claude/ → ~/.codex/ replacement
- Fix zsh glob safety: use find instead of ls for variant-*.png
- Sync package.json version to 0.13.2.0 matching VERSION file
- Add browse-mobile-basic to E2E_TIERS
- Resolve .gitignore, package.json, touchfiles.ts conflicts (both sides)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Conflicts:
- design-shotgun/SKILL.md.tmpl: both sides fixed the same zsh glob
  safety issue differently (we used `find`, upstream used `setopt` guard).
  Kept `find` — avoids needing the setopt workaround entirely.
- design-shotgun/SKILL.md: generated file, same resolution as template.
- package.json: version 0.13.2.0 (ours) vs 0.13.3.0 (upstream).
  Took upstream's 0.13.3.0 since it's the newer release.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant