AI-powered Android UI test automation platform — write test cases in plain English (or import from xmind / Markdown), have an AI agent run them on real devices, and get visual replay reports. Doubles as a general phone-automation tool with self-learning replay.
English · 简体中文
example-hq.mp4
The demo above shows three panels recorded simultaneously for the same test case:
- Left — Phone camera: physical proof. The device sits on a stand with no hands in frame; "Show Touches" is enabled so every synthetic tap appears as a white dot on the screen.
- Middle — Backend log + Web UI report: live
uvicornoutput (agent thoughts, JSON-RPC calls liketap_element/screenshot, verifier verdicts) on top, with the test management page (step replay + per-step reasoning + pass/fail verdict) below. - Right — Phone screen mirror: the actual UI as seen by the AI agent.
The whole sequence is unedited — what you see is the agent operating the device end-to-end based on a single plain-language test case.
No USB cable in this demo. The phone talks to the server over WiFi only, via the Portal App's reverse WebSocket — the laptop and the phone don't even need to be on the same network. Run your devices anywhere (4G / 5G / corporate WiFi).
Note: if the video doesn't render in your viewer, download it directly (15 MB) or browse the Releases page.
You write:
Open Settings, find About Phone, capture the version number.
Expected: System version is shown, no error dialog.
The AI agent finds the path, taps, verifies — every step has a screenshot and a thought trace. Failed cases automatically extract a "lesson learned"; the next time the same task runs, the agent avoids the same mistake.
No XPath, no Appium, no recorded scripts.
- Plain-language test cases — write in Chinese or English; import from YAML / Excel / xmind / Markdown
- Dual perception — screenshot (vision) + a11y tree (semantic), fused decision
- Multi-LLM — OpenAI, Anthropic, Gemini, Zhipu GLM, Groq, Ollama
- Any-network device — Portal App opens a reverse WebSocket; runs over 4G / 5G / corporate WiFi without ADB
- Test management UI — suites, cases, run history, step replay, run comparison, pass-rate trend
- Self-contained HTML reports — single-file export with screenshots, thoughts, actions, verdicts
- Planner + Subagent — complex tasks decomposed into subgoals, each with isolated context
- Page-aware reasoning — current Activity class + recent-pages trail injected, so the agent recognizes "wrong screen" instead of blindly tapping
- Two-shot verifier — at-action frame (catches transient toasts) + settled frame, both used for pass/fail judgment
- Learn from mistakes —
LessonLearnedauto-extracted from past runs and re-injected as guardrails - Auto-recovery — 4-level escalation when stuck (warn → back → restart → fail)
- Observability — token usage, perception/LLM/action timing per step, pass-rate trend chart
- CI/CD — CLI runner, webhook notifications (Feishu / DingTalk / Slack)
Full comparison and roadmap: Comparison · Roadmap
- Python 3.9+
- Node.js 18+
- An Android device (real device or emulator)
git clone https://github.com/rejigtian/Smart-AI-Bot.git
cd Smart-AI-Bot
# Backend
cd backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reload --host 0.0.0.0 --port 8000
# Frontend (new terminal)
cd frontend
npm install
npm run devOr one command:
./start.shOpen http://localhost:5173 and drop your LLM API keys into Settings.
Option A — prebuilt APK (fastest)
Download from the latest release and install:
adb install -r ~/Downloads/app-v1.0.0.apkOption B — build from source
cd android
./gradlew assembleDebug
adb install -r app/build/outputs/apk/debug/app-debug.apkFirst launch:
- In the app's Settings, set the Server WebSocket URL (e.g.
ws://192.168.1.10:8000/v1/providers/join) and a Device Token (generate one in the Web UI's Devices page). - System Settings → Accessibility → enable AgentAccessibilityService.
- Back in the app, tap Start Connection. The persistent foreground notification means you're online.
In the Test Suites page, create a suite and add a case:
Path: Open Settings, navigate to About Phone, capture the version number
Expected: System version info is shown, no error dialog
Pick a device + model, hit Run.
cd backend
python cli.py run --suite <id> --device <id> --jsonExit code: 0 = all passed, 1 = at least one failed.
Browser (management UI)
│ REST + SSE
FastAPI server
├── Planner (decomposes complex tasks)
│ └── SubAgent #1..N (isolated context per subgoal)
├── TestCaseAgent (6-layer + VLM fallback)
│ perception → decision → action → memory → verification → replay
└── SQLite + webhook + CLI
Device / Suite / Case / Run / Result / StepLog
│
│ WebSocket JSON-RPC
Android device (Portal App)
tap / swipe / input / screenshot / get_ui_state
Detailed design: docs/agent-architecture.md.
| Doc | What it covers |
|---|---|
| Agent Architecture | 6-layer agent + Planner / Subagent design |
| Android Portal | Portal App performance & connection stability |
| Test KB | Building the test knowledge base for your own app |
| Roadmap | Done features + priorities |
| Comparison | DroidRun / Midscene / AutoGLM technical comparison |
| Troubleshooting | Common issues — connection / screenshot / recognition |
This project is inspired by:
- droidrun / droidrun-portal — the Portal App's reverse WebSocket and connection-stability patterns (library-level ping/pong, reconnect budget, terminal-error detection) are directly inspired by droidrun-portal.
- Midscene.js — the Set-of-Marks visual annotation idea inspired our a11y element overlay. We ended up using magenta crosshairs instead of numbered bubbles to avoid confusion with in-game content.
- AutoGLM — the Planner / Grounder split influenced our dual-perception fusion architecture.
PRs and issues welcome. Common contribution paths:
- New LLM provider — add a branch in
agent/base.py - New Portal App action — define the tool in
agent/tools.py+ implement it inws_device.py - New test case format parser —
core/test_parser.py - Documentation / i18n
MIT — see LICENSE.



