Skip to content

rejigtian/Smart-AI-Bot

Repository files navigation

Smart-AI-Bot

AI-powered Android UI test automation platform — write test cases in plain English (or import from xmind / Markdown), have an AI agent run them on real devices, and get visual replay reports. Doubles as a general phone-automation tool with self-learning replay.

English · 简体中文


Demo

example-hq.mp4

The demo above shows three panels recorded simultaneously for the same test case:

  • Left — Phone camera: physical proof. The device sits on a stand with no hands in frame; "Show Touches" is enabled so every synthetic tap appears as a white dot on the screen.
  • Middle — Backend log + Web UI report: live uvicorn output (agent thoughts, JSON-RPC calls like tap_element / screenshot, verifier verdicts) on top, with the test management page (step replay + per-step reasoning + pass/fail verdict) below.
  • Right — Phone screen mirror: the actual UI as seen by the AI agent.

The whole sequence is unedited — what you see is the agent operating the device end-to-end based on a single plain-language test case.

No USB cable in this demo. The phone talks to the server over WiFi only, via the Portal App's reverse WebSocket — the laptop and the phone don't even need to be on the same network. Run your devices anywhere (4G / 5G / corporate WiFi).

Note: if the video doesn't render in your viewer, download it directly (15 MB) or browse the Releases page.


Table of Contents


Why Smart-AI-Bot

You write:

Open Settings, find About Phone, capture the version number.
Expected: System version is shown, no error dialog.

The AI agent finds the path, taps, verifies — every step has a screenshot and a thought trace. Failed cases automatically extract a "lesson learned"; the next time the same task runs, the agent avoids the same mistake.

No XPath, no Appium, no recorded scripts.


Features

  • Plain-language test cases — write in Chinese or English; import from YAML / Excel / xmind / Markdown
  • Dual perception — screenshot (vision) + a11y tree (semantic), fused decision
  • Multi-LLM — OpenAI, Anthropic, Gemini, Zhipu GLM, Groq, Ollama
  • Any-network device — Portal App opens a reverse WebSocket; runs over 4G / 5G / corporate WiFi without ADB
  • Test management UI — suites, cases, run history, step replay, run comparison, pass-rate trend
  • Self-contained HTML reports — single-file export with screenshots, thoughts, actions, verdicts
  • Planner + Subagent — complex tasks decomposed into subgoals, each with isolated context
  • Page-aware reasoning — current Activity class + recent-pages trail injected, so the agent recognizes "wrong screen" instead of blindly tapping
  • Two-shot verifier — at-action frame (catches transient toasts) + settled frame, both used for pass/fail judgment
  • Learn from mistakesLessonLearned auto-extracted from past runs and re-injected as guardrails
  • Auto-recovery — 4-level escalation when stuck (warn → back → restart → fail)
  • Observability — token usage, perception/LLM/action timing per step, pass-rate trend chart
  • CI/CD — CLI runner, webhook notifications (Feishu / DingTalk / Slack)

Full comparison and roadmap: Comparison · Roadmap


Screenshots

Portal App on Android Quick Task in Web UI
Portal App — pair the device by setting WebSocket URL + Token, enable Accessibility, and tap Connect Quick Task — describe a task in plain language, pick a device + LLM model, hit Run
Test Report Step Replay
Test Report — pass/fail counts, pass rate, token usage, run time, and per-case verdict with verifier reasoning Step Replay — every action with screenshot, agent reasoning, and tool call (e.g. tap_element({"index": 5}))

Quick Start

Prerequisites

  • Python 3.9+
  • Node.js 18+
  • An Android device (real device or emulator)

Run the backend & frontend

git clone https://github.com/rejigtian/Smart-AI-Bot.git
cd Smart-AI-Bot

# Backend
cd backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reload --host 0.0.0.0 --port 8000

# Frontend (new terminal)
cd frontend
npm install
npm run dev

Or one command:

./start.sh

Open http://localhost:5173 and drop your LLM API keys into Settings.

Install the Portal App

Option A — prebuilt APK (fastest)

Download from the latest release and install:

adb install -r ~/Downloads/app-v1.0.0.apk

Option B — build from source

cd android
./gradlew assembleDebug
adb install -r app/build/outputs/apk/debug/app-debug.apk

First launch:

  1. In the app's Settings, set the Server WebSocket URL (e.g. ws://192.168.1.10:8000/v1/providers/join) and a Device Token (generate one in the Web UI's Devices page).
  2. System Settings → Accessibility → enable AgentAccessibilityService.
  3. Back in the app, tap Start Connection. The persistent foreground notification means you're online.

Write a test case

In the Test Suites page, create a suite and add a case:

Path: Open Settings, navigate to About Phone, capture the version number
Expected: System version info is shown, no error dialog

Pick a device + model, hit Run.

CLI (CI/CD integration)

cd backend
python cli.py run --suite <id> --device <id> --json

Exit code: 0 = all passed, 1 = at least one failed.


Architecture

Browser (management UI)
  │ REST + SSE
FastAPI server
  ├── Planner (decomposes complex tasks)
  │     └── SubAgent #1..N (isolated context per subgoal)
  ├── TestCaseAgent (6-layer + VLM fallback)
  │     perception → decision → action → memory → verification → replay
  └── SQLite + webhook + CLI
        Device / Suite / Case / Run / Result / StepLog
  │
  │ WebSocket JSON-RPC
Android device (Portal App)
  tap / swipe / input / screenshot / get_ui_state

Detailed design: docs/agent-architecture.md.


More Docs

Doc What it covers
Agent Architecture 6-layer agent + Planner / Subagent design
Android Portal Portal App performance & connection stability
Test KB Building the test knowledge base for your own app
Roadmap Done features + priorities
Comparison DroidRun / Midscene / AutoGLM technical comparison
Troubleshooting Common issues — connection / screenshot / recognition

Acknowledgments

This project is inspired by:

  • droidrun / droidrun-portal — the Portal App's reverse WebSocket and connection-stability patterns (library-level ping/pong, reconnect budget, terminal-error detection) are directly inspired by droidrun-portal.
  • Midscene.js — the Set-of-Marks visual annotation idea inspired our a11y element overlay. We ended up using magenta crosshairs instead of numbered bubbles to avoid confusion with in-game content.
  • AutoGLM — the Planner / Grounder split influenced our dual-perception fusion architecture.

Contributing

PRs and issues welcome. Common contribution paths:

  • New LLM provider — add a branch in agent/base.py
  • New Portal App action — define the tool in agent/tools.py + implement it in ws_device.py
  • New test case format parser — core/test_parser.py
  • Documentation / i18n

License

MIT — see LICENSE.

About

AI-powered Android UI test automation platform — write test cases in plain English (or import from xmind / Markdown), have an AI agent run them on real devices, get visual replay reports.AI 驱动的 Android 自动化测试平台 — 用自然语言编写测试用例,或者导入xmind,md等文件生成用例。Agent 在真实设备上执行,结果可视化追踪。也可以当作自动化操作手机的工具来用,具备自学习与增强回放功能。

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors