Smart-AI-Bot

AI-powered Android UI test automation platform — write test cases in plain English (or import from xmind / Markdown), have an AI agent run them on real devices, and get visual replay reports. Doubles as a general phone-automation tool with self-learning replay.

English · 简体中文

Demo

example-hq.mp4

The demo above shows three panels recorded simultaneously for the same test case:

Left — Phone camera: physical proof. The device sits on a stand with no hands in frame; "Show Touches" is enabled so every synthetic tap appears as a white dot on the screen.
Middle — Backend log + Web UI report: live uvicorn output (agent thoughts, JSON-RPC calls like tap_element / screenshot, verifier verdicts) on top, with the test management page (step replay + per-step reasoning + pass/fail verdict) below.
Right — Phone screen mirror: the actual UI as seen by the AI agent.

The whole sequence is unedited — what you see is the agent operating the device end-to-end based on a single plain-language test case.

No USB cable in this demo. The phone talks to the server over WiFi only, via the Portal App's reverse WebSocket — the laptop and the phone don't even need to be on the same network. Run your devices anywhere (4G / 5G / corporate WiFi).

Note: if the video doesn't render in your viewer, download it directly (15 MB) or browse the Releases page.

Why Smart-AI-Bot

You write:

Open Settings, find About Phone, capture the version number.
Expected: System version is shown, no error dialog.

The AI agent finds the path, taps, verifies — every step has a screenshot and a thought trace. Failed cases automatically extract a "lesson learned"; the next time the same task runs, the agent avoids the same mistake.

No XPath, no Appium, no recorded scripts.

Features

Plain-language test cases — write in Chinese or English; import from YAML / Excel / xmind / Markdown
Dual perception — screenshot (vision) + a11y tree (semantic), fused decision
Multi-LLM — OpenAI, Anthropic, Gemini, Zhipu GLM, Groq, Ollama
Any-network device — Portal App opens a reverse WebSocket; runs over 4G / 5G / corporate WiFi without ADB
Test management UI — suites, cases, run history, step replay, run comparison, pass-rate trend
Self-contained HTML reports — single-file export with screenshots, thoughts, actions, verdicts
Planner + Subagent — complex tasks decomposed into subgoals, each with isolated context
Page-aware reasoning — current Activity class + recent-pages trail injected, so the agent recognizes "wrong screen" instead of blindly tapping
Two-shot verifier — at-action frame (catches transient toasts) + settled frame, both used for pass/fail judgment
Learn from mistakes — LessonLearned auto-extracted from past runs and re-injected as guardrails
Auto-recovery — 4-level escalation when stuck (warn → back → restart → fail)
Observability — token usage, perception/LLM/action timing per step, pass-rate trend chart
CI/CD — CLI runner, webhook notifications (Feishu / DingTalk / Slack)

Full comparison and roadmap: Comparison · Roadmap

Screenshots


Portal App — pair the device by setting WebSocket URL + Token, enable Accessibility, and tap Connect	Quick Task — describe a task in plain language, pick a device + LLM model, hit Run

Test Report — pass/fail counts, pass rate, token usage, run time, and per-case verdict with verifier reasoning	Step Replay — every action with screenshot, agent reasoning, and tool call (e.g. `tap_element({"index": 5})`)

Quick Start

Prerequisites

Python 3.9+
Node.js 18+
An Android device (real device or emulator)

Run the backend & frontend

git clone https://github.com/rejigtian/Smart-AI-Bot.git
cd Smart-AI-Bot

# Backend
cd backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reload --host 0.0.0.0 --port 8000

# Frontend (new terminal)
cd frontend
npm install
npm run dev

Or one command:

./start.sh

Open http://localhost:5173 and drop your LLM API keys into Settings.

Install the Portal App

Option A — prebuilt APK (fastest)

Download from the latest release and install:

adb install -r ~/Downloads/app-v1.0.0.apk

Option B — build from source

cd android
./gradlew assembleDebug
adb install -r app/build/outputs/apk/debug/app-debug.apk

First launch:

In the app's Settings, set the Server WebSocket URL (e.g. ws://192.168.1.10:8000/v1/providers/join) and a Device Token (generate one in the Web UI's Devices page).
System Settings → Accessibility → enable AgentAccessibilityService.
Back in the app, tap Start Connection. The persistent foreground notification means you're online.

Write a test case

In the Test Suites page, create a suite and add a case:

Path: Open Settings, navigate to About Phone, capture the version number
Expected: System version info is shown, no error dialog

Pick a device + model, hit Run.

CLI (CI/CD integration)

cd backend
python cli.py run --suite <id> --device <id> --json

Exit code: 0 = all passed, 1 = at least one failed.

Architecture

Browser (management UI)
  │ REST + SSE
FastAPI server
  ├── Planner (decomposes complex tasks)
  │     └── SubAgent #1..N (isolated context per subgoal)
  ├── TestCaseAgent (6-layer + VLM fallback)
  │     perception → decision → action → memory → verification → replay
  └── SQLite + webhook + CLI
        Device / Suite / Case / Run / Result / StepLog
  │
  │ WebSocket JSON-RPC
Android device (Portal App)
  tap / swipe / input / screenshot / get_ui_state

Detailed design: docs/agent-architecture.md.

More Docs

Doc	What it covers
Agent Architecture	6-layer agent + Planner / Subagent design
Android Portal	Portal App performance & connection stability
Test KB	Building the test knowledge base for your own app
Roadmap	Done features + priorities
Comparison	DroidRun / Midscene / AutoGLM technical comparison
Troubleshooting	Common issues — connection / screenshot / recognition

Acknowledgments

This project is inspired by:

droidrun / droidrun-portal — the Portal App's reverse WebSocket and connection-stability patterns (library-level ping/pong, reconnect budget, terminal-error detection) are directly inspired by droidrun-portal.
Midscene.js — the Set-of-Marks visual annotation idea inspired our a11y element overlay. We ended up using magenta crosshairs instead of numbered bubbles to avoid confusion with in-game content.
AutoGLM — the Planner / Grounder split influenced our dual-perception fusion architecture.

Contributing

PRs and issues welcome. Common contribution paths:

New LLM provider — add a branch in agent/base.py
New Portal App action — define the tool in agent/tools.py + implement it in ws_device.py
New test case format parser — core/test_parser.py
Documentation / i18n

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github		.github
android		android
assets		assets
backend		backend
docs		docs
frontend		frontend
test_knowledge		test_knowledge
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smart-AI-Bot

Demo

Table of Contents

Why Smart-AI-Bot

Features

Screenshots

Quick Start

Prerequisites

Run the backend & frontend

Install the Portal App

Write a test case

CLI (CI/CD integration)

Architecture

More Docs

Acknowledgments

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Smart-AI-Bot

Demo

Table of Contents

Why Smart-AI-Bot

Features

Screenshots

Quick Start

Prerequisites

Run the backend & frontend

Install the Portal App

Write a test case

CLI (CI/CD integration)

Architecture

More Docs

Acknowledgments

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages