Desktop GUI automation driven by natural language. Captures screenshots, sends them to a cloud-based hybrid vision model, and executes the returned actions on the local machine — click, type, scroll, drag, and more.
User ──► Local client ──► Cloud Server ──► Local client executes action
│ │
screenshot model selection
│
┌──────────┴──────────┐
│ │
Mano Model Claude CUA Model
(fast, lightweight) (complex reasoning)
At each step, mano-cua captures the current screenshot and sends it along with the task description to the cloud server. The server analyzes the task and automatically selects the most suitable model:
- Mano Model — optimized for straightforward, repetitive GUI tasks. Low latency, high throughput. Ideal for simple clicks, form filling, and navigation.
- Claude CUA (Computer Use Agent) — handles tasks that require deeper visual understanding and multi-step reasoning. Used when the task involves complex layouts, ambiguous UI elements, or decision-making.
The server evaluates task complexity in real time and routes accordingly, balancing speed and accuracy.
brew tap Mininglamp-AI/tap
brew install mano-cuamano-cua run "your task description"mano-cua run "Open WeChat and tell FTY that the meeting is postponed"
mano-cua run "Search for AI news in Xiaohongshu and show the first post"click · type · hotkey · scroll · drag · mouse move · screenshot · wait · app launch
Screen Recording and Accessibility (Keyboard/Mouse control) permissions are required. Grant these in System Preferences > Privacy & Security before running.
A small UI panel is displayed on the top-right corner of the screen to track and manage the current session status.
- User consent is required before proceeding with steps that may be sensitive or potentially dangerous.
- Screenshots are captured and applications may be started or closed during the session. Avoid exposing apps with sensitive or critical data while a task is running.
- Do not use the mouse or keyboard during the task. Manual input while mano-cua is running may cause unexpected behavior.
- Multiple displays: only the primary display is used. All mouse movements, clicks, and screenshots are restricted to that display.
This is a Beta release. macOS is the preferred and most tested platform. Adaptations for Windows and Linux are not yet fully completed — minor issues are expected.
MIT