Project 05 — Computer Use Agent

Autonomous agent that operates a virtualized Ubuntu desktop to automate workflows without API access. Reads screenshots, decides actions, executes clicks/keyboard. Anthropic's Computer Use frontier applied to back-office automation.

Industrial use case: RPA for legacy systems without APIs, back-office automation in banking/insurance.

What this project does

Receives a task in natural language ("extract Q3 sales from the legacy system and save to CSV"). Operates a Ubuntu VM by: capturing screenshot → Claude with computer_use tool decides next action → xdotool executes it → loop until task complete or max steps.

Architecture

Task: "Extract Q3 sales from legacy system"
   │
   ▼
[Planner] Claude → initial plan
   │
   ▼
LOOP DE ACCIÓN:
   ├──► [Screenshot capture] scrot on Xvfb
   ├──► [Vision + Reasoning] Claude with computer_use tool
   │      → click(x, y) | type(text) | key(combo) | scroll
   ├──► [Action Executor] xdotool
   ├──► [Verifier] Claude — "did the step complete? unexpected dialog?"
   ├──► [State Updater] LangGraph state with screenshot history
   └──► continue until task_complete or max_steps
   │
   ▼
[Final Validator] Claude → does output satisfy original task?
   │
   ▼
[Audit Log] PostgreSQL

Roadmap to v1.0.0

Stack

Layer	Technology
Computer Use	Anthropic Claude Sonnet 4.5 with `computer_20250124` tool
VM	Docker container — Ubuntu 22.04 + Xvfb + VNC
Screenshots	scrot or gnome-screenshot
Input automation	xdotool
Orchestration	LangGraph with screenshot history in state
Storage	PostgreSQL for action logs, filesystem for screenshots
Frontend	Next.js with live VNC viewer
Observability	LangSmith with action-by-action traces

Definition of Done — project-specific

Reproducible VM via docker compose up
20 custom tasks defined with ground truth
Success rate reported by category (form filling, data extraction, web nav, multi-step)
Demo lets user pick a task and watch the agent execute in streaming
5 recorded videos (MP4 in repo or YouTube unlisted) of real executions
Safety layer documented (blocked actions list)

Plus the 12 universal DoD blocks.

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
demo		demo
docs		docs
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project 05 — Computer Use Agent

What this project does

Architecture

Roadmap to v1.0.0

Stack

Definition of Done — project-specific

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project 05 — Computer Use Agent

What this project does

Architecture

Roadmap to v1.0.0

Stack

Definition of Done — project-specific

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages