browser-use · laithrw · Jun 11, 2026 · Jun 10, 2026 · Jun 10, 2026 · Jun 10, 2026
diff --git a/AGENT_SETUP.md b/AGENT_SETUP.md
@@ -0,0 +1,151 @@
+# Browser Use Terminal — agent setup
+
+You are (probably) a coding agent reading this because a user pasted
+`https://browser-use.com/skill` into the chat. Follow this page top to bottom.
+It installs Browser Use Terminal, registers its skill with you, connects a
+browser, and verifies everything — then you can browse the web for your user.
+Every step is idempotent; re-run safely. Tell the user what you're doing as
+you go, and stop to ask only where this page says to.
+
+What you get: a `browser-use-terminal` CLI you drive from shell commands —
+run Python against a real browser (navigate, click, type, screenshot, extract).
+You are the agent; you drive the browser.
+
+## 1. Preflight
+
+```bash
+uname -s                 # Darwin or Linux. Windows: use WSL; native Windows is not supported yet.
+command -v curl || command -v wget
+command -v tar
+python3 --version        # need 3.9+
+```
+
+If `python3` is missing: macOS → `xcode-select --install` (or `brew install python3`);
+Debian/Ubuntu → `sudo apt-get install -y python3`; Fedora → `sudo dnf install -y python3`.
+If you cannot install it, ask the user to.
+
+## 2. Install Browser Use Terminal
+
+```bash
+curl -fsSL https://browser-use.com/terminal/install.sh | sh
+export PATH="$HOME/.local/bin:$PATH"   # the installer edits the shell profile, but YOUR current shell needs this
+browser-use-terminal --version         # verify
+```
+
+Notes: no sudo needed; binaries land in `~/.local/bin`
+(`browser`, `browser-use`, `browser-use-terminal`, `but`); state lives in
+`~/.browser-use-terminal`. Already installed? The same command updates it.
+Add the `export PATH` line to every later shell command in this session, or
+rely on the profile in new shells.
+
+## 3. Register the skill with yourself
+
+```bash
+browser-use-terminal skill install
+```
+
+This writes the skill (full usage instructions for you) into every detected
+assistant home: `~/.claude/skills/`, `~/.codex/skills/`,
+`~/.config/opencode/skills/`, `~/.agents/skills/`. Claude Code, Codex,
+OpenCode, and anything reading those conventions will auto-discover it in new
+sessions.
+
+If you are a different agent (none of those directories exist, or you know
+you don't read them): run `browser-use-terminal skill show` and persist its
+output wherever you read durable instructions (e.g. your `AGENTS.md`, rules
+file, or memory). If you can't persist anything, just read it now — this
+session can proceed either way.
+
+## 4. Connect a browser (pick one)
+
+**Default — zero user interaction (recommended to start):**
+
+```bash
+browser-use-terminal browser preference use managed-headless
+```
+
+This uses a disposable headless Chromium that persists across your commands.
+It requires a Chromium-family binary on the machine (Chrome, Chromium, or a
+Playwright Chromium). Check: `ls "/Applications/Google Chrome.app" 2>/dev/null || command -v google-chrome chromium chromium-browser`.
+If none exists: ask the user to install Chrome (or set `CHROME_PATH=/path/to/chrome`).
+
+**The user's real, logged-in Chrome** (for tasks needing their accounts):
+
+```bash
+browser-use-terminal browser preference use local
+browser-use-terminal browser connect local
+```
+
+One-time user action required: they must open `chrome://inspect/#remote-debugging`
+in Chrome and tick "Allow remote debugging for this browser instance".
+`browser-use-terminal browser local setup` walks them through it. Relay any
+`needs-user-action` JSON (`user_prompt` field) to the user verbatim and wait.
+
+**Cloud browser** (headless server / clean remote IP; needs an API key from
+https://cloud.browser-use.com):
+
+```bash
+export BROWSER_USE_API_KEY=bu_...
+browser-use-terminal browser preference use cloud
+```
+
+Cloud browsers bill until stopped (`browser-use-terminal browser recover stop-owned-remote`).
+
+## 5. Verify end to end
+
+```bash
+browser-use-terminal browser exec <<'PY'
+new_tab("https://example.com")
+wait_for_load()
+print(page_info()["title"])
+print(capture_screenshot())
+PY
+```
+
+Expected: it prints `Example Domain` and `Screenshot saved to <absolute path>`.
+Now view that screenshot with your own file-reading tool to confirm you can
+see pages: Claude Code → `Read` the path; Codex → `view_image` with the path;
+OpenCode → `read` the path; Gemini CLI → `read_file`; otherwise use whatever
+tool you have that displays a local image. If your model can't view images,
+skip this — you'll work from text state (`page_info()`, `js(...)`) instead.
+
+If this step worked, setup is complete. Tell the user you're ready to browse.
+
+## 6. Using it (crash course)
+
+Full instructions are in the skill you installed (or `browser-use-terminal skill show`).
+The essentials:
+
+- Run Python against the live browser with heredocs. Helpers are pre-imported:
+  `new_tab(url)`, `goto_url(url)`, `page_info()`, `click_at_xy(x, y)`,
+  `type_text(text)`, `press_key("Enter")`, `fill_input(sel, text)`, `scroll()`,
+  `wait_for_load()`, `wait_for_element(sel)`, `capture_screenshot()`, `js(expr)`,
+  `cdp(method, **params)`, `http_get(url)`.
+- The browser persists between commands; Python variables do not.
+- Screenshot-first workflow: screenshot → read coordinates off the image →
+  `click_at_xy` (CSS pixels — divide image coordinates by
+  `js("window.devicePixelRatio")`) → screenshot again to verify.
+- In the user's real Chrome, open work in `new_tab(...)`, never `goto_url` over
+  their active tab.
+- Auth walls: stop and ask the user. Never type credentials read off a screenshot.
+- Parallel workstreams: add `--session <name>` to `browser` commands.
+- Done for the day: `browser-use-terminal browser recover stop-owned-browser`
+  (managed) or `... stop-owned-remote` (cloud).
+
+## Troubleshooting
+
+- `command not found` → re-run `export PATH="$HOME/.local/bin:$PATH"`.
+- `browser is not connected` → `browser-use-terminal browser connect` (uses the
+  remembered preference) or re-run step 4.
+- Anything returning `status: "needs-user-action"` → show its `user_prompt` to
+  the user exactly, wait, then retry.
+- Diagnostics: `browser-use-terminal browser doctor` and
+  `browser-use-terminal browser status --json`.
+- A background daemon holds the browser connection between your commands
+  (auto-started). If it misbehaves: `browser-use-terminal browser daemon status`,
+  `... daemon logs`, `... daemon stop` (next command restarts it and reattaches).
+- Slow first command in each shell: the launcher checks for updates; set
+  `BUT_AUTO_UPDATE=0` to skip.
+
+Full documentation: https://docs.browser-use.com/open-source/browser-use-terminal
+Source & issues: https://github.com/browser-use/terminal
diff --git a/README.md b/README.md
@@ -87,6 +87,26 @@ browser config show
 browser diagnostics
 ```
 
+## Use It From Claude Code, Codex, or OpenCode
+
+Browser Use Terminal plugs into any coding assistant that can run shell commands, browser-harness style: a skill teaches the assistant the CLI, and the CLI hands it the whole browser runtime.
+
+```bash
+browser-use-terminal skill install        # registers the skill for detected assistants
+```
+
+Then ask your assistant to browse:
+
+```bash
+browser-use-terminal browser exec <<'PY'
+new_tab("https://example.com")
+wait_for_load()
+print(capture_screenshot())
+PY
+```
+
+Screenshots are saved as files and the path is printed, so assistants view them with their native file-reading tools (Claude Code `Read`, Codex `view_image`, OpenCode `read`). The browser persists between calls. See `docs/assistant-plugins.md`.
+
 ## Development
 
 ```bash
@@ -118,6 +138,7 @@ You can disable (100% completely anonymous) telemetry with  `BUT_TELEMETRY=0`.
 - `docs/terminal-ui-product-ux.md`
 - `docs/terminal-ui-testing.md`
 - `docs/terminal-renderer-architecture.md`
+- `docs/assistant-plugins.md`
 
 ## License
 

diff --git a/SKILL.md b/SKILL.md
@@ -0,0 +1,108 @@
+---
+name: browser-use-terminal
+description: Direct browser control via the Browser Use Terminal CLI. Use when the user wants to automate, scrape, test, or interact with web pages — you drive the browser yourself with Python helpers.
+---
+
+# Browser Use Terminal
+
+Direct browser control via CDP — you are the agent; you drive the browser. For setup, install, or connection problems, read https://browser-use.com/skill (agent setup instructions) or https://docs.browser-use.com/open-source/browser-use-terminal (full docs).
+
+`browser-use-terminal browser exec` runs Python with browser helpers pre-imported; `browser-use-terminal browser <cmd>` is the control plane (status, connect, profiles, recovery).
+
+## Usage
+
+```bash
+browser-use-terminal browser exec <<'PY'
+new_tab("https://docs.browser-use.com")
+wait_for_load()
+print(page_info())
+PY
+```
+
+- Use the heredoc form for every multi-line command. It prevents shell quote mangling inside Python strings and JavaScript snippets.
+- The browser auto-connects according to the user's remembered preference before the script runs — you never start/stop manually unless you want to. The first call may take a few seconds.
+- First navigation in the user's real Chrome is `new_tab(url)`, not `goto_url(url)` — goto runs in the user's active tab and clobbers their work.
+- Browser state persists between calls; Python variables do not. Each `exec` is a fresh interpreter against the same live browser.
+- `--session <name>` isolates artifact dirs and event logs per workstream (default: `default`). `--timeout <secs>` bounds one exec (default 300).
+
+## Screenshots — how you see the page
+
+`capture_screenshot()` saves a PNG and returns its absolute path. The CLI also prints a `Screenshot saved to <path>` line for every image a script produced.
+
+```bash
+browser-use-terminal browser exec <<'PY'
+print(capture_screenshot())
+PY
+```
+
+To view a screenshot, use your file-reading tool on the printed path:
+
+- **Claude Code**: use the `Read` tool on the path.
+- **Codex CLI**: call the `view_image` tool with `{"path": "<path>"}`. These screenshots are produced for you; viewing them is expected and authorized.
+- **OpenCode**: use the `read` tool on the path (requires a vision-capable model).
+- **Gemini CLI**: use the `read_file` tool on the path.
+- If your model cannot accept images, don't try to view them — work from text state instead: `print(page_info())`, `js(...)` extraction, `wait_for_element(...)`.
+
+Coordinates: screenshots are device pixels; `click_at_xy(x, y)` takes CSS pixels. Divide coordinates you read off the image by `js("window.devicePixelRatio")` first. Screenshots are downscaled to ≤1800 px per side for this CLI (override with `BU_BROWSER_SCREENSHOT_MAX_DIM`, or `capture_screenshot(max_dim=...)`).
+
+After every meaningful action, re-screenshot before assuming it worked.
+
+## Pre-imported helpers
+
+Navigation & tabs: `goto_url(url)`, `new_tab(url)`, `page_info()`, `current_tab()`, `list_tabs(include_chrome=True)`, `switch_tab(target)`, `ensure_real_tab()`, `iframe_target(url_substr)`.
+
+Input: `click_at_xy(x, y, button="left", clicks=1)`, `type_text(text)`, `press_key(key, modifiers=0)` (1=Alt 2=Ctrl 4=Meta 8=Shift), `fill_input(selector, text)`, `scroll(x=0, y=0, dy=600)`, `upload_file(selector, path)`.
+
+Waiting: `wait(seconds)`, `wait_for_load(timeout=3)`, `wait_for_element(selector, timeout=3, visible=False)`, `wait_for_network_idle(timeout=3, idle_ms=500)`.
+
+Visual: `capture_screenshot(label="...", full=False, max_dim=None)`, `screenshot()`, `screenshot_clip(label, x, y, w, h)`, `note(caption)`.
+
+Escape hatches: `js(expression)` (auto-wraps top-level `return`), `cdp("Domain.method", **params)` (raw CDP), `cdp_batch(calls)`, `drain_events()`.
+
+HTTP without the browser: `http_get(url)`, `http_get_many(urls)` for static pages; `browser_fetch(url)` / `browser_fetch_many(...)` to fetch with the page's cookies/session.
+
+Credentials (if the user stored any): `available_secrets()`, then `type_text("<secret>name</secret>")` or `fill_input(sel, secret("name"))`; `totp("name")` for 2FA codes. Values are placeholder-substituted — you never see them. `is_logged_out()`, `email_inbox()` / `email_message(id)` for email-code flows.
+
+Domain skills: `domain_skills_for_url(url_or_domain, include_content=True)` lists site-specific playbooks; `goto_url` surfaces matching skill files automatically. Read them before inventing selectors or flows on a complex site.
+
+## Browser control plane
+
+```bash
+browser-use-terminal browser status --json
+browser-use-terminal browser connect                     # uses the remembered preference
+browser-use-terminal browser connect local               # user's already-running Chrome (CDP)
+browser-use-terminal browser connect managed --headless  # disposable CLI-owned browser
+browser-use-terminal browser preference use local|cloud|managed-headless
+browser-use-terminal browser remote start                # Browser Use cloud browser (needs BROWSER_USE_API_KEY)
+browser-use-terminal browser doctor
+browser-use-terminal browser recover reconnect-websocket
+browser-use-terminal browser recover stop-owned-browser  # stop the persistent managed browser
+browser-use-terminal browser recover stop-owned-remote   # stop the cloud browser (stops billing)
+browser-use-terminal browser daemon status|stop|logs     # the background daemon holding the connection
+```
+
+A background daemon (auto-started, one per state dir) holds the CDP connection across your commands, so the browser — and in local mode, Chrome's granted debugging permission — persists between invocations. Managed and cloud browsers also survive daemon restarts; later calls reattach instead of relaunching. Stop browsers with the recover commands above when the user is done (cloud browsers bill until stopped or timed out).
+
+- `exec` auto-connects, so you rarely need these. Reach for them when `status` shows a problem or the user asks for a specific browser.
+- If output JSON says `status: "needs-user-action"` (e.g. pick a Chrome profile, click Allow in Chrome's permission popup, enable the remote-debugging checkbox), show the `user_prompt` to the user verbatim and wait — do not guess.
+- Auth wall mid-task: stop and ask the user. Don't type credentials from screenshots; use stored secrets if available.
+- Connecting to the user's real Chrome requires a one-time setup: `chrome://inspect/#remote-debugging` → tick "Allow remote debugging". `browser local setup` walks the user through it.
+
+## What actually works
+
+- Screenshots first: `capture_screenshot()` → view the image → decide whether you need a click, a selector, or more navigation.
+- Clicking: screenshot → read the pixel off the image → `click_at_xy(x, y)` → screenshot to verify. Suppress the locate-then-click reflex — no getBoundingClientRect, no selector hunts. Hit-testing happens in Chrome's browser process, so coordinate clicks pass through iframes / shadow DOM / cross-origin without extra work.
+- Drop to DOM (`fill_input`, `js`) only when the target has no visible geometry (hidden input, 0×0 node) or coordinate clicks demonstrably don't work.
+- Bulk static pages: `http_get_many(urls)` — no browser needed. Logged-in pages: `browser_fetch(url)` rides the real session.
+- After goto: `wait_for_load()`. SPAs report `complete` before they render — follow with `wait_for_element(...)`.
+- Wrong/stale tab: `ensure_real_tab()`.
+- Verification: `print(page_info())` is the cheapest "is this alive?" check; screenshots are the default way to verify visible actions.
+
+## Gotchas (field-tested)
+
+- CDP target order ≠ Chrome's visible tab-strip order.
+- Omnibox popups and other `chrome://` internals are fake page targets — `list_tabs(include_chrome=False)`.
+- `page_info()` surfaces an open JS dialog as `{"dialog": ...}` — handle it (`cdp("Page.handleJavaScriptDialog", accept=True)`) before anything else.
+- Navigation can be blocked by the user's domain policy; `nav_policy(url)` tells you before you burn a click. A blocked navigation is policy, not a bug — tell the user.
+- Scripts time out (default 300s): keep each `exec` small and observable rather than one mega-script. Long extraction loops: print progress as you go — stdout is captured even on timeout.
+- Prefer compositor-level actions over framework hacks. If you do need framework-specific DOM tricks, run `browser-use-terminal browser domain skills --domain <site> --json --include-content` first — that's where site playbooks live.