diff --git a/AGENT_SETUP.md b/AGENT_SETUP.md new file mode 100644 index 00000000..90169d7a --- /dev/null +++ b/AGENT_SETUP.md @@ -0,0 +1,151 @@ +# Browser Use Terminal — agent setup + +You are (probably) a coding agent reading this because a user pasted +`https://browser-use.com/skill` into the chat. Follow this page top to bottom. +It installs Browser Use Terminal, registers its skill with you, connects a +browser, and verifies everything — then you can browse the web for your user. +Every step is idempotent; re-run safely. Tell the user what you're doing as +you go, and stop to ask only where this page says to. + +What you get: a `browser-use-terminal` CLI you drive from shell commands — +run Python against a real browser (navigate, click, type, screenshot, extract). +You are the agent; you drive the browser. + +## 1. Preflight + +```bash +uname -s # Darwin or Linux. Windows: use WSL; native Windows is not supported yet. +command -v curl || command -v wget +command -v tar +python3 --version # need 3.9+ +``` + +If `python3` is missing: macOS → `xcode-select --install` (or `brew install python3`); +Debian/Ubuntu → `sudo apt-get install -y python3`; Fedora → `sudo dnf install -y python3`. +If you cannot install it, ask the user to. + +## 2. Install Browser Use Terminal + +```bash +curl -fsSL https://browser-use.com/terminal/install.sh | sh +export PATH="$HOME/.local/bin:$PATH" # the installer edits the shell profile, but YOUR current shell needs this +browser-use-terminal --version # verify +``` + +Notes: no sudo needed; binaries land in `~/.local/bin` +(`browser`, `browser-use`, `browser-use-terminal`, `but`); state lives in +`~/.browser-use-terminal`. Already installed? The same command updates it. +Add the `export PATH` line to every later shell command in this session, or +rely on the profile in new shells. + +## 3. Register the skill with yourself + +```bash +browser-use-terminal skill install +``` + +This writes the skill (full usage instructions for you) into every detected +assistant home: `~/.claude/skills/`, `~/.codex/skills/`, +`~/.config/opencode/skills/`, `~/.agents/skills/`. Claude Code, Codex, +OpenCode, and anything reading those conventions will auto-discover it in new +sessions. + +If you are a different agent (none of those directories exist, or you know +you don't read them): run `browser-use-terminal skill show` and persist its +output wherever you read durable instructions (e.g. your `AGENTS.md`, rules +file, or memory). If you can't persist anything, just read it now — this +session can proceed either way. + +## 4. Connect a browser (pick one) + +**Default — zero user interaction (recommended to start):** + +```bash +browser-use-terminal browser preference use managed-headless +``` + +This uses a disposable headless Chromium that persists across your commands. +It requires a Chromium-family binary on the machine (Chrome, Chromium, or a +Playwright Chromium). Check: `ls "/Applications/Google Chrome.app" 2>/dev/null || command -v google-chrome chromium chromium-browser`. +If none exists: ask the user to install Chrome (or set `CHROME_PATH=/path/to/chrome`). + +**The user's real, logged-in Chrome** (for tasks needing their accounts): + +```bash +browser-use-terminal browser preference use local +browser-use-terminal browser connect local +``` + +One-time user action required: they must open `chrome://inspect/#remote-debugging` +in Chrome and tick "Allow remote debugging for this browser instance". +`browser-use-terminal browser local setup` walks them through it. Relay any +`needs-user-action` JSON (`user_prompt` field) to the user verbatim and wait. + +**Cloud browser** (headless server / clean remote IP; needs an API key from +https://cloud.browser-use.com): + +```bash +export BROWSER_USE_API_KEY=bu_... +browser-use-terminal browser preference use cloud +``` + +Cloud browsers bill until stopped (`browser-use-terminal browser recover stop-owned-remote`). + +## 5. Verify end to end + +```bash +browser-use-terminal browser exec <<'PY' +new_tab("https://example.com") +wait_for_load() +print(page_info()["title"]) +print(capture_screenshot()) +PY +``` + +Expected: it prints `Example Domain` and `Screenshot saved to `. +Now view that screenshot with your own file-reading tool to confirm you can +see pages: Claude Code → `Read` the path; Codex → `view_image` with the path; +OpenCode → `read` the path; Gemini CLI → `read_file`; otherwise use whatever +tool you have that displays a local image. If your model can't view images, +skip this — you'll work from text state (`page_info()`, `js(...)`) instead. + +If this step worked, setup is complete. Tell the user you're ready to browse. + +## 6. Using it (crash course) + +Full instructions are in the skill you installed (or `browser-use-terminal skill show`). +The essentials: + +- Run Python against the live browser with heredocs. Helpers are pre-imported: + `new_tab(url)`, `goto_url(url)`, `page_info()`, `click_at_xy(x, y)`, + `type_text(text)`, `press_key("Enter")`, `fill_input(sel, text)`, `scroll()`, + `wait_for_load()`, `wait_for_element(sel)`, `capture_screenshot()`, `js(expr)`, + `cdp(method, **params)`, `http_get(url)`. +- The browser persists between commands; Python variables do not. +- Screenshot-first workflow: screenshot → read coordinates off the image → + `click_at_xy` (CSS pixels — divide image coordinates by + `js("window.devicePixelRatio")`) → screenshot again to verify. +- In the user's real Chrome, open work in `new_tab(...)`, never `goto_url` over + their active tab. +- Auth walls: stop and ask the user. Never type credentials read off a screenshot. +- Parallel workstreams: add `--session ` to `browser` commands. +- Done for the day: `browser-use-terminal browser recover stop-owned-browser` + (managed) or `... stop-owned-remote` (cloud). + +## Troubleshooting + +- `command not found` → re-run `export PATH="$HOME/.local/bin:$PATH"`. +- `browser is not connected` → `browser-use-terminal browser connect` (uses the + remembered preference) or re-run step 4. +- Anything returning `status: "needs-user-action"` → show its `user_prompt` to + the user exactly, wait, then retry. +- Diagnostics: `browser-use-terminal browser doctor` and + `browser-use-terminal browser status --json`. +- A background daemon holds the browser connection between your commands + (auto-started). If it misbehaves: `browser-use-terminal browser daemon status`, + `... daemon logs`, `... daemon stop` (next command restarts it and reattaches). +- Slow first command in each shell: the launcher checks for updates; set + `BUT_AUTO_UPDATE=0` to skip. + +Full documentation: https://docs.browser-use.com/open-source/browser-use-terminal +Source & issues: https://github.com/browser-use/terminal diff --git a/README.md b/README.md index d3c6cea9..2407a235 100644 --- a/README.md +++ b/README.md @@ -87,6 +87,26 @@ browser config show browser diagnostics ``` +## Use It From Claude Code, Codex, or OpenCode + +Browser Use Terminal plugs into any coding assistant that can run shell commands, browser-harness style: a skill teaches the assistant the CLI, and the CLI hands it the whole browser runtime. + +```bash +browser-use-terminal skill install # registers the skill for detected assistants +``` + +Then ask your assistant to browse: + +```bash +browser-use-terminal browser exec <<'PY' +new_tab("https://example.com") +wait_for_load() +print(capture_screenshot()) +PY +``` + +Screenshots are saved as files and the path is printed, so assistants view them with their native file-reading tools (Claude Code `Read`, Codex `view_image`, OpenCode `read`). The browser persists between calls. See `docs/assistant-plugins.md`. + ## Development ```bash @@ -118,6 +138,7 @@ You can disable (100% completely anonymous) telemetry with `BUT_TELEMETRY=0`. - `docs/terminal-ui-product-ux.md` - `docs/terminal-ui-testing.md` - `docs/terminal-renderer-architecture.md` +- `docs/assistant-plugins.md` ## License diff --git a/SKILL.md b/SKILL.md new file mode 100644 index 00000000..1405657b --- /dev/null +++ b/SKILL.md @@ -0,0 +1,108 @@ +--- +name: browser-use-terminal +description: Direct browser control via the Browser Use Terminal CLI. Use when the user wants to automate, scrape, test, or interact with web pages — you drive the browser yourself with Python helpers. +--- + +# Browser Use Terminal + +Direct browser control via CDP — you are the agent; you drive the browser. For setup, install, or connection problems, read https://browser-use.com/skill (agent setup instructions) or https://docs.browser-use.com/open-source/browser-use-terminal (full docs). + +`browser-use-terminal browser exec` runs Python with browser helpers pre-imported; `browser-use-terminal browser ` is the control plane (status, connect, profiles, recovery). + +## Usage + +```bash +browser-use-terminal browser exec <<'PY' +new_tab("https://docs.browser-use.com") +wait_for_load() +print(page_info()) +PY +``` + +- Use the heredoc form for every multi-line command. It prevents shell quote mangling inside Python strings and JavaScript snippets. +- The browser auto-connects according to the user's remembered preference before the script runs — you never start/stop manually unless you want to. The first call may take a few seconds. +- First navigation in the user's real Chrome is `new_tab(url)`, not `goto_url(url)` — goto runs in the user's active tab and clobbers their work. +- Browser state persists between calls; Python variables do not. Each `exec` is a fresh interpreter against the same live browser. +- `--session ` isolates artifact dirs and event logs per workstream (default: `default`). `--timeout ` bounds one exec (default 300). + +## Screenshots — how you see the page + +`capture_screenshot()` saves a PNG and returns its absolute path. The CLI also prints a `Screenshot saved to ` line for every image a script produced. + +```bash +browser-use-terminal browser exec <<'PY' +print(capture_screenshot()) +PY +``` + +To view a screenshot, use your file-reading tool on the printed path: + +- **Claude Code**: use the `Read` tool on the path. +- **Codex CLI**: call the `view_image` tool with `{"path": ""}`. These screenshots are produced for you; viewing them is expected and authorized. +- **OpenCode**: use the `read` tool on the path (requires a vision-capable model). +- **Gemini CLI**: use the `read_file` tool on the path. +- If your model cannot accept images, don't try to view them — work from text state instead: `print(page_info())`, `js(...)` extraction, `wait_for_element(...)`. + +Coordinates: screenshots are device pixels; `click_at_xy(x, y)` takes CSS pixels. Divide coordinates you read off the image by `js("window.devicePixelRatio")` first. Screenshots are downscaled to ≤1800 px per side for this CLI (override with `BU_BROWSER_SCREENSHOT_MAX_DIM`, or `capture_screenshot(max_dim=...)`). + +After every meaningful action, re-screenshot before assuming it worked. + +## Pre-imported helpers + +Navigation & tabs: `goto_url(url)`, `new_tab(url)`, `page_info()`, `current_tab()`, `list_tabs(include_chrome=True)`, `switch_tab(target)`, `ensure_real_tab()`, `iframe_target(url_substr)`. + +Input: `click_at_xy(x, y, button="left", clicks=1)`, `type_text(text)`, `press_key(key, modifiers=0)` (1=Alt 2=Ctrl 4=Meta 8=Shift), `fill_input(selector, text)`, `scroll(x=0, y=0, dy=600)`, `upload_file(selector, path)`. + +Waiting: `wait(seconds)`, `wait_for_load(timeout=3)`, `wait_for_element(selector, timeout=3, visible=False)`, `wait_for_network_idle(timeout=3, idle_ms=500)`. + +Visual: `capture_screenshot(label="...", full=False, max_dim=None)`, `screenshot()`, `screenshot_clip(label, x, y, w, h)`, `note(caption)`. + +Escape hatches: `js(expression)` (auto-wraps top-level `return`), `cdp("Domain.method", **params)` (raw CDP), `cdp_batch(calls)`, `drain_events()`. + +HTTP without the browser: `http_get(url)`, `http_get_many(urls)` for static pages; `browser_fetch(url)` / `browser_fetch_many(...)` to fetch with the page's cookies/session. + +Credentials (if the user stored any): `available_secrets()`, then `type_text("name")` or `fill_input(sel, secret("name"))`; `totp("name")` for 2FA codes. Values are placeholder-substituted — you never see them. `is_logged_out()`, `email_inbox()` / `email_message(id)` for email-code flows. + +Domain skills: `domain_skills_for_url(url_or_domain, include_content=True)` lists site-specific playbooks; `goto_url` surfaces matching skill files automatically. Read them before inventing selectors or flows on a complex site. + +## Browser control plane + +```bash +browser-use-terminal browser status --json +browser-use-terminal browser connect # uses the remembered preference +browser-use-terminal browser connect local # user's already-running Chrome (CDP) +browser-use-terminal browser connect managed --headless # disposable CLI-owned browser +browser-use-terminal browser preference use local|cloud|managed-headless +browser-use-terminal browser remote start # Browser Use cloud browser (needs BROWSER_USE_API_KEY) +browser-use-terminal browser doctor +browser-use-terminal browser recover reconnect-websocket +browser-use-terminal browser recover stop-owned-browser # stop the persistent managed browser +browser-use-terminal browser recover stop-owned-remote # stop the cloud browser (stops billing) +browser-use-terminal browser daemon status|stop|logs # the background daemon holding the connection +``` + +A background daemon (auto-started, one per state dir) holds the CDP connection across your commands, so the browser — and in local mode, Chrome's granted debugging permission — persists between invocations. Managed and cloud browsers also survive daemon restarts; later calls reattach instead of relaunching. Stop browsers with the recover commands above when the user is done (cloud browsers bill until stopped or timed out). + +- `exec` auto-connects, so you rarely need these. Reach for them when `status` shows a problem or the user asks for a specific browser. +- If output JSON says `status: "needs-user-action"` (e.g. pick a Chrome profile, click Allow in Chrome's permission popup, enable the remote-debugging checkbox), show the `user_prompt` to the user verbatim and wait — do not guess. +- Auth wall mid-task: stop and ask the user. Don't type credentials from screenshots; use stored secrets if available. +- Connecting to the user's real Chrome requires a one-time setup: `chrome://inspect/#remote-debugging` → tick "Allow remote debugging". `browser local setup` walks the user through it. + +## What actually works + +- Screenshots first: `capture_screenshot()` → view the image → decide whether you need a click, a selector, or more navigation. +- Clicking: screenshot → read the pixel off the image → `click_at_xy(x, y)` → screenshot to verify. Suppress the locate-then-click reflex — no getBoundingClientRect, no selector hunts. Hit-testing happens in Chrome's browser process, so coordinate clicks pass through iframes / shadow DOM / cross-origin without extra work. +- Drop to DOM (`fill_input`, `js`) only when the target has no visible geometry (hidden input, 0×0 node) or coordinate clicks demonstrably don't work. +- Bulk static pages: `http_get_many(urls)` — no browser needed. Logged-in pages: `browser_fetch(url)` rides the real session. +- After goto: `wait_for_load()`. SPAs report `complete` before they render — follow with `wait_for_element(...)`. +- Wrong/stale tab: `ensure_real_tab()`. +- Verification: `print(page_info())` is the cheapest "is this alive?" check; screenshots are the default way to verify visible actions. + +## Gotchas (field-tested) + +- CDP target order ≠ Chrome's visible tab-strip order. +- Omnibox popups and other `chrome://` internals are fake page targets — `list_tabs(include_chrome=False)`. +- `page_info()` surfaces an open JS dialog as `{"dialog": ...}` — handle it (`cdp("Page.handleJavaScriptDialog", accept=True)`) before anything else. +- Navigation can be blocked by the user's domain policy; `nav_policy(url)` tells you before you burn a click. A blocked navigation is policy, not a bug — tell the user. +- Scripts time out (default 300s): keep each `exec` small and observable rather than one mega-script. Long extraction loops: print progress as you go — stdout is captured even on timeout. +- Prefer compositor-level actions over framework hacks. If you do need framework-specific DOM tricks, run `browser-use-terminal browser domain skills --domain --json --include-content` first — that's where site playbooks live. diff --git a/crates/browser-use-agent/src/tools/handlers/browser.rs b/crates/browser-use-agent/src/tools/handlers/browser.rs index e07dfc9a..0c98f72e 100644 --- a/crates/browser-use-agent/src/tools/handlers/browser.rs +++ b/crates/browser-use-agent/src/tools/handlers/browser.rs @@ -614,6 +614,19 @@ pub(crate) fn browser_command_is_passive(words: &[&str]) -> bool { words, ["browser", "status", ..] | ["status", ..] + // Read-only / informational commands must never trigger an + // auto-connect: `help`/`doctor`/`domain` are the first things an + // external assistant runs to orient itself. + | ["browser", "help", ..] + | ["help", ..] + | ["browser", "--help", ..] + | ["--help", ..] + | ["browser", "-h", ..] + | ["-h", ..] + | ["browser", "doctor", ..] + | ["doctor", ..] + | ["browser", "domain", ..] + | ["domain", ..] | ["browser", "connect", ..] | ["connect", ..] | ["browser", "local", "list", ..] @@ -2917,105 +2930,17 @@ impl ToolRuntime for BrowserTool { BrowserAction::Command { command } => { let selected_browser_mode = selected_browser_mode.as_deref(); let out = if let Some(persistence) = &persistence { - let store = persistence.store.lock().map_err(|_| { - ToolError::Other(anyhow::anyhow!("store mutex poisoned")) - })?; - if let Some(content) = dispatch_browser_preference_command_for_mode( - &store, + run_browser_command_with_shared_store( + &persistence.store, backend.as_ref(), &session_id, &cwd, &artifact_dir, &command, selected_browser_mode, - ) - .map_err(|error| ToolError::Rejected(format!("{error:#}")))? - { - BrowserCommandOutput { - content, - events: Vec::new(), - } - } else { - let resolved = resolve_browser_command_for_selected_mode( - Some(&store), - &command, - selected_browser_mode, - selected_browser_profile_id.as_deref(), - ) - .map_err(|error| ToolError::Rejected(format!("{error:#}")))?; - let preferred_browser = selected_local_browser.clone().or_else(|| { - store - .get_setting(BROWSER_PREF_BROWSER) - .ok() - .flatten() - .filter(|browser| !browser.trim().is_empty()) - }); - let effective_mode = - effective_browser_mode(Some(&store), selected_browser_mode) - .map_err(|error| ToolError::Rejected(format!("{error:#}")))?; - let store_profile_id = if matches!(effective_mode, "local" | "cloud") { - stored_profile_for_mode(&store, effective_mode) - .map_err(|error| ToolError::Rejected(format!("{error:#}")))? - } else { - None - }; - let default_profile_id = - selected_browser_profile_id.clone().or(store_profile_id); - let default_profile_id = if matches!(effective_mode, "local" | "cloud") - { - default_profile_id - } else { - None - }; - let has_default_profile = default_profile_id.is_some(); - drop(store); - if let Some(preflight) = local_connect_default_profile_preflight( - has_default_profile, - preferred_browser.as_deref(), - backend.as_ref(), - &session_id, - &cwd, - &artifact_dir, - &resolved, - ) - .map_err(|error| ToolError::Rejected(format!("{error:#}")))? - { - preflight - } else { - open_default_profile_before_local_connect( - backend.as_ref(), - &session_id, - &cwd, - &artifact_dir, - &resolved, - default_profile_id.as_deref(), - ) - .map_err(ToolError::Other)?; - let output = backend - .command(&session_id, &cwd, &artifact_dir, &resolved) - .map_err(ToolError::Other)?; - let output = enrich_local_profiles_with_default_profile( - output, - &resolved, - default_profile_id.as_deref(), - ); - let output = enforce_local_connect_default_profile_context( - output, - &resolved, - default_profile_id.as_deref(), - ); - let output = enrich_local_connect_recovery_with_default_profile( - output, - &resolved, - default_profile_id.as_deref(), - ); - enrich_status_with_selected_browser_mode( - output, - &resolved, - Some(effective_mode), - ) - } - } + selected_browser_profile_id.as_deref(), + selected_local_browser.as_deref(), + )? } else { let resolved = resolve_browser_command_for_selected_mode( None, @@ -3048,62 +2973,21 @@ impl ToolRuntime for BrowserTool { } BrowserAction::Execute { script, .. } => { if let Some(persistence) = &persistence { - let store = persistence.store.lock().map_err(|_| { - ToolError::Other(anyhow::anyhow!("store mutex poisoned")) - })?; - let mode = - effective_browser_mode(Some(&store), selected_browser_mode.as_deref()) - .map_err(|error| ToolError::Rejected(format!("{error:#}")))?; - let default_profile_id = if mode == "local" { - selected_browser_profile_id.clone().or_else(|| { - stored_profile_for_mode(&store, "local") - .ok() - .flatten() - .filter(|profile| !profile.trim().is_empty()) - }) - } else { - None - }; - let preferred_browser = if mode == "local" { - selected_local_browser.clone().or_else(|| { - store - .get_setting(BROWSER_PREF_BROWSER) - .ok() - .flatten() - .filter(|browser| !browser.trim().is_empty()) - }) - } else { - None - }; - drop(store); - if mode == "local" && default_profile_id.is_none() { - if let Some(preflight) = local_connect_default_profile_preflight( - false, - preferred_browser.as_deref(), - backend.as_ref(), - &session_id, - &cwd, - &artifact_dir, - "browser connect local", - ) - .map_err(|error| ToolError::Rejected(format!("{error:#}")))? - { - return Ok(map_command_output(preflight)); - } - } - ensure_browser_ready_for_work( + if let Some(preflight) = prepare_browser_for_script_with_shared_store( + &persistence.store, backend.as_ref(), &session_id, &cwd, &artifact_dir, - mode, - default_profile_id.as_deref(), - ) - .map_err(ToolError::Other)?; + selected_browser_mode.as_deref(), + selected_browser_profile_id.as_deref(), + selected_local_browser.as_deref(), + )? { + return Ok(map_command_output(preflight)); + } } // Re-resolve the secrets + nav policy on every run (fail closed) - // so secret/domain changes take effect mid-session. Cheap now - // that values live in an encrypted file, not the OS keychain. + // so secret/domain changes take effect mid-session if let Some(persistence) = &persistence { let store = persistence.store.lock().map_err(|_| { ToolError::Other(anyhow::anyhow!("store mutex poisoned")) @@ -3174,6 +3058,337 @@ impl ToolRuntime for BrowserTool { } } +/// Run a `browser ` control-plane command with full store-backed +/// behavior: preference dispatch, mode/profile resolution, local-profile +/// preflight, and the default-profile enrichers. +/// +/// Shared by the in-session [`BrowserTool`] command path and the external +/// assistant CLI surface ([`run_external_browser_command`]) so both see +/// identical heuristics. +#[allow(clippy::too_many_arguments)] +fn run_browser_command_with_shared_store( + shared_store: &SharedStore, + backend: &dyn BrowserBackend, + session_id: &str, + cwd: &std::path::Path, + artifact_dir: &std::path::Path, + command: &str, + selected_browser_mode: Option<&str>, + selected_browser_profile_id: Option<&str>, + selected_local_browser: Option<&str>, +) -> Result { + let store = shared_store + .lock() + .map_err(|_| ToolError::Other(anyhow::anyhow!("store mutex poisoned")))?; + if let Some(content) = dispatch_browser_preference_command_for_mode( + &store, + backend, + session_id, + cwd, + artifact_dir, + command, + selected_browser_mode, + ) + .map_err(|error| ToolError::Rejected(format!("{error:#}")))? + { + return Ok(BrowserCommandOutput { + content, + events: Vec::new(), + }); + } + let resolved = resolve_browser_command_for_selected_mode( + Some(&store), + command, + selected_browser_mode, + selected_browser_profile_id, + ) + .map_err(|error| ToolError::Rejected(format!("{error:#}")))?; + let preferred_browser = selected_local_browser.map(str::to_string).or_else(|| { + store + .get_setting(BROWSER_PREF_BROWSER) + .ok() + .flatten() + .filter(|browser| !browser.trim().is_empty()) + }); + let effective_mode = effective_browser_mode(Some(&store), selected_browser_mode) + .map_err(|error| ToolError::Rejected(format!("{error:#}")))?; + let store_profile_id = if matches!(effective_mode, "local" | "cloud") { + stored_profile_for_mode(&store, effective_mode) + .map_err(|error| ToolError::Rejected(format!("{error:#}")))? + } else { + None + }; + let default_profile_id = selected_browser_profile_id + .map(str::to_string) + .or(store_profile_id); + let default_profile_id = if matches!(effective_mode, "local" | "cloud") { + default_profile_id + } else { + None + }; + let has_default_profile = default_profile_id.is_some(); + drop(store); + if let Some(preflight) = local_connect_default_profile_preflight( + has_default_profile, + preferred_browser.as_deref(), + backend, + session_id, + cwd, + artifact_dir, + &resolved, + ) + .map_err(|error| ToolError::Rejected(format!("{error:#}")))? + { + return Ok(preflight); + } + open_default_profile_before_local_connect( + backend, + session_id, + cwd, + artifact_dir, + &resolved, + default_profile_id.as_deref(), + ) + .map_err(ToolError::Other)?; + let output = backend + .command(session_id, cwd, artifact_dir, &resolved) + .map_err(ToolError::Other)?; + let output = enrich_local_profiles_with_default_profile( + output, + &resolved, + default_profile_id.as_deref(), + ); + let output = enforce_local_connect_default_profile_context( + output, + &resolved, + default_profile_id.as_deref(), + ); + let output = enrich_local_connect_recovery_with_default_profile( + output, + &resolved, + default_profile_id.as_deref(), + ); + Ok(enrich_status_with_selected_browser_mode( + output, + &resolved, + Some(effective_mode), + )) +} + +/// Make the browser ready for a `browser_script` run: resolve the effective +/// mode from the store, run the local default-profile preflight, and +/// auto-connect/auto-start the configured browser when needed. +/// +/// Returns `Some(preflight)` when browser work is blocked on a user decision +/// (e.g. no default local Chrome profile chosen yet); the caller must surface +/// that output instead of running the script. +/// +/// Shared by the in-session [`BrowserTool`] execute path and the external +/// assistant CLI surface ([`run_external_browser_script`]). +#[allow(clippy::too_many_arguments)] +fn prepare_browser_for_script_with_shared_store( + shared_store: &SharedStore, + backend: &dyn BrowserBackend, + session_id: &str, + cwd: &std::path::Path, + artifact_dir: &std::path::Path, + selected_browser_mode: Option<&str>, + selected_browser_profile_id: Option<&str>, + selected_local_browser: Option<&str>, +) -> Result, ToolError> { + let store = shared_store + .lock() + .map_err(|_| ToolError::Other(anyhow::anyhow!("store mutex poisoned")))?; + let mode = effective_browser_mode(Some(&store), selected_browser_mode) + .map_err(|error| ToolError::Rejected(format!("{error:#}")))?; + let default_profile_id = if mode == "local" { + selected_browser_profile_id.map(str::to_string).or_else(|| { + stored_profile_for_mode(&store, "local") + .ok() + .flatten() + .filter(|profile| !profile.trim().is_empty()) + }) + } else { + None + }; + let preferred_browser = if mode == "local" { + selected_local_browser.map(str::to_string).or_else(|| { + store + .get_setting(BROWSER_PREF_BROWSER) + .ok() + .flatten() + .filter(|browser| !browser.trim().is_empty()) + }) + } else { + None + }; + drop(store); + if mode == "local" && default_profile_id.is_none() { + if let Some(preflight) = local_connect_default_profile_preflight( + false, + preferred_browser.as_deref(), + backend, + session_id, + cwd, + artifact_dir, + "browser connect local", + ) + .map_err(|error| ToolError::Rejected(format!("{error:#}")))? + { + return Ok(Some(preflight)); + } + } + ensure_browser_ready_for_work( + backend, + session_id, + cwd, + artifact_dir, + mode, + default_profile_id.as_deref(), + ) + .map_err(ToolError::Other)?; + Ok(None) +} + +// ============================================================================ +// External assistant CLI surface +// ============================================================================ +// +// `browser-use-terminal browser ...` lets external coding assistants +// (Claude Code, Codex, OpenCode, ...) drive the browser through one-shot CLI +// invocations. These blocking entry points reuse the +// exact preference resolution, connect heuristics, security policy, and event +// persistence of the in-session `browser` / `browser_script` tools so an +// external assistant and the built-in agent see identical behavior. + +/// Outcome of an external `browser exec` script request. +#[derive(Debug)] +pub enum ExternalBrowserScriptOutcome { + /// The script ran. `output.ok` may still be false on script errors. + Ran(BrowserScriptOutput), + /// Browser work is blocked on a user decision before any script can run + /// (e.g. no default local Chrome profile is chosen yet). The payload is the + /// preflight JSON with `user_prompt` / `next_step` guidance. + Blocked(Value), +} + +fn external_tool_error(error: ToolError) -> anyhow::Error { + match error { + ToolError::Rejected(message) => anyhow!(message), + ToolError::Other(error) => error, + ToolError::Sandboxed(denial) => anyhow!( + "browser call denied by sandbox: {}", + denial.output.stderr.trim() + ), + } +} + +/// Configure a backend with the store-preferred browser mode so its +/// auto-connect behavior (the harness `ensure_daemon()` analog) targets the +/// right browser. +fn set_backend_browser_mode_from_store( + shared_store: &SharedStore, + backend: &dyn BrowserBackend, +) -> anyhow::Result<()> { + let store = shared_store + .lock() + .map_err(|_| anyhow!("store mutex poisoned"))?; + let mode = preferred_browser_mode(Some(&store))?; + drop(store); + backend.set_browser_mode(Some(mode.to_string())); + Ok(()) +} + +/// Run a `browser ` control-plane command for an external assistant CLI. +/// +/// Blocking. Resolves the preferred mode from the store (so plain +/// `browser connect` honors `browser preference use ...`), runs the same +/// preflights/enrichers as the in-session tool, and records browser events to +/// the session's durable event log. +pub fn run_external_browser_command( + shared_store: &SharedStore, + session_id: &str, + cwd: &std::path::Path, + artifact_dir: &std::path::Path, + command: &str, +) -> anyhow::Result { + let backend = RealBackend::default(); + set_backend_browser_mode_from_store(shared_store, &backend)?; + let out = run_browser_command_with_shared_store( + shared_store, + &backend, + session_id, + cwd, + artifact_dir, + command, + None, + None, + None, + ) + .map_err(external_tool_error)?; + if let Ok(store) = shared_store.lock() { + let _ = record_browser_command_response_events( + &store, + session_id, + "browser", + &format!("browser-cli-{session_id}"), + &out, + ); + } + Ok(out) +} + +/// Run a `browser_script` Python snippet for an external assistant CLI. +/// +/// Blocking: auto-connects the preferred browser when needed, installs the +/// secrets/navigation policy, runs the script to completion, and records the +/// response (text, images, artifacts, browser events) to the session's durable +/// event log. Screenshots taken by the script land in `artifact_dir` and their +/// paths are reported in the returned output's `images`. +pub fn run_external_browser_script( + shared_store: &SharedStore, + session_id: &str, + cwd: &std::path::Path, + artifact_dir: &std::path::Path, + code: &str, + timeout_secs: u64, +) -> anyhow::Result { + let backend = RealBackend::default(); + set_backend_browser_mode_from_store(shared_store, &backend)?; + if let Some(preflight) = prepare_browser_for_script_with_shared_store( + shared_store, + &backend, + session_id, + cwd, + artifact_dir, + None, + None, + None, + ) + .map_err(external_tool_error)? + { + return Ok(ExternalBrowserScriptOutcome::Blocked(preflight.content)); + } + { + let store = shared_store + .lock() + .map_err(|_| anyhow!("store mutex poisoned"))?; + super::secrets_admin::install_script_security(&store, session_id) + .map_err(|error| anyhow!("failed to apply browser security policy: {error:#}"))?; + } + let out = backend.run_script(session_id, cwd, artifact_dir, code, timeout_secs)?; + if let Ok(store) = shared_store.lock() { + let _ = record_browser_script_response_events_for_tool( + &store, + session_id, + "browser_script", + &format!("browser-cli-{session_id}"), + &out, + ); + } + Ok(ExternalBrowserScriptOutcome::Ran(out)) +} + #[cfg(test)] mod browser_mode_tests { use super::*; diff --git a/crates/browser-use-browser/src/lib.rs b/crates/browser-use-browser/src/lib.rs index 24c6fb13..0e4f30e5 100644 --- a/crates/browser-use-browser/src/lib.rs +++ b/crates/browser-use-browser/src/lib.rs @@ -165,11 +165,18 @@ struct ManagedBrowser { _profile_dir: Option, launch: ManagedLaunch, marker_path: PathBuf, + /// Keep the browser process (and its marker) alive when this handle drops + /// so one-shot CLI invocations can reattach to it later. See + /// [`external_browser_persistence_enabled`]. + persist: bool, } impl Drop for ManagedBrowser { fn drop(&mut self) { unregister_managed_browser_pid(self.child.id()); + if self.persist { + return; + } let _ = self.child.kill(); let _ = self.child.wait(); let _ = fs::remove_file(&self.marker_path); @@ -229,6 +236,10 @@ struct BrowserSession { active_local_profile_id: Option, preferred_browser_context_id: Option, artifact_dir: Option, + /// Marker file of a persistent managed browser this session reattached to + /// (or launched) without owning the child process. Used so explicit stops + /// can terminate the browser even across one-shot CLI processes. + persistent_managed_marker: Option, logs: VecDeque, } @@ -264,6 +275,7 @@ impl Default for BrowserSession { active_local_profile_id: None, preferred_browser_context_id: None, artifact_dir: None, + persistent_managed_marker: None, logs: VecDeque::new(), } } @@ -3200,6 +3212,7 @@ fn dispatch_recover(session: &mut BrowserSession, argv: &[String]) -> Result session.reattach_same_target(), Some("restart-runtime") => session.restart_runtime(), Some("restart-owned-browser") => session.restart_owned_browser(), + Some("stop-owned-browser") => session.stop_owned_browser(), Some("stop-owned-remote") => session.stop_owned_remote(), Some(other) => bail!("unknown browser recover command: {other}"), None => bail!("browser recover requires a recovery action"), @@ -3635,6 +3648,26 @@ impl BrowserSession { profile: ManagedProfile, extra_args: Vec, ) -> Result { + let persist = external_browser_persistence_enabled(); + let profile = if persist { + match profile { + // Persistent managed browsers need a stable profile dir so the + // next one-shot invocation can find the marker and reattach. + ManagedProfile::Temp => ManagedProfile::Path(persistent_managed_profile_path( + self.session_id.as_deref(), + )), + other => other, + } + } else { + profile + }; + if persist { + if let ManagedProfile::Path(path) = &profile { + if let Some(connected) = self.reattach_persistent_managed(path) { + return Ok(connected); + } + } + } self.stop_owned_managed(); let mut launch_errors = Vec::new(); let mut launched = None; @@ -3645,7 +3678,7 @@ impl BrowserSession { headless, extra_args: extra_args.clone(), }; - match launch_managed_browser(launch.clone(), self.session_id.clone()) { + match launch_managed_browser(launch.clone(), self.session_id.clone(), persist) { Ok((managed, http_url)) => { launched = Some((launch, managed, http_url)); break; @@ -3667,6 +3700,7 @@ impl BrowserSession { ); }; let ws_url = resolve_ws_from_http(&http_url)?; + let marker_path = managed.marker_path.clone(); self.managed = Some(managed); if let Err(error) = self.connect_endpoint( Endpoint { @@ -3681,6 +3715,7 @@ impl BrowserSession { self.stop_owned_managed(); return Err(error); } + self.persistent_managed_marker = persist.then_some(marker_path); self.browser_name = Some("Managed Chromium".to_string()); self.profile = Some(match &launch.profile { ManagedProfile::Temp => "temp".to_string(), @@ -3694,7 +3729,85 @@ impl BrowserSession { })) } + /// Reattach to a still-running Browser Use cloud browser recorded by a + /// previous one-shot invocation, instead of creating (and billing) a new + /// one. Returns `None` when the record is missing or the browser is gone. + fn reattach_persistent_cloud(&mut self) -> Option { + let record_path = persistent_cloud_record_path(self.session_id.as_deref()); + let raw = fs::read_to_string(&record_path).ok()?; + let Ok(record) = serde_json::from_str::(&raw) else { + let _ = fs::remove_file(&record_path); + return None; + }; + let (Some(id), Some(cdp_url)) = ( + record.get("id").and_then(Value::as_str), + record.get("cdpUrl").and_then(Value::as_str), + ) else { + let _ = fs::remove_file(&record_path); + return None; + }; + let Ok(ws_url) = resolve_ws_from_http(cdp_url) else { + // The cloud browser stopped (timeout or explicit stop); retire the + // record so the caller starts a fresh one. + let _ = fs::remove_file(&record_path); + return None; + }; + self.stop_owned_managed(); + self.connect_endpoint( + Endpoint { + kind: "browser-use-cloud".to_string(), + http_url: Some(cdp_url.to_string()), + ws_url, + candidate_id: None, + }, + BrowserMode::RemoteCloud, + BrowserOwner::Rust, + ) + .ok()?; + self.remote_browser_id = Some(id.to_string()); + self.live_url = record + .get("liveUrl") + .and_then(Value::as_str) + .map(ToOwned::to_owned); + self.browser_name = Some("Browser Use Cloud".to_string()); + self.profile = record + .get("profileId") + .and_then(Value::as_str) + .map(ToOwned::to_owned); + Some(json!({ + "status": "connected", + "reattached": true, + "browser": self.status_json(), + "live_url": self.live_url, + })) + } + + fn persist_cloud_record(&self, browser: &Value, requested_profile_id: Option<&str>) { + let record_path = persistent_cloud_record_path(self.session_id.as_deref()); + if let Some(parent) = record_path.parent() { + let _ = fs::create_dir_all(parent); + } + let record = json!({ + "id": self.remote_browser_id, + "cdpUrl": browser.get("cdpUrl"), + "liveUrl": browser.get("liveUrl"), + "profileId": requested_profile_id, + "started_at_ms": unix_time_ms() as u64, + }); + if let Ok(raw) = serde_json::to_vec_pretty(&record) { + let _ = fs::write(&record_path, raw); + } + } + fn start_remote_cloud(&mut self, argv: &[String]) -> Result { + // Plain `remote start` (no explicit profile/timeout/proxy options) may + // reattach to the persistent cloud browser from a prior invocation; + // explicit options always provision a fresh browser. + if external_browser_persistence_enabled() && argv.len() <= 2 { + if let Some(connected) = self.reattach_persistent_cloud() { + return Ok(connected); + } + } let mut body = serde_json::Map::new(); let mut requested_profile_id = None; if let Some(profile_id) = option_value(argv, "--profile-id") { @@ -3757,6 +3870,9 @@ impl BrowserSession { .and_then(Value::as_str) .map(ToOwned::to_owned); self.browser_name = Some("Browser Use Cloud".to_string()); + if external_browser_persistence_enabled() { + self.persist_cloud_record(&browser, requested_profile_id.as_deref()); + } self.profile = requested_profile_id; Ok(json!({ "status": "connected", @@ -3777,6 +3893,7 @@ impl BrowserSession { return Ok(json!({ "stopped": false, "reason": "missing remote browser id" })); }; stop_cloud_browser(&id)?; + let _ = fs::remove_file(persistent_cloud_record_path(self.session_id.as_deref())); if let Some(sid) = self.session_id.clone() { stop_session_capture(&sid); } @@ -4049,11 +4166,86 @@ impl BrowserSession { Ok(json!({ "restarted": true, "browser": self.status_json() })) } + /// Reattach to a still-running persistent managed browser via its marker + /// file instead of launching a new one. Returns `None` when there is no + /// live browser to reattach to (caller falls through to a fresh launch). + fn reattach_persistent_managed(&mut self, profile_path: &Path) -> Option { + let marker_path = profile_path.join(MANAGED_BROWSER_MARKER_FILE); + let raw = fs::read_to_string(&marker_path).ok()?; + let marker = serde_json::from_str::(&raw).ok()?; + if !process_command_matches_managed_marker(marker.pid, &marker.profile_path) { + return None; + } + let http_url = format!("http://127.0.0.1:{}", marker.port); + let ws_url = resolve_ws_from_http(&http_url).ok()?; + // Clear any current attachment state without killing the browser we + // are about to reattach to. + self.persistent_managed_marker = None; + self.detach_managed_state(); + self.connect_endpoint( + Endpoint { + kind: "cdp-url".to_string(), + http_url: Some(http_url), + ws_url, + candidate_id: None, + }, + BrowserMode::Managed, + BrowserOwner::Rust, + ) + .ok()?; + self.persistent_managed_marker = Some(marker_path); + self.browser_name = Some("Managed Chromium".to_string()); + self.profile = Some(profile_path.display().to_string()); + Some(json!({ + "status": "connected", + "reattached": true, + "browser": self.status_json(), + "next_step": "Continue immediately with the user's requested browser/search/page work in this connected managed browser.", + "model_instruction": "Browser connection is setup only. Do not answer the user's browser/search/page task from memory or stop after connecting; continue with page work now.", + })) + } + + /// Explicitly stop a Rust-owned managed browser, including persistent + /// managed browsers reattached from a previous one-shot invocation. + fn stop_owned_browser(&mut self) -> Result { + let owned = self.managed.is_some() + || self.persistent_managed_marker.is_some() + || (self.owner == BrowserOwner::Rust && self.mode == BrowserMode::Managed); + if !owned { + return Ok(json!({ + "stopped": false, + "reason": "current browser is not a Rust-owned managed browser", + })); + } + self.stop_owned_managed(); + Ok(json!({ "stopped": true })) + } + fn stop_owned_managed(&mut self) { if let Some(mut managed) = self.managed.take() { let _ = managed.child.kill(); let _ = managed.child.wait(); + // Explicit stops always retire the marker, even for persistent + // managed browsers whose Drop would otherwise keep it. + let _ = fs::remove_file(&managed.marker_path); + } else if let Some(marker_path) = self.persistent_managed_marker.take() { + if let Ok(raw) = fs::read_to_string(&marker_path) { + if let Ok(marker) = serde_json::from_str::(&raw) { + if process_command_matches_managed_marker(marker.pid, &marker.profile_path) { + terminate_process(marker.pid); + } + } + } + let _ = fs::remove_file(&marker_path); + if let Some(profile_dir) = marker_path.parent() { + let _ = fs::remove_file(profile_dir.join("DevToolsActivePort")); + } } + self.persistent_managed_marker = None; + self.detach_managed_state(); + } + + fn detach_managed_state(&mut self) { if self.mode == BrowserMode::Managed { if let Some(sid) = self.session_id.clone() { stop_session_capture(&sid); @@ -6115,9 +6307,17 @@ fn process_command_matches_managed_marker(_pid: u32, _profile_path: &Path) -> bo #[cfg(unix)] fn terminate_process(pid: u32) { let pid = pid.to_string(); - let _ = Command::new("kill").args(["-TERM", &pid]).status(); + let _ = Command::new("kill") + .args(["-TERM", &pid]) + .stdout(Stdio::null()) + .stderr(Stdio::null()) + .status(); thread::sleep(Duration::from_millis(200)); - let _ = Command::new("kill").args(["-KILL", &pid]).status(); + let _ = Command::new("kill") + .args(["-KILL", &pid]) + .stdout(Stdio::null()) + .stderr(Stdio::null()) + .status(); } #[cfg(not(unix))] @@ -6145,9 +6345,47 @@ fn write_managed_browser_marker( Ok(marker_path) } +/// Whether externally-driven one-shot CLI invocations should keep Rust-owned +/// browsers (managed Chromium, Browser Use cloud) alive across processes and +/// reattach to them, instead of stopping them when the process exits. +/// +/// Set by `browser-use-terminal browser ...` (the assistant plugin surface); +/// long-lived hosts (TUI/SDK) keep the default in-process ownership. +fn external_browser_persistence_enabled() -> bool { + env_bool("BROWSER_USE_TERMINAL_PERSIST_BROWSERS") == Some(true) +} + +/// Root directory for persistent external-browser state (managed profiles and +/// cloud browser records). Override with `BU_EXTERNAL_BROWSER_STATE_DIR` (the +/// external CLI points it inside the resolved state dir). +fn persistent_external_browser_state_root() -> PathBuf { + std::env::var_os("BU_EXTERNAL_BROWSER_STATE_DIR") + .filter(|value| !value.is_empty()) + .map(PathBuf::from) + .unwrap_or_else(|| { + home_dir() + .map(|home| home.join(".browser-use-terminal")) + .unwrap_or_else(|| PathBuf::from(".browser-use-terminal")) + .join("external-browser") + }) +} + +fn persistent_managed_profile_path(session_id: Option<&str>) -> PathBuf { + persistent_external_browser_state_root() + .join("managed") + .join(session_id.unwrap_or("default")) +} + +fn persistent_cloud_record_path(session_id: Option<&str>) -> PathBuf { + persistent_external_browser_state_root() + .join("cloud") + .join(format!("{}.json", session_id.unwrap_or("default"))) +} + fn launch_managed_browser( launch: ManagedLaunch, owner_session_id: Option, + persist: bool, ) -> Result<(ManagedBrowser, String)> { let (profile_path, temp_dir) = match &launch.profile { ManagedProfile::Temp => { @@ -6225,6 +6463,7 @@ fn launch_managed_browser( _profile_dir: temp_dir, launch, marker_path, + persist, }, http_url, )); @@ -7011,7 +7250,8 @@ fn local_profile_cookies(profile: &LocalBrowserProfile) -> Result> { headless: true, extra_args: vec!["--no-startup-window".to_string()], }; - let (mut managed, http_url) = launch_managed_browser(launch, None)?; + // Profile inspection browsers are always throwaway; never persist them. + let (mut managed, http_url) = launch_managed_browser(launch, None, false)?; let result = (|| -> Result> { let ws_url = resolve_ws_from_http(&http_url)?; let mut connection = CdpConnection::connect(&ws_url)?; @@ -13943,6 +14183,49 @@ print("large response ok", len(data["blob"])) .contains("DevToolsActivePort")); } + #[test] + fn stop_owned_browser_reports_not_owned_for_fresh_session() { + let mut session = BrowserSession::default(); + let result = session.stop_owned_browser().unwrap(); + assert_eq!(result["stopped"], false); + } + + #[test] + fn stop_owned_browser_terminates_persistent_marker_browser() { + let temp = tempfile::tempdir().unwrap(); + let marker_path = temp.path().join(MANAGED_BROWSER_MARKER_FILE); + let marker = ManagedBrowserMarker { + // A pid that does not match a managed chrome command line, so the + // stop path only retires the marker without killing anything. + pid: 999_999, + port: 9, + executable: "chrome".to_string(), + profile_path: temp.path().to_path_buf(), + owner_session_id: Some("owner-session".to_string()), + started_at_ms: 1, + }; + fs::write(&marker_path, serde_json::to_vec(&marker).unwrap()).unwrap(); + fs::write(temp.path().join("DevToolsActivePort"), "9\nstale\n").unwrap(); + + let mut session = BrowserSession { + persistent_managed_marker: Some(marker_path.clone()), + ..BrowserSession::default() + }; + let result = session.stop_owned_browser().unwrap(); + assert_eq!(result["stopped"], true); + assert!(session.persistent_managed_marker.is_none()); + assert!(!marker_path.exists()); + assert!(!temp.path().join("DevToolsActivePort").exists()); + } + + #[test] + fn persistent_external_browser_paths_are_session_scoped() { + let managed = persistent_managed_profile_path(Some("browser-cli-work")); + assert!(managed.ends_with("managed/browser-cli-work")); + let cloud = persistent_cloud_record_path(None); + assert!(cloud.ends_with("cloud/default.json")); + } + #[test] fn active_managed_browser_marker_is_not_reaped() { let temp = tempfile::tempdir().unwrap(); diff --git a/crates/browser-use-cli/src/main.rs b/crates/browser-use-cli/src/main.rs index d5620df4..63add3ea 100644 --- a/crates/browser-use-cli/src/main.rs +++ b/crates/browser-use-cli/src/main.rs @@ -218,6 +218,14 @@ enum Command { Start { text: String, }, + /// Run a browser task to completion with the configured default + /// provider/model (resolved like the TUI). Streams progress and exits when + /// the task finishes — the delegation entry point for external assistants. + Run { + text: String, + #[arg(long)] + model: Option, + }, RunFake { text: String, #[arg(long)] @@ -338,6 +346,41 @@ enum Command { task_id: String, code: String, }, + /// Browser management for external assistants + /// + /// `browser exec [CODE]` runs Python with browser helpers pre-imported + /// (reads stdin when CODE is omitted — heredoc-friendly) and prints the + /// script output plus `Screenshot saved to ` lines. Any other + /// arguments are forwarded to the browser control plane + /// (`status --json`, `connect local`, `doctor`, ...); run + /// `browser help` for the full command list. The browser auto-connects + /// using the remembered preference, and the browser itself persists across + /// invocations. + Browser { + /// Named workstream; isolates artifact dirs and event logs. + #[arg(long, default_value = "default")] + session: String, + /// Per-exec script timeout in seconds. + #[arg(long, default_value_t = 300)] + timeout: u64, + /// Print the full structured response as JSON. + #[arg(long)] + json: bool, + /// `exec [CODE]` or control-plane command words. + #[arg(trailing_var_arg = true, allow_hyphen_values = true)] + args: Vec, + }, + /// Internal: long-lived daemon backing the `browser` subcommand. Holds the + /// CDP connection so Chrome's per-connection permission prompt fires once, + /// not on every one-shot CLI invocation. Spawned automatically. + #[command(name = "browser-daemon", hide = true)] + BrowserDaemon, + /// Print or install the assistant-facing skill (SKILL.md) that teaches + /// coding assistants how to use the `browser` subcommand. + Skill { + #[command(subcommand)] + command: SkillCommand, + }, SyncCookies { #[arg(value_name = "LOCAL_PROFILE")] profile: Option, @@ -746,6 +789,32 @@ enum SessionsCommand { }, } +#[derive(Debug, Subcommand)] +enum SkillCommand { + /// Print the skill markdown to stdout. + Show, + /// Print the canonical install location for each assistant. + Paths, + /// Install the skill for coding assistants. With no argument, installs for + /// every assistant whose home directory exists. + Install { + #[arg(value_enum)] + assistant: Option, + }, +} + +#[derive(Clone, Copy, Debug, PartialEq, Eq, ValueEnum)] +enum SkillAssistant { + /// Claude Code (~/.claude/skills). OpenCode also discovers this location. + Claude, + /// Codex CLI ($CODEX_HOME or ~/.codex, under skills/). + Codex, + /// OpenCode (~/.config/opencode/skills). + Opencode, + /// The cross-assistant agents dir (~/.agents/skills). + Agents, +} + #[derive(Clone, Copy, Debug, PartialEq, Eq, ValueEnum)] enum AuthAccount { Codex, @@ -864,6 +933,15 @@ fn main() -> Result<()> { }; match args.command { Command::Start { text } => start(&store, text), + Command::Run { text, model } => run_default( + &store, + text, + model, + config_profile.as_deref(), + &config_overrides, + collaboration_mode, + &runtime_options, + ), Command::RunFake { text, python_code } => run_fake(&store, text, python_code), Command::RunOpenai { text, model } => run_openai( &store, @@ -1001,6 +1079,14 @@ fn main() -> Result<()> { Command::Events { task_id } => events(&store, &task_id), Command::Python { task_id, code } => python(&store, &task_id, code), Command::BrowserScript { task_id, code } => browser_script(&store, &task_id, code), + Command::Browser { + session, + timeout, + json, + args: browser_args, + } => browser_cli(store, &session, timeout, json, browser_args), + Command::BrowserDaemon => browser_external_daemon(store), + Command::Skill { command } => skill(command), Command::SyncCookies { profile, local_profile, @@ -1267,6 +1353,7 @@ fn main() -> Result<()> { fn command_name(command: &Command) -> &'static str { match command { Command::Start { .. } => "start", + Command::Run { .. } => "run", Command::RunFake { .. } => "run_fake", Command::RunOpenai { .. } => "run_openai", Command::RunBrowserUse { .. } => "run_browser_use", @@ -1292,6 +1379,9 @@ fn command_name(command: &Command) -> &'static str { Command::Events { .. } => "events", Command::Python { .. } => "python", Command::BrowserScript { .. } => "browser_script", + Command::Browser { .. } => "browser", + Command::BrowserDaemon => "browser_daemon", + Command::Skill { .. } => "skill", Command::SyncCookies { .. } => "sync_cookies", Command::UserShell { .. } => "user_shell", Command::Review { .. } => "review", @@ -2314,6 +2404,143 @@ fn dataset_browser_mode(options: &DatasetRunOptions) -> String { .replace(['_', ' '], "-") } +fn provider_backend_from_env_keys() -> Option { + let has = |key: &str| std::env::var(key).is_ok_and(|value| !value.trim().is_empty()); + if has("ANTHROPIC_API_KEY") { + Some(ProviderBackend::Anthropic) + } else if has("OPENAI_API_KEY") { + Some(ProviderBackend::Openai) + } else if has("BROWSER_USE_API_KEY") { + Some(ProviderBackend::BrowserUse) + } else if has("OPENROUTER_API_KEY") { + Some(ProviderBackend::Openrouter) + } else if has("DEEPSEEK_API_KEY") { + Some(ProviderBackend::Deepseek) + } else if has("GEMINI_API_KEY") || has("GOOGLE_API_KEY") { + Some(ProviderBackend::Google) + } else { + None + } +} + +/// `browser-use-terminal run ""`: provider-agnostic task run. +/// +/// Resolves the provider from the user's config (what `/model` in the TUI +/// persists), falling back to whichever provider API key is set in the +/// environment, then dispatches to the matching `run-*` path. Unlike `start` +/// (which only queues a session), this drives the task to completion. +fn run_default( + store: &Store, + text: String, + model: Option, + config_profile: Option<&str>, + raw_config_overrides: &[String], + collaboration_mode: CollaborationModeKind, + runtime_options: &CliRuntimeOptions, +) -> Result<()> { + let overrides = parse_cli_config_overrides(raw_config_overrides)?; + let cwd = std::env::current_dir()?; + let configured = + configured_model_provider_id_for_cwd_with_options(&cwd, config_profile, &overrides)?; + let backend = match configured.as_deref().map(str::trim) { + Some(provider_id) if !provider_id.is_empty() => { + ProviderBackend::from_provider_id(provider_id).unwrap_or(ProviderBackend::Openai) + } + _ => provider_backend_from_env_keys().ok_or_else(|| { + anyhow::anyhow!( + "no model provider is configured. Configure one in the TUI (`browser`, then \ + /auth and /model), set a provider API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, \ + OPENROUTER_API_KEY, DEEPSEEK_API_KEY, GEMINI_API_KEY, or BROWSER_USE_API_KEY), or use an \ + explicit command such as `browser-use-terminal run-anthropic \"\"`." + ) + })?, + }; + match backend { + ProviderBackend::Openai => run_openai( + store, + text, + model, + config_profile, + raw_config_overrides, + collaboration_mode, + runtime_options, + ), + ProviderBackend::Anthropic + | ProviderBackend::Openrouter + | ProviderBackend::Deepseek + | ProviderBackend::Google + | ProviderBackend::BrowserUse + | ProviderBackend::Codex => { + let (model, _source) = resolve_cli_model_with_source( + backend, + model, + config_profile, + raw_config_overrides, + )?; + match backend { + ProviderBackend::Anthropic => run_anthropic( + store, + text, + model, + config_profile, + raw_config_overrides, + collaboration_mode, + runtime_options, + ), + ProviderBackend::Openrouter => run_openrouter( + store, + text, + model, + config_profile, + raw_config_overrides, + collaboration_mode, + runtime_options, + ), + ProviderBackend::Deepseek => run_deepseek( + store, + text, + model, + config_profile, + raw_config_overrides, + collaboration_mode, + runtime_options, + ), + ProviderBackend::Google => run_google( + store, + text, + model, + config_profile, + raw_config_overrides, + collaboration_mode, + runtime_options, + ), + ProviderBackend::BrowserUse => run_browser_use( + store, + text, + model, + config_profile, + raw_config_overrides, + collaboration_mode, + runtime_options, + ), + ProviderBackend::Codex => run_codex( + store, + text, + model, + config_profile, + raw_config_overrides, + collaboration_mode, + runtime_options, + ), + _ => unreachable!(), + } + } + ProviderBackend::Fake | ProviderBackend::None => { + bail!("no runnable model provider configured") + } + } +} + fn run_openai( store: &Store, text: String, @@ -3175,7 +3402,21 @@ fn show(store: &Store, task_id: &str) -> Result<()> { let title = task_from_events(&events).unwrap_or_else(|| "untitled task".to_string()); let browser = browser_summary_from_events(&events, "local chrome"); println!("Task: {title}"); - println!("Status: {}", task.status.as_str()); + // A session created by `start` (or an SDK queue) has no agent attached yet; + // reporting it as plain "running" sends pollers into infinite loops. + let agent_started = events.iter().any(|event| { + !matches!( + event.event_type.as_str(), + "session.created" | "session.input" + ) + }); + if task.status.is_active() && !agent_started { + println!( + "Status: queued (no agent attached — drive it with `browser-use-terminal run \"\"` next time, or `run-anthropic-session {task_id}` to run this one)" + ); + } else { + println!("Status: {}", task.status.as_str()); + } if let Some(url) = browser.url { println!("Browser: {url}"); } @@ -3316,6 +3557,845 @@ fn browser_script(store: &Store, task_id: &str, code: String) -> Result<()> { ) } +/// Assistant-facing skill document, embedded so installed binaries can write +/// it without a repo checkout. Source of truth: `SKILL.md` at the repo root. +const SKILL_MD: &str = include_str!("../../../SKILL.md"); +const SKILL_DIR_NAME: &str = "browser-use-terminal"; +const EXTERNAL_BROWSER_SESSION_PREFIX: &str = "browser-cli-"; +/// Screenshot downscale ceiling for the external CLI surface. Assistants view +/// screenshots through their file-read tools, several of which resize or +/// reject images above ~2000 px per side (Codex `view_image` caps at 2048). +const EXTERNAL_SCREENSHOT_MAX_DIM: &str = "1800"; + +/// One parsed `browser` CLI invocation: either Python to exec or a +/// control-plane command string. +enum ExternalBrowserAction { + Exec { code: String }, + Command { command: String }, +} + +/// `browser-use-terminal browser ...`: browser management for external coding +/// assistants. One-shot invocations run against a durable named session and +/// route through a long-lived per-state-dir daemon that holds the CDP +/// connection (so Chrome's per-connection permission prompt fires once, like +/// the TUI), falling back to in-process execution if the daemon can't start. +fn browser_cli( + store: Store, + session: &str, + timeout: u64, + json: bool, + args: Vec, +) -> Result<()> { + let session_name = sanitize_external_session_name(session)?; + let session_id = format!("{EXTERNAL_BROWSER_SESSION_PREFIX}{session_name}"); + let cwd = std::env::current_dir().context("resolve current dir")?; + let state_dir = store.state_dir().to_path_buf(); + + let mut argv = args; + if argv.first().map(String::as_str) == Some("browser") { + argv.remove(0); + } + if argv.is_empty() { + argv.push("help".to_string()); + } + + // Daemon control operates on the daemon itself and must not side-effect a + // browser session row / artifact dir; intercept before the session ensure. + if argv.first().map(String::as_str) == Some("daemon") { + return browser_external_daemon_control(&state_dir, &argv); + } + + let task = match store.load_session(&session_id)? { + Some(task) => task, + None => { + let artifact_root = store.state_dir().join("artifacts").join(&session_id); + store.create_session_with_id_and_artifact_root( + None, + &cwd, + artifact_root, + session_id.clone(), + )? + } + }; + let artifact_dir = PathBuf::from(&task.artifact_root); + + let action = if argv.first().map(String::as_str) == Some("exec") { + let code = if argv.len() == 1 || (argv.len() == 2 && argv[1] == "-") { + let mut buffer = String::new(); + io::stdin() + .read_to_string(&mut buffer) + .context("read Python code from stdin")?; + buffer + } else { + argv[1..].join(" ") + }; + if code.trim().is_empty() { + bail!("no Python code provided: pass it as an argument or on stdin (heredoc)"); + } + ExternalBrowserAction::Exec { code } + } else { + ExternalBrowserAction::Command { + command: join_browser_command_words(&argv), + } + }; + + if external_browser_daemon_enabled() { + match ensure_external_browser_daemon(&state_dir) { + Ok(socket) => { + drop(store); + return browser_cli_via_daemon( + &socket, + &session_id, + &cwd, + &artifact_dir, + timeout, + json, + action, + ); + } + Err(error) => { + eprintln!( + "warning: browser daemon unavailable ({error:#}); running in-process. \ + Local Chrome may re-prompt its debugging permission." + ); + } + } + } + apply_external_browser_env_defaults(&state_dir); + let shared: SharedStore = Arc::new(Mutex::new(store)); + browser_cli_in_process( + &shared, + &session_id, + &cwd, + &artifact_dir, + timeout, + json, + action, + ) +} + +/// Persistence + screenshot env defaults shared by the daemon and the +/// in-process fallback: Rust-owned browsers must survive process exit so +/// later invocations (or a restarted daemon) reattach instead of relaunching. +fn apply_external_browser_env_defaults(state_dir: &Path) { + unsafe { + std::env::set_var("BROWSER_USE_TERMINAL_PERSIST_BROWSERS", "1"); + if std::env::var_os("BU_EXTERNAL_BROWSER_STATE_DIR").is_none() { + std::env::set_var( + "BU_EXTERNAL_BROWSER_STATE_DIR", + state_dir.join("external-browser"), + ); + } + if std::env::var_os("BU_BROWSER_SCREENSHOT_MAX_DIM").is_none() + && std::env::var_os("BROWSER_USE_SCREENSHOT_MAX_DIM").is_none() + { + std::env::set_var("BU_BROWSER_SCREENSHOT_MAX_DIM", EXTERNAL_SCREENSHOT_MAX_DIM); + } + } +} + +fn external_browser_daemon_enabled() -> bool { + if !cfg!(unix) { + return false; + } + match std::env::var("BUT_BROWSER_EXTERNAL_DAEMON") { + Ok(value) => !matches!( + value.trim().to_ascii_lowercase().as_str(), + "0" | "false" | "no" | "off" + ), + Err(_) => true, + } +} + +#[allow(clippy::too_many_arguments)] +fn browser_cli_in_process( + shared: &SharedStore, + session_id: &str, + cwd: &Path, + artifact_dir: &Path, + timeout: u64, + json: bool, + action: ExternalBrowserAction, +) -> Result<()> { + match action { + ExternalBrowserAction::Exec { code } => { + let outcome = browser_use_agent::tools::handlers::browser::run_external_browser_script( + shared, + session_id, + cwd, + artifact_dir, + &code, + timeout, + )?; + match outcome { + browser_use_agent::tools::handlers::browser::ExternalBrowserScriptOutcome::Blocked( + content, + ) => print_external_blocked(&content), + browser_use_agent::tools::handlers::browser::ExternalBrowserScriptOutcome::Ran( + out, + ) => print_external_exec_result(&out, json), + } + } + ExternalBrowserAction::Command { command } => { + let out = browser_use_agent::tools::handlers::browser::run_external_browser_command( + shared, + session_id, + cwd, + artifact_dir, + &command, + )?; + print_external_command_content(&out.content); + Ok(()) + } + } +} + +fn print_external_blocked(content: &Value) -> Result<()> { + println!("{}", serde_json::to_string_pretty(content)?); + bail!("browser needs user action before scripts can run (see JSON above)") +} + +fn print_external_exec_result( + out: &browser_use_browser::BrowserScriptOutput, + json: bool, +) -> Result<()> { + if json { + println!( + "{}", + serde_json::to_string_pretty(&serde_json::to_value(out)?)? + ); + } else { + print_external_script_output(out); + } + if !out.ok { + bail!( + "{}", + out.error + .clone() + .unwrap_or_else(|| "browser_script failed".to_string()) + ); + } + Ok(()) +} + +fn print_external_command_content(content: &Value) { + match content { + Value::String(text) => println!("{text}"), + other => println!( + "{}", + serde_json::to_string_pretty(other).unwrap_or_else(|_| other.to_string()) + ), + } +} + +// ============================================================================ +// External browser daemon +// ============================================================================ +// +// The `browser` subcommand is invoked once per bash call by external +// assistants, but a CDP attachment to the user's real Chrome must outlive any +// single invocation: Chrome shows its debugging-permission popup per new +// connection, and the in-process connection caches only help within one +// process. So the CLI is a thin client over a per-state-dir unix-socket daemon +// (browser-harness architecture) that hosts the browser session registries — +// and therefore the live CDP websocket — across invocations. + +#[derive(Debug, Serialize, serde::Deserialize)] +#[serde(tag = "kind", rename_all = "snake_case")] +enum ExternalBrowserDaemonRequest { + Ping, + Shutdown, + Command { + session_id: String, + cwd: PathBuf, + artifact_dir: PathBuf, + command: String, + }, + Exec { + session_id: String, + cwd: PathBuf, + artifact_dir: PathBuf, + code: String, + timeout_secs: u64, + }, +} + +#[derive(Debug, Default, Serialize, serde::Deserialize)] +struct ExternalBrowserDaemonResponse { + ok: bool, + #[serde(default, skip_serializing_if = "Option::is_none")] + version: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + error: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + content: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + script: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + blocked: Option, +} + +fn fnv1a_hash(bytes: &[u8]) -> u64 { + let mut hash: u64 = 0xcbf2_9ce4_8422_2325; + for byte in bytes { + hash ^= u64::from(*byte); + hash = hash.wrapping_mul(0x0000_0100_0000_01b3); + } + hash +} + +/// Socket lives in the temp dir (AF_UNIX paths are length-limited on macOS), +/// keyed by the canonical state dir so isolated `--state-dir` runs get their +/// own daemon. The log lives inside the state dir. +fn external_daemon_socket_path(state_dir: &Path) -> PathBuf { + let canonical = fs::canonicalize(state_dir).unwrap_or_else(|_| state_dir.to_path_buf()); + let hash = fnv1a_hash(canonical.to_string_lossy().as_bytes()); + std::env::temp_dir().join(format!("but-browser-{hash:016x}.sock")) +} + +fn external_daemon_log_path(state_dir: &Path) -> PathBuf { + state_dir.join("external-browser").join("daemon.log") +} + +#[cfg(unix)] +fn external_daemon_send( + socket: &Path, + request: &ExternalBrowserDaemonRequest, + read_timeout: Duration, +) -> Result { + use std::os::unix::net::UnixStream; + let stream = UnixStream::connect(socket) + .with_context(|| format!("connect browser daemon socket {}", socket.display()))?; + stream.set_write_timeout(Some(Duration::from_secs(10)))?; + stream.set_read_timeout(Some(read_timeout))?; + let mut payload = serde_json::to_vec(request)?; + payload.push(b'\n'); + (&stream).write_all(&payload)?; + let mut line = String::new(); + io::BufReader::new(&stream) + .read_line(&mut line) + .context("read browser daemon response")?; + serde_json::from_str(&line).context("parse browser daemon response") +} + +#[cfg(not(unix))] +fn external_daemon_send( + _socket: &Path, + _request: &ExternalBrowserDaemonRequest, + _read_timeout: Duration, +) -> Result { + bail!("the external browser daemon is only supported on unix") +} + +/// Ping the daemon; spawn it if missing; restart it on version mismatch so a +/// CLI update never talks to a stale daemon. Returns the socket path. +fn ensure_external_browser_daemon(state_dir: &Path) -> Result { + let socket = external_daemon_socket_path(state_dir); + let version = env!("CARGO_PKG_VERSION"); + if let Ok(response) = + external_daemon_send(&socket, &ExternalBrowserDaemonRequest::Ping, PING_TIMEOUT) + { + if response.ok && response.version.as_deref() == Some(version) { + return Ok(socket); + } + let _ = external_daemon_send( + &socket, + &ExternalBrowserDaemonRequest::Shutdown, + PING_TIMEOUT, + ); + let gone_deadline = Instant::now() + Duration::from_secs(5); + while Instant::now() < gone_deadline { + if external_daemon_send(&socket, &ExternalBrowserDaemonRequest::Ping, PING_TIMEOUT) + .is_err() + { + break; + } + thread::sleep(Duration::from_millis(100)); + } + } + let _ = fs::remove_file(&socket); + + let log_path = external_daemon_log_path(state_dir); + if let Some(parent) = log_path.parent() { + fs::create_dir_all(parent) + .with_context(|| format!("create daemon log dir {}", parent.display()))?; + } + let log = fs::OpenOptions::new() + .create(true) + .append(true) + .open(&log_path) + .with_context(|| format!("open daemon log {}", log_path.display()))?; + let exe = std::env::current_exe().context("resolve current executable")?; + let mut command = std::process::Command::new(exe); + command + .arg("--state-dir") + .arg(state_dir) + .arg("browser-daemon") + .stdin(std::process::Stdio::null()) + .stdout(log.try_clone().context("clone daemon log handle")?) + .stderr(log); + #[cfg(unix)] + { + // New process group: the daemon must survive the assistant's shell + // (and any group-wide interrupt) ending this CLI invocation. + std::os::unix::process::CommandExt::process_group(&mut command, 0); + } + command.spawn().context("spawn browser daemon")?; + + let ready_deadline = Instant::now() + Duration::from_secs(10); + while Instant::now() < ready_deadline { + if let Ok(response) = + external_daemon_send(&socket, &ExternalBrowserDaemonRequest::Ping, PING_TIMEOUT) + { + if response.ok { + return Ok(socket); + } + } + thread::sleep(Duration::from_millis(100)); + } + bail!( + "browser daemon did not become ready; see {}", + log_path.display() + ) +} + +const PING_TIMEOUT: Duration = Duration::from_secs(3); +/// Generous request timeout: `browser connect local` can legitimately wait on +/// the user clicking Allow in Chrome's permission popup. +const EXTERNAL_DAEMON_COMMAND_TIMEOUT: Duration = Duration::from_secs(900); + +#[allow(clippy::too_many_arguments)] +fn browser_cli_via_daemon( + socket: &Path, + session_id: &str, + cwd: &Path, + artifact_dir: &Path, + timeout: u64, + json: bool, + action: ExternalBrowserAction, +) -> Result<()> { + match action { + ExternalBrowserAction::Exec { code } => { + let response = external_daemon_send( + socket, + &ExternalBrowserDaemonRequest::Exec { + session_id: session_id.to_string(), + cwd: cwd.to_path_buf(), + artifact_dir: artifact_dir.to_path_buf(), + code, + timeout_secs: timeout, + }, + Duration::from_secs(timeout).saturating_add(EXTERNAL_DAEMON_COMMAND_TIMEOUT), + )?; + if let Some(blocked) = response.blocked { + return print_external_blocked(&blocked); + } + if !response.ok { + bail!( + "{}", + response + .error + .unwrap_or_else(|| "browser daemon request failed".to_string()) + ); + } + let script = response + .script + .ok_or_else(|| anyhow::anyhow!("browser daemon returned no script output"))?; + print_external_exec_result(&script, json) + } + ExternalBrowserAction::Command { command } => { + let response = external_daemon_send( + socket, + &ExternalBrowserDaemonRequest::Command { + session_id: session_id.to_string(), + cwd: cwd.to_path_buf(), + artifact_dir: artifact_dir.to_path_buf(), + command, + }, + EXTERNAL_DAEMON_COMMAND_TIMEOUT, + )?; + if !response.ok { + bail!( + "{}", + response + .error + .unwrap_or_else(|| "browser daemon request failed".to_string()) + ); + } + let content = response.content.unwrap_or(Value::Null); + print_external_command_content(&content); + Ok(()) + } + } +} + +/// `browser daemon status|stop|logs` — operate the daemon itself. +fn browser_external_daemon_control(state_dir: &Path, argv: &[String]) -> Result<()> { + let socket = external_daemon_socket_path(state_dir); + match argv.get(1).map(String::as_str) { + Some("status") | None => { + match external_daemon_send(&socket, &ExternalBrowserDaemonRequest::Ping, PING_TIMEOUT) { + Ok(response) if response.ok => println!( + "running (version {}, socket {})", + response.version.as_deref().unwrap_or("unknown"), + socket.display() + ), + _ => println!("not running (socket {})", socket.display()), + } + Ok(()) + } + Some("stop") => { + match external_daemon_send( + &socket, + &ExternalBrowserDaemonRequest::Shutdown, + PING_TIMEOUT, + ) { + Ok(_) => println!("stopped"), + Err(_) => println!("not running"), + } + Ok(()) + } + Some("logs") => { + let log_path = external_daemon_log_path(state_dir); + let contents = fs::read_to_string(&log_path) + .with_context(|| format!("read {}", log_path.display()))?; + let lines: Vec<&str> = contents.lines().collect(); + let start = lines.len().saturating_sub(100); + for line in &lines[start..] { + println!("{line}"); + } + Ok(()) + } + Some(other) => bail!("unknown browser daemon command: {other} (use status|stop|logs)"), + } +} + +/// Daemon main loop: serve line-delimited JSON requests over the unix socket, +/// executing them against in-process browser registries so the CDP connection +/// (and Chrome's granted debugging permission) persists across CLI calls. +#[cfg(unix)] +fn browser_external_daemon(store: Store) -> Result<()> { + use std::os::unix::fs::PermissionsExt; + use std::os::unix::net::UnixListener; + + let state_dir = store.state_dir().to_path_buf(); + apply_external_browser_env_defaults(&state_dir); + let socket = external_daemon_socket_path(&state_dir); + let _ = fs::remove_file(&socket); + let listener = UnixListener::bind(&socket) + .with_context(|| format!("bind browser daemon socket {}", socket.display()))?; + fs::set_permissions(&socket, fs::Permissions::from_mode(0o600)) + .context("restrict browser daemon socket permissions")?; + eprintln!( + "[browser-daemon] started: version={} pid={} state_dir={} socket={}", + env!("CARGO_PKG_VERSION"), + std::process::id(), + state_dir.display(), + socket.display() + ); + let shared: SharedStore = Arc::new(Mutex::new(store)); + for stream in listener.incoming() { + let stream = match stream { + Ok(stream) => stream, + Err(error) => { + eprintln!("[browser-daemon] accept failed: {error:#}"); + continue; + } + }; + match handle_external_daemon_stream(stream, &shared) { + Ok(true) => {} + Ok(false) => break, + Err(error) => eprintln!("[browser-daemon] request failed: {error:#}"), + } + } + let _ = fs::remove_file(&socket); + eprintln!("[browser-daemon] stopped"); + Ok(()) +} + +#[cfg(not(unix))] +fn browser_external_daemon(_store: Store) -> Result<()> { + bail!("the external browser daemon is only supported on unix") +} + +#[cfg(unix)] +fn handle_external_daemon_stream( + stream: std::os::unix::net::UnixStream, + shared: &SharedStore, +) -> Result { + stream.set_read_timeout(Some(Duration::from_secs(30)))?; + let mut line = String::new(); + io::BufReader::new(&stream).read_line(&mut line)?; + let request: ExternalBrowserDaemonRequest = + serde_json::from_str(&line).context("parse browser daemon request")?; + let version = Some(env!("CARGO_PKG_VERSION").to_string()); + let (response, keep_running) = match request { + ExternalBrowserDaemonRequest::Ping => ( + ExternalBrowserDaemonResponse { + ok: true, + version, + ..Default::default() + }, + true, + ), + ExternalBrowserDaemonRequest::Shutdown => ( + ExternalBrowserDaemonResponse { + ok: true, + version, + ..Default::default() + }, + false, + ), + ExternalBrowserDaemonRequest::Command { + session_id, + cwd, + artifact_dir, + command, + } => { + let response = + match browser_use_agent::tools::handlers::browser::run_external_browser_command( + shared, + &session_id, + &cwd, + &artifact_dir, + &command, + ) { + Ok(out) => ExternalBrowserDaemonResponse { + ok: true, + version, + content: Some(out.content), + ..Default::default() + }, + Err(error) => ExternalBrowserDaemonResponse { + ok: false, + version, + error: Some(format!("{error:#}")), + ..Default::default() + }, + }; + (response, true) + } + ExternalBrowserDaemonRequest::Exec { + session_id, + cwd, + artifact_dir, + code, + timeout_secs, + } => { + let response = match browser_use_agent::tools::handlers::browser::run_external_browser_script( + shared, + &session_id, + &cwd, + &artifact_dir, + &code, + timeout_secs, + ) { + Ok( + browser_use_agent::tools::handlers::browser::ExternalBrowserScriptOutcome::Ran( + out, + ), + ) => ExternalBrowserDaemonResponse { + ok: true, + version, + script: Some(out), + ..Default::default() + }, + Ok( + browser_use_agent::tools::handlers::browser::ExternalBrowserScriptOutcome::Blocked( + content, + ), + ) => ExternalBrowserDaemonResponse { + ok: true, + version, + blocked: Some(content), + ..Default::default() + }, + Err(error) => ExternalBrowserDaemonResponse { + ok: false, + version, + error: Some(format!("{error:#}")), + ..Default::default() + }, + }; + (response, true) + } + }; + stream.set_write_timeout(Some(Duration::from_secs(60)))?; + let mut payload = serde_json::to_vec(&response)?; + payload.push(b'\n'); + (&stream).write_all(&payload)?; + Ok(keep_running) +} + +fn print_external_script_output(out: &browser_use_browser::BrowserScriptOutput) { + let text = out.text.trim_end(); + if !text.is_empty() { + println!("{text}"); + } + let mut printed_paths = HashSet::new(); + for image in &out.images { + if let Some(path) = image.get("path").and_then(Value::as_str) { + if printed_paths.insert(path.to_string()) { + println!("Screenshot saved to {path}"); + } + } + } + for artifact in &out.artifacts { + if let Some(path) = artifact.get("path").and_then(Value::as_str) { + if printed_paths.insert(path.to_string()) { + println!("Artifact saved to {path}"); + } + } + } +} + +fn sanitize_external_session_name(name: &str) -> Result { + let trimmed = name.trim(); + if trimmed.is_empty() + || trimmed.len() > 64 + || !trimmed + .chars() + .all(|c| c.is_ascii_alphanumeric() || matches!(c, '-' | '_' | '.')) + { + bail!("invalid --session name {name:?}: use 1-64 ASCII letters, digits, '-', '_' or '.'"); + } + Ok(trimmed.to_string()) +} + +/// Re-quote command words for the browser control plane's shell-words parser +/// so arguments containing spaces (e.g. profile ids like +/// `google-chrome:Profile 2`) survive the join. +fn join_browser_command_words(words: &[String]) -> String { + words + .iter() + .map(|word| { + if !word.is_empty() + && !word + .chars() + .any(|c| c.is_whitespace() || matches!(c, '"' | '\'' | '\\')) + { + word.clone() + } else { + format!("'{}'", word.replace('\\', "\\\\").replace('\'', "\\'")) + } + }) + .collect::>() + .join(" ") +} + +fn skill(command: SkillCommand) -> Result<()> { + match command { + SkillCommand::Show => { + print!("{SKILL_MD}"); + Ok(()) + } + SkillCommand::Paths => { + let home = user_home_dir()?; + for assistant in [ + SkillAssistant::Claude, + SkillAssistant::Codex, + SkillAssistant::Opencode, + SkillAssistant::Agents, + ] { + println!( + "{}: {}", + skill_assistant_label(assistant), + skill_install_dir(&home, assistant) + .join("SKILL.md") + .display() + ); + } + Ok(()) + } + SkillCommand::Install { assistant } => skill_install(assistant), + } +} + +fn skill_install(assistant: Option) -> Result<()> { + let home = user_home_dir()?; + let targets = match assistant { + Some(assistant) => vec![assistant], + None => { + let detected = detect_skill_assistants(&home); + if detected.is_empty() { + bail!( + "no assistant homes found (~/.claude, ~/.codex, ~/.config/opencode, ~/.agents). \ + Pass one explicitly: `browser-use-terminal skill install `" + ); + } + detected + } + }; + for target in targets { + let dir = skill_install_dir(&home, target); + fs::create_dir_all(&dir).with_context(|| format!("create skill dir {}", dir.display()))?; + let path = dir.join("SKILL.md"); + fs::write(&path, SKILL_MD).with_context(|| format!("write {}", path.display()))?; + println!( + "Installed {} skill: {}", + skill_assistant_label(target), + path.display() + ); + } + println!( + "\nNew sessions of those assistants will discover the skill automatically. \ + Browser setup (one-time): open chrome://inspect/#remote-debugging in Chrome and tick \ + \"Allow remote debugging\", or let the CLI use a managed/cloud browser \ + (`browser-use-terminal browser preference use managed-headless|cloud`)." + ); + Ok(()) +} + +fn skill_assistant_label(assistant: SkillAssistant) -> &'static str { + match assistant { + SkillAssistant::Claude => "Claude Code", + SkillAssistant::Codex => "Codex", + SkillAssistant::Opencode => "OpenCode", + SkillAssistant::Agents => "agents dir", + } +} + +fn skill_install_dir(home: &Path, assistant: SkillAssistant) -> PathBuf { + let base = match assistant { + SkillAssistant::Claude => home.join(".claude"), + SkillAssistant::Codex => std::env::var_os("CODEX_HOME") + .filter(|value| !value.is_empty()) + .map(PathBuf::from) + .unwrap_or_else(|| home.join(".codex")), + SkillAssistant::Opencode => home.join(".config").join("opencode"), + SkillAssistant::Agents => home.join(".agents"), + }; + base.join("skills").join(SKILL_DIR_NAME) +} + +fn detect_skill_assistants(home: &Path) -> Vec { + // OpenCode also discovers Claude-compatible skill paths (~/.claude/skills), + // so a Claude install covers OpenCode users unless they only have + // ~/.config/opencode. + [ + SkillAssistant::Claude, + SkillAssistant::Codex, + SkillAssistant::Opencode, + SkillAssistant::Agents, + ] + .into_iter() + .filter(|assistant| { + skill_install_dir(home, *assistant) + .parent() + .and_then(Path::parent) + .is_some_and(Path::is_dir) + }) + .collect() +} + +fn user_home_dir() -> Result { + std::env::var_os("HOME") + .filter(|value| !value.is_empty()) + .or_else(|| std::env::var_os("USERPROFILE").filter(|value| !value.is_empty())) + .map(PathBuf::from) + .context("could not resolve the home directory (HOME/USERPROFILE unset)") +} + fn remote_cdp_connect_command_from_env() -> Option { std::env::var("BU_CDP_WS") .ok() @@ -9476,6 +10556,17 @@ fn notify_parent_agent_done(store: &Store, task: &browser_use_protocol::SessionM mod tests { use super::*; + #[test] + fn external_daemon_socket_path_is_stable_and_short() { + let a = external_daemon_socket_path(Path::new("/tmp/state-a")); + let b = external_daemon_socket_path(Path::new("/tmp/state-a")); + let c = external_daemon_socket_path(Path::new("/tmp/state-b")); + assert_eq!(a, b, "same state dir must map to the same socket"); + assert_ne!(a, c, "different state dirs must map to different sockets"); + // AF_UNIX socket paths are limited to ~104 bytes on macOS. + assert!(a.as_os_str().len() < 100, "socket path too long: {a:?}"); + } + #[test] fn cli_config_overrides_parse_toml_and_raw_strings_like_codex() -> Result<()> { let parsed = parse_cli_config_overrides(&[ diff --git a/docs/assistant-plugins.md b/docs/assistant-plugins.md new file mode 100644 index 00000000..6413938b --- /dev/null +++ b/docs/assistant-plugins.md @@ -0,0 +1,95 @@ +# Using Browser Use Terminal from coding assistants + +Browser Use Terminal plugs into any coding assistant or agent that can run shell commands — Claude Code, Codex, OpenCode, OpenClaw, Cursor CLI, and friends. The model is similar to [browser-use/browser-harness](https://github.com/browser-use/browser-harness): a skill file teaches the assistant the CLI, and the CLI gives it the full browser runtime (connect/recovery control plane, Python page helpers, screenshots-as-files). + +Fastest setup: paste `https://browser-use.com/skill` into your assistant, and it provides instructions on how to install, register the skill, connect a browser, and verify. Full docs: . + +The assistant is the agent. The skill teaches it to drive the browser directly: + +```bash +browser-use-terminal browser exec <<'PY' +new_tab("https://example.com") +wait_for_load() +print(capture_screenshot()) +PY +``` + +(The built-in agent — the TUI and `browser-use-terminal run` — remains a human-facing surface and is not part of the skill.) + +## Install + +1. Install Browser Use Terminal so `browser-use-terminal` is on `$PATH`: + + ```bash + curl -fsSL https://browser-use.com/terminal/install.sh | sh + ``` + +2. Register the skill. With no argument this installs for every assistant whose home directory exists (`~/.claude`, `~/.codex`, `~/.config/opencode`, `~/.agents`): + + ```bash + browser-use-terminal skill install + # or one of: + browser-use-terminal skill install claude + browser-use-terminal skill install codex + browser-use-terminal skill install opencode + browser-use-terminal skill install agents + ``` + + `browser-use-terminal skill paths` prints where each assistant's copy lands; `skill show` prints the markdown. Re-run `skill install` after updating the CLI to refresh the copies. + + Notes: + - OpenCode also discovers Claude-compatible skill paths (`~/.claude/skills/...`), so the `claude` install covers both. + - Gemini CLI has no skills directory; paste the output of `browser-use-terminal skill show` into `~/.gemini/GEMINI.md` (or a project `GEMINI.md`) instead. + +3. Pick a browser (one-time). The CLI auto-connects per the remembered preference on every call: + + ```bash + browser-use-terminal browser preference use managed-headless # zero-setup disposable Chromium (default-quality choice) + browser-use-terminal browser preference use local # your real, logged-in Chrome + browser-use-terminal browser preference use cloud # Browser Use cloud (needs BROWSER_USE_API_KEY) + ``` + + For `local`, Chrome needs its one-time remote-debugging opt-in: open `chrome://inspect/#remote-debugging` and tick "Allow remote debugging for this browser instance" (`browser-use-terminal browser local setup` walks the user through it). + +## How it stays stateful across bash calls + +Each CLI invocation is a thin client over a long-lived daemon. The first `browser` command auto-spawns one daemon per state dir. It hosts the browser session registries — and therefore the live CDP websocket — across invocations. This is what keeps Chrome's per-connection debugging-permission popup to a single Allow in local mode, exactly like the TUI. The CLI and daemon speak line-delimited JSON over a unix socket (`$TMPDIR/but-browser-.sock`, mode 0600); a version handshake restarts stale daemons after CLI updates, and if the daemon can't start the CLI falls back to in-process execution with a warning. Operate it with `browser-use-terminal browser daemon status|stop|logs`. + +The browsers themselves also survive daemon restarts: + +- **local** — your Chrome keeps running; the daemon holds one authorized CDP connection to it. +- **managed** — Chromium launches with a stable per-session profile and a marker file (`/external-browser/managed//`), outlives the daemon, and is reattached. Stop it with `browser-use-terminal browser recover stop-owned-browser`. +- **cloud** — the created browser's id/CDP URL is recorded (`/external-browser/cloud/.json`) and reattached until it stops or times out. Stop it (and billing) with `browser-use-terminal browser recover stop-owned-remote`. + +Page/tab state therefore persists between `browser exec` calls; Python variables do not (each exec is a fresh interpreter). `--session ` gives parallel workstreams isolated artifact dirs, event logs, and managed browsers (requests share one daemon and run serially). + +Everything an assistant does is recorded in the same SQLite event log the TUI uses — inspect with `browser-use-terminal events browser-cli-` or `browser-use-terminal sessions list`. + +## How assistants see screenshots + +Bash output is text-only, so images travel as files: `capture_screenshot()` saves a PNG (downscaled to ≤1800 px per side for this surface) and the CLI prints `Screenshot saved to `. The skill then tells each assistant to use its native image-reading tool: + +| Assistant | Tool | Notes | +|---|---|---| +| Claude Code | `Read` on the path | reads PNG/JPG natively | +| Codex CLI | `view_image` with `{"path": ...}` | resizes >2048 px itself; enabled by default | +| OpenCode | `read` on the path | image support since Oct 2025; needs a vision model | +| Gemini CLI | `read_file` on the path | returns inline image data | +| Cursor CLI | reference the path | the agent reads image files automatically | + +Non-vision models skip screenshots and work from text state (`page_info()`, `js(...)`, `wait_for_element(...)`) — the skill spells out this fallback. + +## Credentials, navigation policy, and safety + +The external CLI surface enforces the same policies as the in-app agent on every call: + +- Secrets/TOTP stored via `browser-use-terminal secrets ...` are available to scripts only as placeholders (`type_text("name")`); raw values never reach the assistant. +- The domain allow/deny policy (`browser-use-terminal domains ...`) guards `Page.navigate` at the Rust layer. +- Blocked states surface as `needs-user-action` JSON with a `user_prompt` the assistant is instructed to relay verbatim instead of guessing. + +## Files + +- `SKILL.md` (repo root) — the assistant-facing skill, embedded into the binary at build time and written out by `skill install`. +- `crates/browser-use-cli` — `browser` / `skill` subcommands. +- `crates/browser-use-agent/src/tools/handlers/browser.rs` — `run_external_browser_command` / `run_external_browser_script`, the blocking entry points that reuse the in-session preference resolution, auto-connect, security policy, and event persistence. +- `crates/browser-use-browser` — persistent managed/cloud browser reattach (`BROWSER_USE_TERMINAL_PERSIST_BROWSERS`, `BU_EXTERNAL_BROWSER_STATE_DIR`). diff --git a/prompts/browser-tool-description.md b/prompts/browser-tool-description.md index 3d24b0d4..0222d490 100644 --- a/prompts/browser-tool-description.md +++ b/prompts/browser-tool-description.md @@ -93,6 +93,7 @@ Recovery: - `browser recover reattach-same-target`: attaches a fresh CDP session to the same target id. If the target is gone, it reports available targets and does not silently switch. - `browser recover restart-runtime`: resets the Rust connection holder and reconnects to the same endpoint. It does not kill Chrome. - `browser recover restart-owned-browser`: restarts only Rust-owned managed browsers. +- `browser recover stop-owned-browser`: stops only Rust-owned managed browsers (including persistent managed browsers reattached across CLI invocations). - `browser recover stop-owned-remote`: stops only Rust-owned Browser Use cloud browsers. Commands: @@ -134,6 +135,7 @@ browser recover reconnect-websocket browser recover reattach-same-target browser recover restart-runtime browser recover restart-owned-browser +browser recover stop-owned-browser browser recover stop-owned-remote browser script runs --json