From b751173239bd27dd2d7fdbf015897f988422e414 Mon Sep 17 00:00:00 2001 From: Laith Weinberger Date: Wed, 10 Jun 2026 10:51:20 -0700 Subject: [PATCH 1/6] plug terminal into any coding assistant or agent (claude code, codex, opencode, etc) - new `browser` subcommand: heredoc python exec + control-plane passthrough, session-optional (auto-creates browser-cli-), prints screenshot paths - new `skill` subcommand: show/paths/install embedded SKILL.md into ~/.claude, ~/.codex, ~/.config/opencode, ~/.agents skill dirs - SKILL.md at repo root, modeled on browser-harness (helpers, screenshot viewing per assistant, gotchas) - public run_external_browser_command/script in agent crate; extracted shared preference/preflight fns so external CLI and in-session tool share heuristics - persistent managed/cloud browsers across one-shot invocations: marker-file reattach, BROWSER_USE_TERMINAL_PERSIST_BROWSERS + BU_EXTERNAL_BROWSER_STATE_DIR - new `browser recover stop-owned-browser` (+ help text), silence kill noise - screenshots capped at 1800px on external surface (codex view_image limit) - docs/assistant-plugins.md + README section, unit tests for new helpers --- README.md | 21 + SKILL.md | 118 +++++ .../src/tools/handlers/browser.rs | 488 +++++++++++++----- crates/browser-use-browser/src/lib.rs | 291 ++++++++++- crates/browser-use-cli/src/main.rs | 363 +++++++++++++ docs/assistant-plugins.md | 94 ++++ prompts/browser-tool-description.md | 2 + 7 files changed, 1230 insertions(+), 147 deletions(-) create mode 100644 SKILL.md create mode 100644 docs/assistant-plugins.md diff --git a/README.md b/README.md index d3c6cea9..2407a235 100644 --- a/README.md +++ b/README.md @@ -87,6 +87,26 @@ browser config show browser diagnostics ``` +## Use It From Claude Code, Codex, or OpenCode + +Browser Use Terminal plugs into any coding assistant that can run shell commands, browser-harness style: a skill teaches the assistant the CLI, and the CLI hands it the whole browser runtime. + +```bash +browser-use-terminal skill install # registers the skill for detected assistants +``` + +Then ask your assistant to browse: + +```bash +browser-use-terminal browser exec <<'PY' +new_tab("https://example.com") +wait_for_load() +print(capture_screenshot()) +PY +``` + +Screenshots are saved as files and the path is printed, so assistants view them with their native file-reading tools (Claude Code `Read`, Codex `view_image`, OpenCode `read`). The browser persists between calls. See `docs/assistant-plugins.md`. + ## Development ```bash @@ -118,6 +138,7 @@ You can disable (100% completely anonymous) telemetry with `BUT_TELEMETRY=0`. - `docs/terminal-ui-product-ux.md` - `docs/terminal-ui-testing.md` - `docs/terminal-renderer-architecture.md` +- `docs/assistant-plugins.md` ## License diff --git a/SKILL.md b/SKILL.md new file mode 100644 index 00000000..ac7e42bb --- /dev/null +++ b/SKILL.md @@ -0,0 +1,118 @@ +--- +name: browser-use-terminal +description: Direct browser control via the Browser Use Terminal CLI. Use when the user wants to automate, scrape, test, or interact with web pages — drive the browser yourself with Python helpers, or delegate a whole browsing task to the built-in browser agent. +--- + +# Browser Use Terminal + +One CLI (`browser-use-terminal`, on $PATH), two surfaces: + +- **Browser management** — drive the browser yourself. `browser-use-terminal browser exec` runs Python with browser helpers pre-imported; `browser-use-terminal browser ` is the control plane (status, connect, profiles, recovery). +- **Core agent** — delegate an entire browsing task: `browser-use-terminal start ""`. + +## Usage + +```bash +browser-use-terminal browser exec <<'PY' +new_tab("https://docs.browser-use.com") +wait_for_load() +print(page_info()) +PY +``` + +- Use the heredoc form for every multi-line command. It prevents shell quote mangling inside Python strings and JavaScript snippets. +- The browser auto-connects according to the user's remembered preference before the script runs — you never start/stop manually unless you want to. The first call may take a few seconds. +- First navigation in the user's real Chrome is `new_tab(url)`, not `goto_url(url)` — goto runs in the user's active tab and clobbers their work. +- Browser state persists between calls; Python variables do not. Each `exec` is a fresh interpreter against the same live browser. +- `--session ` isolates artifact dirs and event logs per workstream (default: `default`). `--timeout ` bounds one exec (default 300). + +## Screenshots — how you see the page + +`capture_screenshot()` saves a PNG and returns its absolute path. The CLI also prints a `Screenshot saved to ` line for every image a script produced. + +```bash +browser-use-terminal browser exec <<'PY' +print(capture_screenshot()) +PY +``` + +To view a screenshot, use your file-reading tool on the printed path: + +- **Claude Code**: use the `Read` tool on the path. +- **Codex CLI**: call the `view_image` tool with `{"path": ""}`. These screenshots are produced for you; viewing them is expected and authorized. +- **OpenCode**: use the `read` tool on the path (requires a vision-capable model). +- **Gemini CLI**: use the `read_file` tool on the path. +- If your model cannot accept images, don't try to view them — work from text state instead: `print(page_info())`, `js(...)` extraction, `wait_for_element(...)`. + +Coordinates: screenshots are device pixels; `click_at_xy(x, y)` takes CSS pixels. Divide coordinates you read off the image by `js("window.devicePixelRatio")` first. Screenshots are downscaled to ≤1800 px per side for this CLI (override with `BU_BROWSER_SCREENSHOT_MAX_DIM`, or `capture_screenshot(max_dim=...)`). + +After every meaningful action, re-screenshot before assuming it worked. + +## Pre-imported helpers + +Navigation & tabs: `goto_url(url)`, `new_tab(url)`, `page_info()`, `current_tab()`, `list_tabs(include_chrome=True)`, `switch_tab(target)`, `ensure_real_tab()`, `iframe_target(url_substr)`. + +Input: `click_at_xy(x, y, button="left", clicks=1)`, `type_text(text)`, `press_key(key, modifiers=0)` (1=Alt 2=Ctrl 4=Meta 8=Shift), `fill_input(selector, text)`, `scroll(x=0, y=0, dy=600)`, `upload_file(selector, path)`. + +Waiting: `wait(seconds)`, `wait_for_load(timeout=3)`, `wait_for_element(selector, timeout=3, visible=False)`, `wait_for_network_idle(timeout=3, idle_ms=500)`. + +Visual: `capture_screenshot(label="...", full=False, max_dim=None)`, `screenshot()`, `screenshot_clip(label, x, y, w, h)`, `note(caption)`. + +Escape hatches: `js(expression)` (auto-wraps top-level `return`), `cdp("Domain.method", **params)` (raw CDP), `cdp_batch(calls)`, `drain_events()`. + +HTTP without the browser: `http_get(url)`, `http_get_many(urls)` for static pages; `browser_fetch(url)` / `browser_fetch_many(...)` to fetch with the page's cookies/session. + +Credentials (if the user stored any): `available_secrets()`, then `type_text("name")` or `fill_input(sel, secret("name"))`; `totp("name")` for 2FA codes. Values are placeholder-substituted — you never see them. `is_logged_out()`, `email_inbox()` / `email_message(id)` for email-code flows. + +Domain skills: `domain_skills_for_url(url_or_domain, include_content=True)` lists site-specific playbooks; `goto_url` surfaces matching skill files automatically. Read them before inventing selectors or flows on a complex site. + +## Browser control plane + +```bash +browser-use-terminal browser status --json +browser-use-terminal browser connect # uses the remembered preference +browser-use-terminal browser connect local # user's already-running Chrome (CDP) +browser-use-terminal browser connect managed --headless # disposable CLI-owned browser +browser-use-terminal browser preference use local|cloud|managed-headless +browser-use-terminal browser remote start # Browser Use cloud browser (needs BROWSER_USE_API_KEY) +browser-use-terminal browser doctor +browser-use-terminal browser recover reconnect-websocket +browser-use-terminal browser recover stop-owned-browser # stop the persistent managed browser +browser-use-terminal browser recover stop-owned-remote # stop the cloud browser (stops billing) +``` + +Managed and cloud browsers persist across invocations — later calls reattach instead of relaunching. Stop them with the recover commands above when the user is done (cloud browsers bill until stopped or timed out). + +- `exec` auto-connects, so you rarely need these. Reach for them when `status` shows a problem or the user asks for a specific browser. +- If output JSON says `status: "needs-user-action"` (e.g. pick a Chrome profile, click Allow in Chrome's permission popup, enable the remote-debugging checkbox), show the `user_prompt` to the user verbatim and wait — do not guess. +- Auth wall mid-task: stop and ask the user. Don't type credentials from screenshots; use stored secrets if available. +- Connecting to the user's real Chrome requires a one-time setup: `chrome://inspect/#remote-debugging` → tick "Allow remote debugging". `browser local setup` walks the user through it. + +## Delegating to the built-in agent + +For a self-contained browsing task (research, multi-step form filling, comparison shopping), the built-in agent is often faster than driving the browser yourself: + +```bash +browser-use-terminal start "Find the three cheapest direct LAX→SFO flights next Friday and list airline, time, price" +``` + +It runs with the user's configured model, prints progress events, and ends with a result. Inspect afterwards with `browser-use-terminal history`, `show `, `events `. + +## What actually works + +- Screenshots first: `capture_screenshot()` → view the image → decide whether you need a click, a selector, or more navigation. +- Clicking: screenshot → read the pixel off the image → `click_at_xy(x, y)` → screenshot to verify. Suppress the locate-then-click reflex — no getBoundingClientRect, no selector hunts. Hit-testing happens in Chrome's browser process, so coordinate clicks pass through iframes / shadow DOM / cross-origin without extra work. +- Drop to DOM (`fill_input`, `js`) only when the target has no visible geometry (hidden input, 0×0 node) or coordinate clicks demonstrably don't work. +- Bulk static pages: `http_get_many(urls)` — no browser needed. Logged-in pages: `browser_fetch(url)` rides the real session. +- After goto: `wait_for_load()`. SPAs report `complete` before they render — follow with `wait_for_element(...)`. +- Wrong/stale tab: `ensure_real_tab()`. +- Verification: `print(page_info())` is the cheapest "is this alive?" check; screenshots are the default way to verify visible actions. + +## Gotchas (field-tested) + +- CDP target order ≠ Chrome's visible tab-strip order. +- Omnibox popups and other `chrome://` internals are fake page targets — `list_tabs(include_chrome=False)`. +- `page_info()` surfaces an open JS dialog as `{"dialog": ...}` — handle it (`cdp("Page.handleJavaScriptDialog", accept=True)`) before anything else. +- Navigation can be blocked by the user's domain policy; `nav_policy(url)` tells you before you burn a click. A blocked navigation is policy, not a bug — tell the user. +- Scripts time out (default 300s): keep each `exec` small and observable rather than one mega-script. Long extraction loops: print progress as you go — stdout is captured even on timeout. +- Prefer compositor-level actions over framework hacks. If you do need framework-specific DOM tricks, run `browser-use-terminal browser domain skills --domain --json --include-content` first — that's where site playbooks live. diff --git a/crates/browser-use-agent/src/tools/handlers/browser.rs b/crates/browser-use-agent/src/tools/handlers/browser.rs index db632c03..c5f7b0ce 100644 --- a/crates/browser-use-agent/src/tools/handlers/browser.rs +++ b/crates/browser-use-agent/src/tools/handlers/browser.rs @@ -2884,105 +2884,17 @@ impl ToolRuntime for BrowserTool { BrowserAction::Command { command } => { let selected_browser_mode = selected_browser_mode.as_deref(); let out = if let Some(persistence) = &persistence { - let store = persistence.store.lock().map_err(|_| { - ToolError::Other(anyhow::anyhow!("store mutex poisoned")) - })?; - if let Some(content) = dispatch_browser_preference_command_for_mode( - &store, + run_browser_command_with_shared_store( + &persistence.store, backend.as_ref(), &session_id, &cwd, &artifact_dir, &command, selected_browser_mode, - ) - .map_err(|error| ToolError::Rejected(format!("{error:#}")))? - { - BrowserCommandOutput { - content, - events: Vec::new(), - } - } else { - let resolved = resolve_browser_command_for_selected_mode( - Some(&store), - &command, - selected_browser_mode, - selected_browser_profile_id.as_deref(), - ) - .map_err(|error| ToolError::Rejected(format!("{error:#}")))?; - let preferred_browser = selected_local_browser.clone().or_else(|| { - store - .get_setting(BROWSER_PREF_BROWSER) - .ok() - .flatten() - .filter(|browser| !browser.trim().is_empty()) - }); - let effective_mode = - effective_browser_mode(Some(&store), selected_browser_mode) - .map_err(|error| ToolError::Rejected(format!("{error:#}")))?; - let store_profile_id = if matches!(effective_mode, "local" | "cloud") { - stored_profile_for_mode(&store, effective_mode) - .map_err(|error| ToolError::Rejected(format!("{error:#}")))? - } else { - None - }; - let default_profile_id = - selected_browser_profile_id.clone().or(store_profile_id); - let default_profile_id = if matches!(effective_mode, "local" | "cloud") - { - default_profile_id - } else { - None - }; - let has_default_profile = default_profile_id.is_some(); - drop(store); - if let Some(preflight) = local_connect_default_profile_preflight( - has_default_profile, - preferred_browser.as_deref(), - backend.as_ref(), - &session_id, - &cwd, - &artifact_dir, - &resolved, - ) - .map_err(|error| ToolError::Rejected(format!("{error:#}")))? - { - preflight - } else { - open_default_profile_before_local_connect( - backend.as_ref(), - &session_id, - &cwd, - &artifact_dir, - &resolved, - default_profile_id.as_deref(), - ) - .map_err(ToolError::Other)?; - let output = backend - .command(&session_id, &cwd, &artifact_dir, &resolved) - .map_err(ToolError::Other)?; - let output = enrich_local_profiles_with_default_profile( - output, - &resolved, - default_profile_id.as_deref(), - ); - let output = enforce_local_connect_default_profile_context( - output, - &resolved, - default_profile_id.as_deref(), - ); - let output = enrich_local_connect_recovery_with_default_profile( - output, - &resolved, - default_profile_id.as_deref(), - ); - enrich_status_with_selected_browser_mode( - output, - &resolved, - Some(effective_mode), - ) - } - } + selected_browser_profile_id.as_deref(), + selected_local_browser.as_deref(), + )? } else { let resolved = resolve_browser_command_for_selected_mode( None, @@ -3015,62 +2927,21 @@ impl ToolRuntime for BrowserTool { } BrowserAction::Execute { script, .. } => { if let Some(persistence) = &persistence { - let store = persistence.store.lock().map_err(|_| { - ToolError::Other(anyhow::anyhow!("store mutex poisoned")) - })?; - let mode = - effective_browser_mode(Some(&store), selected_browser_mode.as_deref()) - .map_err(|error| ToolError::Rejected(format!("{error:#}")))?; - let default_profile_id = if mode == "local" { - selected_browser_profile_id.clone().or_else(|| { - stored_profile_for_mode(&store, "local") - .ok() - .flatten() - .filter(|profile| !profile.trim().is_empty()) - }) - } else { - None - }; - let preferred_browser = if mode == "local" { - selected_local_browser.clone().or_else(|| { - store - .get_setting(BROWSER_PREF_BROWSER) - .ok() - .flatten() - .filter(|browser| !browser.trim().is_empty()) - }) - } else { - None - }; - drop(store); - if mode == "local" && default_profile_id.is_none() { - if let Some(preflight) = local_connect_default_profile_preflight( - false, - preferred_browser.as_deref(), - backend.as_ref(), - &session_id, - &cwd, - &artifact_dir, - "browser connect local", - ) - .map_err(|error| ToolError::Rejected(format!("{error:#}")))? - { - return Ok(map_command_output(preflight)); - } - } - ensure_browser_ready_for_work( + if let Some(preflight) = prepare_browser_for_script_with_shared_store( + &persistence.store, backend.as_ref(), &session_id, &cwd, &artifact_dir, - mode, - default_profile_id.as_deref(), - ) - .map_err(ToolError::Other)?; + selected_browser_mode.as_deref(), + selected_browser_profile_id.as_deref(), + selected_local_browser.as_deref(), + )? { + return Ok(map_command_output(preflight)); + } } // Re-resolve the secrets + nav policy on every run (fail closed) - // so secret/domain changes take effect mid-session. Cheap now - // that values live in an encrypted file, not the OS keychain. + // so secret/domain changes take effect mid-session if let Some(persistence) = &persistence { let store = persistence.store.lock().map_err(|_| { ToolError::Other(anyhow::anyhow!("store mutex poisoned")) @@ -3141,6 +3012,337 @@ impl ToolRuntime for BrowserTool { } } +/// Run a `browser ` control-plane command with full store-backed +/// behavior: preference dispatch, mode/profile resolution, local-profile +/// preflight, and the default-profile enrichers. +/// +/// Shared by the in-session [`BrowserTool`] command path and the external +/// assistant CLI surface ([`run_external_browser_command`]) so both see +/// identical heuristics. +#[allow(clippy::too_many_arguments)] +fn run_browser_command_with_shared_store( + shared_store: &SharedStore, + backend: &dyn BrowserBackend, + session_id: &str, + cwd: &std::path::Path, + artifact_dir: &std::path::Path, + command: &str, + selected_browser_mode: Option<&str>, + selected_browser_profile_id: Option<&str>, + selected_local_browser: Option<&str>, +) -> Result { + let store = shared_store + .lock() + .map_err(|_| ToolError::Other(anyhow::anyhow!("store mutex poisoned")))?; + if let Some(content) = dispatch_browser_preference_command_for_mode( + &store, + backend, + session_id, + cwd, + artifact_dir, + command, + selected_browser_mode, + ) + .map_err(|error| ToolError::Rejected(format!("{error:#}")))? + { + return Ok(BrowserCommandOutput { + content, + events: Vec::new(), + }); + } + let resolved = resolve_browser_command_for_selected_mode( + Some(&store), + command, + selected_browser_mode, + selected_browser_profile_id, + ) + .map_err(|error| ToolError::Rejected(format!("{error:#}")))?; + let preferred_browser = selected_local_browser.map(str::to_string).or_else(|| { + store + .get_setting(BROWSER_PREF_BROWSER) + .ok() + .flatten() + .filter(|browser| !browser.trim().is_empty()) + }); + let effective_mode = effective_browser_mode(Some(&store), selected_browser_mode) + .map_err(|error| ToolError::Rejected(format!("{error:#}")))?; + let store_profile_id = if matches!(effective_mode, "local" | "cloud") { + stored_profile_for_mode(&store, effective_mode) + .map_err(|error| ToolError::Rejected(format!("{error:#}")))? + } else { + None + }; + let default_profile_id = selected_browser_profile_id + .map(str::to_string) + .or(store_profile_id); + let default_profile_id = if matches!(effective_mode, "local" | "cloud") { + default_profile_id + } else { + None + }; + let has_default_profile = default_profile_id.is_some(); + drop(store); + if let Some(preflight) = local_connect_default_profile_preflight( + has_default_profile, + preferred_browser.as_deref(), + backend, + session_id, + cwd, + artifact_dir, + &resolved, + ) + .map_err(|error| ToolError::Rejected(format!("{error:#}")))? + { + return Ok(preflight); + } + open_default_profile_before_local_connect( + backend, + session_id, + cwd, + artifact_dir, + &resolved, + default_profile_id.as_deref(), + ) + .map_err(ToolError::Other)?; + let output = backend + .command(session_id, cwd, artifact_dir, &resolved) + .map_err(ToolError::Other)?; + let output = enrich_local_profiles_with_default_profile( + output, + &resolved, + default_profile_id.as_deref(), + ); + let output = enforce_local_connect_default_profile_context( + output, + &resolved, + default_profile_id.as_deref(), + ); + let output = enrich_local_connect_recovery_with_default_profile( + output, + &resolved, + default_profile_id.as_deref(), + ); + Ok(enrich_status_with_selected_browser_mode( + output, + &resolved, + Some(effective_mode), + )) +} + +/// Make the browser ready for a `browser_script` run: resolve the effective +/// mode from the store, run the local default-profile preflight, and +/// auto-connect/auto-start the configured browser when needed. +/// +/// Returns `Some(preflight)` when browser work is blocked on a user decision +/// (e.g. no default local Chrome profile chosen yet); the caller must surface +/// that output instead of running the script. +/// +/// Shared by the in-session [`BrowserTool`] execute path and the external +/// assistant CLI surface ([`run_external_browser_script`]). +#[allow(clippy::too_many_arguments)] +fn prepare_browser_for_script_with_shared_store( + shared_store: &SharedStore, + backend: &dyn BrowserBackend, + session_id: &str, + cwd: &std::path::Path, + artifact_dir: &std::path::Path, + selected_browser_mode: Option<&str>, + selected_browser_profile_id: Option<&str>, + selected_local_browser: Option<&str>, +) -> Result, ToolError> { + let store = shared_store + .lock() + .map_err(|_| ToolError::Other(anyhow::anyhow!("store mutex poisoned")))?; + let mode = effective_browser_mode(Some(&store), selected_browser_mode) + .map_err(|error| ToolError::Rejected(format!("{error:#}")))?; + let default_profile_id = if mode == "local" { + selected_browser_profile_id.map(str::to_string).or_else(|| { + stored_profile_for_mode(&store, "local") + .ok() + .flatten() + .filter(|profile| !profile.trim().is_empty()) + }) + } else { + None + }; + let preferred_browser = if mode == "local" { + selected_local_browser.map(str::to_string).or_else(|| { + store + .get_setting(BROWSER_PREF_BROWSER) + .ok() + .flatten() + .filter(|browser| !browser.trim().is_empty()) + }) + } else { + None + }; + drop(store); + if mode == "local" && default_profile_id.is_none() { + if let Some(preflight) = local_connect_default_profile_preflight( + false, + preferred_browser.as_deref(), + backend, + session_id, + cwd, + artifact_dir, + "browser connect local", + ) + .map_err(|error| ToolError::Rejected(format!("{error:#}")))? + { + return Ok(Some(preflight)); + } + } + ensure_browser_ready_for_work( + backend, + session_id, + cwd, + artifact_dir, + mode, + default_profile_id.as_deref(), + ) + .map_err(ToolError::Other)?; + Ok(None) +} + +// ============================================================================ +// External assistant CLI surface +// ============================================================================ +// +// `browser-use-terminal browser ...` lets external coding assistants +// (Claude Code, Codex, OpenCode, ...) drive the browser through one-shot CLI +// invocations. These blocking entry points reuse the +// exact preference resolution, connect heuristics, security policy, and event +// persistence of the in-session `browser` / `browser_script` tools so an +// external assistant and the built-in agent see identical behavior. + +/// Outcome of an external `browser exec` script request. +#[derive(Debug)] +pub enum ExternalBrowserScriptOutcome { + /// The script ran. `output.ok` may still be false on script errors. + Ran(BrowserScriptOutput), + /// Browser work is blocked on a user decision before any script can run + /// (e.g. no default local Chrome profile is chosen yet). The payload is the + /// preflight JSON with `user_prompt` / `next_step` guidance. + Blocked(Value), +} + +fn external_tool_error(error: ToolError) -> anyhow::Error { + match error { + ToolError::Rejected(message) => anyhow!(message), + ToolError::Other(error) => error, + ToolError::Sandboxed(denial) => anyhow!( + "browser call denied by sandbox: {}", + denial.output.stderr.trim() + ), + } +} + +/// Configure a backend with the store-preferred browser mode so its +/// auto-connect behavior (the harness `ensure_daemon()` analog) targets the +/// right browser. +fn set_backend_browser_mode_from_store( + shared_store: &SharedStore, + backend: &dyn BrowserBackend, +) -> anyhow::Result<()> { + let store = shared_store + .lock() + .map_err(|_| anyhow!("store mutex poisoned"))?; + let mode = preferred_browser_mode(Some(&store))?; + drop(store); + backend.set_browser_mode(Some(mode.to_string())); + Ok(()) +} + +/// Run a `browser ` control-plane command for an external assistant CLI. +/// +/// Blocking. Resolves the preferred mode from the store (so plain +/// `browser connect` honors `browser preference use ...`), runs the same +/// preflights/enrichers as the in-session tool, and records browser events to +/// the session's durable event log. +pub fn run_external_browser_command( + shared_store: &SharedStore, + session_id: &str, + cwd: &std::path::Path, + artifact_dir: &std::path::Path, + command: &str, +) -> anyhow::Result { + let backend = RealBackend::default(); + set_backend_browser_mode_from_store(shared_store, &backend)?; + let out = run_browser_command_with_shared_store( + shared_store, + &backend, + session_id, + cwd, + artifact_dir, + command, + None, + None, + None, + ) + .map_err(external_tool_error)?; + if let Ok(store) = shared_store.lock() { + let _ = record_browser_command_response_events( + &store, + session_id, + "browser", + &format!("browser-cli-{session_id}"), + &out, + ); + } + Ok(out) +} + +/// Run a `browser_script` Python snippet for an external assistant CLI. +/// +/// Blocking: auto-connects the preferred browser when needed, installs the +/// secrets/navigation policy, runs the script to completion, and records the +/// response (text, images, artifacts, browser events) to the session's durable +/// event log. Screenshots taken by the script land in `artifact_dir` and their +/// paths are reported in the returned output's `images`. +pub fn run_external_browser_script( + shared_store: &SharedStore, + session_id: &str, + cwd: &std::path::Path, + artifact_dir: &std::path::Path, + code: &str, + timeout_secs: u64, +) -> anyhow::Result { + let backend = RealBackend::default(); + set_backend_browser_mode_from_store(shared_store, &backend)?; + if let Some(preflight) = prepare_browser_for_script_with_shared_store( + shared_store, + &backend, + session_id, + cwd, + artifact_dir, + None, + None, + None, + ) + .map_err(external_tool_error)? + { + return Ok(ExternalBrowserScriptOutcome::Blocked(preflight.content)); + } + { + let store = shared_store + .lock() + .map_err(|_| anyhow!("store mutex poisoned"))?; + super::secrets_admin::install_script_security(&store, session_id) + .map_err(|error| anyhow!("failed to apply browser security policy: {error:#}"))?; + } + let out = backend.run_script(session_id, cwd, artifact_dir, code, timeout_secs)?; + if let Ok(store) = shared_store.lock() { + let _ = record_browser_script_response_events_for_tool( + &store, + session_id, + "browser_script", + &format!("browser-cli-{session_id}"), + &out, + ); + } + Ok(ExternalBrowserScriptOutcome::Ran(out)) +} + #[cfg(test)] mod browser_mode_tests { use super::*; diff --git a/crates/browser-use-browser/src/lib.rs b/crates/browser-use-browser/src/lib.rs index 36a8fef5..d03ce89e 100644 --- a/crates/browser-use-browser/src/lib.rs +++ b/crates/browser-use-browser/src/lib.rs @@ -165,11 +165,18 @@ struct ManagedBrowser { _profile_dir: Option, launch: ManagedLaunch, marker_path: PathBuf, + /// Keep the browser process (and its marker) alive when this handle drops + /// so one-shot CLI invocations can reattach to it later. See + /// [`external_browser_persistence_enabled`]. + persist: bool, } impl Drop for ManagedBrowser { fn drop(&mut self) { unregister_managed_browser_pid(self.child.id()); + if self.persist { + return; + } let _ = self.child.kill(); let _ = self.child.wait(); let _ = fs::remove_file(&self.marker_path); @@ -229,6 +236,10 @@ struct BrowserSession { active_local_profile_id: Option, preferred_browser_context_id: Option, artifact_dir: Option, + /// Marker file of a persistent managed browser this session reattached to + /// (or launched) without owning the child process. Used so explicit stops + /// can terminate the browser even across one-shot CLI processes. + persistent_managed_marker: Option, logs: VecDeque, } @@ -264,6 +275,7 @@ impl Default for BrowserSession { active_local_profile_id: None, preferred_browser_context_id: None, artifact_dir: None, + persistent_managed_marker: None, logs: VecDeque::new(), } } @@ -3166,6 +3178,7 @@ fn dispatch_recover(session: &mut BrowserSession, argv: &[String]) -> Result session.reattach_same_target(), Some("restart-runtime") => session.restart_runtime(), Some("restart-owned-browser") => session.restart_owned_browser(), + Some("stop-owned-browser") => session.stop_owned_browser(), Some("stop-owned-remote") => session.stop_owned_remote(), Some(other) => bail!("unknown browser recover command: {other}"), None => bail!("browser recover requires a recovery action"), @@ -3601,6 +3614,26 @@ impl BrowserSession { profile: ManagedProfile, extra_args: Vec, ) -> Result { + let persist = external_browser_persistence_enabled(); + let profile = if persist { + match profile { + // Persistent managed browsers need a stable profile dir so the + // next one-shot invocation can find the marker and reattach. + ManagedProfile::Temp => ManagedProfile::Path(persistent_managed_profile_path( + self.session_id.as_deref(), + )), + other => other, + } + } else { + profile + }; + if persist { + if let ManagedProfile::Path(path) = &profile { + if let Some(connected) = self.reattach_persistent_managed(path) { + return Ok(connected); + } + } + } self.stop_owned_managed(); let mut launch_errors = Vec::new(); let mut launched = None; @@ -3611,7 +3644,7 @@ impl BrowserSession { headless, extra_args: extra_args.clone(), }; - match launch_managed_browser(launch.clone(), self.session_id.clone()) { + match launch_managed_browser(launch.clone(), self.session_id.clone(), persist) { Ok((managed, http_url)) => { launched = Some((launch, managed, http_url)); break; @@ -3633,6 +3666,7 @@ impl BrowserSession { ); }; let ws_url = resolve_ws_from_http(&http_url)?; + let marker_path = managed.marker_path.clone(); self.managed = Some(managed); if let Err(error) = self.connect_endpoint( Endpoint { @@ -3647,6 +3681,7 @@ impl BrowserSession { self.stop_owned_managed(); return Err(error); } + self.persistent_managed_marker = persist.then_some(marker_path); self.browser_name = Some("Managed Chromium".to_string()); self.profile = Some(match &launch.profile { ManagedProfile::Temp => "temp".to_string(), @@ -3660,7 +3695,85 @@ impl BrowserSession { })) } + /// Reattach to a still-running Browser Use cloud browser recorded by a + /// previous one-shot invocation, instead of creating (and billing) a new + /// one. Returns `None` when the record is missing or the browser is gone. + fn reattach_persistent_cloud(&mut self) -> Option { + let record_path = persistent_cloud_record_path(self.session_id.as_deref()); + let raw = fs::read_to_string(&record_path).ok()?; + let Ok(record) = serde_json::from_str::(&raw) else { + let _ = fs::remove_file(&record_path); + return None; + }; + let (Some(id), Some(cdp_url)) = ( + record.get("id").and_then(Value::as_str), + record.get("cdpUrl").and_then(Value::as_str), + ) else { + let _ = fs::remove_file(&record_path); + return None; + }; + let Ok(ws_url) = resolve_ws_from_http(cdp_url) else { + // The cloud browser stopped (timeout or explicit stop); retire the + // record so the caller starts a fresh one. + let _ = fs::remove_file(&record_path); + return None; + }; + self.stop_owned_managed(); + self.connect_endpoint( + Endpoint { + kind: "browser-use-cloud".to_string(), + http_url: Some(cdp_url.to_string()), + ws_url, + candidate_id: None, + }, + BrowserMode::RemoteCloud, + BrowserOwner::Rust, + ) + .ok()?; + self.remote_browser_id = Some(id.to_string()); + self.live_url = record + .get("liveUrl") + .and_then(Value::as_str) + .map(ToOwned::to_owned); + self.browser_name = Some("Browser Use Cloud".to_string()); + self.profile = record + .get("profileId") + .and_then(Value::as_str) + .map(ToOwned::to_owned); + Some(json!({ + "status": "connected", + "reattached": true, + "browser": self.status_json(), + "live_url": self.live_url, + })) + } + + fn persist_cloud_record(&self, browser: &Value, requested_profile_id: Option<&str>) { + let record_path = persistent_cloud_record_path(self.session_id.as_deref()); + if let Some(parent) = record_path.parent() { + let _ = fs::create_dir_all(parent); + } + let record = json!({ + "id": self.remote_browser_id, + "cdpUrl": browser.get("cdpUrl"), + "liveUrl": browser.get("liveUrl"), + "profileId": requested_profile_id, + "started_at_ms": unix_time_ms() as u64, + }); + if let Ok(raw) = serde_json::to_vec_pretty(&record) { + let _ = fs::write(&record_path, raw); + } + } + fn start_remote_cloud(&mut self, argv: &[String]) -> Result { + // Plain `remote start` (no explicit profile/timeout/proxy options) may + // reattach to the persistent cloud browser from a prior invocation; + // explicit options always provision a fresh browser. + if external_browser_persistence_enabled() && argv.len() <= 2 { + if let Some(connected) = self.reattach_persistent_cloud() { + return Ok(connected); + } + } let mut body = serde_json::Map::new(); let mut requested_profile_id = None; if let Some(profile_id) = option_value(argv, "--profile-id") { @@ -3723,6 +3836,9 @@ impl BrowserSession { .and_then(Value::as_str) .map(ToOwned::to_owned); self.browser_name = Some("Browser Use Cloud".to_string()); + if external_browser_persistence_enabled() { + self.persist_cloud_record(&browser, requested_profile_id.as_deref()); + } self.profile = requested_profile_id; Ok(json!({ "status": "connected", @@ -3743,6 +3859,7 @@ impl BrowserSession { return Ok(json!({ "stopped": false, "reason": "missing remote browser id" })); }; stop_cloud_browser(&id)?; + let _ = fs::remove_file(persistent_cloud_record_path(self.session_id.as_deref())); if let Some(sid) = self.session_id.clone() { stop_session_capture(&sid); } @@ -4015,11 +4132,86 @@ impl BrowserSession { Ok(json!({ "restarted": true, "browser": self.status_json() })) } + /// Reattach to a still-running persistent managed browser via its marker + /// file instead of launching a new one. Returns `None` when there is no + /// live browser to reattach to (caller falls through to a fresh launch). + fn reattach_persistent_managed(&mut self, profile_path: &Path) -> Option { + let marker_path = profile_path.join(MANAGED_BROWSER_MARKER_FILE); + let raw = fs::read_to_string(&marker_path).ok()?; + let marker = serde_json::from_str::(&raw).ok()?; + if !process_command_matches_managed_marker(marker.pid, &marker.profile_path) { + return None; + } + let http_url = format!("http://127.0.0.1:{}", marker.port); + let ws_url = resolve_ws_from_http(&http_url).ok()?; + // Clear any current attachment state without killing the browser we + // are about to reattach to. + self.persistent_managed_marker = None; + self.detach_managed_state(); + self.connect_endpoint( + Endpoint { + kind: "cdp-url".to_string(), + http_url: Some(http_url), + ws_url, + candidate_id: None, + }, + BrowserMode::Managed, + BrowserOwner::Rust, + ) + .ok()?; + self.persistent_managed_marker = Some(marker_path); + self.browser_name = Some("Managed Chromium".to_string()); + self.profile = Some(profile_path.display().to_string()); + Some(json!({ + "status": "connected", + "reattached": true, + "browser": self.status_json(), + "next_step": "Continue immediately with the user's requested browser/search/page work in this connected managed browser.", + "model_instruction": "Browser connection is setup only. Do not answer the user's browser/search/page task from memory or stop after connecting; continue with page work now.", + })) + } + + /// Explicitly stop a Rust-owned managed browser, including persistent + /// managed browsers reattached from a previous one-shot invocation. + fn stop_owned_browser(&mut self) -> Result { + let owned = self.managed.is_some() + || self.persistent_managed_marker.is_some() + || (self.owner == BrowserOwner::Rust && self.mode == BrowserMode::Managed); + if !owned { + return Ok(json!({ + "stopped": false, + "reason": "current browser is not a Rust-owned managed browser", + })); + } + self.stop_owned_managed(); + Ok(json!({ "stopped": true })) + } + fn stop_owned_managed(&mut self) { if let Some(mut managed) = self.managed.take() { let _ = managed.child.kill(); let _ = managed.child.wait(); + // Explicit stops always retire the marker, even for persistent + // managed browsers whose Drop would otherwise keep it. + let _ = fs::remove_file(&managed.marker_path); + } else if let Some(marker_path) = self.persistent_managed_marker.take() { + if let Ok(raw) = fs::read_to_string(&marker_path) { + if let Ok(marker) = serde_json::from_str::(&raw) { + if process_command_matches_managed_marker(marker.pid, &marker.profile_path) { + terminate_process(marker.pid); + } + } + } + let _ = fs::remove_file(&marker_path); + if let Some(profile_dir) = marker_path.parent() { + let _ = fs::remove_file(profile_dir.join("DevToolsActivePort")); + } } + self.persistent_managed_marker = None; + self.detach_managed_state(); + } + + fn detach_managed_state(&mut self) { if self.mode == BrowserMode::Managed { if let Some(sid) = self.session_id.clone() { stop_session_capture(&sid); @@ -6081,9 +6273,17 @@ fn process_command_matches_managed_marker(_pid: u32, _profile_path: &Path) -> bo #[cfg(unix)] fn terminate_process(pid: u32) { let pid = pid.to_string(); - let _ = Command::new("kill").args(["-TERM", &pid]).status(); + let _ = Command::new("kill") + .args(["-TERM", &pid]) + .stdout(Stdio::null()) + .stderr(Stdio::null()) + .status(); thread::sleep(Duration::from_millis(200)); - let _ = Command::new("kill").args(["-KILL", &pid]).status(); + let _ = Command::new("kill") + .args(["-KILL", &pid]) + .stdout(Stdio::null()) + .stderr(Stdio::null()) + .status(); } #[cfg(not(unix))] @@ -6111,9 +6311,47 @@ fn write_managed_browser_marker( Ok(marker_path) } +/// Whether externally-driven one-shot CLI invocations should keep Rust-owned +/// browsers (managed Chromium, Browser Use cloud) alive across processes and +/// reattach to them, instead of stopping them when the process exits. +/// +/// Set by `browser-use-terminal browser ...` (the assistant plugin surface); +/// long-lived hosts (TUI/SDK) keep the default in-process ownership. +fn external_browser_persistence_enabled() -> bool { + env_bool("BROWSER_USE_TERMINAL_PERSIST_BROWSERS") == Some(true) +} + +/// Root directory for persistent external-browser state (managed profiles and +/// cloud browser records). Override with `BU_EXTERNAL_BROWSER_STATE_DIR` (the +/// external CLI points it inside the resolved state dir). +fn persistent_external_browser_state_root() -> PathBuf { + std::env::var_os("BU_EXTERNAL_BROWSER_STATE_DIR") + .filter(|value| !value.is_empty()) + .map(PathBuf::from) + .unwrap_or_else(|| { + home_dir() + .map(|home| home.join(".browser-use-terminal")) + .unwrap_or_else(|| PathBuf::from(".browser-use-terminal")) + .join("external-browser") + }) +} + +fn persistent_managed_profile_path(session_id: Option<&str>) -> PathBuf { + persistent_external_browser_state_root() + .join("managed") + .join(session_id.unwrap_or("default")) +} + +fn persistent_cloud_record_path(session_id: Option<&str>) -> PathBuf { + persistent_external_browser_state_root() + .join("cloud") + .join(format!("{}.json", session_id.unwrap_or("default"))) +} + fn launch_managed_browser( launch: ManagedLaunch, owner_session_id: Option, + persist: bool, ) -> Result<(ManagedBrowser, String)> { let (profile_path, temp_dir) = match &launch.profile { ManagedProfile::Temp => { @@ -6191,6 +6429,7 @@ fn launch_managed_browser( _profile_dir: temp_dir, launch, marker_path, + persist, }, http_url, )); @@ -6977,7 +7216,8 @@ fn local_profile_cookies(profile: &LocalBrowserProfile) -> Result> { headless: true, extra_args: vec!["--no-startup-window".to_string()], }; - let (mut managed, http_url) = launch_managed_browser(launch, None)?; + // Profile inspection browsers are always throwaway; never persist them. + let (mut managed, http_url) = launch_managed_browser(launch, None, false)?; let result = (|| -> Result> { let ws_url = resolve_ws_from_http(&http_url)?; let mut connection = CdpConnection::connect(&ws_url)?; @@ -13883,6 +14123,49 @@ print("large response ok", len(data["blob"])) .contains("DevToolsActivePort")); } + #[test] + fn stop_owned_browser_reports_not_owned_for_fresh_session() { + let mut session = BrowserSession::default(); + let result = session.stop_owned_browser().unwrap(); + assert_eq!(result["stopped"], false); + } + + #[test] + fn stop_owned_browser_terminates_persistent_marker_browser() { + let temp = tempfile::tempdir().unwrap(); + let marker_path = temp.path().join(MANAGED_BROWSER_MARKER_FILE); + let marker = ManagedBrowserMarker { + // A pid that does not match a managed chrome command line, so the + // stop path only retires the marker without killing anything. + pid: 999_999, + port: 9, + executable: "chrome".to_string(), + profile_path: temp.path().to_path_buf(), + owner_session_id: Some("owner-session".to_string()), + started_at_ms: 1, + }; + fs::write(&marker_path, serde_json::to_vec(&marker).unwrap()).unwrap(); + fs::write(temp.path().join("DevToolsActivePort"), "9\nstale\n").unwrap(); + + let mut session = BrowserSession { + persistent_managed_marker: Some(marker_path.clone()), + ..BrowserSession::default() + }; + let result = session.stop_owned_browser().unwrap(); + assert_eq!(result["stopped"], true); + assert!(session.persistent_managed_marker.is_none()); + assert!(!marker_path.exists()); + assert!(!temp.path().join("DevToolsActivePort").exists()); + } + + #[test] + fn persistent_external_browser_paths_are_session_scoped() { + let managed = persistent_managed_profile_path(Some("browser-cli-work")); + assert!(managed.ends_with("managed/browser-cli-work")); + let cloud = persistent_cloud_record_path(None); + assert!(cloud.ends_with("cloud/default.json")); + } + #[test] fn active_managed_browser_marker_is_not_reaped() { let temp = tempfile::tempdir().unwrap(); diff --git a/crates/browser-use-cli/src/main.rs b/crates/browser-use-cli/src/main.rs index 713e50f0..62617e9a 100644 --- a/crates/browser-use-cli/src/main.rs +++ b/crates/browser-use-cli/src/main.rs @@ -328,6 +328,36 @@ enum Command { task_id: String, code: String, }, + /// Browser management for external assistants + /// + /// `browser exec [CODE]` runs Python with browser helpers pre-imported + /// (reads stdin when CODE is omitted — heredoc-friendly) and prints the + /// script output plus `Screenshot saved to ` lines. Any other + /// arguments are forwarded to the browser control plane + /// (`status --json`, `connect local`, `doctor`, ...); run + /// `browser help` for the full command list. The browser auto-connects + /// using the remembered preference, and the browser itself persists across + /// invocations. + Browser { + /// Named workstream; isolates artifact dirs and event logs. + #[arg(long, default_value = "default")] + session: String, + /// Per-exec script timeout in seconds. + #[arg(long, default_value_t = 300)] + timeout: u64, + /// Print the full structured response as JSON. + #[arg(long)] + json: bool, + /// `exec [CODE]` or control-plane command words. + #[arg(trailing_var_arg = true, allow_hyphen_values = true)] + args: Vec, + }, + /// Print or install the assistant-facing skill (SKILL.md) that teaches + /// coding assistants how to use the `browser` subcommand. + Skill { + #[command(subcommand)] + command: SkillCommand, + }, SyncCookies { #[arg(value_name = "LOCAL_PROFILE")] profile: Option, @@ -736,6 +766,32 @@ enum SessionsCommand { }, } +#[derive(Debug, Subcommand)] +enum SkillCommand { + /// Print the skill markdown to stdout. + Show, + /// Print the canonical install location for each assistant. + Paths, + /// Install the skill for coding assistants. With no argument, installs for + /// every assistant whose home directory exists. + Install { + #[arg(value_enum)] + assistant: Option, + }, +} + +#[derive(Clone, Copy, Debug, PartialEq, Eq, ValueEnum)] +enum SkillAssistant { + /// Claude Code (~/.claude/skills). OpenCode also discovers this location. + Claude, + /// Codex CLI ($CODEX_HOME or ~/.codex, under skills/). + Codex, + /// OpenCode (~/.config/opencode/skills). + Opencode, + /// The cross-assistant agents dir (~/.agents/skills). + Agents, +} + #[derive(Clone, Copy, Debug, PartialEq, Eq, ValueEnum)] enum AuthAccount { Codex, @@ -972,6 +1028,13 @@ fn main() -> Result<()> { Command::Events { task_id } => events(&store, &task_id), Command::Python { task_id, code } => python(&store, &task_id, code), Command::BrowserScript { task_id, code } => browser_script(&store, &task_id, code), + Command::Browser { + session, + timeout, + json, + args: browser_args, + } => browser_cli(store, &session, timeout, json, browser_args), + Command::Skill { command } => skill(command), Command::SyncCookies { profile, local_profile, @@ -1261,6 +1324,8 @@ fn command_name(command: &Command) -> &'static str { Command::Events { .. } => "events", Command::Python { .. } => "python", Command::BrowserScript { .. } => "browser_script", + Command::Browser { .. } => "browser", + Command::Skill { .. } => "skill", Command::SyncCookies { .. } => "sync_cookies", Command::UserShell { .. } => "user_shell", Command::Review { .. } => "review", @@ -3240,6 +3305,304 @@ fn browser_script(store: &Store, task_id: &str, code: String) -> Result<()> { ) } +/// Assistant-facing skill document, embedded so installed binaries can write +/// it without a repo checkout. Source of truth: `SKILL.md` at the repo root. +const SKILL_MD: &str = include_str!("../../../SKILL.md"); +const SKILL_DIR_NAME: &str = "browser-use-terminal"; +const EXTERNAL_BROWSER_SESSION_PREFIX: &str = "browser-cli-"; +/// Screenshot downscale ceiling for the external CLI surface. Assistants view +/// screenshots through their file-read tools, several of which resize or +/// reject images above ~2000 px per side (Codex `view_image` caps at 2048). +const EXTERNAL_SCREENSHOT_MAX_DIM: &str = "1800"; + +/// `browser-use-terminal browser ...`: browser management for external coding +/// assistants. One-shot invocations run against a durable named session; the +/// browser itself (user Chrome / managed / cloud) persists between calls. +fn browser_cli( + store: Store, + session: &str, + timeout: u64, + json: bool, + args: Vec, +) -> Result<()> { + let session_name = sanitize_external_session_name(session)?; + let session_id = format!("{EXTERNAL_BROWSER_SESSION_PREFIX}{session_name}"); + let cwd = std::env::current_dir().context("resolve current dir")?; + let task = match store.load_session(&session_id)? { + Some(task) => task, + None => { + let artifact_root = store.state_dir().join("artifacts").join(&session_id); + store.create_session_with_id_and_artifact_root( + None, + &cwd, + artifact_root, + session_id.clone(), + )? + } + }; + let artifact_dir = PathBuf::from(&task.artifact_root); + // One-shot invocations must not tear down Rust-owned browsers on exit: + // managed Chromium / cloud browsers persist and later calls reattach + // (browser-harness daemon semantics, minus the daemon). + unsafe { + std::env::set_var("BROWSER_USE_TERMINAL_PERSIST_BROWSERS", "1"); + if std::env::var_os("BU_EXTERNAL_BROWSER_STATE_DIR").is_none() { + std::env::set_var( + "BU_EXTERNAL_BROWSER_STATE_DIR", + store.state_dir().join("external-browser"), + ); + } + } + let shared: SharedStore = Arc::new(Mutex::new(store)); + + let mut argv = args; + if argv.first().map(String::as_str) == Some("browser") { + argv.remove(0); + } + if argv.is_empty() { + argv.push("help".to_string()); + } + + if argv.first().map(String::as_str) == Some("exec") { + let code = if argv.len() == 1 || (argv.len() == 2 && argv[1] == "-") { + let mut buffer = String::new(); + io::stdin() + .read_to_string(&mut buffer) + .context("read Python code from stdin")?; + buffer + } else { + argv[1..].join(" ") + }; + if code.trim().is_empty() { + bail!("no Python code provided: pass it as an argument or on stdin (heredoc)"); + } + if std::env::var_os("BU_BROWSER_SCREENSHOT_MAX_DIM").is_none() + && std::env::var_os("BROWSER_USE_SCREENSHOT_MAX_DIM").is_none() + { + unsafe { + std::env::set_var("BU_BROWSER_SCREENSHOT_MAX_DIM", EXTERNAL_SCREENSHOT_MAX_DIM); + } + } + let outcome = browser_use_agent::tools::handlers::browser::run_external_browser_script( + &shared, + &session_id, + &cwd, + &artifact_dir, + &code, + timeout, + )?; + match outcome { + browser_use_agent::tools::handlers::browser::ExternalBrowserScriptOutcome::Blocked( + content, + ) => { + println!("{}", serde_json::to_string_pretty(&content)?); + bail!("browser needs user action before scripts can run (see JSON above)"); + } + browser_use_agent::tools::handlers::browser::ExternalBrowserScriptOutcome::Ran(out) => { + if json { + println!( + "{}", + serde_json::to_string_pretty(&serde_json::to_value(&out)?)? + ); + } else { + print_external_script_output(&out); + } + if !out.ok { + bail!( + "{}", + out.error + .unwrap_or_else(|| "browser_script failed".to_string()) + ); + } + Ok(()) + } + } + } else { + let command = join_browser_command_words(&argv); + let out = browser_use_agent::tools::handlers::browser::run_external_browser_command( + &shared, + &session_id, + &cwd, + &artifact_dir, + &command, + )?; + match &out.content { + Value::String(text) => println!("{text}"), + other => println!("{}", serde_json::to_string_pretty(other)?), + } + Ok(()) + } +} + +fn print_external_script_output(out: &browser_use_browser::BrowserScriptOutput) { + let text = out.text.trim_end(); + if !text.is_empty() { + println!("{text}"); + } + let mut printed_paths = HashSet::new(); + for image in &out.images { + if let Some(path) = image.get("path").and_then(Value::as_str) { + if printed_paths.insert(path.to_string()) { + println!("Screenshot saved to {path}"); + } + } + } + for artifact in &out.artifacts { + if let Some(path) = artifact.get("path").and_then(Value::as_str) { + if printed_paths.insert(path.to_string()) { + println!("Artifact saved to {path}"); + } + } + } +} + +fn sanitize_external_session_name(name: &str) -> Result { + let trimmed = name.trim(); + if trimmed.is_empty() + || trimmed.len() > 64 + || !trimmed + .chars() + .all(|c| c.is_ascii_alphanumeric() || matches!(c, '-' | '_' | '.')) + { + bail!("invalid --session name {name:?}: use 1-64 ASCII letters, digits, '-', '_' or '.'"); + } + Ok(trimmed.to_string()) +} + +/// Re-quote command words for the browser control plane's shell-words parser +/// so arguments containing spaces (e.g. profile ids like +/// `google-chrome:Profile 2`) survive the join. +fn join_browser_command_words(words: &[String]) -> String { + words + .iter() + .map(|word| { + if !word.is_empty() + && !word + .chars() + .any(|c| c.is_whitespace() || matches!(c, '"' | '\'' | '\\')) + { + word.clone() + } else { + format!("'{}'", word.replace('\\', "\\\\").replace('\'', "\\'")) + } + }) + .collect::>() + .join(" ") +} + +fn skill(command: SkillCommand) -> Result<()> { + match command { + SkillCommand::Show => { + print!("{SKILL_MD}"); + Ok(()) + } + SkillCommand::Paths => { + let home = user_home_dir()?; + for assistant in [ + SkillAssistant::Claude, + SkillAssistant::Codex, + SkillAssistant::Opencode, + SkillAssistant::Agents, + ] { + println!( + "{}: {}", + skill_assistant_label(assistant), + skill_install_dir(&home, assistant) + .join("SKILL.md") + .display() + ); + } + Ok(()) + } + SkillCommand::Install { assistant } => skill_install(assistant), + } +} + +fn skill_install(assistant: Option) -> Result<()> { + let home = user_home_dir()?; + let targets = match assistant { + Some(assistant) => vec![assistant], + None => { + let detected = detect_skill_assistants(&home); + if detected.is_empty() { + bail!( + "no assistant homes found (~/.claude, ~/.codex, ~/.config/opencode, ~/.agents). \ + Pass one explicitly: `browser-use-terminal skill install `" + ); + } + detected + } + }; + for target in targets { + let dir = skill_install_dir(&home, target); + fs::create_dir_all(&dir).with_context(|| format!("create skill dir {}", dir.display()))?; + let path = dir.join("SKILL.md"); + fs::write(&path, SKILL_MD).with_context(|| format!("write {}", path.display()))?; + println!( + "Installed {} skill: {}", + skill_assistant_label(target), + path.display() + ); + } + println!( + "\nNew sessions of those assistants will discover the skill automatically. \ + Browser setup (one-time): open chrome://inspect/#remote-debugging in Chrome and tick \ + \"Allow remote debugging\", or let the CLI use a managed/cloud browser \ + (`browser-use-terminal browser preference use managed-headless|cloud`)." + ); + Ok(()) +} + +fn skill_assistant_label(assistant: SkillAssistant) -> &'static str { + match assistant { + SkillAssistant::Claude => "Claude Code", + SkillAssistant::Codex => "Codex", + SkillAssistant::Opencode => "OpenCode", + SkillAssistant::Agents => "agents dir", + } +} + +fn skill_install_dir(home: &Path, assistant: SkillAssistant) -> PathBuf { + let base = match assistant { + SkillAssistant::Claude => home.join(".claude"), + SkillAssistant::Codex => std::env::var_os("CODEX_HOME") + .filter(|value| !value.is_empty()) + .map(PathBuf::from) + .unwrap_or_else(|| home.join(".codex")), + SkillAssistant::Opencode => home.join(".config").join("opencode"), + SkillAssistant::Agents => home.join(".agents"), + }; + base.join("skills").join(SKILL_DIR_NAME) +} + +fn detect_skill_assistants(home: &Path) -> Vec { + // OpenCode also discovers Claude-compatible skill paths (~/.claude/skills), + // so a Claude install covers OpenCode users unless they only have + // ~/.config/opencode. + [ + SkillAssistant::Claude, + SkillAssistant::Codex, + SkillAssistant::Opencode, + SkillAssistant::Agents, + ] + .into_iter() + .filter(|assistant| { + skill_install_dir(home, *assistant) + .parent() + .and_then(Path::parent) + .is_some_and(Path::is_dir) + }) + .collect() +} + +fn user_home_dir() -> Result { + std::env::var_os("HOME") + .filter(|value| !value.is_empty()) + .or_else(|| std::env::var_os("USERPROFILE").filter(|value| !value.is_empty())) + .map(PathBuf::from) + .context("could not resolve the home directory (HOME/USERPROFILE unset)") +} + fn remote_cdp_connect_command_from_env() -> Option { std::env::var("BU_CDP_WS") .ok() diff --git a/docs/assistant-plugins.md b/docs/assistant-plugins.md new file mode 100644 index 00000000..5d841e6f --- /dev/null +++ b/docs/assistant-plugins.md @@ -0,0 +1,94 @@ +# Using Browser Use Terminal from coding assistants + +Browser Use Terminal plugs into any coding assistant or agent that can run shell commands — Claude Code, Codex, OpenCode, OpenClaw, Cursor CLI, and friends. The model is similar to [browser-use/browser-harness](https://github.com/browser-use/browser-harness): a skill file teaches the assistant the CLI, and the CLI gives it the full browser runtime (connect/recovery control plane, Python page helpers, screenshots-as-files). + +Two surfaces, one binary: + +- **Browser management** — the assistant drives the browser itself: + ```bash + browser-use-terminal browser exec <<'PY' + new_tab("https://example.com") + wait_for_load() + print(capture_screenshot()) + PY + ``` +- **Core agent** — the assistant delegates a whole task to the built-in browser agent: + ```bash + browser-use-terminal start "Compare M4 MacBook Air prices across three retailers" + ``` + +## Install + +1. Install Browser Use Terminal so `browser-use-terminal` is on `$PATH`: + + ```bash + curl -fsSL https://browser-use.com/terminal/install.sh | sh + ``` + +2. Register the skill. With no argument this installs for every assistant whose home directory exists (`~/.claude`, `~/.codex`, `~/.config/opencode`, `~/.agents`): + + ```bash + browser-use-terminal skill install + # or one of: + browser-use-terminal skill install claude + browser-use-terminal skill install codex + browser-use-terminal skill install opencode + browser-use-terminal skill install agents + ``` + + `browser-use-terminal skill paths` prints where each assistant's copy lands; `skill show` prints the markdown. Re-run `skill install` after updating the CLI to refresh the copies. + + Notes: + - OpenCode also discovers Claude-compatible skill paths (`~/.claude/skills/...`), so the `claude` install covers both. + - Gemini CLI has no skills directory; paste the output of `browser-use-terminal skill show` into `~/.gemini/GEMINI.md` (or a project `GEMINI.md`) instead. + +3. Pick a browser (one-time). The CLI auto-connects per the remembered preference on every call: + + ```bash + browser-use-terminal browser preference use managed-headless # zero-setup disposable Chromium (default-quality choice) + browser-use-terminal browser preference use local # your real, logged-in Chrome + browser-use-terminal browser preference use cloud # Browser Use cloud (needs BROWSER_USE_API_KEY) + ``` + + For `local`, Chrome needs its one-time remote-debugging opt-in: open `chrome://inspect/#remote-debugging` and tick "Allow remote debugging for this browser instance" (`browser-use-terminal browser local setup` walks the user through it). + +## How it stays stateful across bash calls + +Each CLI invocation is a one-shot process, but the browser is not: + +- **local** — your Chrome keeps running; each call rediscovers it via `DevToolsActivePort`. +- **managed** — the CLI launches Chromium with a stable per-session profile and a marker file (`/external-browser/managed//`), leaves it running on exit, and reattaches on the next call. Stop it with `browser-use-terminal browser recover stop-owned-browser`. +- **cloud** — the created browser's id/CDP URL is recorded (`/external-browser/cloud/.json`) and reattached until it stops or times out. Stop it (and billing) with `browser-use-terminal browser recover stop-owned-remote`. + +Page/tab state therefore persists between `browser exec` calls; Python variables do not (each exec is a fresh interpreter). `--session ` gives parallel workstreams isolated artifact dirs, event logs, and managed browsers. + +Everything an assistant does is recorded in the same SQLite event log the TUI uses — inspect with `browser-use-terminal events browser-cli-` or `browser-use-terminal sessions list`. + +## How assistants see screenshots + +Bash output is text-only, so images travel as files: `capture_screenshot()` saves a PNG (downscaled to ≤1800 px per side for this surface) and the CLI prints `Screenshot saved to `. The skill then tells each assistant to use its native image-reading tool: + +| Assistant | Tool | Notes | +|---|---|---| +| Claude Code | `Read` on the path | reads PNG/JPG natively | +| Codex CLI | `view_image` with `{"path": ...}` | resizes >2048 px itself; enabled by default | +| OpenCode | `read` on the path | image support since Oct 2025; needs a vision model | +| Gemini CLI | `read_file` on the path | returns inline image data | +| Cursor CLI | reference the path | the agent reads image files automatically | + +Non-vision models skip screenshots and work from text state (`page_info()`, `js(...)`, `wait_for_element(...)`) — the skill spells out this fallback. + +## Credentials, navigation policy, and safety + +The external CLI surface enforces the same policies as the in-app agent on every call: + +- Secrets/TOTP stored via `browser-use-terminal secrets ...` are available to scripts only as placeholders (`type_text("name")`); raw values never reach the assistant. +- The domain allow/deny policy (`browser-use-terminal domains ...`) guards `Page.navigate` at the Rust layer. +- Blocked states surface as `needs-user-action` JSON with a `user_prompt` the assistant is instructed to relay verbatim instead of guessing. + +## Files + +- `SKILL.md` (repo root) — the assistant-facing skill, embedded into the binary at build time and written out by `skill install`. +- `crates/browser-use-cli` — `browser` / `skill` subcommands. +- `crates/browser-use-agent/src/tools/handlers/browser.rs` — `run_external_browser_command` / `run_external_browser_script`, the blocking entry points that reuse the in-session preference resolution, auto-connect, security policy, and event persistence. +- `crates/browser-use-browser` — persistent managed/cloud browser reattach (`BROWSER_USE_TERMINAL_PERSIST_BROWSERS`, `BU_EXTERNAL_BROWSER_STATE_DIR`). diff --git a/prompts/browser-tool-description.md b/prompts/browser-tool-description.md index 3d24b0d4..0222d490 100644 --- a/prompts/browser-tool-description.md +++ b/prompts/browser-tool-description.md @@ -93,6 +93,7 @@ Recovery: - `browser recover reattach-same-target`: attaches a fresh CDP session to the same target id. If the target is gone, it reports available targets and does not silently switch. - `browser recover restart-runtime`: resets the Rust connection holder and reconnects to the same endpoint. It does not kill Chrome. - `browser recover restart-owned-browser`: restarts only Rust-owned managed browsers. +- `browser recover stop-owned-browser`: stops only Rust-owned managed browsers (including persistent managed browsers reattached across CLI invocations). - `browser recover stop-owned-remote`: stops only Rust-owned Browser Use cloud browsers. Commands: @@ -134,6 +135,7 @@ browser recover reconnect-websocket browser recover reattach-same-target browser recover restart-runtime browser recover restart-owned-browser +browser recover stop-owned-browser browser recover stop-owned-remote browser script runs --json From 5978722fa9506991e2cba4351b55510757da245a Mon Sep 17 00:00:00 2001 From: Laith Weinberger Date: Wed, 10 Jun 2026 11:19:45 -0700 Subject: [PATCH 2/6] auto agent setup --- AGENT_SETUP.md | 151 ++++++++++++++++++++++++++++++++++++++ SKILL.md | 2 + docs/assistant-plugins.md | 2 + 3 files changed, 155 insertions(+) create mode 100644 AGENT_SETUP.md diff --git a/AGENT_SETUP.md b/AGENT_SETUP.md new file mode 100644 index 00000000..6eb898f1 --- /dev/null +++ b/AGENT_SETUP.md @@ -0,0 +1,151 @@ +# Browser Use Terminal — agent setup + +You are (probably) a coding agent reading this because a user pasted +`https://browser-use.com/skill` into the chat. Follow this page top to bottom. +It installs Browser Use Terminal, registers its skill with you, connects a +browser, and verifies everything — then you can browse the web for your user. +Every step is idempotent; re-run safely. Tell the user what you're doing as +you go, and stop to ask only where this page says to. + +What you get: a `browser-use-terminal` CLI you drive from shell commands — +run Python against a real browser (navigate, click, type, screenshot, extract), +or delegate whole browsing tasks to its built-in agent. + +## 1. Preflight + +```bash +uname -s # Darwin or Linux. Windows: use WSL; native Windows is not supported yet. +command -v curl || command -v wget +command -v tar +python3 --version # need 3.9+ +``` + +If `python3` is missing: macOS → `xcode-select --install` (or `brew install python3`); +Debian/Ubuntu → `sudo apt-get install -y python3`; Fedora → `sudo dnf install -y python3`. +If you cannot install it, ask the user to. + +## 2. Install Browser Use Terminal + +```bash +curl -fsSL https://browser-use.com/terminal/install.sh | sh +export PATH="$HOME/.local/bin:$PATH" # the installer edits the shell profile, but YOUR current shell needs this +browser-use-terminal --version # verify +``` + +Notes: no sudo needed; binaries land in `~/.local/bin` +(`browser`, `browser-use`, `browser-use-terminal`, `but`); state lives in +`~/.browser-use-terminal`. Already installed? The same command updates it. +Add the `export PATH` line to every later shell command in this session, or +rely on the profile in new shells. + +## 3. Register the skill with yourself + +```bash +browser-use-terminal skill install +``` + +This writes the skill (full usage instructions for you) into every detected +assistant home: `~/.claude/skills/`, `~/.codex/skills/`, +`~/.config/opencode/skills/`, `~/.agents/skills/`. Claude Code, Codex, +OpenCode, and anything reading those conventions will auto-discover it in new +sessions. + +If you are a different agent (none of those directories exist, or you know +you don't read them): run `browser-use-terminal skill show` and persist its +output wherever you read durable instructions (e.g. your `AGENTS.md`, rules +file, or memory). If you can't persist anything, just read it now — this +session can proceed either way. + +## 4. Connect a browser (pick one) + +**Default — zero user interaction (recommended to start):** + +```bash +browser-use-terminal browser preference use managed-headless +``` + +This uses a disposable headless Chromium that persists across your commands. +It requires a Chromium-family binary on the machine (Chrome, Chromium, or a +Playwright Chromium). Check: `ls "/Applications/Google Chrome.app" 2>/dev/null || command -v google-chrome chromium chromium-browser`. +If none exists: ask the user to install Chrome (or set `CHROME_PATH=/path/to/chrome`). + +**The user's real, logged-in Chrome** (for tasks needing their accounts): + +```bash +browser-use-terminal browser preference use local +browser-use-terminal browser connect local +``` + +One-time user action required: they must open `chrome://inspect/#remote-debugging` +in Chrome and tick "Allow remote debugging for this browser instance". +`browser-use-terminal browser local setup` walks them through it. Relay any +`needs-user-action` JSON (`user_prompt` field) to the user verbatim and wait. + +**Cloud browser** (headless server / clean remote IP; needs an API key from +https://cloud.browser-use.com): + +```bash +export BROWSER_USE_API_KEY=bu_... +browser-use-terminal browser preference use cloud +``` + +Cloud browsers bill until stopped (`browser-use-terminal browser recover stop-owned-remote`). + +## 5. Verify end to end + +```bash +browser-use-terminal browser exec <<'PY' +new_tab("https://example.com") +wait_for_load() +print(page_info()["title"]) +print(capture_screenshot()) +PY +``` + +Expected: it prints `Example Domain` and `Screenshot saved to `. +Now view that screenshot with your own file-reading tool to confirm you can +see pages: Claude Code → `Read` the path; Codex → `view_image` with the path; +OpenCode → `read` the path; Gemini CLI → `read_file`; otherwise use whatever +tool you have that displays a local image. If your model can't view images, +skip this — you'll work from text state (`page_info()`, `js(...)`) instead. + +If this step worked, setup is complete. Tell the user you're ready to browse. + +## 6. Using it (crash course) + +Full instructions are in the skill you installed (or `browser-use-terminal skill show`). +The essentials: + +- Run Python against the live browser with heredocs. Helpers are pre-imported: + `new_tab(url)`, `goto_url(url)`, `page_info()`, `click_at_xy(x, y)`, + `type_text(text)`, `press_key("Enter")`, `fill_input(sel, text)`, `scroll()`, + `wait_for_load()`, `wait_for_element(sel)`, `capture_screenshot()`, `js(expr)`, + `cdp(method, **params)`, `http_get(url)`. +- The browser persists between commands; Python variables do not. +- Screenshot-first workflow: screenshot → read coordinates off the image → + `click_at_xy` (CSS pixels — divide image coordinates by + `js("window.devicePixelRatio")`) → screenshot again to verify. +- In the user's real Chrome, open work in `new_tab(...)`, never `goto_url` over + their active tab. +- Auth walls: stop and ask the user. Never type credentials read off a screenshot. +- Delegate a whole task to the built-in browser agent (needs model auth — + `browser-use-terminal auth` or provider API keys): + `browser-use-terminal start "find the 3 cheapest direct LAX->SFO flights Friday"`. +- Parallel workstreams: add `--session ` to `browser` commands. +- Done for the day: `browser-use-terminal browser recover stop-owned-browser` + (managed) or `... stop-owned-remote` (cloud). + +## Troubleshooting + +- `command not found` → re-run `export PATH="$HOME/.local/bin:$PATH"`. +- `browser is not connected` → `browser-use-terminal browser connect` (uses the + remembered preference) or re-run step 4. +- Anything returning `status: "needs-user-action"` → show its `user_prompt` to + the user exactly, wait, then retry. +- Diagnostics: `browser-use-terminal browser doctor` and + `browser-use-terminal browser status --json`. +- Slow first command in each shell: the launcher checks for updates; set + `BUT_AUTO_UPDATE=0` to skip. + +Full documentation: https://docs.browser-use.com/open-source/browser-use-terminal +Source & issues: https://github.com/browser-use/terminal diff --git a/SKILL.md b/SKILL.md index ac7e42bb..4eaf65d0 100644 --- a/SKILL.md +++ b/SKILL.md @@ -5,6 +5,8 @@ description: Direct browser control via the Browser Use Terminal CLI. Use when t # Browser Use Terminal +For setup, install, or connection problems, read https://browser-use.com/skill (agent setup instructions) or https://docs.browser-use.com/open-source/browser-use-terminal (full docs). + One CLI (`browser-use-terminal`, on $PATH), two surfaces: - **Browser management** — drive the browser yourself. `browser-use-terminal browser exec` runs Python with browser helpers pre-imported; `browser-use-terminal browser ` is the control plane (status, connect, profiles, recovery). diff --git a/docs/assistant-plugins.md b/docs/assistant-plugins.md index 5d841e6f..ccb477f3 100644 --- a/docs/assistant-plugins.md +++ b/docs/assistant-plugins.md @@ -2,6 +2,8 @@ Browser Use Terminal plugs into any coding assistant or agent that can run shell commands — Claude Code, Codex, OpenCode, OpenClaw, Cursor CLI, and friends. The model is similar to [browser-use/browser-harness](https://github.com/browser-use/browser-harness): a skill file teaches the assistant the CLI, and the CLI gives it the full browser runtime (connect/recovery control plane, Python page helpers, screenshots-as-files). +Fastest setup: paste `https://browser-use.com/skill` into your assistant, and it provides instructions on how to install, register the skill, connect a browser, and verify. Full docs: . + Two surfaces, one binary: - **Browser management** — the assistant drives the browser itself: From 56efca4af8a7ca1a8131f37f20b5cf80b801bfad Mon Sep 17 00:00:00 2001 From: Laith Weinberger Date: Wed, 10 Jun 2026 13:27:38 -0700 Subject: [PATCH 3/6] feat(cli): implement background daemon for browser commands, and improve documentation --- AGENT_SETUP.md | 3 + SKILL.md | 3 +- .../src/tools/handlers/browser.rs | 13 + crates/browser-use-cli/src/main.rs | 680 ++++++++++++++++-- docs/assistant-plugins.md | 10 +- 5 files changed, 642 insertions(+), 67 deletions(-) diff --git a/AGENT_SETUP.md b/AGENT_SETUP.md index 6eb898f1..73667c61 100644 --- a/AGENT_SETUP.md +++ b/AGENT_SETUP.md @@ -144,6 +144,9 @@ The essentials: the user exactly, wait, then retry. - Diagnostics: `browser-use-terminal browser doctor` and `browser-use-terminal browser status --json`. +- A background daemon holds the browser connection between your commands + (auto-started). If it misbehaves: `browser-use-terminal browser daemon status`, + `... daemon logs`, `... daemon stop` (next command restarts it and reattaches). - Slow first command in each shell: the launcher checks for updates; set `BUT_AUTO_UPDATE=0` to skip. diff --git a/SKILL.md b/SKILL.md index 4eaf65d0..ff9daeb1 100644 --- a/SKILL.md +++ b/SKILL.md @@ -81,9 +81,10 @@ browser-use-terminal browser doctor browser-use-terminal browser recover reconnect-websocket browser-use-terminal browser recover stop-owned-browser # stop the persistent managed browser browser-use-terminal browser recover stop-owned-remote # stop the cloud browser (stops billing) +browser-use-terminal browser daemon status|stop|logs # the background daemon holding the connection ``` -Managed and cloud browsers persist across invocations — later calls reattach instead of relaunching. Stop them with the recover commands above when the user is done (cloud browsers bill until stopped or timed out). +A background daemon (auto-started, one per state dir) holds the CDP connection across your commands, so the browser — and in local mode, Chrome's granted debugging permission — persists between invocations. Managed and cloud browsers also survive daemon restarts; later calls reattach instead of relaunching. Stop browsers with the recover commands above when the user is done (cloud browsers bill until stopped or timed out). - `exec` auto-connects, so you rarely need these. Reach for them when `status` shows a problem or the user asks for a specific browser. - If output JSON says `status: "needs-user-action"` (e.g. pick a Chrome profile, click Allow in Chrome's permission popup, enable the remote-debugging checkbox), show the `user_prompt` to the user verbatim and wait — do not guess. diff --git a/crates/browser-use-agent/src/tools/handlers/browser.rs b/crates/browser-use-agent/src/tools/handlers/browser.rs index c5f7b0ce..8e50c4fb 100644 --- a/crates/browser-use-agent/src/tools/handlers/browser.rs +++ b/crates/browser-use-agent/src/tools/handlers/browser.rs @@ -581,6 +581,19 @@ pub(crate) fn browser_command_is_passive(words: &[&str]) -> bool { words, ["browser", "status", ..] | ["status", ..] + // Read-only / informational commands must never trigger an + // auto-connect: `help`/`doctor`/`domain` are the first things an + // external assistant runs to orient itself. + | ["browser", "help", ..] + | ["help", ..] + | ["browser", "--help", ..] + | ["--help", ..] + | ["browser", "-h", ..] + | ["-h", ..] + | ["browser", "doctor", ..] + | ["doctor", ..] + | ["browser", "domain", ..] + | ["domain", ..] | ["browser", "connect", ..] | ["connect", ..] | ["browser", "local", "list", ..] diff --git a/crates/browser-use-cli/src/main.rs b/crates/browser-use-cli/src/main.rs index 62617e9a..d1f5fe80 100644 --- a/crates/browser-use-cli/src/main.rs +++ b/crates/browser-use-cli/src/main.rs @@ -352,6 +352,11 @@ enum Command { #[arg(trailing_var_arg = true, allow_hyphen_values = true)] args: Vec, }, + /// Internal: long-lived daemon backing the `browser` subcommand. Holds the + /// CDP connection so Chrome's per-connection permission prompt fires once, + /// not on every one-shot CLI invocation. Spawned automatically. + #[command(name = "browser-daemon", hide = true)] + BrowserDaemon, /// Print or install the assistant-facing skill (SKILL.md) that teaches /// coding assistants how to use the `browser` subcommand. Skill { @@ -1034,6 +1039,7 @@ fn main() -> Result<()> { json, args: browser_args, } => browser_cli(store, &session, timeout, json, browser_args), + Command::BrowserDaemon => browser_external_daemon(store), Command::Skill { command } => skill(command), Command::SyncCookies { profile, @@ -1325,6 +1331,7 @@ fn command_name(command: &Command) -> &'static str { Command::Python { .. } => "python", Command::BrowserScript { .. } => "browser_script", Command::Browser { .. } => "browser", + Command::BrowserDaemon => "browser_daemon", Command::Skill { .. } => "skill", Command::SyncCookies { .. } => "sync_cookies", Command::UserShell { .. } => "user_shell", @@ -3315,9 +3322,18 @@ const EXTERNAL_BROWSER_SESSION_PREFIX: &str = "browser-cli-"; /// reject images above ~2000 px per side (Codex `view_image` caps at 2048). const EXTERNAL_SCREENSHOT_MAX_DIM: &str = "1800"; +/// One parsed `browser` CLI invocation: either Python to exec or a +/// control-plane command string. +enum ExternalBrowserAction { + Exec { code: String }, + Command { command: String }, +} + /// `browser-use-terminal browser ...`: browser management for external coding -/// assistants. One-shot invocations run against a durable named session; the -/// browser itself (user Chrome / managed / cloud) persists between calls. +/// assistants. One-shot invocations run against a durable named session and +/// route through a long-lived per-state-dir daemon that holds the CDP +/// connection (so Chrome's per-connection permission prompt fires once, like +/// the TUI), falling back to in-process execution if the daemon can't start. fn browser_cli( store: Store, session: &str, @@ -3328,6 +3344,7 @@ fn browser_cli( let session_name = sanitize_external_session_name(session)?; let session_id = format!("{EXTERNAL_BROWSER_SESSION_PREFIX}{session_name}"); let cwd = std::env::current_dir().context("resolve current dir")?; + let state_dir = store.state_dir().to_path_buf(); let task = match store.load_session(&session_id)? { Some(task) => task, None => { @@ -3341,19 +3358,6 @@ fn browser_cli( } }; let artifact_dir = PathBuf::from(&task.artifact_root); - // One-shot invocations must not tear down Rust-owned browsers on exit: - // managed Chromium / cloud browsers persist and later calls reattach - // (browser-harness daemon semantics, minus the daemon). - unsafe { - std::env::set_var("BROWSER_USE_TERMINAL_PERSIST_BROWSERS", "1"); - if std::env::var_os("BU_EXTERNAL_BROWSER_STATE_DIR").is_none() { - std::env::set_var( - "BU_EXTERNAL_BROWSER_STATE_DIR", - store.state_dir().join("external-browser"), - ); - } - } - let shared: SharedStore = Arc::new(Mutex::new(store)); let mut argv = args; if argv.first().map(String::as_str) == Some("browser") { @@ -3363,7 +3367,11 @@ fn browser_cli( argv.push("help".to_string()); } - if argv.first().map(String::as_str) == Some("exec") { + if argv.first().map(String::as_str) == Some("daemon") { + return browser_external_daemon_control(&state_dir, &argv); + } + + let action = if argv.first().map(String::as_str) == Some("exec") { let code = if argv.len() == 1 || (argv.len() == 2 && argv[1] == "-") { let mut buffer = String::new(); io::stdin() @@ -3376,64 +3384,601 @@ fn browser_cli( if code.trim().is_empty() { bail!("no Python code provided: pass it as an argument or on stdin (heredoc)"); } + ExternalBrowserAction::Exec { code } + } else { + ExternalBrowserAction::Command { + command: join_browser_command_words(&argv), + } + }; + + if external_browser_daemon_enabled() { + match ensure_external_browser_daemon(&state_dir) { + Ok(socket) => { + drop(store); + return browser_cli_via_daemon( + &socket, + &session_id, + &cwd, + &artifact_dir, + timeout, + json, + action, + ); + } + Err(error) => { + eprintln!( + "warning: browser daemon unavailable ({error:#}); running in-process. \ + Local Chrome may re-prompt its debugging permission." + ); + } + } + } + apply_external_browser_env_defaults(&state_dir); + let shared: SharedStore = Arc::new(Mutex::new(store)); + browser_cli_in_process( + &shared, + &session_id, + &cwd, + &artifact_dir, + timeout, + json, + action, + ) +} + +/// Persistence + screenshot env defaults shared by the daemon and the +/// in-process fallback: Rust-owned browsers must survive process exit so +/// later invocations (or a restarted daemon) reattach instead of relaunching. +fn apply_external_browser_env_defaults(state_dir: &Path) { + unsafe { + std::env::set_var("BROWSER_USE_TERMINAL_PERSIST_BROWSERS", "1"); + if std::env::var_os("BU_EXTERNAL_BROWSER_STATE_DIR").is_none() { + std::env::set_var( + "BU_EXTERNAL_BROWSER_STATE_DIR", + state_dir.join("external-browser"), + ); + } if std::env::var_os("BU_BROWSER_SCREENSHOT_MAX_DIM").is_none() && std::env::var_os("BROWSER_USE_SCREENSHOT_MAX_DIM").is_none() { - unsafe { - std::env::set_var("BU_BROWSER_SCREENSHOT_MAX_DIM", EXTERNAL_SCREENSHOT_MAX_DIM); + std::env::set_var("BU_BROWSER_SCREENSHOT_MAX_DIM", EXTERNAL_SCREENSHOT_MAX_DIM); + } + } +} + +fn external_browser_daemon_enabled() -> bool { + if !cfg!(unix) { + return false; + } + match std::env::var("BUT_BROWSER_EXTERNAL_DAEMON") { + Ok(value) => !matches!( + value.trim().to_ascii_lowercase().as_str(), + "0" | "false" | "no" | "off" + ), + Err(_) => true, + } +} + +#[allow(clippy::too_many_arguments)] +fn browser_cli_in_process( + shared: &SharedStore, + session_id: &str, + cwd: &Path, + artifact_dir: &Path, + timeout: u64, + json: bool, + action: ExternalBrowserAction, +) -> Result<()> { + match action { + ExternalBrowserAction::Exec { code } => { + let outcome = browser_use_agent::tools::handlers::browser::run_external_browser_script( + shared, + session_id, + cwd, + artifact_dir, + &code, + timeout, + )?; + match outcome { + browser_use_agent::tools::handlers::browser::ExternalBrowserScriptOutcome::Blocked( + content, + ) => print_external_blocked(&content), + browser_use_agent::tools::handlers::browser::ExternalBrowserScriptOutcome::Ran( + out, + ) => print_external_exec_result(&out, json), } } - let outcome = browser_use_agent::tools::handlers::browser::run_external_browser_script( - &shared, - &session_id, - &cwd, - &artifact_dir, - &code, - timeout, - )?; - match outcome { - browser_use_agent::tools::handlers::browser::ExternalBrowserScriptOutcome::Blocked( - content, - ) => { - println!("{}", serde_json::to_string_pretty(&content)?); - bail!("browser needs user action before scripts can run (see JSON above)"); + ExternalBrowserAction::Command { command } => { + let out = browser_use_agent::tools::handlers::browser::run_external_browser_command( + shared, + session_id, + cwd, + artifact_dir, + &command, + )?; + print_external_command_content(&out.content); + Ok(()) + } + } +} + +fn print_external_blocked(content: &Value) -> Result<()> { + println!("{}", serde_json::to_string_pretty(content)?); + bail!("browser needs user action before scripts can run (see JSON above)") +} + +fn print_external_exec_result( + out: &browser_use_browser::BrowserScriptOutput, + json: bool, +) -> Result<()> { + if json { + println!( + "{}", + serde_json::to_string_pretty(&serde_json::to_value(out)?)? + ); + } else { + print_external_script_output(out); + } + if !out.ok { + bail!( + "{}", + out.error + .clone() + .unwrap_or_else(|| "browser_script failed".to_string()) + ); + } + Ok(()) +} + +fn print_external_command_content(content: &Value) { + match content { + Value::String(text) => println!("{text}"), + other => println!( + "{}", + serde_json::to_string_pretty(other).unwrap_or_else(|_| other.to_string()) + ), + } +} + +// ============================================================================ +// External browser daemon +// ============================================================================ +// +// The `browser` subcommand is invoked once per bash call by external +// assistants, but a CDP attachment to the user's real Chrome must outlive any +// single invocation: Chrome shows its debugging-permission popup per new +// connection, and the in-process connection caches only help within one +// process. So the CLI is a thin client over a per-state-dir unix-socket daemon +// (browser-harness architecture) that hosts the browser session registries — +// and therefore the live CDP websocket — across invocations. + +#[derive(Debug, Serialize, serde::Deserialize)] +#[serde(tag = "kind", rename_all = "snake_case")] +enum ExternalBrowserDaemonRequest { + Ping, + Shutdown, + Command { + session_id: String, + cwd: PathBuf, + artifact_dir: PathBuf, + command: String, + }, + Exec { + session_id: String, + cwd: PathBuf, + artifact_dir: PathBuf, + code: String, + timeout_secs: u64, + }, +} + +#[derive(Debug, Default, Serialize, serde::Deserialize)] +struct ExternalBrowserDaemonResponse { + ok: bool, + #[serde(default, skip_serializing_if = "Option::is_none")] + version: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + error: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + content: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + script: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + blocked: Option, +} + +fn fnv1a_hash(bytes: &[u8]) -> u64 { + let mut hash: u64 = 0xcbf2_9ce4_8422_2325; + for byte in bytes { + hash ^= u64::from(*byte); + hash = hash.wrapping_mul(0x0000_0100_0000_01b3); + } + hash +} + +/// Socket lives in the temp dir (AF_UNIX paths are length-limited on macOS), +/// keyed by the canonical state dir so isolated `--state-dir` runs get their +/// own daemon. The log lives inside the state dir. +fn external_daemon_socket_path(state_dir: &Path) -> PathBuf { + let canonical = fs::canonicalize(state_dir).unwrap_or_else(|_| state_dir.to_path_buf()); + let hash = fnv1a_hash(canonical.to_string_lossy().as_bytes()); + std::env::temp_dir().join(format!("but-browser-{hash:016x}.sock")) +} + +fn external_daemon_log_path(state_dir: &Path) -> PathBuf { + state_dir.join("external-browser").join("daemon.log") +} + +#[cfg(unix)] +fn external_daemon_send( + socket: &Path, + request: &ExternalBrowserDaemonRequest, + read_timeout: Duration, +) -> Result { + use std::os::unix::net::UnixStream; + let stream = UnixStream::connect(socket) + .with_context(|| format!("connect browser daemon socket {}", socket.display()))?; + stream.set_write_timeout(Some(Duration::from_secs(10)))?; + stream.set_read_timeout(Some(read_timeout))?; + let mut payload = serde_json::to_vec(request)?; + payload.push(b'\n'); + (&stream).write_all(&payload)?; + let mut line = String::new(); + io::BufReader::new(&stream) + .read_line(&mut line) + .context("read browser daemon response")?; + serde_json::from_str(&line).context("parse browser daemon response") +} + +#[cfg(not(unix))] +fn external_daemon_send( + _socket: &Path, + _request: &ExternalBrowserDaemonRequest, + _read_timeout: Duration, +) -> Result { + bail!("the external browser daemon is only supported on unix") +} + +/// Ping the daemon; spawn it if missing; restart it on version mismatch so a +/// CLI update never talks to a stale daemon. Returns the socket path. +fn ensure_external_browser_daemon(state_dir: &Path) -> Result { + let socket = external_daemon_socket_path(state_dir); + let version = env!("CARGO_PKG_VERSION"); + if let Ok(response) = + external_daemon_send(&socket, &ExternalBrowserDaemonRequest::Ping, PING_TIMEOUT) + { + if response.ok && response.version.as_deref() == Some(version) { + return Ok(socket); + } + let _ = external_daemon_send( + &socket, + &ExternalBrowserDaemonRequest::Shutdown, + PING_TIMEOUT, + ); + let gone_deadline = Instant::now() + Duration::from_secs(5); + while Instant::now() < gone_deadline { + if external_daemon_send(&socket, &ExternalBrowserDaemonRequest::Ping, PING_TIMEOUT) + .is_err() + { + break; } - browser_use_agent::tools::handlers::browser::ExternalBrowserScriptOutcome::Ran(out) => { - if json { - println!( - "{}", - serde_json::to_string_pretty(&serde_json::to_value(&out)?)? - ); - } else { - print_external_script_output(&out); - } - if !out.ok { - bail!( - "{}", - out.error - .unwrap_or_else(|| "browser_script failed".to_string()) - ); - } - Ok(()) + thread::sleep(Duration::from_millis(100)); + } + } + let _ = fs::remove_file(&socket); + + let log_path = external_daemon_log_path(state_dir); + if let Some(parent) = log_path.parent() { + fs::create_dir_all(parent) + .with_context(|| format!("create daemon log dir {}", parent.display()))?; + } + let log = fs::OpenOptions::new() + .create(true) + .append(true) + .open(&log_path) + .with_context(|| format!("open daemon log {}", log_path.display()))?; + let exe = std::env::current_exe().context("resolve current executable")?; + let mut command = std::process::Command::new(exe); + command + .arg("--state-dir") + .arg(state_dir) + .arg("browser-daemon") + .stdin(std::process::Stdio::null()) + .stdout(log.try_clone().context("clone daemon log handle")?) + .stderr(log); + #[cfg(unix)] + { + // New process group: the daemon must survive the assistant's shell + // (and any group-wide interrupt) ending this CLI invocation. + std::os::unix::process::CommandExt::process_group(&mut command, 0); + } + command.spawn().context("spawn browser daemon")?; + + let ready_deadline = Instant::now() + Duration::from_secs(10); + while Instant::now() < ready_deadline { + if let Ok(response) = + external_daemon_send(&socket, &ExternalBrowserDaemonRequest::Ping, PING_TIMEOUT) + { + if response.ok { + return Ok(socket); } } - } else { - let command = join_browser_command_words(&argv); - let out = browser_use_agent::tools::handlers::browser::run_external_browser_command( - &shared, - &session_id, - &cwd, - &artifact_dir, - &command, - )?; - match &out.content { - Value::String(text) => println!("{text}"), - other => println!("{}", serde_json::to_string_pretty(other)?), + thread::sleep(Duration::from_millis(100)); + } + bail!( + "browser daemon did not become ready; see {}", + log_path.display() + ) +} + +const PING_TIMEOUT: Duration = Duration::from_secs(3); +/// Generous request timeout: `browser connect local` can legitimately wait on +/// the user clicking Allow in Chrome's permission popup. +const EXTERNAL_DAEMON_COMMAND_TIMEOUT: Duration = Duration::from_secs(900); + +#[allow(clippy::too_many_arguments)] +fn browser_cli_via_daemon( + socket: &Path, + session_id: &str, + cwd: &Path, + artifact_dir: &Path, + timeout: u64, + json: bool, + action: ExternalBrowserAction, +) -> Result<()> { + match action { + ExternalBrowserAction::Exec { code } => { + let response = external_daemon_send( + socket, + &ExternalBrowserDaemonRequest::Exec { + session_id: session_id.to_string(), + cwd: cwd.to_path_buf(), + artifact_dir: artifact_dir.to_path_buf(), + code, + timeout_secs: timeout, + }, + Duration::from_secs(timeout).saturating_add(EXTERNAL_DAEMON_COMMAND_TIMEOUT), + )?; + if let Some(blocked) = response.blocked { + return print_external_blocked(&blocked); + } + if !response.ok { + bail!( + "{}", + response + .error + .unwrap_or_else(|| "browser daemon request failed".to_string()) + ); + } + let script = response + .script + .ok_or_else(|| anyhow::anyhow!("browser daemon returned no script output"))?; + print_external_exec_result(&script, json) + } + ExternalBrowserAction::Command { command } => { + let response = external_daemon_send( + socket, + &ExternalBrowserDaemonRequest::Command { + session_id: session_id.to_string(), + cwd: cwd.to_path_buf(), + artifact_dir: artifact_dir.to_path_buf(), + command, + }, + EXTERNAL_DAEMON_COMMAND_TIMEOUT, + )?; + if !response.ok { + bail!( + "{}", + response + .error + .unwrap_or_else(|| "browser daemon request failed".to_string()) + ); + } + let content = response.content.unwrap_or(Value::Null); + print_external_command_content(&content); + Ok(()) } - Ok(()) } } +/// `browser daemon status|stop|logs` — operate the daemon itself. +fn browser_external_daemon_control(state_dir: &Path, argv: &[String]) -> Result<()> { + let socket = external_daemon_socket_path(state_dir); + match argv.get(1).map(String::as_str) { + Some("status") | None => { + match external_daemon_send(&socket, &ExternalBrowserDaemonRequest::Ping, PING_TIMEOUT) { + Ok(response) if response.ok => println!( + "running (version {}, socket {})", + response.version.as_deref().unwrap_or("unknown"), + socket.display() + ), + _ => println!("not running (socket {})", socket.display()), + } + Ok(()) + } + Some("stop") => { + match external_daemon_send( + &socket, + &ExternalBrowserDaemonRequest::Shutdown, + PING_TIMEOUT, + ) { + Ok(_) => println!("stopped"), + Err(_) => println!("not running"), + } + Ok(()) + } + Some("logs") => { + let log_path = external_daemon_log_path(state_dir); + let contents = fs::read_to_string(&log_path) + .with_context(|| format!("read {}", log_path.display()))?; + let lines: Vec<&str> = contents.lines().collect(); + let start = lines.len().saturating_sub(100); + for line in &lines[start..] { + println!("{line}"); + } + Ok(()) + } + Some(other) => bail!("unknown browser daemon command: {other} (use status|stop|logs)"), + } +} + +/// Daemon main loop: serve line-delimited JSON requests over the unix socket, +/// executing them against in-process browser registries so the CDP connection +/// (and Chrome's granted debugging permission) persists across CLI calls. +#[cfg(unix)] +fn browser_external_daemon(store: Store) -> Result<()> { + use std::os::unix::fs::PermissionsExt; + use std::os::unix::net::UnixListener; + + let state_dir = store.state_dir().to_path_buf(); + apply_external_browser_env_defaults(&state_dir); + let socket = external_daemon_socket_path(&state_dir); + let _ = fs::remove_file(&socket); + let listener = UnixListener::bind(&socket) + .with_context(|| format!("bind browser daemon socket {}", socket.display()))?; + fs::set_permissions(&socket, fs::Permissions::from_mode(0o600)) + .context("restrict browser daemon socket permissions")?; + eprintln!( + "[browser-daemon] started: version={} pid={} state_dir={} socket={}", + env!("CARGO_PKG_VERSION"), + std::process::id(), + state_dir.display(), + socket.display() + ); + let shared: SharedStore = Arc::new(Mutex::new(store)); + for stream in listener.incoming() { + let stream = match stream { + Ok(stream) => stream, + Err(error) => { + eprintln!("[browser-daemon] accept failed: {error:#}"); + continue; + } + }; + match handle_external_daemon_stream(stream, &shared) { + Ok(true) => {} + Ok(false) => break, + Err(error) => eprintln!("[browser-daemon] request failed: {error:#}"), + } + } + let _ = fs::remove_file(&socket); + eprintln!("[browser-daemon] stopped"); + Ok(()) +} + +#[cfg(not(unix))] +fn browser_external_daemon(_store: Store) -> Result<()> { + bail!("the external browser daemon is only supported on unix") +} + +#[cfg(unix)] +fn handle_external_daemon_stream( + stream: std::os::unix::net::UnixStream, + shared: &SharedStore, +) -> Result { + stream.set_read_timeout(Some(Duration::from_secs(30)))?; + let mut line = String::new(); + io::BufReader::new(&stream).read_line(&mut line)?; + let request: ExternalBrowserDaemonRequest = + serde_json::from_str(&line).context("parse browser daemon request")?; + let version = Some(env!("CARGO_PKG_VERSION").to_string()); + let (response, keep_running) = match request { + ExternalBrowserDaemonRequest::Ping => ( + ExternalBrowserDaemonResponse { + ok: true, + version, + ..Default::default() + }, + true, + ), + ExternalBrowserDaemonRequest::Shutdown => ( + ExternalBrowserDaemonResponse { + ok: true, + version, + ..Default::default() + }, + false, + ), + ExternalBrowserDaemonRequest::Command { + session_id, + cwd, + artifact_dir, + command, + } => { + let response = + match browser_use_agent::tools::handlers::browser::run_external_browser_command( + shared, + &session_id, + &cwd, + &artifact_dir, + &command, + ) { + Ok(out) => ExternalBrowserDaemonResponse { + ok: true, + version, + content: Some(out.content), + ..Default::default() + }, + Err(error) => ExternalBrowserDaemonResponse { + ok: false, + version, + error: Some(format!("{error:#}")), + ..Default::default() + }, + }; + (response, true) + } + ExternalBrowserDaemonRequest::Exec { + session_id, + cwd, + artifact_dir, + code, + timeout_secs, + } => { + let response = match browser_use_agent::tools::handlers::browser::run_external_browser_script( + shared, + &session_id, + &cwd, + &artifact_dir, + &code, + timeout_secs, + ) { + Ok( + browser_use_agent::tools::handlers::browser::ExternalBrowserScriptOutcome::Ran( + out, + ), + ) => ExternalBrowserDaemonResponse { + ok: true, + version, + script: Some(out), + ..Default::default() + }, + Ok( + browser_use_agent::tools::handlers::browser::ExternalBrowserScriptOutcome::Blocked( + content, + ), + ) => ExternalBrowserDaemonResponse { + ok: true, + version, + blocked: Some(content), + ..Default::default() + }, + Err(error) => ExternalBrowserDaemonResponse { + ok: false, + version, + error: Some(format!("{error:#}")), + ..Default::default() + }, + }; + (response, true) + } + }; + stream.set_write_timeout(Some(Duration::from_secs(60)))?; + let mut payload = serde_json::to_vec(&response)?; + payload.push(b'\n'); + (&stream).write_all(&payload)?; + Ok(keep_running) +} + fn print_external_script_output(out: &browser_use_browser::BrowserScriptOutput) { let text = out.text.trim_end(); if !text.is_empty() { @@ -9749,6 +10294,17 @@ fn notify_parent_agent_done(store: &Store, task: &browser_use_protocol::SessionM mod tests { use super::*; + #[test] + fn external_daemon_socket_path_is_stable_and_short() { + let a = external_daemon_socket_path(Path::new("/tmp/state-a")); + let b = external_daemon_socket_path(Path::new("/tmp/state-a")); + let c = external_daemon_socket_path(Path::new("/tmp/state-b")); + assert_eq!(a, b, "same state dir must map to the same socket"); + assert_ne!(a, c, "different state dirs must map to different sockets"); + // AF_UNIX socket paths are limited to ~104 bytes on macOS. + assert!(a.as_os_str().len() < 100, "socket path too long: {a:?}"); + } + #[test] fn cli_config_overrides_parse_toml_and_raw_strings_like_codex() -> Result<()> { let parsed = parse_cli_config_overrides(&[ diff --git a/docs/assistant-plugins.md b/docs/assistant-plugins.md index ccb477f3..8a41a8e5 100644 --- a/docs/assistant-plugins.md +++ b/docs/assistant-plugins.md @@ -56,13 +56,15 @@ Two surfaces, one binary: ## How it stays stateful across bash calls -Each CLI invocation is a one-shot process, but the browser is not: +Each CLI invocation is a thin client over a long-lived daemon. The first `browser` command auto-spawns one daemon per state dir. It hosts the browser session registries — and therefore the live CDP websocket — across invocations. This is what keeps Chrome's per-connection debugging-permission popup to a single Allow in local mode, exactly like the TUI. The CLI and daemon speak line-delimited JSON over a unix socket (`$TMPDIR/but-browser-.sock`, mode 0600); a version handshake restarts stale daemons after CLI updates, and if the daemon can't start the CLI falls back to in-process execution with a warning. Operate it with `browser-use-terminal browser daemon status|stop|logs`. -- **local** — your Chrome keeps running; each call rediscovers it via `DevToolsActivePort`. -- **managed** — the CLI launches Chromium with a stable per-session profile and a marker file (`/external-browser/managed//`), leaves it running on exit, and reattaches on the next call. Stop it with `browser-use-terminal browser recover stop-owned-browser`. +The browsers themselves also survive daemon restarts: + +- **local** — your Chrome keeps running; the daemon holds one authorized CDP connection to it. +- **managed** — Chromium launches with a stable per-session profile and a marker file (`/external-browser/managed//`), outlives the daemon, and is reattached. Stop it with `browser-use-terminal browser recover stop-owned-browser`. - **cloud** — the created browser's id/CDP URL is recorded (`/external-browser/cloud/.json`) and reattached until it stops or times out. Stop it (and billing) with `browser-use-terminal browser recover stop-owned-remote`. -Page/tab state therefore persists between `browser exec` calls; Python variables do not (each exec is a fresh interpreter). `--session ` gives parallel workstreams isolated artifact dirs, event logs, and managed browsers. +Page/tab state therefore persists between `browser exec` calls; Python variables do not (each exec is a fresh interpreter). `--session ` gives parallel workstreams isolated artifact dirs, event logs, and managed browsers (requests share one daemon and run serially). Everything an assistant does is recorded in the same SQLite event log the TUI uses — inspect with `browser-use-terminal events browser-cli-` or `browser-use-terminal sessions list`. From 439d23d5caed28c85afcf563681937653942b833 Mon Sep 17 00:00:00 2001 From: Laith Weinberger Date: Wed, 10 Jun 2026 18:21:32 -0700 Subject: [PATCH 4/6] drop special terminal agent when being used by another agent --- AGENT_SETUP.md | 7 +- SKILL.md | 19 +-- crates/browser-use-cli/src/main.rs | 204 +++++++++++++++++++++++++++-- docs/assistant-plugins.md | 25 ++-- 4 files changed, 207 insertions(+), 48 deletions(-) diff --git a/AGENT_SETUP.md b/AGENT_SETUP.md index 73667c61..90169d7a 100644 --- a/AGENT_SETUP.md +++ b/AGENT_SETUP.md @@ -8,8 +8,8 @@ Every step is idempotent; re-run safely. Tell the user what you're doing as you go, and stop to ask only where this page says to. What you get: a `browser-use-terminal` CLI you drive from shell commands — -run Python against a real browser (navigate, click, type, screenshot, extract), -or delegate whole browsing tasks to its built-in agent. +run Python against a real browser (navigate, click, type, screenshot, extract). +You are the agent; you drive the browser. ## 1. Preflight @@ -128,9 +128,6 @@ The essentials: - In the user's real Chrome, open work in `new_tab(...)`, never `goto_url` over their active tab. - Auth walls: stop and ask the user. Never type credentials read off a screenshot. -- Delegate a whole task to the built-in browser agent (needs model auth — - `browser-use-terminal auth` or provider API keys): - `browser-use-terminal start "find the 3 cheapest direct LAX->SFO flights Friday"`. - Parallel workstreams: add `--session ` to `browser` commands. - Done for the day: `browser-use-terminal browser recover stop-owned-browser` (managed) or `... stop-owned-remote` (cloud). diff --git a/SKILL.md b/SKILL.md index ff9daeb1..1405657b 100644 --- a/SKILL.md +++ b/SKILL.md @@ -1,16 +1,13 @@ --- name: browser-use-terminal -description: Direct browser control via the Browser Use Terminal CLI. Use when the user wants to automate, scrape, test, or interact with web pages — drive the browser yourself with Python helpers, or delegate a whole browsing task to the built-in browser agent. +description: Direct browser control via the Browser Use Terminal CLI. Use when the user wants to automate, scrape, test, or interact with web pages — you drive the browser yourself with Python helpers. --- # Browser Use Terminal -For setup, install, or connection problems, read https://browser-use.com/skill (agent setup instructions) or https://docs.browser-use.com/open-source/browser-use-terminal (full docs). +Direct browser control via CDP — you are the agent; you drive the browser. For setup, install, or connection problems, read https://browser-use.com/skill (agent setup instructions) or https://docs.browser-use.com/open-source/browser-use-terminal (full docs). -One CLI (`browser-use-terminal`, on $PATH), two surfaces: - -- **Browser management** — drive the browser yourself. `browser-use-terminal browser exec` runs Python with browser helpers pre-imported; `browser-use-terminal browser ` is the control plane (status, connect, profiles, recovery). -- **Core agent** — delegate an entire browsing task: `browser-use-terminal start ""`. +`browser-use-terminal browser exec` runs Python with browser helpers pre-imported; `browser-use-terminal browser ` is the control plane (status, connect, profiles, recovery). ## Usage @@ -91,16 +88,6 @@ A background daemon (auto-started, one per state dir) holds the CDP connection a - Auth wall mid-task: stop and ask the user. Don't type credentials from screenshots; use stored secrets if available. - Connecting to the user's real Chrome requires a one-time setup: `chrome://inspect/#remote-debugging` → tick "Allow remote debugging". `browser local setup` walks the user through it. -## Delegating to the built-in agent - -For a self-contained browsing task (research, multi-step form filling, comparison shopping), the built-in agent is often faster than driving the browser yourself: - -```bash -browser-use-terminal start "Find the three cheapest direct LAX→SFO flights next Friday and list airline, time, price" -``` - -It runs with the user's configured model, prints progress events, and ends with a result. Inspect afterwards with `browser-use-terminal history`, `show `, `events `. - ## What actually works - Screenshots first: `capture_screenshot()` → view the image → decide whether you need a click, a selector, or more navigation. diff --git a/crates/browser-use-cli/src/main.rs b/crates/browser-use-cli/src/main.rs index d1f5fe80..09a8fd1a 100644 --- a/crates/browser-use-cli/src/main.rs +++ b/crates/browser-use-cli/src/main.rs @@ -218,6 +218,14 @@ enum Command { Start { text: String, }, + /// Run a browser task to completion with the configured default + /// provider/model (resolved like the TUI). Streams progress and exits when + /// the task finishes — the delegation entry point for external assistants. + Run { + text: String, + #[arg(long)] + model: Option, + }, RunFake { text: String, #[arg(long)] @@ -914,6 +922,15 @@ fn main() -> Result<()> { }; match args.command { Command::Start { text } => start(&store, text), + Command::Run { text, model } => run_default( + &store, + text, + model, + config_profile.as_deref(), + &config_overrides, + collaboration_mode, + &runtime_options, + ), Command::RunFake { text, python_code } => run_fake(&store, text, python_code), Command::RunOpenai { text, model } => run_openai( &store, @@ -1307,6 +1324,7 @@ fn main() -> Result<()> { fn command_name(command: &Command) -> &'static str { match command { Command::Start { .. } => "start", + Command::Run { .. } => "run", Command::RunFake { .. } => "run_fake", Command::RunOpenai { .. } => "run_openai", Command::RunBrowserUse { .. } => "run_browser_use", @@ -2353,6 +2371,149 @@ fn dataset_browser_mode(options: &DatasetRunOptions) -> String { .replace(['_', ' '], "-") } +/// Map a configured `model_provider` id to its CLI backend. +fn provider_backend_for_provider_id(provider_id: &str) -> Option { + match provider_id.trim().to_ascii_lowercase().as_str() { + "openai" => Some(ProviderBackend::Openai), + "anthropic" | "claude" => Some(ProviderBackend::Anthropic), + "openrouter" => Some(ProviderBackend::Openrouter), + "deepseek" => Some(ProviderBackend::Deepseek), + "browser-use" | "browser_use" | "browseruse" => Some(ProviderBackend::BrowserUse), + "codex" | "chatgpt" => Some(ProviderBackend::Codex), + _ => None, + } +} + +fn provider_backend_from_env_keys() -> Option { + let has = |key: &str| std::env::var(key).is_ok_and(|value| !value.trim().is_empty()); + if has("ANTHROPIC_API_KEY") { + Some(ProviderBackend::Anthropic) + } else if has("OPENAI_API_KEY") { + Some(ProviderBackend::Openai) + } else if has("BROWSER_USE_API_KEY") { + Some(ProviderBackend::BrowserUse) + } else if has("OPENROUTER_API_KEY") { + Some(ProviderBackend::Openrouter) + } else if has("DEEPSEEK_API_KEY") { + Some(ProviderBackend::Deepseek) + } else { + None + } +} + +/// `browser-use-terminal run ""`: provider-agnostic task run. +/// +/// Resolves the provider from the user's config (what `/model` in the TUI +/// persists), falling back to whichever provider API key is set in the +/// environment, then dispatches to the matching `run-*` path. Unlike `start` +/// (which only queues a session), this drives the task to completion. +fn run_default( + store: &Store, + text: String, + model: Option, + config_profile: Option<&str>, + raw_config_overrides: &[String], + collaboration_mode: CollaborationModeKind, + runtime_options: &CliRuntimeOptions, +) -> Result<()> { + let overrides = parse_cli_config_overrides(raw_config_overrides)?; + let cwd = std::env::current_dir()?; + let configured = + configured_model_provider_id_for_cwd_with_options(&cwd, config_profile, &overrides)?; + let backend = match configured.as_deref().map(str::trim) { + Some(provider_id) if !provider_id.is_empty() => { + provider_backend_for_provider_id(provider_id).ok_or_else(|| { + anyhow::anyhow!( + "configured model provider {provider_id:?} is not runnable from the CLI; \ + use an explicit command like `browser-use-terminal run-anthropic \"\"`" + ) + })? + } + _ => provider_backend_from_env_keys().ok_or_else(|| { + anyhow::anyhow!( + "no model provider is configured. Configure one in the TUI (`browser`, then \ + /auth and /model), set a provider API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, \ + OPENROUTER_API_KEY, DEEPSEEK_API_KEY, or BROWSER_USE_API_KEY), or use an \ + explicit command such as `browser-use-terminal run-anthropic \"\"`." + ) + })?, + }; + match backend { + ProviderBackend::Openai => run_openai( + store, + text, + model, + config_profile, + raw_config_overrides, + collaboration_mode, + runtime_options, + ), + ProviderBackend::Anthropic + | ProviderBackend::Openrouter + | ProviderBackend::Deepseek + | ProviderBackend::BrowserUse + | ProviderBackend::Codex => { + let (model, _source) = resolve_cli_model_with_source( + backend, + model, + config_profile, + raw_config_overrides, + )?; + match backend { + ProviderBackend::Anthropic => run_anthropic( + store, + text, + model, + config_profile, + raw_config_overrides, + collaboration_mode, + runtime_options, + ), + ProviderBackend::Openrouter => run_openrouter( + store, + text, + model, + config_profile, + raw_config_overrides, + collaboration_mode, + runtime_options, + ), + ProviderBackend::Deepseek => run_deepseek( + store, + text, + model, + config_profile, + raw_config_overrides, + collaboration_mode, + runtime_options, + ), + ProviderBackend::BrowserUse => run_browser_use( + store, + text, + model, + config_profile, + raw_config_overrides, + collaboration_mode, + runtime_options, + ), + ProviderBackend::Codex => run_codex( + store, + text, + model, + config_profile, + raw_config_overrides, + collaboration_mode, + runtime_options, + ), + _ => unreachable!(), + } + } + ProviderBackend::Fake | ProviderBackend::None => { + bail!("no runnable model provider configured") + } + } +} + fn run_openai( store: &Store, text: String, @@ -3171,7 +3332,21 @@ fn show(store: &Store, task_id: &str) -> Result<()> { let title = task_from_events(&events).unwrap_or_else(|| "untitled task".to_string()); let browser = browser_summary_from_events(&events, "local chrome"); println!("Task: {title}"); - println!("Status: {}", task.status.as_str()); + // A session created by `start` (or an SDK queue) has no agent attached yet; + // reporting it as plain "running" sends pollers into infinite loops. + let agent_started = events.iter().any(|event| { + !matches!( + event.event_type.as_str(), + "session.created" | "session.input" + ) + }); + if task.status.is_active() && !agent_started { + println!( + "Status: queued (no agent attached — drive it with `browser-use-terminal run \"\"` next time, or `run-anthropic-session {task_id}` to run this one)" + ); + } else { + println!("Status: {}", task.status.as_str()); + } if let Some(url) = browser.url { println!("Browser: {url}"); } @@ -3345,6 +3520,21 @@ fn browser_cli( let session_id = format!("{EXTERNAL_BROWSER_SESSION_PREFIX}{session_name}"); let cwd = std::env::current_dir().context("resolve current dir")?; let state_dir = store.state_dir().to_path_buf(); + + let mut argv = args; + if argv.first().map(String::as_str) == Some("browser") { + argv.remove(0); + } + if argv.is_empty() { + argv.push("help".to_string()); + } + + // Daemon control operates on the daemon itself and must not side-effect a + // browser session row / artifact dir; intercept before the session ensure. + if argv.first().map(String::as_str) == Some("daemon") { + return browser_external_daemon_control(&state_dir, &argv); + } + let task = match store.load_session(&session_id)? { Some(task) => task, None => { @@ -3359,18 +3549,6 @@ fn browser_cli( }; let artifact_dir = PathBuf::from(&task.artifact_root); - let mut argv = args; - if argv.first().map(String::as_str) == Some("browser") { - argv.remove(0); - } - if argv.is_empty() { - argv.push("help".to_string()); - } - - if argv.first().map(String::as_str) == Some("daemon") { - return browser_external_daemon_control(&state_dir, &argv); - } - let action = if argv.first().map(String::as_str) == Some("exec") { let code = if argv.len() == 1 || (argv.len() == 2 && argv[1] == "-") { let mut buffer = String::new(); diff --git a/docs/assistant-plugins.md b/docs/assistant-plugins.md index 8a41a8e5..6413938b 100644 --- a/docs/assistant-plugins.md +++ b/docs/assistant-plugins.md @@ -4,20 +4,17 @@ Browser Use Terminal plugs into any coding assistant or agent that can run shell Fastest setup: paste `https://browser-use.com/skill` into your assistant, and it provides instructions on how to install, register the skill, connect a browser, and verify. Full docs: . -Two surfaces, one binary: - -- **Browser management** — the assistant drives the browser itself: - ```bash - browser-use-terminal browser exec <<'PY' - new_tab("https://example.com") - wait_for_load() - print(capture_screenshot()) - PY - ``` -- **Core agent** — the assistant delegates a whole task to the built-in browser agent: - ```bash - browser-use-terminal start "Compare M4 MacBook Air prices across three retailers" - ``` +The assistant is the agent. The skill teaches it to drive the browser directly: + +```bash +browser-use-terminal browser exec <<'PY' +new_tab("https://example.com") +wait_for_load() +print(capture_screenshot()) +PY +``` + +(The built-in agent — the TUI and `browser-use-terminal run` — remains a human-facing surface and is not part of the skill.) ## Install From 714efe83b644aed8da02cc7d3c6f06df1318d0a8 Mon Sep 17 00:00:00 2001 From: Laith Weinberger Date: Wed, 10 Jun 2026 19:25:02 -0700 Subject: [PATCH 5/6] run: don't reject custom model_provider ids --- crates/browser-use-cli/src/main.rs | 20 +------------------- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/crates/browser-use-cli/src/main.rs b/crates/browser-use-cli/src/main.rs index 09a8fd1a..8f1e239c 100644 --- a/crates/browser-use-cli/src/main.rs +++ b/crates/browser-use-cli/src/main.rs @@ -2371,19 +2371,6 @@ fn dataset_browser_mode(options: &DatasetRunOptions) -> String { .replace(['_', ' '], "-") } -/// Map a configured `model_provider` id to its CLI backend. -fn provider_backend_for_provider_id(provider_id: &str) -> Option { - match provider_id.trim().to_ascii_lowercase().as_str() { - "openai" => Some(ProviderBackend::Openai), - "anthropic" | "claude" => Some(ProviderBackend::Anthropic), - "openrouter" => Some(ProviderBackend::Openrouter), - "deepseek" => Some(ProviderBackend::Deepseek), - "browser-use" | "browser_use" | "browseruse" => Some(ProviderBackend::BrowserUse), - "codex" | "chatgpt" => Some(ProviderBackend::Codex), - _ => None, - } -} - fn provider_backend_from_env_keys() -> Option { let has = |key: &str| std::env::var(key).is_ok_and(|value| !value.trim().is_empty()); if has("ANTHROPIC_API_KEY") { @@ -2422,12 +2409,7 @@ fn run_default( configured_model_provider_id_for_cwd_with_options(&cwd, config_profile, &overrides)?; let backend = match configured.as_deref().map(str::trim) { Some(provider_id) if !provider_id.is_empty() => { - provider_backend_for_provider_id(provider_id).ok_or_else(|| { - anyhow::anyhow!( - "configured model provider {provider_id:?} is not runnable from the CLI; \ - use an explicit command like `browser-use-terminal run-anthropic \"\"`" - ) - })? + ProviderBackend::from_provider_id(provider_id).unwrap_or(ProviderBackend::Openai) } _ => provider_backend_from_env_keys().ok_or_else(|| { anyhow::anyhow!( From 2afd0e0c675d976df58390be5877a24de8f22aaa Mon Sep 17 00:00:00 2001 From: Laith Weinberger Date: Thu, 11 Jun 2026 09:09:09 -0700 Subject: [PATCH 6/6] cover the new google provider --- crates/browser-use-cli/src/main.rs | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/crates/browser-use-cli/src/main.rs b/crates/browser-use-cli/src/main.rs index 944c908a..63add3ea 100644 --- a/crates/browser-use-cli/src/main.rs +++ b/crates/browser-use-cli/src/main.rs @@ -2416,6 +2416,8 @@ fn provider_backend_from_env_keys() -> Option { Some(ProviderBackend::Openrouter) } else if has("DEEPSEEK_API_KEY") { Some(ProviderBackend::Deepseek) + } else if has("GEMINI_API_KEY") || has("GOOGLE_API_KEY") { + Some(ProviderBackend::Google) } else { None } @@ -2448,7 +2450,7 @@ fn run_default( anyhow::anyhow!( "no model provider is configured. Configure one in the TUI (`browser`, then \ /auth and /model), set a provider API key (ANTHROPIC_API_KEY, OPENAI_API_KEY, \ - OPENROUTER_API_KEY, DEEPSEEK_API_KEY, or BROWSER_USE_API_KEY), or use an \ + OPENROUTER_API_KEY, DEEPSEEK_API_KEY, GEMINI_API_KEY, or BROWSER_USE_API_KEY), or use an \ explicit command such as `browser-use-terminal run-anthropic \"\"`." ) })?, @@ -2466,6 +2468,7 @@ fn run_default( ProviderBackend::Anthropic | ProviderBackend::Openrouter | ProviderBackend::Deepseek + | ProviderBackend::Google | ProviderBackend::BrowserUse | ProviderBackend::Codex => { let (model, _source) = resolve_cli_model_with_source( @@ -2502,6 +2505,15 @@ fn run_default( collaboration_mode, runtime_options, ), + ProviderBackend::Google => run_google( + store, + text, + model, + config_profile, + raw_config_overrides, + collaboration_mode, + runtime_options, + ), ProviderBackend::BrowserUse => run_browser_use( store, text,