Skip to content

xhwSkhizein/browser-cli

Repository files navigation

browser-cli

CLI-first browser automation for AI agents

FeaturesInstallationQuick StartTask And Automation ModelTesting


browser-cli is a browser automation tool for AI agents and developers who need reliable browser control from the command line.

Architecture

┌──────────────────────────────────────────────────────────────────────┐
│  Task/Automation Layer  (task.py + task.meta.json + automation.toml) │
├──────────────────────────────────────────────────────────────────────┤
│  Browser Daemon  ──►  60+ commands  ──►  Semantic Ref System         │
│  ├─ read: one-shot page capture                                      │
│  ├─ open/snapshot/click/fill: interactive control                    │
│  ├─ console/network/trace: observation & debugging                   │
│  ├─ verify-*: assertions                                             │
│  └─ ... 60+ commands total                                           │
├──────────────────────────────────────────────────────────────────────┤
│  Dual Backend: Playwright (default) ◄──► Chrome Extension (opt)      │
└──────────────────────────────────────────────────────────────────────┘
Component Purpose
Browser Daemon Long-lived browser instance with daemon-backed CLI commands
Semantic Refs Stable element identifiers using bridgic-style reconstruction
Task Runtime Reusable task.py execution through browser_cli.task_runtime
Automation Service Persistent local service for published automation snapshots

Features

  • Dual Backend Architecture: managed profile mode by default, extension mode when real Chrome is available
  • Semantic Ref System: stable refs that survive many DOM re-renders
  • Agent Isolation: X_AGENT_ID isolates visible tabs while sharing browser storage
  • JSON-First API: daemon-backed commands return structured JSON
  • Task Runtime: package browser logic as task.py + task.meta.json
  • Automation Publish Layer: publish immutable task snapshots and operate them through a local Web UI

Installation

Requirements:

  • Python 3.10+
  • uv
  • Stable Google Chrome

Install as a tool:

uv tool install browser-control-and-automation-cli
browser-cli doctor
browser-cli paths
browser-cli read https://example.com

The published package name is browser-control-and-automation-cli. The installed command remains browser-cli.

Run without installing:

uvx --from browser-control-and-automation-cli browser-cli read https://example.com

Install from Git:

uv tool install git+https://github.com/hongv/browser-cli.git
browser-cli --help

Installed users should start with docs/installed-with-uv.md. For removal and local cleanup guidance, see docs/uninstall.md.

Development

Clone the repository and sync the managed development environment:

git clone https://github.com/hongv/browser-cli.git
cd browser-cli
uv sync --dev

The CLI targets stable Google Chrome. Playwright Chromium is mainly useful for local integration testing and is installed through the repo environment.

Optional: Extension Mode

For real-Chrome execution:

  1. Open chrome://extensions
  2. Enable Developer mode
  3. Click Load unpacked
  4. Select browser-cli-extension/

Once connected, browser-cli status reports extension capability state and the daemon can prefer the extension backend at safe idle points.

Quick Start

If you installed Browser CLI with uv, use the dedicated installed-user guide at docs/installed-with-uv.md. The short version is:

browser-cli doctor
browser-cli paths
browser-cli read https://example.com

One-Shot Read

browser-cli read https://example.com
browser-cli read https://example.com --snapshot
browser-cli read https://example.com --scroll-bottom

Interactive Control

browser-cli open https://example.com
browser-cli snapshot
browser-cli click @8d4b03a9
browser-cli fill @input_ref "value"
browser-cli html
browser-cli status
browser-cli reload

Multi-Agent Tabs

X_AGENT_ID=agent-a browser-cli open https://example.com
X_AGENT_ID=agent-a browser-cli tabs

X_AGENT_ID=agent-b browser-cli open https://example.org
X_AGENT_ID=agent-b browser-cli tabs

Task And Automation Model

Browser CLI separates local authoring from durable publication:

  • task is local editable source
  • automation is a published immutable snapshot

Typical task layout:

tasks/
  my_task/
    task.py
    task.meta.json
    automation.toml

Validate and run a task directly:

browser-cli task validate tasks/my_task
browser-cli task run tasks/my_task --set url=https://example.com

Publish the current task directory into the automation service:

browser-cli automation publish tasks/my_task
browser-cli automation status
browser-cli automation ui

Publication semantics:

  • automation publish snapshots task.py, task.meta.json, and automation.toml together under ~/.browser-cli/automations/<automation-id>/versions/<version>/
  • if source automation.toml exists, Browser CLI uses it as the publish-time configuration truth
  • if source automation.toml is absent, Browser CLI publishes generated defaults and reports that explicitly via manifest_source

Export a persisted automation back to automation.toml:

browser-cli automation export my_task --output /tmp/my_task.automation.toml

Included examples:

Real-site publish example:

browser-cli task validate tasks/douyin_video_download
browser-cli automation publish tasks/douyin_video_download
browser-cli automation inspect douyin_video_download
browser-cli automation status

Inspect semantics:

  • browser-cli automation inspect <automation-id> shows the current live automation-service configuration
  • browser-cli automation inspect <automation-id> --version <n> shows snapshot_config for the immutable published version and live_config for the current service state
  • latest_run remains a separate operational view

Output Contracts

read

  • stdout: final rendered result only
  • stderr: diagnostics only

Exit codes:

  • 0: success
  • 1: unexpected internal error
  • 2: usage error
  • 66: empty content
  • 69: browser unavailable
  • 73: profile unavailable
  • 75: temporary read failure

Daemon-backed Commands

  • success stdout: JSON only
  • failure stderr: short error summary
  • stable machine-readable error codes include:
    • NO_ACTIVE_TAB
    • AGENT_ACTIVE_TAB_BUSY
    • TAB_NOT_FOUND
    • NO_SNAPSHOT_CONTEXT
    • REF_NOT_FOUND
    • STALE_SNAPSHOT
    • AMBIGUOUS_REF

Runtime Notes

  • Managed profile mode is the default backend.
  • Extension mode is the preferred real-Chrome backend when connected and healthy.
  • Driver rebinding happens only at safe idle points and is reported as state_reset.
  • runtime.timeout_seconds is the total wall-clock timeout for one automation run in the automation service.

Documentation

Testing

Run lint:

./scripts/lint.sh

Run tests:

./scripts/test.sh

Run guards:

./scripts/guard.sh

Run the full local validation flow:

./scripts/check.sh

Fast Python 3.10 compatibility check:

uv run python scripts/guards/python_compatibility.py

When the runtime behaves unexpectedly, use:

browser-cli status
browser-cli reload
browser-cli status

The integration coverage is fixture-driven and local-first. It exercises:

  • navigation, tabs, and history
  • snapshot and rendered HTML capture
  • semantic ref reconstruction after DOM re-render
  • stale and ambiguous ref failures
  • iframe refs
  • ref-driven element actions
  • console, network, dialogs, trace, video, screenshot, and PDF
  • cookies, storage save/load, and X_AGENT_ID isolation
  • task runtime and automation publishing/service flows

Acknowledgements

This project is deeply inspired by bridgic-browser. Browser CLI keeps the semantic ref and daemon-backed strengths while pushing the product toward a CLI-first, agent-first surface with reusable task and automation layers.

About

CLI-first browser automation for AI agents

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors