Skip to content

nandrzej/byp

Repository files navigation

byp — Browser automation for AI agents

⚠️ Work in progress. Core functionality works; some features (e.g. CAPTCHA solving) are not yet implemented.

byp is a Python library and CLI that gives AI agents a reliable, bot-detection-resistant browser interface. Instead of raw HTML, agents get a structured accessibility tree they can reason over and act on — navigate, click, type, screenshot — all from a simple CLI with JSON output designed for subprocess-based tool use.


Why byp

Most browser automation libraries are built for humans writing test scripts. byp is built for agents:

  • Structured page perception — ARIA accessibility tree output (main frame + all iframes), not raw HTML soup
  • JSON-mode CLI — every command outputs clean JSON, making it trivial to wire up as an agent tool via subprocess
  • Persistent named sessions — agents can resume browser state across calls without re-authenticating or losing cookies
  • Anti-bot evasion — built on Camoufox, a Firefox fork with built-in fingerprint randomisation and UA consistency checks
  • Proxy support — HTTP, SOCKS5, and SOCKS5h (remote DNS) with per-session persistence and DNS leak protection

Status

Feature Status
Navigation, click, type, fill ✅ Working
ARIA tree snapshot ✅ Working
Screenshot (bytes + file) ✅ Working
Persistent named sessions ✅ Working
Proxy support (HTTP/SOCKS5/SOCKS5h) ✅ Working
CAPTCHA solving (playwright-captcha) 🚧 Planned
Async API 🚧 Planned
MCP server mode 🚧 Planned

Installation

Requires Python 3.12+, uv.

git clone https://github.com/nandrzej/byp
cd byp
uv sync

Usage

Navigate to a page

uv run byp navigate https://example.com --session my-session

Get the accessibility tree (what the agent sees)

uv run byp snapshot --session my-session --json

Click an element

uv run byp click @e12 --session my-session --json

Type into a field

uv run byp type @e5 "search query" --session my-session --json

Take a screenshot

uv run byp screenshot --session my-session --path out.png

With proxy

uv run byp navigate https://example.com --session my-session \
  --proxy socks5h://user:pass@proxy-host:1080

All commands support --json for machine-readable output — designed for agent tool-call integration.


Architecture

Agent
  │
  ▼  subprocess / tool call (JSON in, JSON out)
CLI (byp/cli/main.py)
  │
  ▼
BrowserDriver (byp/engine/driver.py)
  ├── Camoufox (Firefox + anti-fingerprinting)
  ├── SessionManager — persistent XDG-compliant user data dirs
  ├── EnginePatcher — UA consistency checks
  └── AriaTreeProcessor — ARIA snapshot → structured JSON

What the agent sees

Instead of raw HTML, the agent receives a structured accessibility tree:

{
  "role": "WebArea",
  "name": "Example Domain",
  "children": [
    { "role": "heading", "name": "Example Domain", "ref": "@e1" },
    { "role": "paragraph", "ref": "@e2" },
    { "role": "link", "name": "More information...", "ref": "@e3" }
  ]
}

Element refs (e.g. @e3) are used directly in click, type, and fill commands.


License

MIT

About

Stealth browser automation library for AI agents. Camoufox + persistent sessions + ARIA tree perception + JSON CLI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors