Skip to content

vericontext/snact

Repository files navigation

snact
AI agent-optimized browser CLI — snap + act

CI Release License


snact lets AI agents control browsers with extreme token efficiency. One snap returns page structure, section content, and every actionable element — enough for an LLM to understand and act in a single turn.

$ snact snap https://www.apple.com/shop/buy-mac/macbook-pro

# Buy MacBook Pro

## Model. Choose your size.
> 14-inch — From $1,699 or $141.58/mo. | 16-inch — From $2,699 or $224.91/mo.
@e35 [input:radio] "14-inch" selected
@e36 [input:radio] "16-inch"

## Chip. Choose from these powerful options.
> M5 Pro — 12-core CPU, 16-core GPU | M5 Max — 16-core CPU, 40-core GPU
@e40 [link]

$ snact click @e36
ok
---
## Model. Choose your size.                    # ← auto re-snap included
> 16-inch — Available with M5 Pro or M5 Max
@e35 [input:radio] "14-inch"
@e36 [input:radio] "16-inch" selected

Every action automatically returns a fresh page snapshot — no manual re-snap needed.

Performance comparison

Task: Visit npmjs.com for 10 React state management libraries (zustand, jotai, recoil, valtio, mobx, redux, xstate, effector, nanostores, legend-state). Collect weekly downloads, last publish date, unpacked size, and dependencies for each.

comparison-compressed.mp4

Both sides played at 16x speed. Left: Playwright MCP (5m 17s real time). Right: snact CLI (2m 39s real time).

snact CLI Playwright CLI Playwright MCP
Time 2m 39s 5m 10s 5m 17s
Total tokens 34.1K (17%) 35.4K (18%) 88K (44%)
Message tokens 18.8K 20.1K 73.4K
Data accuracy Correct Correct Correct

snact finished in half the time with half the tokens. All three produced identical data.

Speed: Both Playwright approaches took ~5 minutes. snact finished in 2m 39s. Token efficiency: snact and Playwright CLI used similar total tokens (~34-35K), but Playwright MCP consumed 2.5x more (88K) due to accessibility tree snapshots accumulating in context. Answer quality: All three produced identical data with minor format differences.

Per-page token measurements

Measured with wc -c / 4 on actual snap output (1 token ≈ 4 chars):

Site snact (full) snact (--focus)
example.com 46
GitHub Login 172 60
GitHub Trending 2,152 614
Hacker News 2,670
Apple MacBook Pro 2,546
StackOverflow 4,363
NYTimes 2,417

Simple pages: 50-200 tokens. Typical pages: 2K-4K. With --focus: 60-600.

Playwright token estimates from scrolltest.medium.com (MCP ~114K per test session, CLI ~27K). snact numbers are directly measured.

Record & Replay

Record task: Use snact to record a workflow called "npm-react-state" that visits npmjs.com for these 10 libraries. For each, snap the page and read the sidebar stats.

Replay task: Replay npm-react-state and build me an updated comparison table.

snact-replayx8-small.mp4

Played at 8x speed. First: record (2m 18s real time). Then: replay (47s real time).

The replay skips all LLM reasoning — it re-executes the recorded commands directly against Chrome and returns fresh data.

Record (first run) Replay
Time 2m 18s 47s
LLM turns ~20+ 1
Data Fresh Fresh (re-visits pages)

Why snact?

Playwright MCP Playwright CLI snact
Architecture Persistent MCP server Daemon + CLI Stateless CLI
After click/fill Snapshot in response Manual re-snapshot Snapshot in response
Tokens per page ~3K-50K ~1K-13K ~50-4K (measured)
Repeated tasks Full LLM call Full LLM call 0 (workflow replay)
Session persistence Config-based --persistent flag session save/load
Cron automation Requires LLM API Requires LLM API Shell one-liner
Locale/Geo override Via run-code Via config --locale / --geo flags
Install npm + Playwright npm + Playwright Single binary (Rust)
Multi-browser Chromium/FF/WebKit Chromium/FF/WebKit Chrome only

Installation

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/vericontext/snact/main/install.sh | bash
# Windows (PowerShell) — experimental, see #3
irm https://raw.githubusercontent.com/vericontext/snact/main/install.ps1 | iex
# From source (all platforms)
cargo install --path crates/snact-cli

# Verify
snact --version

Quick start

snact browser launch --background          # 1. start Chrome
snact snap https://github.com/trending     # 2. page structure + elements
snact click @e28                           # 3. act (auto re-snap included)
snact browser stop                         # 4. done

snap — structure + content + elements

snact snap https://github.com/trending
# Trending

## NousResearch / hermes-agent
> The agent that grows with you | Python | Star
@e28 [link] href="/NousResearch/hermes-agent"

## microsoft / markitdown
> Python tool for converting files and office documents to Markdown. | Python | Star
@e37 [link] href="/microsoft/markitdown"

Section headings group elements. > lines summarize content. Each @eN reference is stable until the next snap.

act — actions return updated state

snact click @e28
ok
---
# NousResearch/hermes-agent
> The agent that grows with you. Build AI agents...
@e1 [link] "Code" href="/NousResearch/hermes-agent"
@e2 [link] "Issues" href="/NousResearch/hermes-agent/issues"
...

Every mutation (click, fill, type, select, scroll) returns a fresh snap. Use --no-snap to disable.

read — full text content

snact read https://example.com --focus="main"
# Example Domain
This domain is for use in documentation examples.
Learn more

snap = structure + elements + summaries. read = full text when you need more detail.

eval — custom JavaScript

When snap/read can't capture dynamic content (e.g. Amazon product cards):

snact eval "JSON.stringify(Array.from(document.querySelectorAll('.product')).map(p => ({
  title: p.querySelector('h2')?.textContent,
  price: p.querySelector('.price')?.textContent
})))"

session — persist browser state

snact session save github           # cookies + localStorage
snact session load github           # restore later

record & replay — zero LLM cost

snact record start login-flow
snact snap https://app.example.com/login
snact fill @e1 "user@example.com" --no-snap
snact click @e3 --no-snap
snact wait navigation
snact record stop

# Day 2, 3, 4... — no LLM, no tokens
snact replay login-flow

Commands

Command Description
snap [url] Page structure + section summaries + interactable elements
read [url] Full visible text as structured markdown
click <@ref> Click element (returns updated snap)
fill <@ref> <value> Set input value (returns updated snap)
type <@ref> <text> Type character by character (returns updated snap)
select <@ref> <value> Select dropdown option (returns updated snap)
scroll [direction] Scroll page (returns updated snap)
eval <expression> Execute JavaScript on the page
screenshot [--file] Capture page as PNG
wait <condition> Wait for navigation, CSS selector, or timeout (ms)
session save|load|list|delete Manage browser sessions
record start|stop|list|delete Record command sequences
replay <name> Replay a recorded workflow
browser launch|stop|status Manage Chrome instance
schema [command] JSON Schema introspection
mcp Start MCP server (JSON-RPC over stdio)
init Create AGENT.md for Claude Code skill discovery

Global flags

--port <PORT>       Chrome debugging port [default: 9222]
--output <FMT>      Output format: text, json, ndjson [default: text]
--dry-run           Preview action without executing
--no-snap           Skip automatic re-snap after actions
--profile <NAME>    Browser profile name [default: "default"] (browser launch)
--idle-timeout <MIN> Auto-stop Chrome after N minutes of inactivity (browser launch)
--lang <LANG>       Accept-Language header [default: en-US]
--locale <LOCALE>   JS navigator.language override (e.g. en-US, ja-JP)
--geo <LAT,LON>     Geolocation override (e.g. "37.7749,-122.4194")
--user-agent <UA>   Custom User-Agent string
--focus <SEL>       CSS selector to limit scope (snap/read)
--verbose           Debug logging

AI agent integration

Claude Code

snact works as a native CLI tool — no MCP configuration needed:

snact browser launch --background
claude
# "Use snact to find the MacBook Pro M4 Pro price on apple.com"

Run snact init in your project directory to create an AGENT.md skill file for Claude Code.

MCP server

For Claude Desktop or any MCP client:

{
  "mcpServers": {
    "snact": {
      "command": "snact",
      "args": ["mcp"]
    }
  }
}

Piped / scripted

snact snap https://example.com --output=json | jq '.elements | keys[]'
snact snap https://example.com --output=ndjson

Architecture

graph TD
    A["AI Agent (Claude, GPT, ...)"] -->|"CLI stdout/stdin"| B
    A -->|"JSON-RPC stdio"| M

    subgraph snact
        B["snact-cli<br/><small>Thin CLI shell (clap)</small>"]
        M["MCP Server<br/><small>JSON-RPC over stdio</small>"]
        B --> C
        M --> C

        subgraph core["snact-core"]
            C["Snap"] & D["Read"] & E["Action + snap"] & F["Record/Replay"]
            C --> G["Element Map<br/><small>@eN refs</small>"]
            E --> G
            H["Session Storage"]
        end

        core --> I

        I["snact-cdp<br/><small>WebSocket + ~30 hand-written CDP commands</small>"]
    end

    I -->|"WebSocket (CDP)"| J["Chrome"]
Loading

Three-crate workspacecdp handles Chrome protocol, core is the library, cli is a thin shell. MCP server exposes the same core over JSON-RPC for Claude Desktop and other MCP clients.

How contextual snap works
  1. DOMSnapshot.captureSnapshot — Full flattened DOM including Shadow DOM
  2. Accessibility.getFullAXTree — Semantic roles, names, descriptions, properties
  3. Merge — Join DOM nodes with AX nodes by backendNodeId
  4. Extract context — Headings, text blocks (DOM + JS fallback for SPAs)
  5. Filter — Keep only interactable elements, exclude hidden/aria-hidden
  6. Compress — Group by section headings, add content summaries, assign @eN refs
Auto re-snap after actions

Every mutation action (click, fill, type, select, scroll) automatically:

  1. Executes the action via CDP
  2. Waits for settle — detects navigation (waits for page load, 3s timeout) or SPA mutation (300ms settle)
  3. Takes a fresh snap on the same transport connection
  4. Returns ok\n---\n{snap output} so the LLM sees updated state in one turn
Snap output format reference
## Section Heading
> Content summary: prices, options, descriptions (up to 300 chars)
@e1 [role] "label" id="..." href="..." expanded desc="Opens in new tab"
@e2 [input:text] "Search" placeholder="..." required
Component Purpose
## Heading Page section structure (h1-h6)
> summary Key text content from that section
@eN Stable element reference for actions
[role] Semantic role (button, link, textbox, etc.)
"label" Accessible name
id=, href= Key attributes
expanded, collapsed Dropdown/accordion state
selected Active tab/option
required, readonly Form field constraints
desc="..." Accessibility description

Design decisions

  • Hand-written CDP types over generated bindings — ~30 commands, fast compile
  • Disk-based state between invocations — element maps, sessions, workflows as JSON
  • backendNodeId as element identifier — stable within a page load, selector hints for replay
  • Text output by default — optimized for LLM comprehension, not JSON parsing
  • Persistent browser profiles — cookies survive restarts, reduces bot detection
  • Single-threaded tokio — one thing at a time

Data storage

User scope~/.local/share/snact/ (Linux) or ~/Library/Application Support/snact/ (macOS):

snact/
├── element_map.json        # Current @eN → element mappings
├── heartbeat               # Last command timestamp (for --idle-timeout)
├── chrome-{port}.pid       # Chrome process ID
├── profiles/default/       # Persistent Chrome profile
├── sessions/{name}.json    # Saved browser sessions
├── workflows/{name}.json   # Recorded workflows (personal)
└── recording.json          # Active recording state

Project scope.snact/ in the project directory (created by snact init, git-committable):

.snact/
└── workflows/{name}.json   # Shared workflows (team/repo)

Workflows save to project scope when .snact/ exists, otherwise user scope. On load, project scope takes priority.

Contributing

See CONTRIBUTING.md for development setup, project structure, and commit conventions.

License

MIT

About

AI agent-optimized browser CLI — snap + act. Extreme token efficiency for LLM-driven browser automation.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors