Autonomous browser agent running entirely on Apple Silicon. No cloud APIs, no Claude Code overhead, no MCP layer. Direct MLX inference + Chrome DevTools Protocol.
User prompt → Local LLM (MLX) → Chrome DevTools Protocol → Brave Browser
↑ ↓
~2–5s per step DOM.pierce + DOM.focus
+ Input.insertText
Default model: Gemma 4 31B Instruct abliterated (4-bit quantized) via MLX on Apple Silicon
Alternative models: any MLX-compatible model — Qwen 3.5 122B (biggest), Llama 3.3 70B (smartest), or anything else — swap via the MLX_MODEL env var
Browser: Brave with remote debugging on port 9222
Protocol: CDP WebSocket — no MCP, no proxy, direct connection
Most news sites (Yahoo, etc.) use third-party comment widgets (OpenWeb/SpotIM) that load inside:
- A cross-origin iframe (JavaScript can't access it)
- A Shadow DOM (normal querySelector can't find elements)
- A ProseMirror rich text editor (innerHTML doesn't work)
Standard browser automation tools (Playwright, Selenium, MCP) fail at all three layers.
Our solution uses CDP primitives that bypass all of these:
DOM.getDocument(depth: -1, pierce: true) # Exposes everything across iframes + Shadow DOM
DOM.performSearch(".ProseMirror") # Finds the editor in any context
DOM.focus(nodeId) # Focuses it regardless of origin
Input.insertText(text) # Types into the focused element
This works because CDP operates at the browser level, not the page level. Same-origin policy doesn't apply.
- macOS with Apple Silicon (M-series), 32 GB+ unified memory recommended
- Brave Browser (or Chrome) with remote debugging
- Python 3.12+ with MLX
# MLX server backend (handles local inference)
pip install mlx mlx-lm websocketsThe agent talks to a local MLX inference server that speaks Anthropic's Messages API.
The server ships with the companion repo claude-code-local — set that up first. Once installed, the server lives at ~/.local/mlx-native-server/server.py and is auto-started by the desktop launcher.
Desktop launcher: double-click Gemma 4 Browser.command (from the claude-code-local repo's launchers/Browser Agent.command). The launcher will:
- Start the MLX server with Gemma 4 31B if it isn't already running
- Start Brave with
--remote-debugging-port=9222if it isn't already running - Ensure at least one page tab exists
- Hand off to the Python agent
python agent.py
# Prompts: "What should I do?"
# Type tasks, get results, stays open for the next task
# Type "quit" to exit
# Errors in one task no longer kill the whole session — you'll just get a
# message and a fresh promptpython agent.py "Find an article about Iran on Yahoo and make a comment"# Override the default model with any MLX-compatible LLM
MLX_MODEL="mlx-community/Qwen2.5-72B-Instruct-4bit" python agent.pyFind an article about Iran on Yahoo and make a comment. Don't post it, just leave it in draft.
The agent will:
- Navigate to Yahoo News
- Find an Iran article via JavaScript (instant, no model needed)
- Click the article
- Read the article content (first 6 paragraphs)
- Generate a relevant 2–3 sentence comment using the model
- Open the Comments section
- Find the comment widget (cross-origin iframe + Shadow DOM)
- Type the comment via DOM.pierce + DOM.focus + Input.insertText
- Scroll so you can see the comment
- NOT click Send — leaves it for your review
Go to Yahoo, find an Iran article. comment: The diplomatic situation demands more transparency from all parties involved.
When the task mentions "comment" plus a topic keyword (iran, trump, etc.):
- JavaScript finds the article — no model needed, instant
- Model generates the comment — reads article paragraphs, writes 2–3 sentences
- CDP types the comment — pierces through iframes and Shadow DOM
The model controls the browser via JSON tool calls:
navigate(url)— go to a pagesnapshot()— get accessibility tree with element UIDsclick(uid)— click an elementtype_text(uid, text)— type into an elementscroll(direction)— scroll up/downjs(code)— run arbitrary JavaScriptdone(message)— task complete
Built-in loop detection: if the same UID gets clicked more than twice in a row, the agent presses Escape (to dismiss any lightbox/overlay) and forces a fresh snapshot so the model can try a different approach.
Any exception during a task (MLX timeout, CDP websocket drop, malformed model output, etc.) is caught by the main loop — you'll see the error printed and return to the prompt rather than the whole agent crashing.
| Metric | Value |
|---|---|
| Navigate + snapshot | ~4s |
| Article finding (JS) | <1s |
| Comment generation | ~8s |
| Comment typing (pierce + type) | ~3s |
| Total for comment task | ~20–30s |
agent.py— The browser agent (single file, ~470 lines)~/.local/mlx-native-server/server.py— MLX inference server with Anthropic API + tool parsing (ships with claude-code-local)launchers/Browser Agent.command— Desktop launcher (ships with claude-code-local, surfaces asGemma 4 Browser.commandon the Desktop)
- MLX — Apple's ML framework for Apple Silicon
- Gemma 4 31B — instruction-tuned, abliterated and 4-bit quantized
- Chrome DevTools Protocol — direct browser control via WebSocket
- No cloud APIs, no subscriptions, no data leaving your machine
Builders running this stack hang out in the NiceDreamzApps Discord — quiet, builder-tone, no bots. Share what you're scraping, what's breaking, what local model worked for which site.