Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,20 @@
# agent-browser

## 0.26.0-celeria-camoufox.1
## 0.26.0-celeria-camoufox.2

<!-- release:start -->
### Bug Fixes

- **Fixed `internal-error: name 'ref_id' is not defined` on every `page.snapshot` call under the Camoufox engine.** A prior change to the sidecar's handle-resolution loop renamed a loop-local variable but missed the `ref_cache.put` call at the end of the loop, which broke every snapshot-then-click flow in v0.26.0-celeria-camoufox.1. The fix also clarifies the two-ref distinction in the loop (agent-facing `@eN` vs. DOM `data-__ab-ref` attribute) so a future edit is less likely to break this again. Regression test added: `test_interactive_only_snapshot_then_click_by_ref` exercises the full snapshot → click-by-ref path and would have caught the original NameError. (Celeria fork)

### New Features

- **`scroll` and `scroll into view` on the Camoufox engine.** Previously returned `not-yet-implemented: action 'Runtime.evaluate' is not yet supported on engine=camoufox`. Parity with the Chrome path: `scroll` accepts `{x, y}` pixel deltas or `{direction: up|down|left|right, amount}` (Rust-side normalisation folds direction/amount into deltas before the sidecar call), with optional `selector` (a CSS selector or `@eN` ref) to scroll inside a specific element rather than the window. `scroll into view` centres the matched element via `scrollIntoView({block:'center', inline:'center'})`, matching the Chrome path's JS exactly. Both return structured errors (`selector-not-found`, `ambiguous-selector`, `ref-stale`, `element-detached`) rather than opaque Playwright exceptions. Still deferred to v2: ref-annotated screenshots (`screenshot --annotate`) which need CDP DOM-box extraction the sidecar doesn't yet expose. (Celeria fork)
<!-- release:end -->

## 0.26.0-celeria-camoufox.1

<!-- old-release:start -->
### New Features

- **`--engine camoufox` — third browser backend (Camoufox / patched Firefox).** Adds Camoufox alongside the existing Chrome (CDP) and Lightpanda (CDP) engines for targets that defeat JS-injection stealth. Camoufox's C++-level patches (canvas/WebGL noise, font fingerprint, WebRTC IP, AudioContext) go deeper than our `--stealth` script. Because Camoufox speaks Juggler, not CDP, the daemon drives it via a persistent Python sidecar over JSON-line stdio instead of the existing `CdpClient`. Stealth is implicit when `engine=camoufox`; combining with `--stealth` is a no-op with a warning (the JS injection would fight the engine-level spoofs). (Celeria fork)
Expand All @@ -15,7 +27,7 @@
### Requirements

- Running `--engine camoufox` outside the Celeria E2B template requires a Python 3 runtime with `pip install camoufox camoufox_sidecar` and a one-time `python -m camoufox fetch` to download the Camoufox browser binary. Follows the Lightpanda "install it yourself" precedent; `agent-browser install` is not extended for Camoufox in v1.
<!-- release:end -->
<!-- old-release:end -->

## 0.26.0-celeria-stealth.1

Expand Down
2 changes: 1 addition & 1 deletion cli/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion cli/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "agent-browser"
version = "0.26.0-celeria-camoufox.1"
version = "0.26.0-celeria-camoufox.2"
edition = "2021"
description = "Fast browser automation CLI for AI agents"
license = "Apache-2.0"
Expand Down
23 changes: 21 additions & 2 deletions cli/src/native/actions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2921,7 +2921,6 @@ async fn handle_hover(cmd: &Value, state: &mut DaemonState) -> Result<Value, Str

async fn handle_scroll(cmd: &Value, state: &mut DaemonState) -> Result<Value, String> {
let mgr = state.browser.as_ref().ok_or("Browser not launched")?;
let session_id = mgr.active_session_id()?.to_string();
let selector = cmd.get("selector").and_then(|v| v.as_str());

let (mut dx, mut dy) = (
Expand All @@ -2940,6 +2939,19 @@ async fn handle_scroll(cmd: &Value, state: &mut DaemonState) -> Result<Value, St
}
}

// Camoufox path: direction/amount are already folded into dx/dy above,
// so the sidecar only needs deltas plus an optional selector. Keeping the
// Rust side responsible for the translation means a single scroll
// semantic lives in `handle_scroll`, not duplicated across engines.
if mgr.backend.is_camoufox() {
let mut args = json!({ "x": dx, "y": dy });
if let Some(sel) = selector {
args["selector"] = json!(sel);
}
return mgr.camoufox_client().call("page.scroll", args).await;
}

let session_id = mgr.active_session_id()?.to_string();
interaction::scroll(
&mgr.backend,
&session_id,
Expand Down Expand Up @@ -4420,12 +4432,19 @@ async fn handle_selectall(cmd: &Value, state: &mut DaemonState) -> Result<Value,

async fn handle_scrollintoview(cmd: &Value, state: &mut DaemonState) -> Result<Value, String> {
let mgr = state.browser.as_ref().ok_or("Browser not launched")?;
let session_id = mgr.active_session_id()?.to_string();
let selector = cmd
.get("selector")
.and_then(|v| v.as_str())
.ok_or("Missing 'selector' parameter")?;

if mgr.backend.is_camoufox() {
return mgr
.camoufox_client()
.call("page.scrollIntoView", json!({ "selector": selector }))
.await;
}

let session_id = mgr.active_session_id()?.to_string();
interaction::scroll_into_view(
&mgr.backend,
&session_id,
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "agent-browser",
"version": "0.26.0-celeria-camoufox.1",
"version": "0.26.0-celeria-camoufox.2",
"description": "Browser automation CLI for AI agents",
"type": "module",
"files": [
Expand Down
2 changes: 1 addition & 1 deletion packages/camoufox-sidecar/camoufox_sidecar/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
"""camoufox-sidecar: Playwright+Camoufox driver process for agent-browser."""

__version__ = "0.26.0+celeria.camoufox.1"
__version__ = "0.26.0+celeria.camoufox.2"
10 changes: 10 additions & 0 deletions packages/camoufox-sidecar/camoufox_sidecar/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,14 @@ async def _cmd_page_screenshot(sidecar: "Sidecar", args: dict) -> dict:
return await sidecar.session.screenshot(args)


async def _cmd_page_scroll(sidecar: "Sidecar", args: dict) -> dict:
return await sidecar.session.scroll(args)


async def _cmd_page_scroll_into_view(sidecar: "Sidecar", args: dict) -> dict:
return await sidecar.session.scroll_into_view(args)


async def _cmd_tab_new(sidecar: "Sidecar", args: dict) -> dict:
return await sidecar.session.tab_new(args)

Expand All @@ -172,6 +180,8 @@ async def _cmd_tab_list(sidecar: "Sidecar", args: dict) -> dict:
"page.fill": _cmd_page_fill,
"page.getText": _cmd_page_get_text,
"page.screenshot": _cmd_page_screenshot,
"page.scroll": _cmd_page_scroll,
"page.scrollIntoView": _cmd_page_scroll_into_view,
"tab.new": _cmd_tab_new,
"tab.switch": _cmd_tab_switch,
"tab.close": _cmd_tab_close,
Expand Down
100 changes: 100 additions & 0 deletions packages/camoufox-sidecar/camoufox_sidecar/session.py
Original file line number Diff line number Diff line change
Expand Up @@ -564,6 +564,106 @@ async def get_text(self, args: Optional[dict] = None) -> dict:
)
return {"text": text, "origin": _safe_page_url(tab.page), "tabId": tab.tab_id}

async def scroll(self, args: Optional[dict] = None) -> dict:
"""Scroll the page or an element by ``(x, y)`` pixels.

Parity shape with the Chrome path: ``selector`` is optional; when
absent we scroll ``window``, when present we scroll the matched
element's own scroll container (via ``el.scrollBy``). The Rust side
pre-normalises ``direction`` + ``amount`` into ``x`` / ``y``, so the
sidecar only sees deltas.
"""
args = args or {}
selector_or_ref = args.get("selector")
dx = float(args.get("x", 0) or 0)
dy = float(args.get("y", 0) or 0)

tab = await self._tab_for(args)

if selector_or_ref is None or selector_or_ref == "":
try:
await tab.page.evaluate(
"([dx, dy]) => window.scrollBy(dx, dy)", [dx, dy]
)
except Exception as exc: # noqa: BLE001
raise LaunchError("action-failed", str(exc)) from exc
return {"scrolled": True, "tabId": tab.tab_id}

ref_id = parse_ref(selector_or_ref)
if ref_id is not None:
handle = _require_ref(tab, ref_id)
try:
await handle.evaluate(
"(el, [dx, dy]) => el.scrollBy(dx, dy)", [dx, dy]
)
except Exception as exc: # noqa: BLE001
raise _classify_playwright_error(exc, "<ref>") from exc
return {"scrolled": True, "tabId": tab.tab_id}

locator = tab.page.locator(selector_or_ref)
try:
count = await locator.count()
except Exception as exc: # noqa: BLE001
raise _classify_playwright_error(exc, selector_or_ref) from exc
if count == 0:
raise LaunchError(
"selector-not-found",
f"Selector {selector_or_ref!r} did not match any element",
)
if count > 1:
raise LaunchError(
"ambiguous-selector",
f"Selector {selector_or_ref!r} matched {count} elements; refine it or use a ref",
)
try:
await locator.evaluate("(el, [dx, dy]) => el.scrollBy(dx, dy)", [dx, dy])
except Exception as exc: # noqa: BLE001
raise _classify_playwright_error(exc, selector_or_ref) from exc
return {"scrolled": True, "tabId": tab.tab_id}

async def scroll_into_view(self, args: Optional[dict] = None) -> dict:
"""Scroll ``selector`` into view, centred.

Mirrors the Chrome path's ``scrollIntoView({block:'center', inline:'center'})``
rather than using Playwright's looser ``scroll_into_view_if_needed``,
so behaviour matches across engines for the same selector.
"""
args = args or {}
selector_or_ref = _require_str(args, "selector")

tab = await self._tab_for(args)
js = "el => el.scrollIntoView({ block: 'center', inline: 'center' })"

ref_id = parse_ref(selector_or_ref)
if ref_id is not None:
handle = _require_ref(tab, ref_id)
try:
await handle.evaluate(js)
except Exception as exc: # noqa: BLE001
raise _classify_playwright_error(exc, "<ref>") from exc
return {"scrolled": selector_or_ref, "tabId": tab.tab_id}

locator = tab.page.locator(selector_or_ref)
try:
count = await locator.count()
except Exception as exc: # noqa: BLE001
raise _classify_playwright_error(exc, selector_or_ref) from exc
if count == 0:
raise LaunchError(
"selector-not-found",
f"Selector {selector_or_ref!r} did not match any element",
)
if count > 1:
raise LaunchError(
"ambiguous-selector",
f"Selector {selector_or_ref!r} matched {count} elements; refine it or use a ref",
)
try:
await locator.evaluate(js)
except Exception as exc: # noqa: BLE001
raise _classify_playwright_error(exc, selector_or_ref) from exc
return {"scrolled": selector_or_ref, "tabId": tab.tab_id}

async def _tab_for(self, args: dict) -> Tab:
tab_id = args.get("tabId")
if isinstance(tab_id, str) and tab_id:
Expand Down
13 changes: 10 additions & 3 deletions packages/camoufox-sidecar/camoufox_sidecar/snapshot.py
Original file line number Diff line number Diff line change
Expand Up @@ -300,17 +300,24 @@ async def take_snapshot(
entry["ref"] = f"e{new_idx}"

# Resolve each ref back to a live ElementHandle so subsequent click/fill
# calls can reach the element without re-running the JS walker.
# calls can reach the element without re-running the JS walker. Two refs
# are in play here: ``entry["ref"]`` is the agent-facing id (``e1..eM``,
# contiguous after any ``interactive_only`` filter), and ``dom_ref`` is
# the attribute value the JS walker stamped on the element before the
# filter re-numbered refs. We query by the DOM ref and cache under the
# agent-facing one, so the next ``click @eN`` from the agent lands on
# the right handle.
for entry in entries:
dom_ref = entry.get("_dom_ref", entry["ref"])
agent_ref = entry["ref"]
dom_ref = entry.get("_dom_ref", agent_ref)
handle = await page.query_selector(f"[data-__ab-ref='{dom_ref}']")
if handle is None:
# Element vanished between the walker and this query_selector —
# extremely rare but possible under a script that re-renders
# synchronously. Drop the entry silently rather than emit a
# dangling ref to the agent.
continue
ref_cache.put(ref_id, handle, role=entry["role"], name=entry["name"])
ref_cache.put(agent_ref, handle, role=entry["role"], name=entry["name"])

lines = [_format_line(entry) for entry in entries]
if not lines:
Expand Down
2 changes: 1 addition & 1 deletion packages/camoufox-sidecar/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "camoufox-sidecar"
version = "0.26.0+celeria.camoufox.1"
version = "0.26.0+celeria.camoufox.2"
description = "Sidecar process that drives Camoufox on behalf of agent-browser"
readme = "README.md"
requires-python = ">=3.10"
Expand Down
Loading
Loading