Skip to content
13 changes: 7 additions & 6 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

## Project Overview

**Bridgic Browser** is an LLM-driven browser automation library built on Playwright with built-in stealth mode. It provides 67 browser tools organized into categories, an accessibility tree-based snapshot system, a stable element reference system (refs like "1f79fe5e", "8d4b03a9", …) designed for reliable AI agent interactions, and a `bridgic-browser` CLI tool backed by a persistent daemon.
**Bridgic Browser** is an LLM-driven browser automation library built on Playwright with built-in stealth mode. It provides 69 browser tools organized into categories, an accessibility tree-based snapshot system, a stable element reference system (refs like "1f79fe5e", "8d4b03a9", …) designed for reliable AI agent interactions, and a `bridgic-browser` CLI tool backed by a persistent daemon.

## Commands

Expand Down Expand Up @@ -57,7 +57,7 @@ bridgic/browser/
├── _redact.py # Log redaction helpers
├── errors.py # Public BridgicBrowserError hierarchy
├── session/ # Core browser session
│ ├── _browser.py # Browser class – main entry point (all 67 tool methods live here)
│ ├── _browser.py # Browser class – main entry point (all 69 tool methods live here)
│ ├── _browser_model.py # Data models
│ ├── _snapshot.py # SnapshotGenerator + EnhancedSnapshot + RefData
│ ├── _stealth.py # StealthConfig + StealthArgsBuilder (50+ Chrome args)
Expand All @@ -67,12 +67,12 @@ bridgic/browser/
│ ├── _launch.py # launch-mode helpers (retriable_launch, etc.)
│ ├── _locator_utils.py # _click_checkable_target and other locator helpers
│ └── _errors.py # session-internal error types
├── tools/ # 67 automation tools (all implemented in _browser.py)
├── tools/ # 69 automation tools (all implemented in _browser.py)
│ ├── _browser_tool_set_builder.py # BrowserToolSetBuilder (category/name selection)
│ └── _browser_tool_spec.py # BrowserToolSpec (wraps tool for agents)
└── cli/ # CLI tool (bridgic-browser command)
├── __init__.py # Exports main()
├── _commands.py # Click command definitions (67 commands, SectionedGroup)
├── _commands.py # Click command definitions (69 commands, SectionedGroup)
├── _client.py # Socket client: send_command(), ensure_daemon_running()
├── _daemon.py # Daemon: asyncio Unix socket server + Browser instance
└── _transport.py # Unix-socket transport layer (used by client and daemon)
Expand Down Expand Up @@ -108,7 +108,7 @@ bridgic has two independent download pipelines, picked by mode:

| Mode | Pipeline | Notes |
|---|---|---|
| non-CDP (launch / persistent_context) | Playwright's per-context `setDownloadBehavior(allowAndName, downloadPath=<artifactsDir>)` → `download` events fire → `DownloadManager.save_as()` copies to `downloads_path` with the real filename. | Files land at the real filename in `downloads_path`. If `downloads_path` is unset, DownloadManager is not attached and files are lost when Playwright deletes `artifactsDir` on close. |
| non-CDP (launch / persistent_context) | Playwright's per-context `setDownloadBehavior(allowAndName, downloadPath=<artifactsDir>)` → `download` events fire → `DownloadManager.save_as()` copies to `downloads_path` with the real filename. | Files land at the real filename in `downloads_path`. If `downloads_path` is unset, `Browser.__init__` defaults the manager to `~/Downloads` (PR #28), so downloads are still captured — no more silently-lost files. |
| CDP-owned (bridgic creates its own context on the remote Chrome) | Same as non-CDP: Playwright's per-context `allowAndName` routes through `artifactsDir`, DownloadManager copies. | Per-context override targets bridgic's own context, doesn't touch the user. |
| **CDP-borrowed** (`Browser(cdp=...)` against a user's running Chrome) | bridgic's own override on bridgic's tab: `Browser.setDownloadBehavior(allowAndName, downloadPath=<effective>, eventsEnabled=true)` sent **via the page CDP session** (`BrowserContext.new_cdp_session(self._page)`). `CdpDownloadRenamer` subscribes to `Browser.downloadWillBegin/downloadProgress` on the same session and renames `<dir>/<guid>` → `<dir>/<real name>` on completion. | Page-session routing is the *only* form Chrome 138+ honors when the user has "Ask where to save each file" enabled — `Browser.setDownloadBehavior` over a browser-level session and `Page.setDownloadBehavior(allow, ...)` both still pop the dialog. See [empirically-tried alternatives](#empirically-tried-alternatives-for-cdp-borrowed-downloads) below. |

Expand Down Expand Up @@ -159,7 +159,8 @@ agent-browser's `Some(session_id)` argument is the same trick — page-level CDP
#### Caveats

- **bridgic's tab gets the override; user's tabs keep their normal Chrome UX** (intentional — the page-session scope is bridgic's tab only). User-initiated downloads in their other tabs still go to their Chrome's configured directory and obey their "Ask where to save" pref. This is by design and matches the "I gave you full control of *my agent's* tab via `--cdp`" semantics — user's private workspace is untouched.
- **DownloadManager is not attached in CDP-borrowed mode.** Chrome writes directly to the final path; Playwright's per-context `download` event doesn't fire when the file is routed away from `artifactsDir`. `wait_for_download()` is correspondingly **unsupported in CDP-borrowed mode** — use CDP-owned or non-CDP for that.
- **DownloadManager in CDP-borrowed mode is NOT attached to any page or context.** Attaching to the borrowed context would hijack user tabs (privacy boundary); attaching page-scoped to bridgic's own tab was empirically verified to cause duplicate-record bugs — Playwright STILL fires `download` events in CDP-borrowed mode (`setDownloadBehavior(allowAndName)` on a page session does not suppress them), and `_handle_download.save_as` writes a 0-byte placeholder while the real file is already produced by `CdpDownloadRenamer`. Instead, `CdpDownloadRenamer.on_completed → DownloadManager.record_external_download` pipes real completions back into DM, populating `downloaded_files` / `_completed_queue` / `_pending_waiters`. `downloaded_files` / `wait_for_next_download` / `get_downloaded_files_text` therefore work for **downloads triggered on bridgic's primary tab** across all modes. The action-bound `wait_for_download()` variant (which still needs Playwright's `download` event to fire on bridgic's tab) **remains unsupported in CDP-borrowed mode** — use `wait_for_next_download()` instead.
- **Popup-triggered downloads in CDP-borrowed mode are not captured by the renamer.** `setDownloadBehavior(allowAndName, ...)` was sent on a single page-level CDP session bound to bridgic's original `self._page`; that override only scopes the originating target. Downloads triggered from an auto-followed popup fall back to Chrome's native UX (the user's "Ask where to save each file" preference governs) and `wait_for_next_download` will time out. See [docs/KNOWN_LIMITATIONS.md](docs/KNOWN_LIMITATIONS.md).
- **The renamer is best-effort.** If a CDP event is missed or the OS rename fails (cross-FS, permission, etc.) the file stays at its GUID path with a warning logged. It never deletes content.
- **`last_close_artifacts()`** exposes a `rescued_downloads` list when L2 actually moved anything.
- **"Show in Folder"** in Chrome's download bubble is broken whenever `setDownloadBehavior(allowAndName, eventsEnabled=true)` is active. This is a Chromium bug (`#324282051`) affecting all CDP-using tools. See [docs/KNOWN_LIMITATIONS.md](docs/KNOWN_LIMITATIONS.md).
Expand Down
13 changes: 11 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -741,8 +741,17 @@ for f in browser.download_manager.downloaded_files:
print(f"Downloaded: {f.file_name} ({f.file_size} bytes)")

# CDP-borrowed (CdpDownloadRenamer pipeline; downloads land at downloads_path
# with real filenames; download_manager is None — wait_for_download /
# wait_for_next_download are unsupported here).
# with real filenames). CdpDownloadRenamer pipes completions back into
# DownloadManager via record_external_download, so wait_for_next_download /
# get_downloaded_files_text / downloaded_files work the same way as above
# for downloads triggered on bridgic's primary tab. wait_for_download (the
# action-bound, Playwright-event variant) remains unsupported in
# CDP-borrowed mode — use wait_for_next_download.
#
# Edge case: downloads triggered from an auto-followed popup are NOT
# routed through the renamer (setDownloadBehavior was bound to bridgic's
# original page session, not the popup), so wait_for_next_download will
# time out on those. See docs/KNOWN_LIMITATIONS.md.
browser = Browser(cdp="auto", downloads_path="./downloads")
```

Expand Down
12 changes: 10 additions & 2 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -734,8 +734,16 @@ print(await browser.get_downloaded_files_text())
for f in browser.download_manager.downloaded_files:
print(f"已下载:{f.file_name}({f.file_size} 字节)")

# CDP-borrowed(CdpDownloadRenamer 流水线;文件以真名落到 downloads_path,
# download_manager 为 None —— wait_for_download / wait_for_next_download 在此模式不支持)
# CDP-borrowed(CdpDownloadRenamer 流水线;文件以真名落到 downloads_path)
# CdpDownloadRenamer 通过 record_external_download 把完成事件回灌到
# DownloadManager,所以 wait_for_next_download / get_downloaded_files_text /
# downloaded_files 在 bridgic 主 tab 触发的下载场景下与非 CDP 模式一致。
# wait_for_download(基于触发动作的 Playwright 事件版本)在此模式仍不支持 ——
# 请改用 wait_for_next_download。
#
# 边界:从 auto-follow popup 上触发的下载不走 renamer(setDownloadBehavior
# 只绑定在 bridgic 原 page 的 CDP session),所以 wait_for_next_download 会
# 超时。详见 docs/KNOWN_LIMITATIONS.md。
browser = Browser(cdp="auto", downloads_path="./downloads")
```

Expand Down
94 changes: 61 additions & 33 deletions bridgic/browser/session/_browser.py
Original file line number Diff line number Diff line change
Expand Up @@ -724,31 +724,49 @@ def downloaded_files(self) -> List[DownloadedFile]:
return self._download_manager.downloaded_files
return []

@staticmethod
def _format_file_size(size_bytes: int) -> str:
"""Format byte count as KB or MB with appropriate precision."""
size_kb = size_bytes / 1024
if size_kb >= 1024:
return f"{size_kb / 1024:.2f} MB"
return f"{size_kb:.1f} KB"

async def get_downloaded_files_text(self) -> str:
"""Return a human-readable summary of all downloads in this session."""
"""Return a human-readable summary of all downloads in this session.

Works across all pipelines:
- non-CDP / CDP-owned: entries come from Playwright's `download`
event via ``DownloadManager._handle_download``.
- CDP-borrowed: entries come from ``CdpDownloadRenamer`` via
``DownloadManager.record_external_download``.
"""
files = self.downloaded_files
if not files:
return "No downloads in this session."
lines = []
for i, f in enumerate(files, 1):
size_kb = f.file_size / 1024
size_str = f"{size_kb / 1024:.2f} MB" if size_kb >= 1024 else f"{size_kb:.1f} KB"
lines.append(f"[{i}] {f.file_name} — {size_str} — {f.path}")
lines = [
f"[{i}] {f.file_name} — {self._format_file_size(f.file_size)} — {f.path}"
for i, f in enumerate(files, 1)
]
return "\n".join(lines)

async def wait_for_next_download(self, timeout: float = 30.0) -> str:
"""Wait up to *timeout* seconds for the next download to complete.

Returns a one-line summary of the downloaded file, or a timeout message.
Returns a one-line summary of the downloaded file, or a timeout
message. Works across all pipelines — in CDP-borrowed mode the
completion record arrives via ``CdpDownloadRenamer`` →
``DownloadManager.record_external_download`` after the rename.
"""
if not self._download_manager:
return "Download manager not available."
file = await self._download_manager.wait_for_next_download(timeout=timeout)
if file is None:
return f"No download completed within {timeout:.0f}s timeout."
size_kb = file.file_size / 1024
size_str = f"{size_kb / 1024:.2f} MB" if size_kb >= 1024 else f"{size_kb:.1f} KB"
return f"Download complete: {file.file_name} — {size_str} — {file.path}"
return (
f"Download complete: {file.file_name} — "
f"{self._format_file_size(file.file_size)} — {file.path}"
)

@property
def headless(self) -> bool:
Expand Down Expand Up @@ -1664,8 +1682,18 @@ async def _start(self) -> None:
if override_ok:
self._current_cdp_download_path = take_over_path
try:
# Pipe successful renames into DownloadManager
# so CDP-borrowed downloads surface through the
# same downloaded_files / wait_for_next_download
# API as non-CDP / CDP-owned modes.
on_completed = (
self._download_manager.record_external_download
if self._download_manager is not None
else None
)
self._cdp_download_renamer = CdpDownloadRenamer(
default_dir=take_over_path
default_dir=take_over_path,
on_completed=on_completed,
)
await self._cdp_download_renamer.attach(
self._cdp_download_session
Expand All @@ -1689,13 +1717,19 @@ async def _start(self) -> None:
# Playwright's per-context setDownloadBehavior(allowAndName)
# still routes downloads through the artifactsDir, so
# DownloadManager.save_as() can copy files to downloads_path.
# - Borrowed context: NOT attached. Our L1 override took the
# default context to allow + downloadPath, so Chrome writes
# directly to the final path; bridgic is not in the
# file-transfer loop. Trying to `save_as` here would block
# forever (Playwright no longer receives
# Browser.downloadProgress(completed) once the path moved
# out of artifactsDir).
# - Borrowed context: NOT attached anywhere. Attaching to the
# whole context would hijack user tabs (privacy boundary);
# attaching page-scoped to bridgic's own tab was empirically
# verified to cause duplicate-record bugs — Playwright STILL
# fires `download` events in CDP-borrowed mode (the
# assumption that "path moved out of artifactsDir suppresses
# the event" is wrong with `setDownloadBehavior(allowAndName)`
# on a page session), and `_handle_download.save_as` writes a
# 0-byte placeholder to the DM default dir. Instead,
# `CdpDownloadRenamer.on_completed` pipes the real completion
# back into DM via `record_external_download`, which
# populates `downloaded_files` / `_completed_queue` /
# `_pending_waiters` without going through `_handle_download`.
if self._download_manager and self._cdp_context_owned:
self._download_manager.attach_to_context(self._context)

Expand Down Expand Up @@ -2958,21 +2992,15 @@ async def _switch_self_page_to(self, new_page: Page) -> None:
await self._switch_video_to_page(new_page)
except Exception as e:
logger.debug("[_switch_self_page_to] video switch failed: %s", e)
# CDP-borrowed mode attaches DownloadManager per-page (not per-context)
# to avoid hijacking the user's private downloads. Migrate handlers so
# downloads triggered from the followed popup still land in bridgic's
# downloads_path.
if self._is_cdp_borrowed and self._download_manager and old is not None:
try:
self._download_manager.detach_from_page(old)
except Exception:
pass
try:
self._download_manager.attach_to_page(new_page)
except Exception as e:
logger.debug(
"[_switch_self_page_to] download manager re-attach failed: %s", e
)
# No DownloadManager migration here: in CDP-borrowed mode the manager
# is intentionally NOT attached to any page (see `_start()` comments —
# attaching causes duplicate-record bugs because Playwright still
# fires `download` events even when allowAndName routes the file
# away from artifactsDir). CdpDownloadRenamer handles bridgic's
# primary tab via `record_external_download`; popup-triggered
# CDP-borrowed downloads are out of scope (see KNOWN_LIMITATIONS).
# In non-CDP / CDP-owned modes the global `attach_to_context` is the
# active pipeline, so no per-page migration is needed either.

async def _select_fallback_page(self, closed_page: Page) -> Optional[Page]:
"""Pick the next `self._page` after `closed_page` is closed.
Expand Down
Loading
Loading