feat: add Browser Run integration (Loader + Tool) by vamshi694 · Pull Request #41 · cloudflare/langchain-cloudflare

vamshi694 · 2026-04-20T23:06:59Z

Adds Cloudflare Browser Run support to langchain-cloudflare via two new classes:

CloudflareBrowserRunLoader (BaseLoader) — document ingestion for RAG pipelines
CloudflareBrowserRunTool (BaseTool) — web interaction tool for agent workflows
Browser Run provides serverless headless Chrome on Cloudflare's edge via a REST API. It renders JS-heavy pages, extracts clean markdown, crawls sites, extracts structured JSON using AI, takes screenshots, generates PDFs, and discovers links — all with a single POST request. No local browser needed.
Combined with the rest of this library you get a full Cloudflare-native pipeline:
Browser Run (crawl) → Workers AI (embed) → Vectorize (store) → Workers AI (query)

What's included

`CloudflareBrowserRunLoader` (BaseLoader)

Converts web pages into LangChain Documents for RAG ingestion.

Mode	Endpoint	Description
`markdown`	`/markdown`	Clean markdown from any page
`crawl`	`/crawl`	Multi-page crawl with async polling
`scrape`	`/scrape`	CSS selector-based element extraction
`content`	`/content`	Raw rendered HTML
Supports sync (`load`, `lazy_load`) and async (`aload`, `alazy_load`).

`CloudflareBrowserRunTool` (BaseTool)

Gives agents the ability to interact with the live web.

Mode	Endpoint	Description
`markdown`	`/markdown`	Read any webpage as markdown
`json`	`/json`	AI-powered structured data extraction
`links`	`/links`	Discover all links on a page
`screenshot`	`/screenshot`	Capture screenshot (base64 PNG)
`pdf`	`/pdf`	Generate PDF (base64)
Each mode auto-generates a unique tool name (e.g. `cloudflare_browser_run_json`) and mode-specific description for agent disambiguation.

Technical notes

REST-only — Browser Run Quick Actions have no Workers binding. First module without a binding parameter (documented in module docstring).
Explicit timeouts — All REST calls include configurable request_timeout (default 60s) on both sync (requests) and async (httpx) paths.
Error envelope handling — Checks for {success: false} API responses. Binary endpoints (screenshot, pdf) validate content-type before base64-encoding to catch JSON/HTML error payloads.
Crawl polling — /crawl is async (POST to start, GET to poll). Configurable crawl_timeout and crawl_poll_interval with warnings.warn() on timeout.
Auth — Same CF_ACCOUNT_ID + CF_API_TOKEN pattern. Requires Browser Rendering – Edit permission.
No new dependencies — uses existing requests (sync) and httpx (async).
Version: 0.3.5 (rebased on 0.3.4)

Test plan

make lint — ruff check + ruff format + mypy, all clean
make test — 147 passed, 2 skipped (existing known skips)
46 unit tests for browser_run including mocked HTTP paths:
- Error envelopes ({success: false})
- Binary endpoint non-binary response detection
- Crawl timeout / errored status / completed record parsing
- Request body construction per mode
- Timeout propagation
15 integration tests against real Browser Run API (markdown, content, scrape, crawl, json, links, screenshot — sync + async)
Real-world pipeline tested: Browser Run → Workers AI (Llama 3.3 70B) summarization
Example notebook (docs/browser_run.ipynb) with executed outputs

References

Cloudflare Browser Run (https://developers.cloudflare.com/browser-run/) gives you serverless headless Chrome on Cloudflare's edge. This PR adds two LangChain primitives that wrap the Quick Actions REST API so Python developers can use Browser Run without running a local browser. Why this matters ---------------- LangChain's existing web loaders (WebBaseLoader, SeleniumLoader, PlaywrightLoader) all need a local browser process. Browser Run is a single POST request — no infra, no dependencies, JS-rendered content. Combined with the rest of this library you get a full Cloudflare-native RAG pipeline: Browser Run (crawl) → Workers AI (embed) → Vectorize (store) → Workers AI (query) What's included --------------- CloudflareBrowserRunLoader (BaseLoader) Converts web pages into LangChain Documents for RAG ingestion. Modes: markdown (/markdown), crawl (/crawl with async polling), scrape (/scrape with CSS selectors), content (/content). Supports sync (load, lazy_load) and async (aload, alazy_load). CloudflareBrowserRunTool (BaseTool) Gives LangGraph agents the ability to read, extract, and navigate the live web. Modes: markdown, json (/json — AI-powered structured extraction), links, screenshot, pdf. Each mode gets its own tool name (e.g. cloudflare_browser_run_json) and description so agents can pick the right tool. LangGraph integration tested with: - Custom nodes in a StateGraph DAG (parallel fan-out) - ToolNode with tools_condition routing - Parallel tool calls in a single AIMessage - Supervisor pattern dispatching to specialist tools - Research loops with conditional edges (cycles) Auth follows the existing pattern: CF_ACCOUNT_ID + CF_API_TOKEN env vars, SecretStr, same validation as rerankers.py. Browser Run is REST-only — no Worker binding path exists, noted in the module docstring. References: - Quick Actions docs: https://developers.cloudflare.com/browser-run/quick-actions/ - /json endpoint: https://developers.cloudflare.com/browser-run/quick-actions/json-endpoint/ - /crawl endpoint: https://developers.cloudflare.com/browser-run/quick-actions/crawl-endpoint/ - API reference: https://developers.cloudflare.com/api/resources/browser_rendering/ - Rename announcement: https://developers.cloudflare.com/changelog/post/2026-04-15-br-rename/ Tests: 33 unit + 19 integration (15 endpoint + 4 LangGraph patterns), all passing against the real Browser Run API.

…es, and credential notes

… outputs Address maintainer review feedback: - Rename browser_run.py to loaders.py (matches module naming convention) - Add docs/browser_run.ipynb with executed outputs from real API calls: - Workers AI docs loaded as markdown (15K chars from JS-rendered page) - books.toscrape.com crawled for knowledge base ingestion - cloudflare.com scraped for h1, h2, nav elements - Pricing data extracted as structured JSON from cloudflare.com/plans - Schema-enforced company extraction from what-is-cloudflare page - 88 links discovered from Browser Run docs - Screenshot captured (398K base64 PNG) - Full research pipeline: discover links -> load 3 pages -> summarize with Llama 3.3 70B - LangGraph patterns shown as notebook examples per maintainer guidance

1. screenshot/pdf now check content-type before base64-encoding; JSON or HTML error responses raise RuntimeError instead of returning garbage base64 data 2. All REST calls (sync requests + async httpx) now include an explicit timeout (default 60s, configurable via request_timeout) 3. Removed LangGraph pattern tests from integration suite (OOS) 4. Added 13 mocked unit tests covering the failure-prone HTTP paths: - error envelopes ({success: false}) - binary endpoint non-binary responses - crawl timeout / errored status / completed records - request body construction per mode (viewport, elements, prompt) - timeout propagation 5. Trimmed notebook to focus on the integration, not LangGraph patterns Version: 0.3.5 (rebased on 0.3.4)

Vamshi_BIDS added 3 commits April 21, 2026 14:08

docs: improve Browser Run README with mode tables, scrape/json exampl…

407ba99

…es, and credential notes

vamshi694 force-pushed the feat/browser-run-integration branch from 19071e1 to 200a38b Compare April 21, 2026 19:10

Vamshi_BIDS added 2 commits April 21, 2026 14:22

fix: add missing timeout to crawl pagination GET request

bfb68b4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Browser Run integration (Loader + Tool)#41

feat: add Browser Run integration (Loader + Tool)#41
vamshi694 wants to merge 5 commits into
cloudflare:mainfrom
vamshi694:feat/browser-run-integration

vamshi694 commented Apr 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vamshi694 commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's included

CloudflareBrowserRunLoader (BaseLoader)

CloudflareBrowserRunTool (BaseTool)

Technical notes

Test plan

References

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vamshi694 commented Apr 20, 2026 •

edited

Loading

`CloudflareBrowserRunLoader` (BaseLoader)

`CloudflareBrowserRunTool` (BaseTool)