feat: add Browser Run integration (Loader + Tool)#41
Open
vamshi694 wants to merge 5 commits into
Open
Conversation
added 3 commits
April 21, 2026 14:08
Cloudflare Browser Run (https://developers.cloudflare.com/browser-run/) gives you serverless headless Chrome on Cloudflare's edge. This PR adds two LangChain primitives that wrap the Quick Actions REST API so Python developers can use Browser Run without running a local browser. Why this matters ---------------- LangChain's existing web loaders (WebBaseLoader, SeleniumLoader, PlaywrightLoader) all need a local browser process. Browser Run is a single POST request — no infra, no dependencies, JS-rendered content. Combined with the rest of this library you get a full Cloudflare-native RAG pipeline: Browser Run (crawl) → Workers AI (embed) → Vectorize (store) → Workers AI (query) What's included --------------- CloudflareBrowserRunLoader (BaseLoader) Converts web pages into LangChain Documents for RAG ingestion. Modes: markdown (/markdown), crawl (/crawl with async polling), scrape (/scrape with CSS selectors), content (/content). Supports sync (load, lazy_load) and async (aload, alazy_load). CloudflareBrowserRunTool (BaseTool) Gives LangGraph agents the ability to read, extract, and navigate the live web. Modes: markdown, json (/json — AI-powered structured extraction), links, screenshot, pdf. Each mode gets its own tool name (e.g. cloudflare_browser_run_json) and description so agents can pick the right tool. LangGraph integration tested with: - Custom nodes in a StateGraph DAG (parallel fan-out) - ToolNode with tools_condition routing - Parallel tool calls in a single AIMessage - Supervisor pattern dispatching to specialist tools - Research loops with conditional edges (cycles) Auth follows the existing pattern: CF_ACCOUNT_ID + CF_API_TOKEN env vars, SecretStr, same validation as rerankers.py. Browser Run is REST-only — no Worker binding path exists, noted in the module docstring. References: - Quick Actions docs: https://developers.cloudflare.com/browser-run/quick-actions/ - /json endpoint: https://developers.cloudflare.com/browser-run/quick-actions/json-endpoint/ - /crawl endpoint: https://developers.cloudflare.com/browser-run/quick-actions/crawl-endpoint/ - API reference: https://developers.cloudflare.com/api/resources/browser_rendering/ - Rename announcement: https://developers.cloudflare.com/changelog/post/2026-04-15-br-rename/ Tests: 33 unit + 19 integration (15 endpoint + 4 LangGraph patterns), all passing against the real Browser Run API.
…es, and credential notes
… outputs Address maintainer review feedback: - Rename browser_run.py to loaders.py (matches module naming convention) - Add docs/browser_run.ipynb with executed outputs from real API calls: - Workers AI docs loaded as markdown (15K chars from JS-rendered page) - books.toscrape.com crawled for knowledge base ingestion - cloudflare.com scraped for h1, h2, nav elements - Pricing data extracted as structured JSON from cloudflare.com/plans - Schema-enforced company extraction from what-is-cloudflare page - 88 links discovered from Browser Run docs - Screenshot captured (398K base64 PNG) - Full research pipeline: discover links -> load 3 pages -> summarize with Llama 3.3 70B - LangGraph patterns shown as notebook examples per maintainer guidance
19071e1 to
200a38b
Compare
added 2 commits
April 21, 2026 14:22
1. screenshot/pdf now check content-type before base64-encoding;
JSON or HTML error responses raise RuntimeError instead of
returning garbage base64 data
2. All REST calls (sync requests + async httpx) now include an
explicit timeout (default 60s, configurable via request_timeout)
3. Removed LangGraph pattern tests from integration suite (OOS)
4. Added 13 mocked unit tests covering the failure-prone HTTP paths:
- error envelopes ({success: false})
- binary endpoint non-binary responses
- crawl timeout / errored status / completed records
- request body construction per mode (viewport, elements, prompt)
- timeout propagation
5. Trimmed notebook to focus on the integration, not LangGraph patterns
Version: 0.3.5 (rebased on 0.3.4)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds Cloudflare Browser Run support to
langchain-cloudflarevia two new classes:CloudflareBrowserRunLoader(BaseLoader) — document ingestion for RAG pipelinesCloudflareBrowserRunTool(BaseTool) — web interaction tool for agent workflowsBrowser Run provides serverless headless Chrome on Cloudflare's edge via a REST API. It renders JS-heavy pages, extracts clean markdown, crawls sites, extracts structured JSON using AI, takes screenshots, generates PDFs, and discovers links — all with a single POST request. No local browser needed.
Combined with the rest of this library you get a full Cloudflare-native pipeline:
Browser Run (crawl) → Workers AI (embed) → Vectorize (store) → Workers AI (query)
What's included
CloudflareBrowserRunLoader(BaseLoader)Converts web pages into LangChain Documents for RAG ingestion.
markdown/markdowncrawl/crawlscrape/scrapecontent/contentload,lazy_load) and async (aload,alazy_load).CloudflareBrowserRunTool(BaseTool)Gives agents the ability to interact with the live web.
markdown/markdownjson/jsonlinks/linksscreenshot/screenshotpdf/pdfcloudflare_browser_run_json) and mode-specific description for agent disambiguation.Technical notes
bindingparameter (documented in module docstring).request_timeout(default 60s) on both sync (requests) and async (httpx) paths.{success: false}API responses. Binary endpoints (screenshot,pdf) validate content-type before base64-encoding to catch JSON/HTML error payloads./crawlis async (POST to start, GET to poll). Configurablecrawl_timeoutandcrawl_poll_intervalwithwarnings.warn()on timeout.CF_ACCOUNT_ID+CF_API_TOKENpattern. Requires Browser Rendering – Edit permission.requests(sync) andhttpx(async).Test plan
make lint— ruff check + ruff format + mypy, all cleanmake test— 147 passed, 2 skipped (existing known skips){success: false})docs/browser_run.ipynb) with executed outputsReferences
/jsonendpoint/crawlendpoint