Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -130,4 +130,15 @@ playwright-report/

# venv
venv/
.venv/
.venv/

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
*.egg-info/
.pytest_cache/
.mypy_cache/
.ruff_cache/
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# LangChain Deep Agents + Browserbase (Python)

This example shows the implementation pattern that fits LangChain Deep Agents best in Python:

- Give the main Deep Agent cheap Browserbase-backed tools for `search` and `fetch`
- Add a specialized browser subagent for heavier rendered or interactive browser work
- Gate stateful browser actions behind Deep Agents `interrupt_on`

It intentionally does **not** route the agent through the Browserbase CLI. Deep Agents already wants Python tools, subagents, and interrupt handling, so the clean integration is to expose Browserbase as Python tools directly.

## Architecture

- `browserbase_search`: fast discovery with Browserbase Search
- `browserbase_fetch`: cheap page retrieval with Browserbase Fetch
- `browserbase_rendered_extract`: Stagehand-backed rendered extraction for JS-heavy pages
- `browserbase_interactive_task`: a Stagehand `agent().execute(...)` workflow for clicks, typing, login, or form submission
- `browser-specialist` subagent: isolates browser-heavy work from the main planner

## Requirements

- Python 3.11+
- `BROWSERBASE_API_KEY` for Browserbase Search, Fetch, and browser sessions
- An OpenAI-compatible base URL for the Deep Agent model if you are not using direct OpenAI

The sample uses `BROWSERBASE_API_KEY` as the fallback API key for both:

- Browserbase primitives and Stagehand
- the LangChain chat model client

That means you do not need a second model-provider secret in this sample if you point the Deep Agent model at a compatible gateway endpoint.

The sample defaults to:

- Deep Agent model: `gpt-5.4`
- Stagehand rendered-extract model: `google/gemini-3-flash-preview`
- Stagehand interactive-agent model: `anthropic/claude-sonnet-4-6`

You can override either with environment variables.

## Install

```bash
cd /Users/kylejeong/Desktop/integrations/examples/integrations/langchain/deepagents-browserbase
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

## Environment

```bash
export BROWSERBASE_API_KEY="bb_..."

# Optional overrides
export DEEPAGENT_MODEL="gpt-5.4"
export DEEPAGENT_BASE_URL="https://<your-openai-compatible-gateway>"
export STAGEHAND_MODEL="google/gemini-3-flash-preview"
export STAGEHAND_AGENT_MODEL="anthropic/claude-sonnet-4-6"
```

## Run

Use the default research prompt:

```bash
python main.py
```

Or pass your own:

```bash
python main.py "Research the Browserbase Fetch API and explain when the agent should escalate to a full browser session."
```

## Approval flow

The sample configures `interrupt_on` for `browserbase_interactive_task`.

When the agent wants to click, type, log in, or submit a form, the script pauses and asks you to:

- `approve`
- `edit`
- `reject`

This is the right place to put human approval in a Deep Agents + Browserbase design, because the approval happens at the tool boundary instead of being hidden inside ad hoc shell calls.

## Notes

- The interactive tool now uses `stagehand.agent().execute(...)` instead of a single `sessions.act(...)` call. That makes it better suited to genuine multi-step browser tasks.
- Browserbase’s Stagehand quickstart documents that Model Gateway works with just `BROWSERBASE_API_KEY` for Stagehand browser workflows.
- I did not hardcode a Browserbase model-gateway URL for the LangChain model client because I did not find an official doc page in the Browserbase docs that specifies a general-purpose OpenAI-compatible endpoint for LangChain. The sample therefore accepts `DEEPAGENT_BASE_URL` or `OPENAI_BASE_URL` explicitly.

## Suggested prompts

- `Research Browserbase Search, Fetch, and browser sessions. Give me a decision tree with citations.`
- `Open docs.browserbase.com and extract the limits of the Fetch API from the rendered docs page.`
- `Go to example.com and tell me whether any interactive action would be required to complete the task.`
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
from __future__ import annotations

import asyncio
import json
import os
import re
from typing import Any

from browserbase import Browserbase
from bs4 import BeautifulSoup
from langchain_core.tools import tool
from stagehand import AsyncStagehand

# Using the Browserbase Model Gateway, you only need to pass your Browserbase API key to use frontier models
# Docs: https://docs.browserbase.com/platform/model-gateway/overview

DEFAULT_STAGEHAND_MODEL = os.getenv(
"STAGEHAND_MODEL",
"google/gemini-3-flash-preview",
)
DEFAULT_STAGEHAND_AGENT_MODEL = os.getenv(
"STAGEHAND_AGENT_MODEL",
"anthropic/claude-sonnet-4-6",
)


def _require_env(name: str) -> str:
value = os.getenv(name, "").strip()
if not value:
raise ValueError(f"Missing required environment variable: {name}")
return value


def _browserbase_client() -> Browserbase:
return Browserbase(api_key=_require_env("BROWSERBASE_API_KEY"))


def _normalize(value: Any) -> Any:
if value is None or isinstance(value, (str, int, float, bool)):
return value
if isinstance(value, dict):
return {str(key): _normalize(val) for key, val in value.items()}
if isinstance(value, (list, tuple, set)):
return [_normalize(item) for item in value]
if hasattr(value, "model_dump"):
return _normalize(value.model_dump())
if hasattr(value, "dict"):
return _normalize(value.dict())
if hasattr(value, "__dict__"):
public = {
key: val
for key, val in vars(value).items()
if not key.startswith("_") and not callable(val)
}
if public:
return _normalize(public)
return str(value)


def _json(value: Any) -> str:
return json.dumps(_normalize(value), indent=2, default=str)


def _html_to_text(html: str, max_chars: int) -> tuple[str, str]:
soup = BeautifulSoup(html, "html.parser")
title = soup.title.get_text(" ", strip=True) if soup.title else ""
for tag in soup(["script", "style", "noscript"]):
tag.decompose()
body = soup.body or soup
text = body.get_text("\n", strip=True)
text = re.sub(r"\n{3,}", "\n\n", text)
return title, text[:max_chars]


def _stagehand_client() -> AsyncStagehand:
return AsyncStagehand(
browserbase_api_key=_require_env("BROWSERBASE_API_KEY"),
)


def _run_async(coro: Any) -> Any:
return asyncio.run(coro)


@tool
def browserbase_search(query: str, num_results: int = 5) -> str:
"""Search the web with Browserbase. Use this first for discovery before opening pages."""
bb = _browserbase_client()
response = bb.search.web(query=query, num_results=max(1, min(num_results, 10)))
results = []
for result in getattr(response, "results", []):
results.append(
{
"title": getattr(result, "title", ""),
"url": getattr(result, "url", ""),
"author": getattr(result, "author", None),
"published_date": (
getattr(result, "published_date", None)
or getattr(result, "publishedDate", None)
),
}
)
return _json(
{
"query": query,
"request_id": getattr(response, "request_id", None)
or getattr(response, "requestId", None),
"results": results,
}
)


@tool
def browserbase_fetch(url: str, use_proxy: bool = False, max_chars: int = 12000) -> str:
"""Fetch page content without a browser session. Best for static pages and quick reads."""
bb = _browserbase_client()
response = bb.fetch_api.create(url=url, proxies=use_proxy)
content = getattr(response, "content", "")
content_type = (
getattr(response, "content_type", None)
or getattr(response, "contentType", "")
or ""
).lower()

title = ""
text = str(content)[:max_chars]
if "html" in content_type:
title, text = _html_to_text(str(content), max_chars=max_chars)

return _json(
{
"url": url,
"status_code": getattr(response, "status_code", None)
or getattr(response, "statusCode", None),
"content_type": getattr(response, "content_type", None)
or getattr(response, "contentType", None),
"encoding": getattr(response, "encoding", None),
"title": title,
"text": text,
}
)


@tool
def browserbase_rendered_extract(start_url: str, instruction: str) -> str:
"""Open a full Browserbase browser session and extract rendered content from a page with Stagehand."""
return _run_async(_browserbase_rendered_extract_async(start_url=start_url, instruction=instruction))


async def _browserbase_rendered_extract_async(start_url: str, instruction: str) -> str:
client = _stagehand_client()
start_resp = await client.sessions.start(
model_name=DEFAULT_STAGEHAND_MODEL,
)
session_id = start_resp.data.session_id

try:
await client.sessions.navigate(
id=session_id,
url=start_url,
frame_id="",
)
result = await client.sessions.extract(
id=session_id,
instruction=instruction,
)
extracted = getattr(getattr(result, "data", None), "result", None)
return _json(
{
"start_url": start_url,
"session_id": session_id,
"session_url": f"https://browserbase.com/sessions/{session_id}",
"instruction": instruction,
"result": _normalize(extracted),
}
)
finally:
await client.sessions.end(id=session_id)


@tool
def browserbase_interactive_task(start_url: str, task: str) -> str:
"""Open a Browserbase-hosted Stagehand session and let a Stagehand agent execute a multi-step browser task."""
return _run_async(_browserbase_interactive_task_async(start_url=start_url, task=task))


async def _browserbase_interactive_task_async(start_url: str, task: str) -> str:
client = _stagehand_client()
start_resp = await client.sessions.start(
model_name=DEFAULT_STAGEHAND_AGENT_MODEL,
)
session_id = start_resp.data.session_id

try:
await client.sessions.navigate(
id=session_id,
url=start_url,
frame_id="",
)
result = await client.sessions.execute(
id=session_id,
execute_options={
"instruction": task,
"max_steps": 20,
},
agent_config={
"model": DEFAULT_STAGEHAND_AGENT_MODEL,
"instructions": (
"You are executing a browser task on behalf of a LangChain tool. "
"Be precise, avoid unnecessary actions, and stop once the requested task is complete."
),
},
timeout=300.0,
)
return _json(_normalize(result))
finally:
await client.sessions.end(id=session_id)

Loading