Add `search` tool backed by the browser-use search API by reformedot · Pull Request #67 · browser-use/terminal

reformedot · 2026-06-05T01:07:50Z

What

Adds a client-executed search tool backed by the browser-use search API (search.browser-use.com — a thin proxy in front of Parallel's Search API with browser-use auth + billing), and makes it the only search tool the model sees: the hosted web_search is no longer registered.

History: the tool was first ported from the Python DuckDuckGo Lite action, then swapped to the browser-use search API in this same PR (contract verified against the search service source). The service is live and this PR is verified against it.

Why remove the hosted `web_search`?

Terminal testing showed that with both tools exposed, OpenAI models prefer their native provider-side search (the Responses builder encodes registered web_search as web_search_preview) — so searches bypassed search.browser-use.com and its auth/billing entirely. The hosted handler module remains in-tree (codex-parity model + registry-test fixture), it's just not wired; the dispatcher membership test pins web_search absent / search present.

API integration

POST {base}/search with {"query": …} and the X-Browser-Use-API-Key header — key read from BROWSER_USE_API_KEY (fails fast with an actionable message when unset against production).
Base URL overridable via BROWSER_USE_SEARCH_URL (e.g. a local dev instance, which runs as an open proxy — keyless requests allowed there).
200 → {"results":[{title?, url, published_date?, content}]}: markdown content whitespace-normalized; untitled results fall back to their URL; only http(s):// result URLs surface (javascript:/data:/relative dropped); publication date appended to the title line.
Errors mapped per the service contract (401 invalid key, 402 insufficient balance, 400/429/502/503 with a body snippet) and surfaced as model-facing soft errors.
Output stays token-efficient: titles ≤ 30 chars, descriptions ≤ 125; URLs intact. Serial scheduling (conservative default for a billed API).

Key plumbing (signed-in users)

The stored cloud API key (auth.browser_use_cloud.api_key) is exported to BROWSER_USE_API_KEY for all browser modes on both run paths — TUI (prepare_tui_agent_run) and CLI (run_session_via_engine_with_runtime_and_cancel) — so search works for signed-in users without manual env setup. An explicitly exported env key always wins; the store only fills it when unset. (SDK-server runs remain env-only, matching the cloud browser's existing SDK behavior.)

Architecture

Handler follows the same trait stack (Approvable + Sandboxable + ToolRuntime) as the sibling tools; HTTP behind a SearchBackend seam (real reqwest impl + fakes in tests). No new dependencies.
Registered in both default_registry and the production dispatcher, with a dispatcher membership test guarding the production tool set.

Verification

cargo fmt --check ✓ · clippy: no new warnings ✓ · agent/tui/cli suites pass (the 2 PTY shell_tests flakes and slash_palette_layers_over_running_content fail identically on clean origin/main — pre-existing upstream) · uv run pytest ✓
Live against production: /health ok; real POST through HttpSearchBackend with the auth header; 401 correctly classified into the model-facing soft error. Happy path (10 real Parallel results, parsed + formatted) verified end-to-end against a service instance.
Reviewed by a multi-agent integration audit (wire contract vs the Go service source, key plumbing across TUI/CLI/SDK entry points, final-diff bug hunt); all confirmed findings fixed.
main merged in (clean, no conflicts).

🤖 Generated with Claude Code

Port the Python `search` action (DuckDuckGo Lite HTTP search) into the async agent engine as a new locally-dispatched `search` tool. Only the search logic is carried over — the `request_human_control` action and the Controller/DB/session scaffolding are dropped per "keep the logic only". Unlike the existing hosted `web_search` (provider-executed, no local I/O), this tool performs a real HTTP GET against `lite.duckduckgo.com/lite/` and parses the result HTML itself, so it works against any provider. Implementation notes: - New handler `tools/handlers/search.rs` follows the same trait stack (Approvable + Sandboxable + ToolRuntime) as the sibling tools, with the HTTP fetch behind a `SearchBackend` seam (real reqwest impl + fake for tests), mirroring the browser/python/mcp backend-injection pattern. - No new dependencies: the repo deliberately avoids HTML-parser deps (browser DOM comes from CDP), so parsing uses targeted `regex` over the fixed DuckDuckGo Lite markup plus a small hand-rolled percent-decoder and entity decoder. Faithful to the original BeautifulSoup logic. - Registered as `search` in both `default_registry` and the production dispatcher (`build_tool_dispatcher_with_cwd_and_goal_store`) so the live model can actually call it; parallel-safe (read-only). - Tests are fully deterministic (fixture HTML + fake backend, no network): parsing, URL unwrapping, entity/whitespace handling, response classification, formatting, and orchestrator/registry/dispatcher wiring. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cubic-dev-ai

2 issues found across 6 files

_{Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic}

A network-dependent end-to-end check against the real DuckDuckGo Lite endpoint via the default HttpSearchBackend. Ignored by default (so CI and `cargo test` stay deterministic and offline); run manually with: cargo test -p browser-use-agent --lib -- --ignored --nocapture search_live_smoke Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…iciency The formatted model-facing output now trims each result's title to 15 chars and description to 100 chars (ellipsis counted within the cap, on a Unicode char boundary); destination URLs are kept intact so they stay usable. Truncation is applied at the display layer (`format_results`), so `SearchResult` still carries full data for any other consumer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Tune the formatted-output truncation limits: titles 15 -> 30 chars, descriptions 100 -> 125 chars (ellipsis still counted within the cap). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…-tool

gregpr07 · 2026-06-05T15:55:09Z

DO NOT MERGE. This does not work well enough in practice and should not be merged in its current form.

The `search` tool now POSTs the query to search.browser-use.com — a thin proxy in front of Parallel's Search API with browser-use auth + billing — instead of scraping DuckDuckGo Lite HTML. Contract verified against the search service source (documents/browser-use/search): - POST {base}/search with {"query"} and the `X-Browser-Use-API-Key` header (key read from BROWSER_USE_API_KEY, the workspace's existing browser-use cloud auth variable; fails fast with an actionable message when unset). - Base URL overridable via BROWSER_USE_SEARCH_URL (e.g. a local dev instance, which runs as an open proxy without auth — keyless requests are allowed through there). - 200 -> {"results":[{title?, url, published_date?, content}]}; the multi-line markdown content is whitespace-normalized; untitled results fall back to their URL; url-less results are dropped; the publication date is appended to the title line when known. - Errors mapped per the service's table: 401 invalid key, 402 insufficient balance, other >=400 carried with a 200-char body snippet — all surfaced to the model as soft errors ("Search failed: ..."). All the DuckDuckGo HTML-parsing machinery (regex extraction, entity decoding, redirect unwrapping, percent decoding) is gone; the title/ description truncation (30/125) and output layout are unchanged. Tests rewritten against fixture JSON; live smoke now targets the real service (verified end-to-end against a local instance). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…-tool # Conflicts: # crates/browser-use-agent/src/entrypoint/provider.rs # crates/browser-use-agent/src/tools/handlers/mod.rs # crates/browser-use-agent/src/tools/handlers/search.rs # crates/browser-use-agent/src/tools/handlers/search_tests.rs # crates/browser-use-agent/src/tools/registry.rs # crates/browser-use-agent/src/tools/registry_tests.rs

Replace the multi-sentence search description with a concise one-liner matching the house/codex style (cf. web_search: "Search the web for a free-text query."), keeping only the key differentiator (no browser needed). Update the definition test accordingly + add a length guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… fixes search.browser-use.com is live; an integration review against the running service and the Go source surfaced these fixes: - TUI (`prepare_tui_agent_run`): load + export the stored cloud API key (auth.browser_use_cloud.api_key) for ALL browser modes, not only Browser Use Cloud — the `search` tool reads BROWSER_USE_API_KEY from the env, so a signed-in user on Local Chrome previously got MissingApiKey. An explicitly exported env key wins; the store only fills it when unset. - CLI (`run_session_via_engine_with_runtime_and_cancel`): same export on the headless run path (env-first, store fallback). - parse_results: restore the http(s)-only URL allowlist the DuckDuckGo-era code enforced — the output tells the model to navigate to result URLs, so javascript:/data:/relative URLs from upstream are dropped (+ test). - Contract docs: the service returns 429 (rate limited) and never 422 (upstream 400/422 are sanitized into client 400); fix the three doc sites and the classify_response test statuses. - registry_tests: pin `search` serial via the hard-coded loop only, removing the self-contradicting constant-based assert. Live-verified against production: POST to search.browser-use.com with the X-Browser-Use-API-Key header; 401 correctly surfaces as the model-facing soft error. (Happy path previously verified against a dev instance.) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…-tool

With the browser-use `search` tool live, the hosted `web_search` competed with it: on OpenAI backends the Responses builder encodes it as the provider-side `web_search_preview`, and the model prefers its native search — observed in terminal testing, where searches bypassed search.browser-use.com entirely (and its billing/auth). Remove the registration from both the production dispatcher and `default_registry` so no provider-side search is emitted and all searches go through `search`. The handler module stays as the codex-parity model of the hosted capability (and as a pure registry-test fixture); its unused `definitions::web_search()` is deleted. The dispatcher membership test now pins `web_search` ABSENT and `search` present. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…-tool

origin/main's #80 added the BrowserUse provider backend but missed the two exhaustive matches in the CLI's message analytics, so the CLI crate does not compile on main. Cover the variant per the backend's own conventions: an api_key-authenticated provider with id "browser-use" (entrypoint/provider.rs maps it to OpenAiCompatibleCustom with provider_id "browser-use" and BROWSER_USE_API_KEY auth). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

cubic-dev-ai Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread crates/browser-use-agent/src/tools/handlers/search.rs Outdated

Comment thread crates/browser-use-agent/src/tools/registry.rs Outdated

reformedot and others added 5 commits June 4, 2026 18:15

Increase search title cap to 30 and description cap to 125

84d8109

Tune the formatted-output truncation limits: titles 15 -> 30 chars, descriptions 100 -> 125 chars (ellipsis still counted within the cap). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into add-duckduckgo-search…

962a2bf

…-tool

Tune search tool guidance and scheduling

af4111c

gregpr07 changed the title ~~Add locally-executed DuckDuckGo search tool~~ DO NOT MERGE: Add locally-executed DuckDuckGo search tool Jun 5, 2026

gregpr07 changed the title ~~DO NOT MERGE: Add locally-executed DuckDuckGo search tool~~ Add locally-executed DuckDuckGo search tool Jun 5, 2026

reformedot changed the title ~~Add locally-executed DuckDuckGo search tool~~ Add search tool backed by the browser-use search API Jun 6, 2026

reformedot and others added 7 commits June 8, 2026 12:59

Merge remote-tracking branch 'origin/main' into add-duckduckgo-search…

9d37fd4

…-tool

Merge remote-tracking branch 'origin/main' into add-duckduckgo-search…

09f4e1a

…-tool

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `search` tool backed by the browser-use search API#67

Add `search` tool backed by the browser-use search API#67
reformedot wants to merge 14 commits into
mainfrom
add-duckduckgo-search-tool

reformedot commented Jun 5, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

gregpr07 commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

reformedot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why remove the hosted web_search?

API integration

Key plumbing (signed-in users)

Architecture

Verification

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gregpr07 commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

reformedot commented Jun 5, 2026 •

edited

Loading

Why remove the hosted `web_search`?

cubic-dev-ai Bot left a comment •

edited

Loading