Skip to content

feat(browser-tools): add browser_batch_get_tab_content for 10x faster multi-tab extraction#19

Open
Shawn1993 wants to merge 2 commits intoopentabs-dev:mainfrom
Shawn1993:feat/batch-get-tab-content
Open

feat(browser-tools): add browser_batch_get_tab_content for 10x faster multi-tab extraction#19
Shawn1993 wants to merge 2 commits intoopentabs-dev:mainfrom
Shawn1993:feat/batch-get-tab-content

Conversation

@Shawn1993
Copy link
Copy Markdown

Summary

Adds browser_batch_get_tab_content — a new tool to extract text content from multiple tabs in a single call using parallel execution.

Performance

Scenario Before (sequential) After (batch) Speedup
5 tabs 174ms 17ms 10x
10 tabs ~350ms ~20ms 17x

The key insight: Promise.allSettled lets us fire all requests to the extension simultaneously instead of waiting for each one sequentially.

API

// Input
{
  tabIds: number[],      // Array of tab IDs (max 20)
  selector?: string,     // CSS selector (default: 'body')
  maxLength?: number     // Max chars per tab (default: 50000)
}

// Output
{
  results: [
    { tabId: 123, title: '...', url: '...', content: '...' },
    { tabId: 456, error: 'Tab closed' },  // Failed tabs include error
    ...
  ]
}

Use Cases

  • Scraping multiple pages in parallel
  • Monitoring multiple tabs at once
  • Batch content extraction for RAG/indexing
  • Any workflow that needs data from multiple tabs

Implementation

  • Uses Promise.allSettled for parallel execution with graceful error handling
  • Failed tabs return { tabId, error } instead of throwing
  • Max 20 tabs per call to prevent resource exhaustion
  • Same selector and maxLength options as browser_get_tab_content

BREAKING: None (behavior unchanged)

Before: 11 sequential executeInTab calls + 3 sequential dispatchToExtension calls
After: All run in parallel using Promise.allSettled

Expected improvement:
- ~1-2s faster per analyzeSite call
- Reduces 14 round-trips to 2 parallel batches

The rate limit concern mentioned in the old comments (10 calls/sec) is not
applicable here since all scripts complete within ~50ms in page context.
Promise.allSettled preserves the partial-results behavior from the original
try-catch blocks.
…xtraction

Adds a new tool to fetch text content from multiple tabs in a single call.

Performance improvement:
- 5 tabs: 174ms (sequential) → 17ms (batch) = 10x faster
- Reduces N HTTP round-trips to 1

API:
- Input: { tabIds: number[], selector?: string, maxLength?: number }
- Output: { results: [{ tabId, title, url, content } | { tabId, error }] }

Uses Promise.allSettled to ensure partial results when some tabs fail.
Copy link
Copy Markdown
Owner

@opentabs-dev opentabs-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — PR #19

Two issues to address:

1. Hand-edits a generated file

platform/shared/src/generated/browser-tools-catalog.ts is auto-generated by npm run generate:browser-tools-catalog (runs tsx scripts/generate-browser-tools-catalog.ts). The catalog entry for browser_batch_get_tab_content should not be hand-written — it should be picked up automatically by running the generator after adding the tool definition.

Fix: Remove the manual edit to browser-tools-catalog.ts and run npm run generate:browser-tools-catalog instead. If the generator doesn't pick up the new tool, fix the generator.

2. Overlapping diff with PR #18

The analyze-site/index.ts parallelization is identical to PR #18. These two PRs will conflict. Suggest rebasing this PR on top of #18 (or waiting for #18 to merge) and dropping the duplicate analyze-site changes from this branch.

3. Consider whether this tool is needed

MCP clients like Claude Code can already call browser_get_tab_content multiple times in parallel (parallel tool calls). The batch tool saves one round-trip of MCP overhead but adds a new tool to the catalog that every AI agent must understand. Is the ~50ms savings per tab worth the added surface area? Not a blocker — just worth considering whether this pulls its weight vs. the existing parallel tool call pattern.

The tool implementation itself looks fine

  • Promise.allSettled with per-tab error handling is correct
  • Max 20 tabs cap is reasonable
  • Schema and types are clean

E2E tests

No tests included for the new browser_batch_get_tab_content tool. A basic E2E test that opens 2-3 tabs, calls the batch tool, and verifies results per tab would catch regressions.

Copy link
Copy Markdown
Owner

@opentabs-dev opentabs-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review — PR #19: browser_batch_get_tab_content

Verdict: Reject — the tool doesn't add meaningful value and has several issues.

1. MCP clients already provide parallel tool calls — this tool is redundant

MCP clients like Claude Code can call browser_get_tab_content N times in parallel in a single response. Both paths end up firing N concurrent dispatchToExtension calls via Promise.allSettled. The batch tool saves only the MCP protocol overhead of N-1 request/response envelopes — sub-millisecond on localhost. The "10x speedup" comparison is against sequential single-tool calls, not parallel ones.

No existing browser tool (48 files in platform/mcp-server/src/browser-tools/) uses a batch pattern. Adding one creates pressure for browser_batch_screenshot_tab, browser_batch_get_page_html, etc. — an anti-pattern when the protocol handles parallelism natively.

2. Hand-edits a generated file

platform/shared/src/generated/browser-tools-catalog.ts is auto-generated by scripts/generate-browser-tools-catalog.ts. The file header says "Do not edit manually." The PR manually inserts the catalog entry instead of running the generator. This will be overwritten on the next npm run build.

3. Identical analyze-site diff as PR #18

The analyze-site/index.ts changes are byte-for-byte identical to PR #18. This should be removed from this PR entirely — it belongs in #18.

4. Design issues in the implementation

  • Shared selector/maxLength across all tabs: The single-tab tool allows per-tab customization. The batch tool forces the same parameters for all tabs — a flexibility regression.
  • Unsafe type assertion: result as object on line 48 — if dispatchToExtension returns null or a primitive, the spread will throw or produce incorrect results.
  • No tests: Other browser tools have test files. A tool with novel parallel dispatch + error aggregation logic needs tests.

5. Bundles two unrelated changes

The analyze-site parallelization and the batch tool are independent. They should not be in the same PR.

Recommendation

Close this PR. The analyze-site parallelization should merge via PR #18. If a batch content tool is truly needed (evidence of real-world demand from agents), it should be a separate focused PR with per-tab parameter support, proper generator integration, and tests.

Copy link
Copy Markdown
Owner

@opentabs-dev opentabs-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — Recommend Close

This PR was previously reviewed twice with changes requested. The issues remain unaddressed:

  1. Redundant with MCP parallel tool calls — MCP clients like Claude Code can call browser_get_tab_content N times in parallel in a single response. The batch tool saves only sub-millisecond localhost MCP overhead, not the 10x speedup claimed (which was measured against sequential calls, not parallel).

  2. Hand-edits a generated filebrowser-tools-catalog.ts is auto-generated. Must run npm run generate:browser-tools-catalog instead.

  3. Identical diff with PR #18 — The analyze-site/index.ts changes are duplicated from PR #18. Should be removed entirely from this PR.

  4. No tests — A tool with novel parallel dispatch + error aggregation logic needs test coverage.

  5. Design issues — Shared selector/maxLength across all tabs is a flexibility regression vs per-tab calls. Unsafe result as object cast.

Recommend closing this PR. The analyze-site parallelization belongs in PR #18.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants