Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,18 @@

## [1.6.2](https://github.com/browserless/browserless-mcp/compare/v1.6.1...v1.6.2) (2026-06-08)


### Bug Fixes

* drop stale COPY patches/ from Dockerfile ([#109](https://github.com/browserless/browserless-mcp/issues/109)) ([976e38d](https://github.com/browserless/browserless-mcp/commit/976e38d4b79643d60485a01cdee0c16486b17afd))
- drop stale COPY patches/ from Dockerfile ([#109](https://github.com/browserless/browserless-mcp/issues/109)) ([976e38d](https://github.com/browserless/browserless-mcp/commit/976e38d4b79643d60485a01cdee0c16486b17afd))

## Latest

- Add file upload/download support to `browserless_agent` via the `uploadFile` and `getDownloads` commands, plus a `file-transfers` skill. Downloads **auto-surface** on every agent response as a ledger — never the bytes, without the model calling `getDownloads`: completed files (handle/path), still-running ones (with progress, so the model re-checks on its next browser touch), and over-cap ones (source URL for a direct fetch). In stdio mode the file is saved locally and you get its path; `uploadFile` accepts a `handle`, a local `path`, or base64 `content`. Honors the server-side 10MB/50MB transfer cap.
- Add out-of-band HTTP file endpoints (httpStream transport), token-gated like the MCP surface: `POST /upload` stages a local file (`curl -F file=@path "<base>/upload?token=<token>"`) and returns a handle for `uploadFile`; `GET /download/<id>?token=<token>` fetches a captured download. Files share a temp store dropped after one download fetch, a 15-minute TTL, or session end — whichever comes first.
- **Removed the standalone `browserless_download` tool.** File downloads now go through `browserless_agent` (trigger the download, then it auto-surfaces) — a single path that never inlines bytes into context. Replaces the old tool that returned the file as base64.

## v1.6.1

Drop vestigial mcp-proxy postinstall patch that broke `npm install` in consumers

- Dependency updates
Expand Down
23 changes: 11 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,18 +22,17 @@ No local install — see [Configuration](#configuration) for per-client snippets

## Tools

| Tool | Description |
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `browserless_smartscraper` | Scrape any webpage using cascading strategies (HTTP fetch, proxy, headless browser, captcha solving). Returns content in requested formats: `markdown`, `html`, `screenshot`, `pdf`, `links`. |
| `browserless_search` | Search the web using Browserless and optionally scrape each result. Supports web, news, and image search with geo-targeting and time filters. |
| `browserless_map` | Discover and map all URLs on a website. Crawls via sitemaps and link extraction. Returns URLs with optional titles and descriptions. Useful for site audits and content discovery. |
| `browserless_crawl` | Crawl a website and scrape every discovered page. Supports depth control, path filtering, sitemap strategies, and configurable scrape options. Returns scraped content and metadata for each page. |
| `browserless_performance` | Run Lighthouse audits on any URL. Returns scores and metrics for accessibility, best practices, performance, PWA, and SEO. Optionally filter by category or supply performance budgets. |
| `browserless_function` | Execute custom Puppeteer JavaScript on the Browserless cloud. The function receives a `page` object and optional `context`; return `{ data, type }` to control the payload and Content-Type. |
| `browserless_download` | Run custom Puppeteer code and return the file Chrome downloads during execution (e.g. after clicking a download link). The downloaded file is streamed back to the caller. |
| `browserless_export` | Export a webpage via the Browserless `/export` API. Fetches the URL and returns its native content (HTML, PDF, image, etc.) with automatic content-type detection. |
| `browserless_agent` | Drive a persistent browser session via a ReAct loop: snapshot the page, plan, batch interactions (click, type, scroll, evaluate, etc.), and re-snapshot. Uses ref-based selectors derived from snapshots, supports multi-tab workflows, screenshots, captcha solving, and live URLs. |
| `browserless_skill` | Load an on-demand recipe for a non-trivial page mechanic (shadow DOM, cookie consent, modals, captchas, dynamic content, snapshot misses, screenshots, tabs). Companion to `browserless_agent`. |
| Tool | Description |
| -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `browserless_smartscraper` | Scrape any webpage using cascading strategies (HTTP fetch, proxy, headless browser, captcha solving). Returns content in requested formats: `markdown`, `html`, `screenshot`, `pdf`, `links`. |
| `browserless_search` | Search the web using Browserless and optionally scrape each result. Supports web, news, and image search with geo-targeting and time filters. |
| `browserless_map` | Discover and map all URLs on a website. Crawls via sitemaps and link extraction. Returns URLs with optional titles and descriptions. Useful for site audits and content discovery. |
| `browserless_crawl` | Crawl a website and scrape every discovered page. Supports depth control, path filtering, sitemap strategies, and configurable scrape options. Returns scraped content and metadata for each page. |
| `browserless_performance` | Run Lighthouse audits on any URL. Returns scores and metrics for accessibility, best practices, performance, PWA, and SEO. Optionally filter by category or supply performance budgets. |
| `browserless_function` | Execute custom Puppeteer JavaScript on the Browserless cloud. The function receives a `page` object and optional `context`; return `{ data, type }` to control the payload and Content-Type. |
| `browserless_export` | Export a webpage via the Browserless `/export` API. Fetches the URL and returns its native content (HTML, PDF, image, etc.) with automatic content-type detection. |
| `browserless_agent` | Drive a persistent browser session via a ReAct loop: snapshot the page, plan, batch interactions (click, type, scroll, evaluate, etc.), and re-snapshot. Uses ref-based selectors derived from snapshots, supports multi-tab workflows, screenshots, captcha solving, live URLs, and file upload/download (captured downloads auto-surface as handles; bytes never enter context). |
| `browserless_skill` | Load an on-demand recipe for a non-trivial page mechanic (shadow DOM, cookie consent, modals, captchas, dynamic content, snapshot misses, screenshots, tabs). Companion to `browserless_agent`. |

## Skills

Expand Down
5 changes: 2 additions & 3 deletions src/@types/types.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ import type {
SmartScraperResponseSchema,
} from '../tools/smartscraper.js';
import type { FunctionParamsSchema } from '../tools/function.js';
import type { DownloadParamsSchema } from '../tools/download.js';
import type { ExportParamsSchema } from '../tools/export.js';
import type {
SearchSourceSchema,
Expand Down Expand Up @@ -233,7 +232,8 @@ export type SkillId =
| 'screenshots'
| 'tabs'
| 'autonomous-login'
| 'auth-profile';
| 'auth-profile'
| 'file-transfers';

export interface DetectContext {
snapshot?: SnapshotResult;
Expand Down Expand Up @@ -294,7 +294,6 @@ export type ScrapeFormat = z.infer<typeof ScrapeFormatSchema>;
export type SmartScraperParams = z.infer<typeof SmartScraperParamsSchema>;
export type SmartScraperResponse = z.infer<typeof SmartScraperResponseSchema>;
export type FunctionParams = z.infer<typeof FunctionParamsSchema>;
export type DownloadParams = z.infer<typeof DownloadParamsSchema>;
export type ExportParams = z.infer<typeof ExportParamsSchema>;
export type ProxyOptions = z.infer<typeof ProxyOptionsSchema>;
export type SearchSource = z.infer<typeof SearchSourceSchema>;
Expand Down
86 changes: 28 additions & 58 deletions src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ import { getConfig } from './config.js';
import type { BrowserlessSession } from './@types/types.js';
import { registerSmartScraperTool } from './tools/smartscraper.js';
import { registerFunctionTool } from './tools/function.js';
import { registerDownloadTool } from './tools/download.js';
import { registerExportTool } from './tools/export.js';
import { registerAgentTools } from './tools/agent.js';
import { registerSearchTool } from './tools/search.js';
Expand All @@ -17,13 +16,14 @@ import { registerCrawlTool } from './tools/crawl.js';
import { registerPerformanceTool } from './tools/performance.js';
import { registerApiDocsResource } from './resources/api-docs.js';
import { registerStatusResource } from './resources/status.js';
import { registerUploadRoute } from './resources/upload-route.js';
import { registerDownloadRoute } from './resources/download-route.js';
import { clearSession } from './lib/download-store.js';
import { registerScrapeUrlPrompt } from './prompts/scrape-url.js';
import { registerExtractContentPrompt } from './prompts/extract-content.js';
import { AnalyticsHelper } from './lib/analytics.js';
import {
resolveApiKey,
installSupabaseTokenTtlPatch,
} from './lib/account-resolver.js';
import { installSupabaseTokenTtlPatch } from './lib/account-resolver.js';
import { resolveBrowserlessAuth } from './lib/http-auth.js';
import { BoundedEventStore } from './lib/bounded-event-store.js';
import { RedisOAuthProxy } from './lib/redis-oauth-proxy.js';
import { Redis } from 'ioredis';
Expand Down Expand Up @@ -107,58 +107,21 @@ const hybridAuthenticate =
config.transport === 'httpStream'
? async (request: IncomingMessage) => {
const params = new URLSearchParams(request.url?.split('?')[1] ?? '');
const authHeader = request.headers.authorization as string | undefined;
const headerToken = authHeader?.startsWith('Bearer ')
? authHeader.slice(7)
: authHeader;

const apiUrl =
(request.headers['x-browserless-api-url'] as string) ??
params.get('browserlessUrl') ??
config.browserlessApiUrl;

// A pre-created session id to attach to, threaded by the autologin
// runner. The agent tool opens /chromium/agent?sessionId=<this> instead
// of doing its own POST /profile.
const attachSessionId =
(request.headers['x-browserless-session-id'] as string) ??
params.get('browserlessSessionId') ??
undefined;

// JWTs have 3 dot-separated base64url segments; plain API keys do not.
const isJwt = headerToken ? headerToken.split('.').length === 3 : false;

// apiUrl/attachSessionId are the same across every auth path; only the
// resolved token differs.
const session = (token: string): BrowserlessSession =>
({ token, apiUrl, attachSessionId }) as BrowserlessSession;

// 1. Authorization header with plain API key
if (headerToken && !isJwt) {
return session(headerToken);
}

// 2. ?token= query param
const directToken = params.get('token') || undefined;
if (directToken) {
return session(directToken);
}

// 3. Authorization header with JWT → decode Supabase token directly
if (isJwt && headerToken) {
const { apiKey } = await resolveApiKey(
config.supabaseUrl,
config.supabaseServiceRoleKey,
headerToken,
);
return session(apiKey);
}

throw new Error(
'No Browserless API token provided. ' +
'Pass it as Authorization: Bearer <token> header, ' +
'?token= query parameter, or authenticate via OAuth.',
);
return (await resolveBrowserlessAuth(
{
authHeader: request.headers.authorization as string | undefined,
tokenQuery: params.get('token') || undefined,
apiUrlHeader: request.headers['x-browserless-api-url'] as
| string
| undefined,
browserlessUrlQuery: params.get('browserlessUrl') || undefined,
sessionIdHeader: request.headers['x-browserless-session-id'] as
| string
| undefined,
sessionIdQuery: params.get('browserlessSessionId') || undefined,
},
config,
)) as BrowserlessSession;
}
: undefined;

Expand All @@ -171,7 +134,6 @@ const server = new FastMCP<BrowserlessSession>({

registerSmartScraperTool(server, config, analytics);
registerFunctionTool(server, config, analytics);
registerDownloadTool(server, config, analytics);
registerExportTool(server, config, analytics);
registerAgentTools(server, config, analytics);
registerSearchTool(server, config, analytics);
Expand All @@ -190,6 +152,8 @@ server.on('connect', (event) => {

server.on('disconnect', (event) => {
const id = event.session.sessionId ?? 'stdio';
// Drop any files staged/captured for this session (TTL is the backstop).
clearSession(event.session.sessionId);
console.error(`[browserless-mcp] Client disconnected: ${id}`);
});

Expand All @@ -203,6 +167,12 @@ if (config.transport === 'httpStream') {
stateless: false,
},
});
// Out-of-band file staging for uploads (the LLM curls a file here and gets a
// handle, instead of base64-ing it through the conversation). httpStream only.
registerUploadRoute(server, config);
// Single-use, out-of-band fetch for captured downloads (the LLM GETs the file
// instead of pulling bytes through the conversation). httpStream only.
registerDownloadRoute(server, config);
console.error(
`[browserless-mcp] HTTP Streamable server listening on port ${config.port}`,
);
Expand Down
Loading