diff --git a/.agent/README.md b/.agent/README.md index 9d1dc67..d7ace3c 100644 --- a/.agent/README.md +++ b/.agent/README.md @@ -379,7 +379,7 @@ npx @modelcontextprotocol/inspector scrapegraph-mcp ## 📅 Changelog ### April 2026 -- ✅ Migrated MCP client and tools to **API v2** ([scrapegraph-py#84](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/84)): base `https://api.scrapegraphai.com/api/v2`, `SGAI-APIKEY` header (matches SDK wire format), new crawl/monitor/credits/history tools; removed sitemap, agentic_scrapper, status polling tools. Env vars aligned with SDK: `SGAI_API_URL`, `SGAI_TIMEOUT` (legacy alias `SGAI_TIMEOUT_S` still honored). +- ✅ Migrated MCP client and tools to **API v2** ([scrapegraph-py#84](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/84)): base `https://v2-api.scrapegraphai.com/api`, `SGAI-APIKEY` header (matches SDK wire format), new crawl/monitor/credits/history tools; removed sitemap, agentic_scrapper, status polling tools. Env vars aligned with SDK: `SGAI_API_URL`, `SGAI_TIMEOUT` (legacy alias `SGAI_TIMEOUT_S` still honored). - ✅ Added `monitor_activity` tool for paginated tick history (GET /monitor/:id/activity), mirroring `sgai.monitor.activity()` in scrapegraph-py v2. ### January 2026 diff --git a/.agent/system/mcp_protocol.md b/.agent/system/mcp_protocol.md index 7d39bba..822a1de 100644 --- a/.agent/system/mcp_protocol.md +++ b/.agent/system/mcp_protocol.md @@ -86,10 +86,10 @@ The **Model Context Protocol** (MCP) is an open standard that defines how AI ass └────────────┬────────────────────┘ │ HTTPS API requests ▼ -┌─────────────────────────────────┐ -│ ScrapeGraphAI API │ -│ https://api.scrapegraphai.com │ -└─────────────────────────────────┘ +┌───────────────────────────────────┐ +│ ScrapeGraphAI API │ +│ v2-api.scrapegraphai.com/api │ +└───────────────────────────────────┘ ``` ### FastMCP Framework diff --git a/.agent/system/project_architecture.md b/.agent/system/project_architecture.md index b219517..94e5657 100644 --- a/.agent/system/project_architecture.md +++ b/.agent/system/project_architecture.md @@ -130,7 +130,7 @@ AI Assistant (Claude/Cursor) ↓ (stdio via MCP) FastMCP Server (this project) ↓ (HTTPS API calls) -ScrapeGraphAI API (default https://api.scrapegraphai.com/api/v2) +ScrapeGraphAI API (default https://v2-api.scrapegraphai.com/api) ↓ (web scraping) Target Websites ``` @@ -141,7 +141,7 @@ The server follows a simple, single-file architecture: **`ScapeGraphClient` Class:** - HTTP client wrapper for ScrapeGraphAI API v2 ([scrapegraph-py#84](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/84)) -- Base URL: `https://api.scrapegraphai.com/api/v2` (override with env `SGAI_API_URL`) +- Base URL: `https://v2-api.scrapegraphai.com/api` (override with env `SGAI_API_URL`) - Auth: `SGAI-APIKEY`, `X-SDK-Version: scrapegraph-mcp@2.0.0` (matches scrapegraph-py v2) - v2 methods include `scrape_v2`, `extract`, `search_api`, `crawl_*`, `monitor_*`, `credits`, `history`, plus compatibility wrappers used by MCP tools @@ -391,7 +391,7 @@ If status is "completed": ### ScrapeGraphAI API -**Base URL:** `https://api.scrapegraphai.com/api/v2` (configurable via `SGAI_API_URL`) +**Base URL:** `https://v2-api.scrapegraphai.com/api` (configurable via `SGAI_API_URL`) **Authentication:** - Headers: `SGAI-APIKEY: ` (matches scrapegraph-py v2 wire format) diff --git a/README.md b/README.md index d50fa99..e7b24a3 100644 --- a/README.md +++ b/README.md @@ -28,11 +28,11 @@ A production-ready [Model Context Protocol](https://modelcontextprotocol.io/intr ## API v2 -This MCP server targets **ScrapeGraph API v2** (`https://api.scrapegraphai.com/api/v2`), aligned 1:1 with +This MCP server targets **ScrapeGraph API v2** (`https://v2-api.scrapegraphai.com/api`), aligned 1:1 with [scrapegraph-py PR #84](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/84). Auth uses the `SGAI-APIKEY` header. Environment variables mirror the Python SDK: -- **`SGAI_API_URL`** — override the base URL (default `https://api.scrapegraphai.com/api/v2`) +- **`SGAI_API_URL`** — override the base URL (default `https://v2-api.scrapegraphai.com/api`) - **`SGAI_TIMEOUT`** — request timeout in seconds (default `120`) - **`SGAI_API_KEY`** — API key (can also be passed via MCP `scrapegraphApiKey` or `X-API-Key` header) @@ -670,7 +670,7 @@ For comprehensive developer documentation, see: ### API Integration - **ScrapeGraph AI API** - Enterprise web scraping service -- **Base URL**: `https://api.scrapegraphai.com/v1` +- **Base URL**: `https://v2-api.scrapegraphai.com/api` - **Authentication**: API key-based ## License diff --git a/server.json b/server.json index d7c057b..d16eccc 100644 --- a/server.json +++ b/server.json @@ -24,7 +24,7 @@ "name": "SGAI_API_KEY" }, { - "description": "Override API base URL (default https://api.scrapegraphai.com/api/v2)", + "description": "Override API base URL (default https://v2-api.scrapegraphai.com/api)", "isRequired": false, "format": "string", "isSecret": false, diff --git a/src/scrapegraph_mcp/server.py b/src/scrapegraph_mcp/server.py index ac2c1a4..99dcf58 100644 --- a/src/scrapegraph_mcp/server.py +++ b/src/scrapegraph_mcp/server.py @@ -24,7 +24,7 @@ Removed on v2 (no API equivalent): sitemap, agentic_scrapper, markdownify_status, smartscraper_status. Environment variables (match scrapegraph-py v2): -- SGAI_API_URL (default https://api.scrapegraphai.com/api/v2) — base URL override +- SGAI_API_URL (default https://v2-api.scrapegraphai.com/api) — base URL override - SGAI_TIMEOUT (default 120) — request timeout in seconds - SCRAPEGRAPH_API_BASE_URL — legacy alias for SGAI_API_URL (still honored) - SGAI_TIMEOUT_S — legacy alias for SGAI_TIMEOUT (still honored) @@ -88,8 +88,8 @@ logger = logging.getLogger(__name__) MCP_SERVER_VERSION = "2.0.0" -# Matches scrapegraph-py v2 (env.py): https://api.scrapegraphai.com/api/v2 -DEFAULT_API_BASE_URL = "https://api.scrapegraphai.com/api/v2" +# Matches scrapegraph-py v2 (env.py): https://v2-api.scrapegraphai.com/api +DEFAULT_API_BASE_URL = "https://v2-api.scrapegraphai.com/api" def _api_base_url() -> str: @@ -662,7 +662,7 @@ def web_scraping_guide() -> str: 1. Use **markdownify** or **scrape** before **smartscraper** when you only need readable text. 2. Multi-page **AI** extraction: run **smartscraper** per URL, or use **monitor_create** on a schedule. 3. Poll **smartcrawler_fetch_results** until the crawl finishes. -4. Override API host with env **SGAI_API_URL** if needed (default `https://api.scrapegraphai.com/api/v2`). +4. Override API host with env **SGAI_API_URL** if needed (default `https://v2-api.scrapegraphai.com/api`). """ @@ -727,7 +727,7 @@ def api_status() -> str: return """# ScapeGraph API Status (MCP v2) - **MCP package version**: 2.0.0 (matches [scrapegraph-py#84](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/84) API surface) -- **Default API base**: `https://api.scrapegraphai.com/api/v2` (override with `SGAI_API_URL`) +- **Default API base**: `https://v2-api.scrapegraphai.com/api` (override with `SGAI_API_URL`) - **Auth headers**: `SGAI-APIKEY`, `X-SDK-Version: scrapegraph-mcp@2.0.0` ## Tools