Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 9 additions & 9 deletions .agent/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Complete system architecture documentation including:
- **Technology Stack** - Python 3.10+, FastMCP, httpx dependencies
- **Project Structure** - File organization and key files
- **Core Architecture** - MCP design, server architecture, patterns
- **MCP Tools** - API v2 tools (markdownify, scrape, smartscraper, searchscraper, crawl, credits, history, monitor, …)
- **MCP Tools** - API v3 tools (scrape, extract, search, crawl_start, crawl_get_status, schema, credits, history, monitor_*)
- **API Integration** - ScrapeGraphAI API endpoints and credit system
- **Deployment** - Smithery, Claude Desktop, Cursor, Docker setup
- **Recent Updates** - SmartCrawler integration and latest features
Expand Down Expand Up @@ -95,14 +95,14 @@ Complete Model Context Protocol integration documentation:

**...available tools and their parameters:**
- Read: [Project Architecture - MCP Tools](./system/project_architecture.md#mcp-tools)
- Quick reference: see README “Available Tools” table (v2: + scrape, crawl_stop/resume, credits, sgai_history, monitor_*; removed sitemap, agentic_scrapper, *\_status tools)
- Quick reference: see README “Available Tools” table (v2: + scrape, crawl_stop/resume, credits, history, monitor_*; removed sitemap, agentic_scrapper, *\_status tools)

**...error handling:**
- Read: [MCP Protocol - Error Handling](./system/mcp_protocol.md#error-handling)
- Pattern: Return `{"error": "message"}` instead of raising exceptions

**...how SmartCrawler works:**
- Read: [Project Architecture - Tool #4 & #5](./system/project_architecture.md#4-smartcrawler_initiate)
- Read: [Project Architecture - Tool #4 & #5](./system/project_architecture.md#4-crawl_start)
- Pattern: Initiate (async) → Poll fetch_results until complete

---
Expand Down Expand Up @@ -133,7 +133,7 @@ npx @modelcontextprotocol/inspector scrapegraph-mcp

**Manual Testing (stdio):**
```bash
echo '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"markdownify","arguments":{"website_url":"https://scrapegraphai.com"}},"id":1}' | scrapegraph-mcp
echo '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"scrape","arguments":{"website_url":"https://scrapegraphai.com","output_format":"markdown"}},"id":1}' | scrapegraph-mcp
# (v2: same tool name; backend calls POST /scrape)
```

Expand Down Expand Up @@ -177,11 +177,11 @@ Quick reference to all MCP tools:

| Tool | Notes |
|------|--------|
| `markdownify` / `scrape` | POST /scrape (v2) |
| `smartscraper` | POST /extract; URL only |
| `searchscraper` | POST /search; num_results 3–20 |
| `scrape` | POST /scrape (v2) |
| `extract` | POST /extract; URL only |
| `search` | POST /search; num_results 3–20 |
| `smartcrawler_*`, `crawl_stop`, `crawl_resume` | POST/GET /crawl |
| `credits`, `sgai_history` | GET /credits, /history |
| `credits`, `history` | GET /credits, /history |
| `monitor_*` | /monitor namespace |

For detailed tool documentation, see [Project Architecture - MCP Tools](./system/project_architecture.md#mcp-tools).
Expand Down Expand Up @@ -229,7 +229,7 @@ For detailed tool documentation, see [Project Architecture - MCP Tools](./system

**Issue: SmartCrawler not returning results**
- **Cause:** Still processing (async operation)
- **Solution:** Keep polling `smartcrawler_fetch_results()` until `status == "completed"`
- **Solution:** Keep polling `crawl_get_status()` until `status == "completed"`

**Issue: Python version error**
- **Cause:** Python < 3.10
Expand Down
40 changes: 20 additions & 20 deletions .agent/system/mcp_protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ The **Model Context Protocol** (MCP) is an open standard that defines how AI ass
- Functions exposed by the server
- Have typed parameters and return values
- Automatically discovered by AI assistants
- **Examples:** `markdownify()`, `smartscraper()`
- **Examples:** `scrape()`, `extract()`

**5. Resources**
- Data exposed by the server (optional)
Expand Down Expand Up @@ -104,7 +104,7 @@ mcp = FastMCP("ScapeGraph API MCP Server")

# Define tools with decorators
@mcp.tool()
def markdownify(website_url: str) -> Dict[str, Any]:
def scrape(website_url: str) -> Dict[str, Any]:
"""Convert a webpage to markdown."""
# Implementation...
return {"result": "..."}
Expand Down Expand Up @@ -135,7 +135,7 @@ mcp.run(transport="stdio")
**Example Flow:**
```
Client → Server (stdin):
{"jsonrpc": "2.0", "method": "tools/call", "params": {"name": "markdownify", "arguments": {"website_url": "https://example.com"}}, "id": 1}
{"jsonrpc": "2.0", "method": "tools/call", "params": {"name": "scrape", "arguments": {"website_url": "https://example.com"}}, "id": 1}

Server → Client (stdout):
{"jsonrpc": "2.0", "result": {"result": "# Example\n\nMarkdown content..."}, "id": 1}
Expand All @@ -151,7 +151,7 @@ MCP uses JSON-RPC 2.0 for message structure:
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "smartscraper",
"name": "extract",
"arguments": {
"user_prompt": "Extract product names",
"website_url": "https://example.com"
Expand Down Expand Up @@ -199,7 +199,7 @@ Response:
"result": {
"tools": [
{
"name": "markdownify",
"name": "scrape",
"description": "Convert a webpage into clean, formatted markdown.",
"inputSchema": {
"type": "object",
Expand Down Expand Up @@ -232,12 +232,12 @@ Response:

Each tool exposed by the server has a schema that defines its parameters and return type.

### Example: `markdownify` Tool
### Example: `scrape` Tool

**Python Definition:**
```python
@mcp.tool()
def markdownify(website_url: str) -> Dict[str, Any]:
def scrape(website_url: str) -> Dict[str, Any]:
"""
Convert a webpage into clean, formatted markdown.

Expand All @@ -253,7 +253,7 @@ def markdownify(website_url: str) -> Dict[str, Any]:
**Generated MCP Schema:**
```json
{
"name": "markdownify",
"name": "scrape",
"description": "Convert a webpage into clean, formatted markdown.",
"inputSchema": {
"type": "object",
Expand All @@ -275,12 +275,12 @@ def markdownify(website_url: str) -> Dict[str, Any]:
- Python `Dict[str, Any]` → JSON Schema `"type": "object"`
- Python `Optional[str]` → JSON Schema `"type": ["string", "null"]`

### Example: `smartscraper` Tool (with optional parameters)
### Example: `extract` Tool (with optional parameters)

**Python Definition:**
```python
@mcp.tool()
def smartscraper(
def extract(
user_prompt: str,
website_url: str,
number_of_scrolls: int = None,
Expand All @@ -293,7 +293,7 @@ def smartscraper(
**Generated MCP Schema:**
```json
{
"name": "smartscraper",
"name": "extract",
"description": "Extract structured data from a webpage using AI.",
"inputSchema": {
"type": "object",
Expand Down Expand Up @@ -458,8 +458,8 @@ AI: "I wasn't able to convert the webpage because your ScrapeGraphAI account has
User: "What are the main features of ScrapeGraphAI?"

Claude (internal):
1. Determines that markdownify tool could help
2. Calls: markdownify("https://scrapegraphai.com")
1. Determines that scrape tool could help
2. Calls: scrape("https://scrapegraphai.com")
3. Receives markdown content
4. Analyzes content
5. Responds to user
Expand Down Expand Up @@ -518,7 +518,7 @@ async def main():

# Call a tool
result = await session.call_tool(
"markdownify",
"scrape",
arguments={"website_url": "https://example.com"}
)
print(f"Result: {result}")
Expand All @@ -543,7 +543,7 @@ else:
Currently, the server does not implement tool versioning. All tools are v1 implicitly.

**Future Consideration:**
- Add version to tool names: `smartscraper_v2()`
- Add version to tool names: `extract_v2()`
- Maintain backward compatibility with deprecated tools
- Use MCP metadata for version info

Expand All @@ -552,11 +552,11 @@ Currently, the server does not implement tool versioning. All tools are v1 impli
MCP supports streaming results for long-running operations. This could be useful for SmartCrawler:

**Current Approach (polling):**
1. Call `smartcrawler_initiate()` → get `request_id`
2. Repeatedly call `smartcrawler_fetch_results(request_id)` until complete
1. Call `crawl_start()` → get `request_id`
2. Repeatedly call `crawl_get_status(request_id)` until complete

**Potential Streaming Approach:**
1. Call `smartcrawler_initiate()` → server keeps connection open
1. Call `crawl_start()` → server keeps connection open
2. Server streams progress updates: `{"status": "processing", "pages": 10}`
3. Server sends final result: `{"status": "completed", "results": [...]}`

Expand Down Expand Up @@ -618,8 +618,8 @@ logging.basicConfig(
logger = logging.getLogger(__name__)

@mcp.tool()
def markdownify(website_url: str) -> Dict[str, Any]:
logger.info(f"markdownify called with URL: {website_url}")
def scrape(website_url: str) -> Dict[str, Any]:
logger.info(f"scrape called with URL: {website_url}")
# ...
```

Expand Down
Loading
Loading