From 3ae7796ff2670f0faaba0114b181d10ba5791d5e Mon Sep 17 00:00:00 2001 From: Marco Vinciguerra Date: Mon, 27 Apr 2026 13:14:56 +0200 Subject: [PATCH 1/2] docs: expand README sections and update API base URL Flesh out per-command sections (Scrape, Extract, Search, Crawl, Monitor, History) with short descriptions and example invocations instead of bare docs links. Update the documented SGAI_API_URL default in README and SKILL.md to https://v2-api.scrapegraphai.com. Co-Authored-By: Claude Opus 4.7 (1M context) --- README.md | 68 ++++++++++++++++++++++++++++++++++++- skills/just-scrape/SKILL.md | 2 +- 2 files changed, 68 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 214138b..e73a15e 100644 --- a/README.md +++ b/README.md @@ -79,7 +79,7 @@ Four ways to provide it: | Variable | Description | Default | | -------------- | --------------------- | -------------------------------------- | | `SGAI_API_KEY` | ScrapeGraph API key | — | -| `SGAI_API_URL` | Override API base URL | `https://api.scrapegraphai.com/api/v2` | +| `SGAI_API_URL` | Override API base URL | `https://v2-api.scrapegraphai.com` | | `SGAI_TIMEOUT` | Timeout (seconds) | `120` | | `SGAI_DEBUG` | Debug logs | `0` | @@ -95,36 +95,102 @@ just-scrape history scrape --json | jq '.[].id' ## Scrape +Fetch a URL and return one or more formats: `markdown`, `html`, `screenshot`, `branding`, `links`, `images`, `summary`, or `json` (AI extraction). Default: `markdown`. + +```bash +just-scrape scrape https://example.com +just-scrape scrape https://example.com -f markdown,links,images +just-scrape scrape https://example.com -f json -p "Extract all products" +just-scrape scrape https://app.example.com --mode js --stealth --scrolls 5 +``` + [docs](https://docs.scrapegraphai.com/api-reference/scrape?utm_source=skil&utm_medium=readme&utm_campaign=skill) ## Extract +Extract structured JSON from a known URL with AI. A dedicated endpoint optimized for extraction; equivalent to `scrape -f json` but tuned for that path. + +```bash +just-scrape extract https://store.example.com -p "Extract product names and prices" +just-scrape extract https://news.example.com -p "Get headlines and dates" \ + --schema '{"type":"object","properties":{"articles":{"type":"array"}}}' +just-scrape extract https://app.example.com -p "Extract user stats" \ + --cookies '{"session":"abc123"}' --stealth +``` + [docs](https://docs.scrapegraphai.com/api-reference/extract?utm_source=skil&utm_medium=readme&utm_campaign=skill) ## Search +Search the web and optionally extract structured data from the results. + +```bash +just-scrape search "Best Python web frameworks in 2026" --num-results 10 +just-scrape search "Top 5 cloud providers pricing" \ + -p "Extract provider name and free-tier details" +just-scrape search "AI regulation EU" --time-range past_week --country eu +``` + [docs](https://docs.scrapegraphai.com/api-reference/search?utm_source=skil&utm_medium=readme&utm_campaign=skill) ## Crawl +Crawl multiple pages from a starting URL. Returns a job that's polled until completion. + +```bash +just-scrape crawl https://docs.example.com --max-pages 50 --max-depth 3 +just-scrape crawl https://example.com \ + --include-patterns '["^https://example\\.com/blog/.*"]' \ + --exclude-patterns '[".*\\.pdf$"]' +just-scrape crawl https://example.com -f markdown,links,images --max-pages 20 +``` + [docs](https://docs.scrapegraphai.com/api-reference/crawl?utm_source=skil&utm_medium=readme&utm_campaign=skill) ## Monitor +Schedule a page to be re-scraped on a cron interval and (optionally) post diffs to a webhook. Actions: `create`, `list`, `get`, `update`, `pause`, `resume`, `delete`, `activity`. + +```bash +just-scrape monitor create \ + --url https://store.example.com/pricing \ + --interval 1h \ + --webhook-url https://hooks.example.com/pricing +just-scrape monitor list +just-scrape monitor activity --id mon_abc123 --limit 50 +just-scrape monitor pause --id mon_abc123 +``` + +`--interval` accepts a cron expression (`0 * * * *`) or shorthand (`1h`, `30m`, `1d`). + [docs](https://docs.scrapegraphai.com/api-reference/monitor?utm_source=skil&utm_medium=readme&utm_campaign=skill) ## History +Browse past requests. Interactive by default (arrow keys); pass an ID to view a specific request. Services: `scrape`, `extract`, `search`, `crawl`, `monitor`. + +```bash +just-scrape history # all services, interactive +just-scrape history extract +just-scrape history scrape req_abc123 --json +just-scrape history crawl --json --page-size 100 | jq '.[] | {id, status}' +``` + [docs](https://docs.scrapegraphai.com/api-reference/history?utm_source=skil&utm_medium=readme&utm_campaign=skill) ## Credits +Check your remaining credit balance. + ```bash id="m6c9tb" just-scrape credits +just-scrape credits --json | jq '.remaining' ``` ## Validate +Health-check the API and validate your key. + ```bash id="c2a2f9" just-scrape validate ``` diff --git a/skills/just-scrape/SKILL.md b/skills/just-scrape/SKILL.md index fa3b81e..ce5f451 100644 --- a/skills/just-scrape/SKILL.md +++ b/skills/just-scrape/SKILL.md @@ -290,7 +290,7 @@ just-scrape scrape https://protected.example.com --mode js --stealth | Variable | Description | Default | |---|---|---| | `SGAI_API_KEY` | ScrapeGraph API key | — | -| `SGAI_API_URL` | Override API base URL | `https://api.scrapegraphai.com/api/v2` | +| `SGAI_API_URL` | Override API base URL | `https://v2-api.scrapegraphai.com` | | `SGAI_TIMEOUT` | Request timeout (seconds) | `120` | | `SGAI_DEBUG` | Debug logging to stderr (`1` to enable) | `0` | From 10b5b6e09e2c086318e2c91b7b56061890ed2e68 Mon Sep 17 00:00:00 2001 From: Marco Vinciguerra Date: Mon, 27 Apr 2026 13:16:54 +0200 Subject: [PATCH 2/2] docs: remove broken per-section docs links from README The docs.scrapegraphai.com/api-reference/{scrape,extract,search,crawl, monitor,history} URLs do not exist. Drop them from each command section in the README. Co-Authored-By: Claude Opus 4.7 (1M context) --- README.md | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/README.md b/README.md index e73a15e..3518a22 100644 --- a/README.md +++ b/README.md @@ -104,8 +104,6 @@ just-scrape scrape https://example.com -f json -p "Extract all products" just-scrape scrape https://app.example.com --mode js --stealth --scrolls 5 ``` -[docs](https://docs.scrapegraphai.com/api-reference/scrape?utm_source=skil&utm_medium=readme&utm_campaign=skill) - ## Extract Extract structured JSON from a known URL with AI. A dedicated endpoint optimized for extraction; equivalent to `scrape -f json` but tuned for that path. @@ -118,8 +116,6 @@ just-scrape extract https://app.example.com -p "Extract user stats" \ --cookies '{"session":"abc123"}' --stealth ``` -[docs](https://docs.scrapegraphai.com/api-reference/extract?utm_source=skil&utm_medium=readme&utm_campaign=skill) - ## Search Search the web and optionally extract structured data from the results. @@ -131,8 +127,6 @@ just-scrape search "Top 5 cloud providers pricing" \ just-scrape search "AI regulation EU" --time-range past_week --country eu ``` -[docs](https://docs.scrapegraphai.com/api-reference/search?utm_source=skil&utm_medium=readme&utm_campaign=skill) - ## Crawl Crawl multiple pages from a starting URL. Returns a job that's polled until completion. @@ -145,8 +139,6 @@ just-scrape crawl https://example.com \ just-scrape crawl https://example.com -f markdown,links,images --max-pages 20 ``` -[docs](https://docs.scrapegraphai.com/api-reference/crawl?utm_source=skil&utm_medium=readme&utm_campaign=skill) - ## Monitor Schedule a page to be re-scraped on a cron interval and (optionally) post diffs to a webhook. Actions: `create`, `list`, `get`, `update`, `pause`, `resume`, `delete`, `activity`. @@ -163,8 +155,6 @@ just-scrape monitor pause --id mon_abc123 `--interval` accepts a cron expression (`0 * * * *`) or shorthand (`1h`, `30m`, `1d`). -[docs](https://docs.scrapegraphai.com/api-reference/monitor?utm_source=skil&utm_medium=readme&utm_campaign=skill) - ## History Browse past requests. Interactive by default (arrow keys); pass an ID to view a specific request. Services: `scrape`, `extract`, `search`, `crawl`, `monitor`. @@ -176,8 +166,6 @@ just-scrape history scrape req_abc123 --json just-scrape history crawl --json --page-size 100 | jq '.[] | {id, status}' ``` -[docs](https://docs.scrapegraphai.com/api-reference/history?utm_source=skil&utm_medium=readme&utm_campaign=skill) - ## Credits Check your remaining credit balance.