feat(tinyfish-web-agent): Rewrite skill as web agent with improved scripts#8
feat(tinyfish-web-agent): Rewrite skill as web agent with improved scripts#8giaphutran12 wants to merge 2 commits intotinyfish-io:mainfrom
Conversation
…ripts - Reframe from scraping tool to web agent (form filling, multi-step workflows) - Add run.sh with jq-based JSON, api_integration tracking, error handling - Keep extract.sh as backward-compat wrapper - Add Apache 2.0 LICENSE - Use sync endpoint as primary, SSE as advanced option - Add best practices, error handling, and limitations sections
📝 WalkthroughWalkthroughThe pull request adds an Apache-2.0 LICENSE, significantly expands SKILL.md with new examples and guidance for browser automation (form filling, multi-step workflows, stealth, proxy/geo, SSE/streaming semantics), replaces extract.sh with a deprecation wrapper delegating to a new scripts/run.sh, and introduces run.sh which validates deps and API key, builds payloads, and supports synchronous (POST /v1/automation/run) and asynchronous streaming (POST /v1/automation/run-sse) execution flows. Sequence Diagram(s)sequenceDiagram
participant User as User/CLI
participant Script as run.sh
participant API as TinyFish API
participant Handler as Response Handler
User->>Script: Invoke with URL, goal, options (--async, --stealth, --proxy)
Script->>Script: Check jq dependency
Script->>Script: Check TINYFISH_API_KEY
Script->>Script: Parse args & build JSON payload
alt Async Mode (--async)
Script->>API: POST /v1/automation/run-sse (payload)
activate API
API-->>Script: SSE stream events
deactivate API
Script->>User: Stream events to stdout
else Sync Mode (default)
Script->>API: POST /v1/automation/run (payload, 120s timeout)
activate API
API-->>Script: HTTP response with JSON {status, result?, error?}
deactivate API
Script->>Handler: Inspect status
alt status == "COMPLETED"
Handler-->>User: Print result (exit 0)
else status == "FAILED"
Handler-->>User: Print error (exit 1)
else
Handler-->>User: Print unexpected status and body (exit 2)
end
end
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
skills/tinyfish-web-agent/scripts/run.sh (1)
103-107: Make the sync timeout configurable.The helper defaults to the sync path, but this hard-caps every run at 120s while
SKILL.mdlater documents a ~5 minute server-side timeout. That means legitimate multi-step runs can fail locally before the service times out. An env var or flag would keep the safe default without baking in the shorter limit.Suggested refactor
+MAX_TIME="${TINYFISH_MAX_TIME:-120}" + echo "Running..." >&2 -HTTP_RESPONSE=$(curl --max-time 120 -s -w "\n%{http_code}" -X POST \ +HTTP_RESPONSE=$(curl --max-time "$MAX_TIME" -s -w "\n%{http_code}" -X POST \ "https://agent.tinyfish.ai/v1/automation/run" \ -H "X-API-Key: ${TINYFISH_API_KEY}" \ -H "Content-Type: application/json" \ -d "$PAYLOAD") || {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/tinyfish-web-agent/scripts/run.sh` around lines 103 - 107, The curl call that sets HTTP_RESPONSE currently hard-caps the client timeout with --max-time 120; make this configurable by replacing the literal 120 with an environment-backed variable (e.g., ${TINYFISH_SYNC_TIMEOUT:-300}) so callers can override the timeout while keeping a sensible default (300s to match SKILL.md). Update the curl invocation that assigns HTTP_RESPONSE to use the new variable (--max-time "${TINYFISH_SYNC_TIMEOUT:-300}") and ensure any callers or README mention the TINYFISH_SYNC_TIMEOUT env var.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/tinyfish-web-agent/scripts/run.sh`:
- Around line 122-134: The script currently sets STATUS by running jq directly
on HTTP_BODY which can fail under set -euo pipefail for non-JSON responses;
modify the logic around STATUS (the STATUS variable and the parsing step that
pipes HTTP_BODY to jq) to first detect/guard against invalid JSON (or use jq's
safe parsing via try/ or -e) and if parsing fails print the raw HTTP_BODY to
stderr and exit 1, otherwise extract .status and continue with the existing
COMPLETED/FAILED/else branching; ensure references to HTTP_BODY, STATUS, and the
jq parse are the only places changed so the fallback error path always runs for
non-JSON responses.
- Around line 91-96: The curl call inside the ASYNC branch (in run.sh where
ASYNC, PAYLOAD and TINYFISH_API_KEY are used) currently treats non-2xx SSE
responses as success; modify the exec curl invocation to include
--fail-with-body (or -f --fail-with-body) so curl will return a non-zero exit
code on HTTP 4xx/5xx and propagate failure from the SSE endpoint instead of
exiting 0.
In `@skills/tinyfish-web-agent/SKILL.md`:
- Around line 196-202: The fenced code block in SKILL.md is missing a language
tag and triggers markdownlint MD040; update the specific fenced block that lists
the numbered steps (the four-line example starting with "1. Click the Login
button") by adding the language identifier "text" after the opening triple
backticks so the block becomes ```text, which will satisfy the linter and
preserve the example content.
---
Nitpick comments:
In `@skills/tinyfish-web-agent/scripts/run.sh`:
- Around line 103-107: The curl call that sets HTTP_RESPONSE currently hard-caps
the client timeout with --max-time 120; make this configurable by replacing the
literal 120 with an environment-backed variable (e.g.,
${TINYFISH_SYNC_TIMEOUT:-300}) so callers can override the timeout while keeping
a sensible default (300s to match SKILL.md). Update the curl invocation that
assigns HTTP_RESPONSE to use the new variable (--max-time
"${TINYFISH_SYNC_TIMEOUT:-300}") and ensure any callers or README mention the
TINYFISH_SYNC_TIMEOUT env var.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 15ef5f81-6618-4347-825b-5369d92ebe7f
📒 Files selected for processing (4)
skills/tinyfish-web-agent/LICENSEskills/tinyfish-web-agent/SKILL.mdskills/tinyfish-web-agent/scripts/extract.shskills/tinyfish-web-agent/scripts/run.sh
- Add --fail-with-body and -sS to async curl so HTTP 4xx/5xx errors are detected instead of silently exiting 0 - Guard jq parsing with proper error handling for non-JSON responses under set -euo pipefail - Use printf over echo for safer string handling - Add 'text' language tag to SKILL.md code fence (MD040)
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
skills/tinyfish-web-agent/SKILL.md (1)
244-256:⚠️ Potential issue | 🟡 MinorHarden the SSE curl example to surface handshake failures.
This example still uses
curl -N -s, so a non-2xx response from/v1/automation/run-ssecan look like a successful command. Matching the helper script’s-sS --fail-with-bodyflags will make the documented one-liner fail correctly too.Suggested fix
-curl -N -s -X POST "https://agent.tinyfish.ai/v1/automation/run-sse" \ +curl -N -sS --fail-with-body -X POST "https://agent.tinyfish.ai/v1/automation/run-sse" \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/tinyfish-web-agent/SKILL.md` around lines 244 - 256, The curl example in SKILL.md uses "curl -N -s" which hides HTTP handshake failures; update the curl flags in the /v1/automation/run-sse example to match the helper script by using -sS --fail-with-body (and keep -N for streaming), so replace the "-N -s" invocation with "-N -sS --fail-with-body" in the curl command to surface non-2xx responses; ensure the change targets the curl one-liner shown in the diff for the run-sse POST example.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/tinyfish-web-agent/scripts/run.sh`:
- Around line 53-59: In the --proxy) case branch, reject option-like next tokens
by checking that the second argument ($2) is not missing and does not start with
a dash before assigning PROXY_COUNTRY; add a guard such as: if [[ "$2" == -* ]];
then echo "Error: --proxy requires a country code argument" >&2; exit 1; fi so
that option-like values like "--async" are treated as invalid, then proceed to
set PROXY_COUNTRY="$2" and shift 2.
- Around line 103-109: The curl invocation that sets HTTP_RESPONSE uses -s which
hides curl's error diagnostics; update that curl command (the one assigning
HTTP_RESPONSE and posting PAYLOAD to
"https://agent.tinyfish.ai/v1/automation/run" with X-API-Key
${TINYFISH_API_KEY}) to use -sS instead of -s so curl will still silence
progress but preserve stderr error messages for transport/TLS/DNS failures; keep
the rest of the flags and the existing error-handling block unchanged.
---
Duplicate comments:
In `@skills/tinyfish-web-agent/SKILL.md`:
- Around line 244-256: The curl example in SKILL.md uses "curl -N -s" which
hides HTTP handshake failures; update the curl flags in the
/v1/automation/run-sse example to match the helper script by using -sS
--fail-with-body (and keep -N for streaming), so replace the "-N -s" invocation
with "-N -sS --fail-with-body" in the curl command to surface non-2xx responses;
ensure the change targets the curl one-liner shown in the diff for the run-sse
POST example.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8feb44cb-6563-43ab-a5cf-f2e593767915
📒 Files selected for processing (2)
skills/tinyfish-web-agent/SKILL.mdskills/tinyfish-web-agent/scripts/run.sh
| --proxy) | ||
| if [ $# -lt 2 ]; then | ||
| echo "Error: --proxy requires a country code argument" >&2 | ||
| exit 1 | ||
| fi | ||
| PROXY_COUNTRY="$2" | ||
| shift 2 |
There was a problem hiding this comment.
Reject option-like values after --proxy.
--proxy --async is currently accepted and sent as "country_code": "--async". Treat a missing or option-like next token as invalid input here so flag order mistakes fail fast locally.
Suggested fix
--proxy)
- if [ $# -lt 2 ]; then
+ if [ $# -lt 2 ] || [[ "$2" == --* ]]; then
echo "Error: --proxy requires a country code argument" >&2
exit 1
fi
PROXY_COUNTRY="$2"
shift 2📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| --proxy) | |
| if [ $# -lt 2 ]; then | |
| echo "Error: --proxy requires a country code argument" >&2 | |
| exit 1 | |
| fi | |
| PROXY_COUNTRY="$2" | |
| shift 2 | |
| --proxy) | |
| if [ $# -lt 2 ] || [[ "$2" == --* ]]; then | |
| echo "Error: --proxy requires a country code argument" >&2 | |
| exit 1 | |
| fi | |
| PROXY_COUNTRY="$2" | |
| shift 2 |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@skills/tinyfish-web-agent/scripts/run.sh` around lines 53 - 59, In the
--proxy) case branch, reject option-like next tokens by checking that the second
argument ($2) is not missing and does not start with a dash before assigning
PROXY_COUNTRY; add a guard such as: if [[ "$2" == -* ]]; then echo "Error:
--proxy requires a country code argument" >&2; exit 1; fi so that option-like
values like "--async" are treated as invalid, then proceed to set
PROXY_COUNTRY="$2" and shift 2.
| HTTP_RESPONSE=$(curl --max-time 120 -s -w "\n%{http_code}" -X POST \ | ||
| "https://agent.tinyfish.ai/v1/automation/run" \ | ||
| -H "X-API-Key: ${TINYFISH_API_KEY}" \ | ||
| -H "Content-Type: application/json" \ | ||
| -d "$PAYLOAD") || { | ||
| echo "Error: curl request failed" >&2 | ||
| exit 1 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's check if the file exists and read the relevant section
if [ -f "skills/tinyfish-web-agent/scripts/run.sh" ]; then
echo "=== File found. Reading lines 100-115 for context ==="
sed -n '100,115p' "skills/tinyfish-web-agent/scripts/run.sh" | cat -n
else
echo "File not found at that path"
fiRepository: tinyfish-io/skills
Length of output: 660
🏁 Script executed:
# Let's also check if there are other curl calls in the script
echo "=== All curl commands in the script ==="
rg "curl" "skills/tinyfish-web-agent/scripts/run.sh" -A 3 -nRepository: tinyfish-io/skills
Length of output: 600
🏁 Script executed:
# Let's verify the total lines in the file
wc -l "skills/tinyfish-web-agent/scripts/run.sh"Repository: tinyfish-io/skills
Length of output: 106
Use -sS for the sync curl call to preserve error diagnostics.
With -s alone, transport failures (DNS, TLS, connection errors) only hit the generic fallback message without curl's diagnostic output. -sS preserves stderr while keeping the current HTTP body and status code parsing unchanged. Note that line 93 in the same script already uses -sS for consistency.
Suggested fix
-HTTP_RESPONSE=$(curl --max-time 120 -s -w "\n%{http_code}" -X POST \
+HTTP_RESPONSE=$(curl --max-time 120 -sS -w "\n%{http_code}" -X POST \📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| HTTP_RESPONSE=$(curl --max-time 120 -s -w "\n%{http_code}" -X POST \ | |
| "https://agent.tinyfish.ai/v1/automation/run" \ | |
| -H "X-API-Key: ${TINYFISH_API_KEY}" \ | |
| -H "Content-Type: application/json" \ | |
| -d "$PAYLOAD") || { | |
| echo "Error: curl request failed" >&2 | |
| exit 1 | |
| HTTP_RESPONSE=$(curl --max-time 120 -sS -w "\n%{http_code}" -X POST \ | |
| "https://agent.tinyfish.ai/v1/automation/run" \ | |
| -H "X-API-Key: ${TINYFISH_API_KEY}" \ | |
| -H "Content-Type: application/json" \ | |
| -d "$PAYLOAD") || { | |
| echo "Error: curl request failed" >&2 | |
| exit 1 |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@skills/tinyfish-web-agent/scripts/run.sh` around lines 103 - 109, The curl
invocation that sets HTTP_RESPONSE uses -s which hides curl's error diagnostics;
update that curl command (the one assigning HTTP_RESPONSE and posting PAYLOAD to
"https://agent.tinyfish.ai/v1/automation/run" with X-API-Key
${TINYFISH_API_KEY}) to use -sS instead of -s so curl will still silence
progress but preserve stderr error messages for transport/TLS/DNS failures; keep
the rest of the flags and the existing error-handling block unchanged.
|
Fix 1 — scripts/run.sh — Async curl error detection Fix 2 — scripts/run.sh — Guard JSON parsing Fix 3 — SKILL.md — Markdown linter fix Replace with: |
|
These are fix suggestions, please take a look |
|
All 3 fixes applied and pushed → |
Summary
Rewrites the tinyfish-web-agent skill to better reflect TinyFish's capabilities as a web agent, not just a scraper.
Changes
SKILL.md (rewritten)
/v1/automation/run(simpler, blocks until done) with SSE as an advanced optionscripts/run.sh (new)
jqfor safe JSON construction (no more string concatenation)api_integration: "openclaw"in all requests--max-time 120on sync requests to prevent silent timeoutsresulton success, error to stderr on failure--stealth,--proxy COUNTRY,--asyncscripts/extract.sh (backward-compat wrapper)
run.shwith a deprecation noticeLICENSE (new)
Files Changed
skills/tinyfish-web-agent/SKILL.md— rewritten (278 lines, was 151)skills/tinyfish-web-agent/scripts/run.sh— new primary scriptskills/tinyfish-web-agent/scripts/extract.sh— replaced with wrapperskills/tinyfish-web-agent/LICENSE— newTesting
openclaw install-skill giaphutran12/skills/skills/tinyfish-web-agentTINYFISH_API_KEYenv var./scripts/run.sh "https://example.com" 'Extract the page title as JSON: {"title": "string"}'