Skip to content

feat(tinyfish-web-agent): Rewrite skill as web agent with improved scripts#8

Open
giaphutran12 wants to merge 2 commits intotinyfish-io:mainfrom
giaphutran12:polish-tinyfish-skill
Open

feat(tinyfish-web-agent): Rewrite skill as web agent with improved scripts#8
giaphutran12 wants to merge 2 commits intotinyfish-io:mainfrom
giaphutran12:polish-tinyfish-skill

Conversation

@giaphutran12
Copy link
Copy Markdown

Summary

Rewrites the tinyfish-web-agent skill to better reflect TinyFish's capabilities as a web agent, not just a scraper.

Changes

SKILL.md (rewritten)

  • Web agent framing: 80% of examples now show browser automation (form filling, multi-step workflows, stealth browsing, geo-proxying) instead of just data extraction
  • Sync endpoint as primary: Uses /v1/automation/run (simpler, blocks until done) with SSE as an advanced option
  • Best practices section: Distilled from the prompting guide — JSON schema tips, numbered steps, guardrails, edge case handling
  • Error handling section: HTTP error codes + task-level failure troubleshooting
  • Limitations documented: CAPTCHAs, infinite scroll, session persistence, timeout

scripts/run.sh (new)

  • Uses jq for safe JSON construction (no more string concatenation)
  • Includes api_integration: "openclaw" in all requests
  • --max-time 120 on sync requests to prevent silent timeouts
  • Parses response: prints result on success, error to stderr on failure
  • Proper exit codes (0 success, 1 failure)
  • Flags: --stealth, --proxy COUNTRY, --async

scripts/extract.sh (backward-compat wrapper)

  • Now a thin wrapper that calls run.sh with a deprecation notice
  • Same CLI interface preserved for existing users

LICENSE (new)

  • Apache 2.0

Files Changed

  • skills/tinyfish-web-agent/SKILL.md — rewritten (278 lines, was 151)
  • skills/tinyfish-web-agent/scripts/run.sh — new primary script
  • skills/tinyfish-web-agent/scripts/extract.sh — replaced with wrapper
  • skills/tinyfish-web-agent/LICENSE — new

Testing

  1. Install the skill: openclaw install-skill giaphutran12/skills/skills/tinyfish-web-agent
  2. Set TINYFISH_API_KEY env var
  3. Try: ./scripts/run.sh "https://example.com" 'Extract the page title as JSON: {"title": "string"}'

…ripts

- Reframe from scraping tool to web agent (form filling, multi-step workflows)
- Add run.sh with jq-based JSON, api_integration tracking, error handling
- Keep extract.sh as backward-compat wrapper
- Add Apache 2.0 LICENSE
- Use sync endpoint as primary, SSE as advanced option
- Add best practices, error handling, and limitations sections
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 9, 2026

📝 Walkthrough

Walkthrough

The pull request adds an Apache-2.0 LICENSE, significantly expands SKILL.md with new examples and guidance for browser automation (form filling, multi-step workflows, stealth, proxy/geo, SSE/streaming semantics), replaces extract.sh with a deprecation wrapper delegating to a new scripts/run.sh, and introduces run.sh which validates deps and API key, builds payloads, and supports synchronous (POST /v1/automation/run) and asynchronous streaming (POST /v1/automation/run-sse) execution flows.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/CLI
    participant Script as run.sh
    participant API as TinyFish API
    participant Handler as Response Handler

    User->>Script: Invoke with URL, goal, options (--async, --stealth, --proxy)
    Script->>Script: Check jq dependency
    Script->>Script: Check TINYFISH_API_KEY
    Script->>Script: Parse args & build JSON payload

    alt Async Mode (--async)
        Script->>API: POST /v1/automation/run-sse (payload)
        activate API
        API-->>Script: SSE stream events
        deactivate API
        Script->>User: Stream events to stdout
    else Sync Mode (default)
        Script->>API: POST /v1/automation/run (payload, 120s timeout)
        activate API
        API-->>Script: HTTP response with JSON {status, result?, error?}
        deactivate API
        Script->>Handler: Inspect status
        alt status == "COMPLETED"
            Handler-->>User: Print result (exit 0)
        else status == "FAILED"
            Handler-->>User: Print error (exit 1)
        else
            Handler-->>User: Print unexpected status and body (exit 2)
        end
    end
Loading
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main changes: rewriting the tinyfish-web-agent skill and improving its scripts.
Description check ✅ Passed The description is well-organized, clearly related to the changeset, and provides specific details about the rewrite and new features.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
skills/tinyfish-web-agent/scripts/run.sh (1)

103-107: Make the sync timeout configurable.

The helper defaults to the sync path, but this hard-caps every run at 120s while SKILL.md later documents a ~5 minute server-side timeout. That means legitimate multi-step runs can fail locally before the service times out. An env var or flag would keep the safe default without baking in the shorter limit.

Suggested refactor
+MAX_TIME="${TINYFISH_MAX_TIME:-120}"
+
 echo "Running..." >&2
 
-HTTP_RESPONSE=$(curl --max-time 120 -s -w "\n%{http_code}" -X POST \
+HTTP_RESPONSE=$(curl --max-time "$MAX_TIME" -s -w "\n%{http_code}" -X POST \
   "https://agent.tinyfish.ai/v1/automation/run" \
   -H "X-API-Key: ${TINYFISH_API_KEY}" \
   -H "Content-Type: application/json" \
   -d "$PAYLOAD") || {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/tinyfish-web-agent/scripts/run.sh` around lines 103 - 107, The curl
call that sets HTTP_RESPONSE currently hard-caps the client timeout with
--max-time 120; make this configurable by replacing the literal 120 with an
environment-backed variable (e.g., ${TINYFISH_SYNC_TIMEOUT:-300}) so callers can
override the timeout while keeping a sensible default (300s to match SKILL.md).
Update the curl invocation that assigns HTTP_RESPONSE to use the new variable
(--max-time "${TINYFISH_SYNC_TIMEOUT:-300}") and ensure any callers or README
mention the TINYFISH_SYNC_TIMEOUT env var.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/tinyfish-web-agent/scripts/run.sh`:
- Around line 122-134: The script currently sets STATUS by running jq directly
on HTTP_BODY which can fail under set -euo pipefail for non-JSON responses;
modify the logic around STATUS (the STATUS variable and the parsing step that
pipes HTTP_BODY to jq) to first detect/guard against invalid JSON (or use jq's
safe parsing via try/ or -e) and if parsing fails print the raw HTTP_BODY to
stderr and exit 1, otherwise extract .status and continue with the existing
COMPLETED/FAILED/else branching; ensure references to HTTP_BODY, STATUS, and the
jq parse are the only places changed so the fallback error path always runs for
non-JSON responses.
- Around line 91-96: The curl call inside the ASYNC branch (in run.sh where
ASYNC, PAYLOAD and TINYFISH_API_KEY are used) currently treats non-2xx SSE
responses as success; modify the exec curl invocation to include
--fail-with-body (or -f --fail-with-body) so curl will return a non-zero exit
code on HTTP 4xx/5xx and propagate failure from the SSE endpoint instead of
exiting 0.

In `@skills/tinyfish-web-agent/SKILL.md`:
- Around line 196-202: The fenced code block in SKILL.md is missing a language
tag and triggers markdownlint MD040; update the specific fenced block that lists
the numbered steps (the four-line example starting with "1. Click the Login
button") by adding the language identifier "text" after the opening triple
backticks so the block becomes ```text, which will satisfy the linter and
preserve the example content.

---

Nitpick comments:
In `@skills/tinyfish-web-agent/scripts/run.sh`:
- Around line 103-107: The curl call that sets HTTP_RESPONSE currently hard-caps
the client timeout with --max-time 120; make this configurable by replacing the
literal 120 with an environment-backed variable (e.g.,
${TINYFISH_SYNC_TIMEOUT:-300}) so callers can override the timeout while keeping
a sensible default (300s to match SKILL.md). Update the curl invocation that
assigns HTTP_RESPONSE to use the new variable (--max-time
"${TINYFISH_SYNC_TIMEOUT:-300}") and ensure any callers or README mention the
TINYFISH_SYNC_TIMEOUT env var.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 15ef5f81-6618-4347-825b-5369d92ebe7f

📥 Commits

Reviewing files that changed from the base of the PR and between c997951 and d190151.

📒 Files selected for processing (4)
  • skills/tinyfish-web-agent/LICENSE
  • skills/tinyfish-web-agent/SKILL.md
  • skills/tinyfish-web-agent/scripts/extract.sh
  • skills/tinyfish-web-agent/scripts/run.sh

- Add --fail-with-body and -sS to async curl so HTTP 4xx/5xx errors
  are detected instead of silently exiting 0
- Guard jq parsing with proper error handling for non-JSON responses
  under set -euo pipefail
- Use printf over echo for safer string handling
- Add 'text' language tag to SKILL.md code fence (MD040)
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
skills/tinyfish-web-agent/SKILL.md (1)

244-256: ⚠️ Potential issue | 🟡 Minor

Harden the SSE curl example to surface handshake failures.

This example still uses curl -N -s, so a non-2xx response from /v1/automation/run-sse can look like a successful command. Matching the helper script’s -sS --fail-with-body flags will make the documented one-liner fail correctly too.

Suggested fix
-curl -N -s -X POST "https://agent.tinyfish.ai/v1/automation/run-sse" \
+curl -N -sS --fail-with-body -X POST "https://agent.tinyfish.ai/v1/automation/run-sse" \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/tinyfish-web-agent/SKILL.md` around lines 244 - 256, The curl example
in SKILL.md uses "curl -N -s" which hides HTTP handshake failures; update the
curl flags in the /v1/automation/run-sse example to match the helper script by
using -sS --fail-with-body (and keep -N for streaming), so replace the "-N -s"
invocation with "-N -sS --fail-with-body" in the curl command to surface non-2xx
responses; ensure the change targets the curl one-liner shown in the diff for
the run-sse POST example.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/tinyfish-web-agent/scripts/run.sh`:
- Around line 53-59: In the --proxy) case branch, reject option-like next tokens
by checking that the second argument ($2) is not missing and does not start with
a dash before assigning PROXY_COUNTRY; add a guard such as: if [[ "$2" == -* ]];
then echo "Error: --proxy requires a country code argument" >&2; exit 1; fi so
that option-like values like "--async" are treated as invalid, then proceed to
set PROXY_COUNTRY="$2" and shift 2.
- Around line 103-109: The curl invocation that sets HTTP_RESPONSE uses -s which
hides curl's error diagnostics; update that curl command (the one assigning
HTTP_RESPONSE and posting PAYLOAD to
"https://agent.tinyfish.ai/v1/automation/run" with X-API-Key
${TINYFISH_API_KEY}) to use -sS instead of -s so curl will still silence
progress but preserve stderr error messages for transport/TLS/DNS failures; keep
the rest of the flags and the existing error-handling block unchanged.

---

Duplicate comments:
In `@skills/tinyfish-web-agent/SKILL.md`:
- Around line 244-256: The curl example in SKILL.md uses "curl -N -s" which
hides HTTP handshake failures; update the curl flags in the
/v1/automation/run-sse example to match the helper script by using -sS
--fail-with-body (and keep -N for streaming), so replace the "-N -s" invocation
with "-N -sS --fail-with-body" in the curl command to surface non-2xx responses;
ensure the change targets the curl one-liner shown in the diff for the run-sse
POST example.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8feb44cb-6563-43ab-a5cf-f2e593767915

📥 Commits

Reviewing files that changed from the base of the PR and between d190151 and 0bd3210.

📒 Files selected for processing (2)
  • skills/tinyfish-web-agent/SKILL.md
  • skills/tinyfish-web-agent/scripts/run.sh

Comment on lines +53 to +59
--proxy)
if [ $# -lt 2 ]; then
echo "Error: --proxy requires a country code argument" >&2
exit 1
fi
PROXY_COUNTRY="$2"
shift 2
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reject option-like values after --proxy.

--proxy --async is currently accepted and sent as "country_code": "--async". Treat a missing or option-like next token as invalid input here so flag order mistakes fail fast locally.

Suggested fix
     --proxy)
-      if [ $# -lt 2 ]; then
+      if [ $# -lt 2 ] || [[ "$2" == --* ]]; then
         echo "Error: --proxy requires a country code argument" >&2
         exit 1
       fi
       PROXY_COUNTRY="$2"
       shift 2
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
--proxy)
if [ $# -lt 2 ]; then
echo "Error: --proxy requires a country code argument" >&2
exit 1
fi
PROXY_COUNTRY="$2"
shift 2
--proxy)
if [ $# -lt 2 ] || [[ "$2" == --* ]]; then
echo "Error: --proxy requires a country code argument" >&2
exit 1
fi
PROXY_COUNTRY="$2"
shift 2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/tinyfish-web-agent/scripts/run.sh` around lines 53 - 59, In the
--proxy) case branch, reject option-like next tokens by checking that the second
argument ($2) is not missing and does not start with a dash before assigning
PROXY_COUNTRY; add a guard such as: if [[ "$2" == -* ]]; then echo "Error:
--proxy requires a country code argument" >&2; exit 1; fi so that option-like
values like "--async" are treated as invalid, then proceed to set
PROXY_COUNTRY="$2" and shift 2.

Comment on lines +103 to +109
HTTP_RESPONSE=$(curl --max-time 120 -s -w "\n%{http_code}" -X POST \
"https://agent.tinyfish.ai/v1/automation/run" \
-H "X-API-Key: ${TINYFISH_API_KEY}" \
-H "Content-Type: application/json" \
-d "$PAYLOAD") || {
echo "Error: curl request failed" >&2
exit 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's check if the file exists and read the relevant section
if [ -f "skills/tinyfish-web-agent/scripts/run.sh" ]; then
  echo "=== File found. Reading lines 100-115 for context ==="
  sed -n '100,115p' "skills/tinyfish-web-agent/scripts/run.sh" | cat -n
else
  echo "File not found at that path"
fi

Repository: tinyfish-io/skills

Length of output: 660


🏁 Script executed:

# Let's also check if there are other curl calls in the script
echo "=== All curl commands in the script ==="
rg "curl" "skills/tinyfish-web-agent/scripts/run.sh" -A 3 -n

Repository: tinyfish-io/skills

Length of output: 600


🏁 Script executed:

# Let's verify the total lines in the file
wc -l "skills/tinyfish-web-agent/scripts/run.sh"

Repository: tinyfish-io/skills

Length of output: 106


Use -sS for the sync curl call to preserve error diagnostics.

With -s alone, transport failures (DNS, TLS, connection errors) only hit the generic fallback message without curl's diagnostic output. -sS preserves stderr while keeping the current HTTP body and status code parsing unchanged. Note that line 93 in the same script already uses -sS for consistency.

Suggested fix
-HTTP_RESPONSE=$(curl --max-time 120 -s -w "\n%{http_code}" -X POST \
+HTTP_RESPONSE=$(curl --max-time 120 -sS -w "\n%{http_code}" -X POST \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
HTTP_RESPONSE=$(curl --max-time 120 -s -w "\n%{http_code}" -X POST \
"https://agent.tinyfish.ai/v1/automation/run" \
-H "X-API-Key: ${TINYFISH_API_KEY}" \
-H "Content-Type: application/json" \
-d "$PAYLOAD") || {
echo "Error: curl request failed" >&2
exit 1
HTTP_RESPONSE=$(curl --max-time 120 -sS -w "\n%{http_code}" -X POST \
"https://agent.tinyfish.ai/v1/automation/run" \
-H "X-API-Key: ${TINYFISH_API_KEY}" \
-H "Content-Type: application/json" \
-d "$PAYLOAD") || {
echo "Error: curl request failed" >&2
exit 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/tinyfish-web-agent/scripts/run.sh` around lines 103 - 109, The curl
invocation that sets HTTP_RESPONSE uses -s which hides curl's error diagnostics;
update that curl command (the one assigning HTTP_RESPONSE and posting PAYLOAD to
"https://agent.tinyfish.ai/v1/automation/run" with X-API-Key
${TINYFISH_API_KEY}) to use -sS instead of -s so curl will still silence
progress but preserve stderr error messages for transport/TLS/DNS failures; keep
the rest of the flags and the existing error-handling block unchanged.

@KrishnaAgarwal7531
Copy link
Copy Markdown

Fix 1 — scripts/run.sh — Async curl error detection
Find this line (around line 93):
bashexec curl -N -s -X POST "https://agent.tinyfish.ai/v1/automation/run-sse"
Replace with:
bashexec curl -N -sS --fail-with-body -X POST "https://agent.tinyfish.ai/v1/automation/run-sse" \

Fix 2 — scripts/run.sh — Guard JSON parsing
Find this line (around line 122):
bashSTATUS=$(echo "$HTTP_BODY" | jq -r '.status // empty')
Replace with:
bashif ! STATUS=$(printf '%s' "$HTTP_BODY" | jq -er '.status' 2>/dev/null); then
echo "Error: Response was not valid JSON with a status field" >&2
echo "$HTTP_BODY" >&2
exit 1
fi
Then find this line a few lines below:
bash echo "$HTTP_BODY" | jq '.result'
Replace with:
bash printf '%s' "$HTTP_BODY" | jq '.result'
Then find this line:
bash echo "$HTTP_BODY" | jq '.error' >&2
Replace with:
bash printf '%s' "$HTTP_BODY" | jq '.error' >&2

Fix 3 — SKILL.md — Markdown linter fix
Find this (around line 196):

1. Click the Login button
2. Enter username "testuser" and password "testpass"
3. Navigate to Settings > Billing
4. Extract the current plan name

Replace with:

1. Click the Login button
2. Enter username "testuser" and password "testpass"
3. Navigate to Settings > Billing
4. Extract the current plan name

@KrishnaAgarwal7531
Copy link
Copy Markdown

These are fix suggestions, please take a look

@giaphutran12
Copy link
Copy Markdown
Author

All 3 fixes applied and pushed → 0bd3210. Thanks Krishna!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants