Turn share links into desktop briefing reports.
English | 简体中文
openclaw-content-processor is an OpenClaw skill and standalone CLI tool that takes one or more share links, extracts the useful content, and saves a local report as report.md + report.json.
If you want another OpenClaw agent to install and bootstrap this skill for you, copy this prompt:
Install this OpenClaw skill from GitHub and make it ready to use:
https://github.com/jjjojoj/openclaw-content-processor.git
After installing:
1. Run the required bootstrap/setup steps.
2. Check whether dependencies such as ffmpeg and whisper-cli are available.
3. Tell me the exact command or usage prompt I can use right away to process links.
If the skill list does not refresh immediately, restart OpenClaw once.
It is designed for:
- GitHub repositories
- regular article pages
- dynamic pages such as WeChat / Zhihu / CSDN / Toutiao
- video and social links such as Bilibili, Xiaohongshu, Weibo, X/Twitter, Douyin, and YouTube
Most link summarizers either stay inside chat or only handle one platform well. This project is opinionated in a different way:
- local-first: always write a report to disk first
- multi-source: accept one or many links in one run
- layered fallback: use different extractors for GitHub, static web, dynamic pages, and media
- automation-friendly: emit both Markdown and structured JSON
Current stable release: v2.3.0
Stable-release validation last refreshed on 2026-03-26:
| Platform | Status | Notes |
|---|---|---|
| GitHub | Stable | Uses GitHub API + README extraction |
| Generic web pages | Stable | Main path uses trafilatura |
| Stable | Usually succeeds via Scrapling |
|
| Zhihu / CSDN | Stable | Real links verified |
| Toutiao | Usually works | Depends on page structure and anti-bot behavior |
| Bilibili | Usually works | Subtitles first, then whisper-cli fallback |
| Xiaohongshu | Usually works | May need media transcription |
| X/Twitter | Mixed | Public video posts can work, but quality depends on transcription |
| Mixed | Short noisy videos may become metadata-only partial |
|
| Douyin / YouTube | Supported | Paths implemented; use real links to verify your scenario |
The current stable release is backed by two validation layers:
- installation validation:
bash scripts/bootstrap.sh --install-python,bash scripts/bootstrap.sh,.venv/bin/python -m py_compile ..., and.venv/bin/python -m unittest discover -s tests -v - live-link validation: public GitHub, Zhihu, CSDN, Toutiao, Bilibili, WeChat, Xiaohongshu, X/Twitter, and Weibo samples were checked on
2026-03-26
See docs/release-validation.md for the latest release checklist, command set, and platform notes.
| Capability | What it does |
|---|---|
| GitHub extractor | Pulls repo metadata, topics, stars, default branch, and README |
| Web extractor | Uses trafilatura for article-style pages |
| Dynamic-page fallback | Uses Scrapling for harder pages |
| Media pipeline | Uses yt-dlp subtitles first, then ffmpeg + whisper-cli |
| Local analysis | Produces summary, highlights, keywords, and analysis text |
| Structured output | Saves report.md, report.json, and per-item JSON files |
| Batch-safe execution | One bad source does not kill the whole run |
macOS:
brew install ffmpeg whisper-cppbash scripts/bootstrap.sh --install-pythonThis installs the skill-local runtime into .venv/, including:
yt-dlptrafilaturaScrapling
bash scripts/run.sh "https://github.com/openai/openai-python"Or let it also check system dependencies:
bash scripts/run.sh --auto-bootstrap "https://github.com/openai/openai-python"bash scripts/run.sh \
"https://github.com/openai/openai-python" \
"https://mp.weixin.qq.com/s/xxxxxxxx"bash scripts/run.sh \
--title "Today's Link Briefing" \
--source "https://x.com/..." \
--source "https://video.weibo.com/show?fid=..."bash scripts/run.sh \
--cookies-from-browser chrome \
--referer "https://mp.weixin.qq.com/" \
--source "https://mp.weixin.qq.com/s/xxxxxxxx"python scripts/run_regression.py --preset coreDefault output root:
~/Desktop/内容摘要/YYYY-MM-DD/<timestamp>/
Each run produces:
report.md
report.json
items/
source-1.json
source-2.json
report.json includes:
- overall run status
- counts for success / partial / failed items
- tool and analysis metadata
- per-item summaries, warnings, extract methods, and content stats
Typical CLI response:
{
"schema_version": "1.0.0",
"status": "success",
"report_title": "GitHub validation",
"output_dir": "/Users/you/Desktop/内容摘要/2026-03-26/20260326_024343_GitHub验证",
"report_md": "/Users/you/Desktop/内容摘要/2026-03-26/20260326_024343_GitHub验证/report.md",
"report_json": "/Users/you/Desktop/内容摘要/2026-03-26/20260326_024343_GitHub验证/report.json",
"item_count": 1,
"success_count": 1,
"partial_count": 0,
"failed_count": 0
}Different sources use different pipelines on purpose:
- GitHub repositories:
GitHub API + README - Regular web pages:
trafilatura - Dynamic / anti-bot pages:
Scrapling - Media links:
yt-dlpsubtitles first - No usable subtitles:
ffmpeg + whisper-cli - Analysis layer: OpenAI-compatible responses first, then local heuristic fallback
See .env.example for the full list.
Most useful variables:
OPENAI_API_KEYOPENAI_BASE_URLCONTENT_PROCESSOR_ANALYSIS_MODECONTENT_PROCESSOR_ANALYSIS_MODELCONTENT_PROCESSOR_COOKIES_FILECONTENT_PROCESSOR_COOKIES_FROM_BROWSERCONTENT_PROCESSOR_COOKIE_HEADERCONTENT_PROCESSOR_REFERERWHISPER_MODEL
This repository contains both human-facing and OpenClaw-facing files:
- README.md: human documentation
- SKILL.md: OpenClaw skill instructions
- agents/openai.yaml: UI metadata for OpenClaw skill lists and default prompts
If you only want the CLI workflow, agents/openai.yaml is not required.
.
├── assets/
│ └── report-preview.svg
├── docs/
│ ├── release-validation.md
│ └── release-validation.zh-CN.md
├── README.md
├── README.zh-CN.md
├── CHANGELOG.md
├── LICENSE
├── SKILL.md
├── .env.example
├── .github/workflows/ci.yml
├── agents/openai.yaml
├── scripts/
│ ├── bootstrap.sh
│ ├── run.sh
│ ├── process_share_links.py
│ └── run_regression.py
└── tests/
└── test_process_share_links.py
Run local checks:
python3 -m py_compile scripts/process_share_links.py scripts/run_regression.py
python3 -m unittest discover -s tests -v
python3 scripts/run_regression.py --preset githubGitHub Actions runs:
- local runtime bootstrap
- Python compile checks
- unit tests
CI does not run all live platform regressions.
- The slowest path is media transcription; some runs can take minutes
- Some platforms need cookies, browser sessions, or referers to be reliable
- Very short or mostly-music videos may only produce
metadata-only partial - Anti-bot behavior can change over time, especially on social platforms
See CONTRIBUTING.md.
MIT. See LICENSE.