OpenClaw Content Processor

Turn share links into desktop briefing reports.

openclaw-content-processor is an OpenClaw skill and standalone CLI tool that takes one or more share links, extracts the useful content, and saves a local report as report.md + report.json.

Install In OpenClaw

If you want another OpenClaw agent to install and bootstrap this skill for you, copy this prompt:

Install this OpenClaw skill from GitHub and make it ready to use:
https://github.com/jjjojoj/openclaw-content-processor.git

After installing:
1. Run the required bootstrap/setup steps.
2. Check whether dependencies such as ffmpeg and whisper-cli are available.
3. Tell me the exact command or usage prompt I can use right away to process links.

If the skill list does not refresh immediately, restart OpenClaw once.

It is designed for:

GitHub repositories
regular article pages
dynamic pages such as WeChat / Zhihu / CSDN / Toutiao
video and social links such as Bilibili, Xiaohongshu, Weibo, X/Twitter, Douyin, and YouTube

Why This Exists

Most link summarizers either stay inside chat or only handle one platform well. This project is opinionated in a different way:

local-first: always write a report to disk first
multi-source: accept one or many links in one run
layered fallback: use different extractors for GitHub, static web, dynamic pages, and media
automation-friendly: emit both Markdown and structured JSON

Validated Status

Current stable release: v2.3.0

Stable-release validation last refreshed on 2026-03-26:

Platform	Status	Notes
GitHub	Stable	Uses GitHub API + README extraction
Generic web pages	Stable	Main path uses `trafilatura`
WeChat	Stable	Usually succeeds via `Scrapling`
Zhihu / CSDN	Stable	Real links verified
Toutiao	Usually works	Depends on page structure and anti-bot behavior
Bilibili	Usually works	Subtitles first, then `whisper-cli` fallback
Xiaohongshu	Usually works	May need media transcription
X/Twitter	Mixed	Public video posts can work, but quality depends on transcription
Weibo	Mixed	Short noisy videos may become `metadata-only partial`
Douyin / YouTube	Supported	Paths implemented; use real links to verify your scenario

Release Validation

The current stable release is backed by two validation layers:

installation validation: bash scripts/bootstrap.sh --install-python, bash scripts/bootstrap.sh, .venv/bin/python -m py_compile ..., and .venv/bin/python -m unittest discover -s tests -v
live-link validation: public GitHub, Zhihu, CSDN, Toutiao, Bilibili, WeChat, Xiaohongshu, X/Twitter, and Weibo samples were checked on 2026-03-26

See docs/release-validation.md for the latest release checklist, command set, and platform notes.

Features

Capability	What it does
GitHub extractor	Pulls repo metadata, topics, stars, default branch, and README
Web extractor	Uses `trafilatura` for article-style pages
Dynamic-page fallback	Uses `Scrapling` for harder pages
Media pipeline	Uses `yt-dlp` subtitles first, then `ffmpeg + whisper-cli`
Local analysis	Produces summary, highlights, keywords, and analysis text
Structured output	Saves `report.md`, `report.json`, and per-item JSON files
Batch-safe execution	One bad source does not kill the whole run

Quick Start

1. Install system dependencies

macOS:

brew install ffmpeg whisper-cpp

2. Install local Python runtime

bash scripts/bootstrap.sh --install-python

This installs the skill-local runtime into .venv/, including:

yt-dlp
trafilatura
Scrapling

3. Run it

bash scripts/run.sh "https://github.com/openai/openai-python"

Or let it also check system dependencies:

bash scripts/run.sh --auto-bootstrap "https://github.com/openai/openai-python"

Usage

Basic CLI

bash scripts/run.sh \
  "https://github.com/openai/openai-python" \
  "https://mp.weixin.qq.com/s/xxxxxxxx"

Explicit title and sources

bash scripts/run.sh \
  --title "Today's Link Briefing" \
  --source "https://x.com/..." \
  --source "https://video.weibo.com/show?fid=..."

With browser session / cookies

bash scripts/run.sh \
  --cookies-from-browser chrome \
  --referer "https://mp.weixin.qq.com/" \
  --source "https://mp.weixin.qq.com/s/xxxxxxxx"

Lightweight regression

python scripts/run_regression.py --preset core

Output

Default output root:

~/Desktop/内容摘要/YYYY-MM-DD/<timestamp>/

Each run produces:

report.md
report.json
items/
  source-1.json
  source-2.json

report.json includes:

overall run status
counts for success / partial / failed items
tool and analysis metadata
per-item summaries, warnings, extract methods, and content stats

Typical CLI response:

{
  "schema_version": "1.0.0",
  "status": "success",
  "report_title": "GitHub validation",
  "output_dir": "/Users/you/Desktop/内容摘要/2026-03-26/20260326_024343_GitHub验证",
  "report_md": "/Users/you/Desktop/内容摘要/2026-03-26/20260326_024343_GitHub验证/report.md",
  "report_json": "/Users/you/Desktop/内容摘要/2026-03-26/20260326_024343_GitHub验证/report.json",
  "item_count": 1,
  "success_count": 1,
  "partial_count": 0,
  "failed_count": 0
}

Extraction Strategy

Different sources use different pipelines on purpose:

GitHub repositories: GitHub API + README
Regular web pages: trafilatura
Dynamic / anti-bot pages: Scrapling
Media links: yt-dlp subtitles first
No usable subtitles: ffmpeg + whisper-cli
Analysis layer: OpenAI-compatible responses first, then local heuristic fallback

Configuration

See .env.example for the full list.

Most useful variables:

OPENAI_API_KEY
OPENAI_BASE_URL
CONTENT_PROCESSOR_ANALYSIS_MODE
CONTENT_PROCESSOR_ANALYSIS_MODEL
CONTENT_PROCESSOR_COOKIES_FILE
CONTENT_PROCESSOR_COOKIES_FROM_BROWSER
CONTENT_PROCESSOR_COOKIE_HEADER
CONTENT_PROCESSOR_REFERER
WHISPER_MODEL

OpenClaw Integration

This repository contains both human-facing and OpenClaw-facing files:

README.md: human documentation
SKILL.md: OpenClaw skill instructions
agents/openai.yaml: UI metadata for OpenClaw skill lists and default prompts

If you only want the CLI workflow, agents/openai.yaml is not required.

Repository Layout

.
├── assets/
│   └── report-preview.svg
├── docs/
│   ├── release-validation.md
│   └── release-validation.zh-CN.md
├── README.md
├── README.zh-CN.md
├── CHANGELOG.md
├── LICENSE
├── SKILL.md
├── .env.example
├── .github/workflows/ci.yml
├── agents/openai.yaml
├── scripts/
│   ├── bootstrap.sh
│   ├── run.sh
│   ├── process_share_links.py
│   └── run_regression.py
└── tests/
    └── test_process_share_links.py

Development

Run local checks:

python3 -m py_compile scripts/process_share_links.py scripts/run_regression.py
python3 -m unittest discover -s tests -v
python3 scripts/run_regression.py --preset github

GitHub Actions runs:

local runtime bootstrap
Python compile checks
unit tests

CI does not run all live platform regressions.

Limitations

The slowest path is media transcription; some runs can take minutes
Some platforms need cookies, browser sessions, or referers to be reliable
Very short or mostly-music videos may only produce metadata-only partial
Anti-bot behavior can change over time, especially on social platforms

Contributing

See CONTRIBUTING.md.

License

MIT. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenClaw Content Processor

Install In OpenClaw

Why This Exists

Validated Status

Release Validation

Features

Quick Start

1. Install system dependencies

2. Install local Python runtime

3. Run it

Usage

Basic CLI

Explicit title and sources

With browser session / cookies

Lightweight regression

Output

Extraction Strategy

Configuration

OpenClaw Integration

Repository Layout

Development

Limitations

Contributing

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
agents		agents
assets		assets
docs		docs
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
SKILL.md		SKILL.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

OpenClaw Content Processor

Install In OpenClaw

Why This Exists

Validated Status

Release Validation

Features

Quick Start

1. Install system dependencies

2. Install local Python runtime

3. Run it

Usage

Basic CLI

Explicit title and sources

With browser session / cookies

Lightweight regression

Output

Extraction Strategy

Configuration

OpenClaw Integration

Repository Layout

Development

Limitations

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages