Transcribe any video URL to text with one command.
An MCP Server + Agent Skill for AI coding assistants, and a standalone CLI tool. Give your AI agent the ability to transcribe any video β just paste a link.
Supports Bilibili, YouTube, TikTok/Douyin, Twitter/X, Vimeo, and 1000+ sites via yt-dlp.
pip install vidscribe[mcp]Add to your MCP client config (e.g. claude_desktop_config.json or Cursor MCP settings):
{
"mcpServers": {
"vidscribe": {
"command": "vidscribe-mcp"
}
}
}Then just ask your AI: "Transcribe this video: https://..."
Copy the skill into your personal skills directory:
# Cursor
cp -r agent-skill ~/.cursor/skills/vidscribe
# Windsurf
cp -r agent-skill ~/.windsurf/skills/vidscribe
# Codex
cp -r agent-skill ~/.codex/skills/vidscribeThen just tell your AI agent: "Transcribe this video: https://..." β it will know how to use vidscribe automatically.
pip install vidscribe
# Set up any one provider (see below)
export OPENAI_API_KEY="sk-..."
# Transcribe!
vidscribe "https://www.youtube.com/watch?v=dQw4w9WgXcQ" -o transcript.txt- MCP Server - works with Claude Desktop, Cursor, and any MCP-compatible client
- Agent Skill - plug into Cursor / Windsurf / Codex as a reusable AI skill
- One command - just paste a URL, get text
- 1000+ sites - any platform supported by yt-dlp
- 5 ASR providers - choose by quality, speed, price, or language
- Auto-detection - picks the best available provider from your env vars
- Offline mode - local Whisper model, no API key needed
- Multilingual - Chinese, English, Japanese, and 50+ languages
# Core (with cloud providers that need no extra deps)
pip install vidscribe
# With MCP server support
pip install vidscribe[mcp] # MCP Server
# With specific provider support
pip install vidscribe[openai] # OpenAI Whisper API
pip install vidscribe[deepgram] # Deepgram
pip install vidscribe[aliyun] # Aliyun DashScope
pip install vidscribe[local] # Local faster-whisper (offline)
pip install vidscribe[all] # Everything (MCP + all providers)System dependencies:
- ffmpeg - for audio conversion (
brew install ffmpeg/apt install ffmpeg)
| Provider | Best For | Price | Speed | Setup |
|---|---|---|---|---|
| Volcengine | Chinese content | ~$0.11/hr | Fast | Guide |
| OpenAI | Multilingual | $0.36/hr | Fast | Guide |
| Aliyun | Chinese content | ~$0.17/hr | Fast | Guide |
| Deepgram | Speed & cost | $0.26/hr | Fastest | Guide |
| Local | Privacy / offline | Free | Slow (CPU) | Guide |
Cheapest cloud option for Chinese content. Powered by Doubao (θ±ε ) ASR.
export VOLC_APP_KEY="your_app_id"
export VOLC_ACCESS_KEY="your_access_token"Get credentials: Volcengine Console
Best multilingual quality. Works globally.
export OPENAI_API_KEY="sk-..."Get API key: OpenAI Platform
Good for Chinese. Free trial: 3 months, 2 hours/day.
export DASHSCOPE_API_KEY="sk-..."Get API key: Bailian Console
Fastest transcription. $200 free credit on signup, no credit card needed.
export DEEPGRAM_API_KEY="..."Get API key: Deepgram Console
Free and offline. No API key needed. Runs on your CPU (slower but private).
pip install vidscribe[local]
vidscribe "https://..." -p local# Auto-detect provider from environment variables
vidscribe "https://www.bilibili.com/video/BV1xxx"
# Specify provider explicitly
vidscribe "https://youtu.be/xxx" -p openai
# Save to file
vidscribe "https://youtu.be/xxx" -o transcript.txt
# Language hint (helps accuracy for non-Chinese/English)
vidscribe "https://youtu.be/xxx" -l ja-JP
# Keep the downloaded audio file
vidscribe "https://youtu.be/xxx" -o transcript.txt --keep-audio
# Use as Python module
python -m vidscribe "https://..."Bilibili, YouTube, TikTok, Douyin, Twitter/X, Vimeo, Dailymotion, Twitch, Instagram, Facebook, Reddit, NicoNico, and 1000+ more.
Chinese, English, Japanese, Korean, French, German, Spanish, Russian, Arabic, and 50+ more (varies by provider).
Video URL --> yt-dlp (download audio) --> ASR Provider --> Text
|
[cloud providers]
upload to temp host
|
[local provider]
process locally
- Download: yt-dlp extracts the audio track from any video URL
- Upload (cloud only): Audio is uploaded to a temporary file host for the ASR API to access
- Transcribe: The ASR provider converts speech to text
- Output: Full transcript printed to stdout or saved to file
For a typical 15-minute video:
| Provider | Cost | Notes |
|---|---|---|
| Volcengine | Cheapest cloud | |
| OpenAI | ~$0.09 | Best quality |
| Aliyun | Free trial available | |
| Deepgram | ~$0.06 | $200 free credit |
| Local | Free | Requires CPU time (~20 min) |
You can set environment variables in your shell profile (~/.zshrc, ~/.bashrc) or use a .env file:
cp .env.example .env
# Edit .env with your keysvidscribe auto-detects providers in this priority order:
- Volcengine (if
VOLC_APP_KEYis set) - OpenAI (if
OPENAI_API_KEYis set) - Aliyun (if
DASHSCOPE_API_KEYis set) - Deepgram (if
DEEPGRAM_API_KEYis set) - Local (if
faster-whisperis installed)
Override with -p <provider> flag.
vidscribe can run as an MCP (Model Context Protocol) server, exposing video transcription as a tool that any MCP-compatible AI client can call directly.
What is MCP? Model Context Protocol is an open standard for connecting AI assistants to external tools. It lets AI models call your tools natively β no shell commands, no copy-paste.
Install & run:
pip install vidscribe[mcp]Configure your MCP client:
Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"vidscribe": {
"command": "vidscribe-mcp"
}
}
}Cursor
Add to your Cursor MCP settings (.cursor/mcp.json):
{
"mcpServers": {
"vidscribe": {
"command": "vidscribe-mcp"
}
}
}Exposed MCP tools:
| Tool | Description |
|---|---|
transcribe |
Transcribe a video URL to text. Params: url, provider, lang, output_path |
list_available_providers |
List all ASR providers and whether they are configured |
Environment variables: The MCP server reads the same env vars as the CLI (VOLC_APP_KEY, OPENAI_API_KEY, etc.). Set them in your shell profile before starting the MCP server.
vidscribe ships with a ready-to-use agent-skill/SKILL.md that works with any AI coding assistant that supports Agent Skills (Cursor, Windsurf, Codex, etc.).
What is an Agent Skill? Agent Skills are reusable capabilities you can add to AI coding assistants. Once installed, the AI automatically knows when and how to use the tool β no manual prompting needed.
Install:
# Clone the repo
git clone https://github.com/XFWang522/vidscribe.git
# Copy the skill to your assistant
cp -r vidscribe/agent-skill ~/.cursor/skills/vidscribeHow it works:
- You tell your AI: "Transcribe this video: https://www.bilibili.com/video/BV1xxx"
- The AI reads the SKILL.md, understands the tool's capabilities
- It runs
vidscribewith the right parameters - You get the full transcript in your editor
Works with any video platform β Bilibili, YouTube, TikTok, Twitter/X, and 1000+ more.
| MCP Server | Agent Skill | |
|---|---|---|
| Protocol | Standard MCP (JSON-RPC over stdio) | Markdown file read by AI |
| Works with | Claude Desktop, Cursor, any MCP client | Cursor, Windsurf, Codex |
| Setup | pip install + config JSON |
Copy a folder |
| Tool discovery | Automatic via MCP protocol | AI reads SKILL.md |
| Best for | Claude Desktop users; standardized tool integration | Cursor/Windsurf users who prefer skills |
You can use both simultaneously β they don't conflict.