An agent skill for Codex and Claude Code that turns any product input — a doc, a webpage, a GitHub repo, or just an idea — into a launch-quality explainer video. One prompt in, polished MP4 out.
A 30-second demo of the skill in action, captured live as it produces a launch video. The user types a prompt; the skill drafts a storyboard, asks for approval, then renders a 1:1 square video with synced music in about 3 minutes.
https://github.com/encircleacity2/bobyte-explainer/blob/main/assets/demos/demo.mp4
In your Codex or Claude Code session, paste one of:
| You have | What you say |
|---|---|
| A product page / docs URL | "Make an explainer video for https://example.com" |
| A Lark / Feishu doc | "Turn this Lark doc into a launch video: [doc URL]" |
| A GitHub repo | "Produce a Shorts video for this GitHub repo: [repo URL]" |
| PDFs + screenshots | "Make an explainer from these files" (attach them) |
| Just an idea | "Make a 30s video about [describe the product]" |
The skill handles everything from intake → storyboard → render → music → delivery. A typical 30-45s video lands in ~/Downloads/ in under 5 minutes at $0–$0.20 in API cost.
The default for product launches. Polished motion + UI + typography + music — no talking head, no portrait photo needed at onboarding. Saves ~80% of cost vs hybrid mode and produces the OpenAI/Apple-style aesthetic modern product videos use.
Phase 1 asks where the video will live and chooses the right aspect ratio + duration sweet spot automatically:
| Channel | Aspect | Sweet-spot duration |
|---|---|---|
| X / LinkedIn / IG feed | 1:1 (1440×1440) | 30–60s |
| TikTok / Reels / Shorts | 9:16 (1080×1920) | 21–34s |
| YouTube / website hero | 16:9 (1920×1080) | 60–180s |
openai-clean— geometric bold sans + lavender liquid + minimalanthropic-warm— warm earth tones + serif italic + editoriallinear-minimal— dark mode + neon accents + technicalapple-keynote— deep black + hero typography + cinematicbrand-bold— high-contrast + oversized type + color-block
Each preset ships a real design.md with full tokens (palette, typography, motion easings, scene recipe). Or paste your own design.md and the skill renders against your brand.
The skill enforces an 8-pattern narrative discipline before any render runs — protagonist with a real name (not "the user"), a small canon of specific entities preserved across frames, a central artifact that echoes across multiple beats, etc. The built-in auditor blocks "screen catalogue" storyboards (a sequence of UI shots with no story) — the #1 reason finished videos don't communicate.
After drafting the storyboard, the skill shows it inline and asks you in your conversation language: Approve / Suggest changes / Stop. No render runs on "looks good 👍". If you suggest changes, the model revises and re-presents — looping until you click approve or stop.
Before render and after, a unified verify.py runs validators for storyboard craft, overlap, assets, launch-grade motion, layout guardrails, zoom logic, camera overflow, render spec, audio levels, and pixel-edge bleed. Auto-fix mechanically repairs the safe subset (cap camera scales, re-encode keyframes, re-mix audio gain, deconflict tracks) and re-verifies up to N iterations.
Every video renders at 60fps --quality high — visibly smoother than the 30fps default most tools settle for. Motion follows a "house style" reference that bans linear easings and codifies entrance / exit / camera curves, layout safety, and target-led zooms so motion craft stays consistent across compositions.
If enabled at onboarding, the skill generates a custom instrumental via Volcengine's music API per video (matching the storyboard's mood), then sidechain-ducks against any voice. Cost: ~$0.20 per track. Disable entirely if you prefer to add music yourself.
For customer overviews, benchmark walkthroughs, and B-roll-first product videos, generate narration independently from Seedance A-roll. This makes timing easier to control, lets you regenerate voice without re-spending video credits, and keeps the final mix clean by rendering voice tracks before adding a continuous music bed.
Provider keys and base URLs can be routed through local api_profiles or an internal proxy. This lets teams switch between BytePlus, Volcengine, ElevenLabs, or a gateway without hardcoding secrets in project files.
For personal-brand videos where you want to appear on camera: hybrid mode generates a Seedance 2.0 AI talking-head from your portrait photo + reference voice clip, then composes it with B-roll. Skip this mode for product launches — it's not needed and adds cost.
Codex:
git clone https://github.com/encircleacity2/bobyte-explainer.git \
~/.codex/skills/explainer-videoClaude Code:
git clone https://github.com/encircleacity2/bobyte-explainer.git \
~/.claude/skills/explainer-videoRestart your agent session — the skill is auto-discovered from SKILL.md.
The first time you trigger the skill, it walks you through a one-time setup that writes ~/.explainer-video/config.json (mode 600). Two paths:
- Credential file (fast) — fill in a copy of
credentials.template.mdand paste its path. The template is designed to be emailed to teammates so each person onboards in one step. - Step-by-step — the skill prompts for each item interactively, in your conversation language.
It collects:
- Output folder — where finished videos land (default
~/Downloads) - AI background music — yes / no — if yes, Volcengine music AK / SK
- BytePlus ModelArk + IAM keys — only if you plan to use hybrid mode (avatar A-roll)
- Personal portrait photo + reference video — only if you plan to use hybrid mode
- Standalone TTS provider — optional; useful for B-roll-only narration
- API proxy / provider profiles — optional; useful for team key management
Pure-broll users only need steps 1–2 to start producing videos unless they want standalone TTS.
Once onboarded, each new video walks through 5 phases:
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Phase 1 │ → │ Phase 2 │ → │ Phase 3 │ → │ Phase 4 │ → │ Phase 5 │
│ Intake + │ │ Restyle │ │Storyboard│ │Production│ │ Deliver │
│Preflight │ │(skipped) │ │+ Approval│ │ + Verify │ │ MP4 │
└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
Phase 1 — Intake + Preflight (~30s) The skill parses your input (URL, file, description) into a brief, then asks 3 questions in your conversation language:
- Mode — pure-broll-product-demo (default) / hybrid / aroll-only
- Visual identity — your own
design.md, or one of the 5 built-in presets - Distribution channel — X / TikTok / YouTube / multi-channel
Phase 2 — Portrait restyle (skipped in pure-broll) Only runs in hybrid mode. Seedream 4.5 generates 4 portrait variants (new outfit / setting / lighting); you pick one or skip.
Phase 3 — Storyboard + approval (~1–2 min of model work + your review) The skill drafts the storyboard in 9 mandatory steps (Cast → Canon → Echo → Narrative answers → Frame names → Narration cues → Click chain → arc_map → then segments). Runs the pre-render auditor to catch structural issues. Presents the storyboard inline in your conversation language with 3 options: Approve / Suggest changes / Stop.
Phase 4 — Production (~3–5 min)
- Auto-installs any HyperFrames registry blocks the storyboard references
- Generates the composition HTML from storyboard + chosen style preset
- Renders at 60fps
--quality high - Generates Volcengine music if enabled
- Mixes (sidechain ducking under any voice)
- Runs post-render
verify.py— audio level adjustments auto-applied
Phase 5 — Deliver Saves the MP4 to your configured output folder. Reports duration, size, path. Optionally uploads to Lark with explicit confirmation.
| Time | What's happening |
|---|---|
| 0:00 | You: "Make an explainer video for [URL]" |
| 0:30 | Skill presents preflight questions; you pick mode / style / channel |
| 1:00 | Skill drafts storyboard, runs auditor, presents inline |
| 2:00 | You hit Approve |
| 3:30 | HyperFrames render complete (1440×1440 @ 60fps) |
| 4:30 | Music generated + mixed |
| 5:00 | Final MP4 in ~/Downloads/ |
~/.codex/skills/explainer-video/ or ~/.claude/skills/explainer-video/
├── SKILL.md # 5-phase orchestration spec
├── references/ # 15 reference docs (load on demand)
│ ├── narrative-arc.md # 8 storyline patterns + 5-beat arc (READ FIRST)
│ ├── motion-house-style.md # easings, fps, duration windows
│ ├── storyboard-format.md # full storyboard.json schema
│ ├── style-presets.md # when-to-use guide for the 5 presets
│ ├── channel-aspect-ratios.md # platform × aspect × duration matrix
│ ├── hyperframes-catalog.md # curated registry block subset
│ ├── caption-components.md # registry caption picker
│ ├── screen-script-format.md # in-device screen HTML scripting
│ ├── meta-output-beat.md # opt-in pattern for video products
│ ├── agent-list.md # known AI coding agent brand info
│ ├── seedance-api.md # A-roll API (hybrid mode only)
│ ├── seedream-api.md # Portrait restyle API
│ ├── tts-api.md # Standalone narration providers + timing
│ ├── api-proxy.md # Provider profile / gateway routing
│ ├── duration-planning.md # Content-density duration planning
│ ├── taste-guide.md # OpenAI / Anthropic-inspired restraint
│ ├── volcengine-music-api.md # Music generation API
│ ├── production-techniques.md # Compose / slice / concat / mix
│ └── ...
├── scripts/ # 16 helpers
│ ├── verify.py # unified validator + auto-fix loop
│ ├── audit_storyboard.py # storyboard auditor
│ ├── compose_and_render.py # Phase 4 orchestrator
│ ├── generate_tts.py # standalone TTS generation + normalization
│ ├── plan_duration.py # content-density duration recommendation
│ ├── synthesize_screen_ui.py # LLM-synth in-device screens
│ ├── fetch_registry.py # HyperFrames registry cache
│ └── ...
├── templates/ # reusable recipe scaffolds
│ ├── openai-product-demo.json # canonical pure-broll recipe
│ └── agent-chip-row.html # named-agent opening pattern
├── assets/
│ ├── style-presets/ # 5 built-in design.md preset files
│ ├── hyperframes-template.html # composition scaffold
│ └── macos-window-chrome.html # reusable macOS window UI
└── credentials.template.md # onboarding template (fill + email)
- HyperFrames — the renderer. HTML compositions with GSAP timelines compile to deterministic MP4 via headless Chrome. Free, local, registry of 80+ blocks.
- Seedance 2.0 (BytePlus ModelArk) — A-roll digital human generation (hybrid mode only)
- Seedream 4.5 (BytePlus) — Portrait restyle (hybrid mode only)
- Volcengine music API — AI background music
- ffmpeg — mixing, level adjustment, slicing, post-render verification
- Node 18+ and npm (HyperFrames renderer)
- Python 3.11+ (audit + verify scripts)
- ffmpeg (compose, mux, audio level adjustments)
- Chrome / Chromium (headless render — managed by HyperFrames automatically)
Optional (only for specific features):
- BytePlus ModelArk + IAM keys — hybrid / aroll-only modes
- Volcengine music keys — AI background music
lark-cli— Lark/Feishu doc ingestion or upload- Anthropic API key —
synthesize_screen_ui.pyfor in-device UI synthesis (has no-LLM fallback)
Python packages installed as needed:
pip install --user requests Pillow volcengine anthropic librosaThe skill runs scripts/verify.py automatically at two points:
- Pre-render — after composition generation, before the (slow) render. Catches storyboard issues + asset issues + lint errors. Auto-fix repairs safe issues (cap camera scales, deconflict tracks, etc.) and re-runs.
- Post-render — after the MP4 is produced. Pixel-based overflow detection, render-spec match (resolution/fps/duration vs declared), audio levels in mode-target range, no clipping. Severe findings mean the MP4 is not delivery-ready unless explicitly forced as a draft.
Severe issues block delivery; warnings surface in the report but don't auto-block.
| Layer | How |
|---|---|
| Brand | Pick a built-in preset, or paste your own design.md |
| Aspect ratio | Phase 1 question — picks defaults per channel |
| Mode | pure-broll-product-demo / aroll-broll-hybrid / aroll-only |
| Storyboard | Drafted by the model; you approve / change / stop |
| Length | Recommended per content profile + channel sweet spot |
| Recipe | Drop a new templates/<name>.json to add a recipe; auditor enforces the schema |
| Adding a new design preset | assets/style-presets/<name>/design.md |
| Adding a known agent | Append to references/agent-list.md (brand color + logo hint) |
CHANGELOG.md— full per-PR history of changesSMOKE_TEST.md— per-feature test commands
MIT
