Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
439 changes: 439 additions & 0 deletions .agents/skills/create-news-video/SKILL.md

Large diffs are not rendered by default.

165 changes: 113 additions & 52 deletions .claude/skills/create-news-video/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Single argument: a news article URL (starts with `http://` or `https://`) OR a p

### Step 4: Generate script.json

Following the schema in `docs/superpowers/specs/2026-04-29-auto-news-video-design.md` Section 4. Key rules:
Following the schema in `src/render/script-schema.ts` (Zod discriminated union, 6 templates). Key rules:

**Script content (Vietnamese):**
- Total voiceText: ~150–200 words → ~55–65s spoken at speed 1.0
Expand Down Expand Up @@ -129,46 +129,49 @@ RIGHT (natural):
**Hook (most important — gets first 3 seconds of viewer attention):**
- Must contain a claim, statistic, or curious question
- NEVER generic ("Hôm nay chúng ta sẽ nói về..." is wrong)
- ALWAYS include at least 1 effect: `flash-white-3f` or `particle-burst`
- When source has og:image, set `bgSrc: "$source.image"` and pick a `kenBurns` effect
- When no image, omit `bgSrc` — pipeline uses gradient fallback

**Visual rules:**
- For image scenes: `background.src = "$source.image"` (literal — CLI substitutes)
- Vary `kenBurns` across scenes (don't use `zoom-in` for every scene)
- Vary text `animation` (don't use `slide-up` for every line)
- Each line ≤ 25 characters
- Each scene 1-3 lines
**TemplateData rules (6 available templates):**

| Template | When to pick | Required fields |
|---|---|---|
| `hook` | First scene (3-5s) | `headline` (max 40), `subhead?` (max 40), `bgSrc?`, `kenBurns?` |
| `comparison` | "X vs Y" / "exceeds" / "compared to" | `left: {label, value, color}`, `right: {label, value, color, winner?}` |
| `stat-hero` | Key number / % stat | `value` (max 20), `label` (max 40), `context?` (max 50) |
| `feature-list` | Listing features (1-4 bullets) | `title` (max 40), `bullets[]` (max 50 each), `icon?` |
| `callout` | Statement / warning / quote | `statement` (max 80), `tag?` (max 20) |
| `outro` | Last scene (3-5s) | `ctaTop` (max 30), `channelName` (max 30), `source` (max 40) |

- Pick templates based on content signal, not arbitrarily — each story beat dictates its template
- Vary `kenBurns` across hook scenes (values: `zoom-in`, `zoom-out`, `pan-left`, `pan-right`; default `zoom-in`)

**Outro (always fixed format):**
```json
{
"id": "outro",
"type": "outro",
"voiceText": "Theo dõi Công nghệ 24h để xem bản tin mới mỗi ngày.",
"visual": {
"background": { "type": "gradient", "preset": "outro-purple" },
"text": {
"position": "center",
"style": "outro-card",
"lines": [
{ "content": "Xem bản tin mới mỗi ngày", "emphasis": "primary", "animation": "fade-in" },
{ "content": "Công nghệ 24h", "emphasis": "channel", "animation": "scale-pop" },
{ "content": "Nguồn: <DOMAIN>", "emphasis": "muted", "animation": "fade-in-late" }
]
}
"templateData": {
"template": "outro",
"ctaTop": "Xem bản tin mới mỗi ngày",
"channelName": "Công nghệ 24h",
"source": "<DOMAIN>"
}
}
```
Replace `<DOMAIN>` with the actual domain string. Note: outro line 1 is shortened to fit 25-char schema rule (full CTA "Theo dõi để xem bản tin mới mỗi ngày" is 36 chars).
Replace `<DOMAIN>` with the actual domain string (e.g. `"vnexpress.net"`). `ctaTop` max 30 chars — shorten the full CTA if needed.

### Step 5: Self-validate before writing

Check:
- Total word count ~150-200
- Every line.content ≤ 25 chars
- 5-8 scenes total
- scenes[0].type === "hook"
- last scene type === "outro"
- All enum values valid (see spec Section 4.2)
- Total voiceText words: ~150-200
- 5-8 scenes total (1 hook + 3-6 body + 1 outro)
- scenes[0].type === "hook", scenes[last].type === "outro"
- Every templateData has required fields for its template (see table above)
- voiceText: numbers spelled phonetically, no emoji, no URLs, no markdown
- `voice.provider`: "lucylab" or "elevenlabs"
- Hook with og:image → set `bgSrc: "$source.image"`; no image → omit `bgSrc`

If invalid, fix yourself silently. Up to 2 self-correction passes. After that, write anyway — the CLI's Zod validation will produce a precise error message that the user can act on.

Expand Down Expand Up @@ -205,7 +208,7 @@ Tổng thời lượng: XX.Xs

User: `/create-news-video https://vnexpress.net/iphone-17-200mp`

Generated `script.json` (excerpt):
Generated `script.json`:
```json
{
"version": "1.0",
Expand All @@ -223,20 +226,54 @@ Generated `script.json` (excerpt):
{
"id": "hook", "type": "hook",
"voiceText": "Apple vừa ra mắt iPhone 17 với camera hai trăm megapixel.",
"visual": {
"background": { "type": "image", "src": "$source.image", "kenBurns": "zoom-in" },
"overlay": { "darkness": 0.4 },
"text": {
"position": "center", "style": "hook-large",
"lines": [
{ "content": "iPhone 17", "emphasis": "primary", "animation": "scale-pop" },
{ "content": "Camera 200MP!", "emphasis": "accent", "animation": "slide-up-bounce" }
]
},
"effects": ["flash-white-3f", "particle-burst"]
"templateData": {
"template": "hook",
"headline": "iPhone 17",
"subhead": "Camera 200MP!",
"bgSrc": "$source.image",
"kenBurns": "zoom-in"
},
"sfx": { "name": "cinematic/impact", "volume": 0.5 }
},
{
"id": "body-1", "type": "body",
"voiceText": "Cảm biến hoàn toàn mới cho zoom quang học gấp mười lần, vượt mọi đối thủ Android.",
"templateData": {
"template": "stat-hero",
"value": "200MP",
"label": "Cảm biến mới",
"context": "Zoom quang học 10x"
}
},
{
"id": "body-2", "type": "body",
"voiceText": "Pin năm nghìn miliampe giờ, tăng ba mươi phần trăm so với đời cũ. Sạc nhanh sáu mươi lăm watt.",
"templateData": {
"template": "feature-list",
"title": "Nâng cấp lớn",
"bullets": ["Pin 5000mAh", "Tăng 30%", "Sạc nhanh 65W"],
"icon": "spark"
}
},
{
"id": "body-3", "type": "body",
"voiceText": "Giá khởi điểm hai mươi mốt triệu đồng, dự kiến mở bán tại Việt Nam vào tháng sau.",
"templateData": {
"template": "callout",
"statement": "Giá từ 21 triệu đồng, mở bán tháng 5.",
"tag": "Giá bán"
}
},
{
"id": "outro", "type": "outro",
"voiceText": "Theo dõi Công nghệ 24h để xem bản tin mới mỗi ngày.",
"templateData": {
"template": "outro",
"ctaTop": "Theo dõi ngay",
"channelName": "Công nghệ 24h",
"source": "vnexpress.net"
}
}
/* ... 3 body scenes + outro ... */
]
}
```
Expand All @@ -245,35 +282,59 @@ Generated `script.json` (excerpt):

User: `/create-news-video news/agi-update.txt`

Generated `script.json` (excerpt):
Generated `script.json`:
```json
{
"version": "1.0",
"metadata": {
"title": "OpenAI công bố mô hình mới với khả năng lập luận",
"source": { "url": "local", "domain": "local", "image": null },
"channel": "Công nghệ 24h"
},
"voice": { "provider": "lucylab", "voiceId": "${VIETNAMESE_VOICEID}", "speed": 1.0 },
"scenes": [
{
"id": "hook", "type": "hook",
"voiceText": "OpenAI vừa công bố mô hình mới có khả năng lập luận như con người.",
"visual": {
"background": { "type": "gradient", "preset": "news-dark" },
"text": {
"position": "center", "style": "hook-large",
"lines": [
{ "content": "Mô hình mới", "emphasis": "primary", "animation": "scale-pop" },
{ "content": "Lập luận!", "emphasis": "accent", "animation": "slide-up-bounce" }
]
},
"effects": ["flash-white-3f"]
"templateData": {
"template": "hook",
"headline": "Mô hình mới",
"subhead": "Lập luận như người"
}
},
{
"id": "body-1", "type": "body",
"voiceText": "Mô hình đạt chín mươi hai phẩy bảy phần trăm trên benchmark, vượt xa phiên bản cũ.",
"templateData": {
"template": "stat-hero",
"value": "92.7%",
"label": "Benchmark",
"context": "Vượt phiên bản cũ 75.1%"
}
},
{
"id": "body-2", "type": "body",
"voiceText": "Hệ thống có thể tự suy luận đa bước, kiểm tra logic và sửa sai trước khi trả lời.",
"templateData": {
"template": "feature-list",
"title": "Khả năng mới",
"bullets": ["Suy luận đa bước", "Tự kiểm tra logic", "Tự sửa lỗi"]
}
},
{
"id": "outro", "type": "outro",
"voiceText": "Theo dõi Công nghệ 24h để xem bản tin mới mỗi ngày.",
"templateData": {
"template": "outro",
"ctaTop": "Xem bản tin mới mỗi ngày",
"channelName": "Công nghệ 24h",
"source": "local"
}
}
/* ... outro line 3 = "Nguồn: local" ... */
]
}
```
Note: when source has no image, every scene uses `background.type = "gradient"` (no image fallback at composer level needed).
Note: when source has no image, omit `bgSrc` from the hook — the pipeline uses a gradient fallback automatically.

## Sound Effects (SFX)

Expand Down
8 changes: 8 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
node_modules
dist
output
.git
.env
.env.local
.DS_Store
npm-debug.log*
13 changes: 12 additions & 1 deletion .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ TTS_PROVIDER=lucylab
# OPTION 1: LucyLab.io (https://lucylab.io)
# ════════════════════════════════════════════════════════════════════════════
# Required when TTS_PROVIDER=lucylab
VIETNAMESE_API_KEY=sk_live_xxxxxxxxxxxxxxxxxxxx
VIETNAMESE_API_KEY=your_lucylab_api_key_here
VIETNAMESE_VOICEID=22charvoiceiduuidhere

# Optional overrides
Expand All @@ -41,6 +41,7 @@ ELEVENLABS_ENDPOINT=https://api.elevenlabs.io/v1
# ════════════════════════════════════════════════════════════════════════════
# Customize the TikTok-style profile card that appears at the end of every video.
# All fields optional — defaults work out of the box.
TIKTOK_ENABLED=true
TIKTOK_DISPLAY_NAME=Quẹp Làm IT
TIKTOK_HANDLE=@haiquep
TIKTOK_FOLLOWERS=11.5k followers
Expand All @@ -53,3 +54,13 @@ TIKTOK_FOLLOWERS=11.5k followers
# ── Pipeline tuning ─────────────────────────────────────────────────────────
# TTS_CONCURRENCY: 1 for LucyLab (API limit), can increase for ElevenLabs
TTS_CONCURRENCY=1

# ── LLM Provider ────────────────────────────────────────────────────────────
# Choose ONE: "anthropic", "openai", or "deepseek"
# - anthropic : Claude (haiku for cost, sonnet for quality)
# - openai : GPT-4o / GPT-4.1
# - deepseek : OpenAI-compatible, set LLM_ENDPOINT
LLM_PROVIDER=anthropic
LLM_API_KEY=sk-ant-...
LLM_MODEL=claude-haiku-4-5-20251001
# LLM_ENDPOINT=https://api.deepseek.com/v1 # only needed for deepseek
17 changes: 17 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Changelog

## [2.0.1] - 2026-05-20

### Added
- Web UI dashboard accessible at `http://localhost:4317` — paste a news URL and generate videos from the browser
- Real-time job progress via Server-Sent Events (SSE) — see pipeline stages live as they run
- LLM provider abstraction supporting Anthropic, OpenAI-compatible, and DeepSeek backends via `LLM_PROVIDER` env var
- Article content web fetcher with HTML extraction and og:image detection
- TikTok UI settings panel — configure avatar, handle, and follower count; settings persist across restarts
- Output listing API with artifact badges (script, video, voice, text) and download links

### Changed
- Pipeline now respects `TIKTOK_ENABLED` toggle — disables TikTok card rendering and avatar fetching when off
- Config supports `LLM_PROVIDER`, `LLM_API_KEY`, `LLM_MODEL`, and `LLM_ENDPOINT` environment variables
- Script schema upgraded to discriminated union templates (6 types: hook, comparison, stat-hero, feature-list, callout, outro)
- HTML composer conditionally renders TikTok handle and outro card based on settings
19 changes: 19 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# AutoCreateVideo

## Skill routing

When the user's request matches an available skill, invoke it via the Skill tool. When in doubt, invoke the skill.

Key routing rules:
- Product ideas/brainstorming → invoke /office-hours
- Strategy/scope → invoke /plan-ceo-review
- Architecture → invoke /plan-eng-review
- Design system/plan review → invoke /design-consultation or /plan-design-review
- Full review pipeline → invoke /autoplan
- Bugs/errors → invoke /investigate
- QA/testing site behavior → invoke /qa or /qa-only
- Code review/diff check → invoke /review
- Visual polish → invoke /design-review
- Ship/deploy/PR → invoke /ship or /land-and-deploy
- Save progress → invoke /context-save
- Resume context → invoke /context-restore
32 changes: 32 additions & 0 deletions CONTEXT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# CONTEXT — Auto News Video

## Glossary

**script.json** — The contract between Claude Code (skill) and the Node CLI (pipeline). Claude writes it, the CLI validates it with Zod, then renders a video from it. Contains metadata, voice config, and an array of scenes.

**templateData** — The content payload Claude provides per scene, discriminated by `template` field. Claude picks the template type (creative decision) and fills in the fields (content). The CLI reads `templateData` to compose HTML. NOT the same as `visual` (a stale term from the April 2026 design spec that never shipped in this form).

**Scene** — One segment of the video. Has `id`, `type` (hook|body|outro), `voiceText` (Vietnamese, TTS-safe), `templateData` (the chosen template + content), and optional `sfx` override.

**Skill** — A Claude Code slash command (`.claude/skills/create-news-video/SKILL.md`) that orchestrates: fetch content → analyze → write script.json → run pipeline. The "creative" half of the architecture.

**Pipeline** — The deterministic Node/TS half (`src/pipeline.ts`): validate script.json → TTS per scene → concat voice with SFX → compose HTML → render with HyperFrames → output video.mp4. Same input always produces identical frames.

**voiceText** — Per-scene Vietnamese text for TTS. Dual role: (1) fed verbatim to LucyLab/ElevenLabs for speech synthesis — numbers MUST be spelled out phonetically ("năm phần trăm" not "5%"), and (2) scanned by the 3-tier SFX picker for semantic keywords to auto-select sound effects. This coupling is intentional — news writing naturally uses emotional language that maps to SFX categories.

**SFX picker** — 3-tier per-scene sound effect selection: (1) explicit `scene.sfx` override, (2) semantic keyword match on `voiceText` (Vietnamese + English), (3) template default category. Within a category, files are picked deterministically by hashing `scene.id`. Anti-repetition window (last 2 scenes) prevents back-to-back duplicates.

**Channel** — The brand identity: "Công nghệ 24h". Appears on the outro card and can be customized via `metadata.channel`.

**Doc maintenance** — SKILL.md is the authoritative document (it's what Claude reads). The design spec (`docs/superpowers/specs/`) is a pre-implementation artifact and may drift. Code is the implementation but the skill file defines the contract Claude follows. When they diverge, SKILL.md wins — update it first, then align code to match.

**Template selection** — Claude picks templates per scene based on content signals (the "When it's picked" column in README), not randomly. Hook always first, outro always last. Body templates match the story beat: a stat → `stat-hero`, a comparison → `comparison`, a list → `feature-list`, a warning → `callout`. Following content signals naturally produces variety — no mechanical "don't repeat" rule needed.

**Template count** — 6 templates are implemented (hook, comparison, stat-hero, feature-list, callout, outro). README lists 6 more (quote-card, icon-grid, timeline, big-text, chart-bars, kinetic-quote) — these are documented aspirations, planned for future implementation. SKILL.md should only reference the 6 that actually render.

**Dashboard** — The web UI + HTTP server at `localhost:4317` (`src/server.ts` + `src/ui/`). Browses outputs, triggers video generation from an article URL, streams job progress via SSE. The third architectural component alongside Skill and Pipeline.

**Job** — An async process kicked off by the dashboard. Has an id, status (`running` | `success` | `failed`), logs, and an SSE event stream. Only one job runs at a time (V1). A pipeline job runs the full generate+render flow; a generate job produces script.json via LLM first, then chains into pipeline.

**Generate** — The LLM-powered step that turns an article URL into `script.json`. The creative half (formerly exclusive to the Skill slash command), now callable from the dashboard via `POST /api/generate`. The server reads SKILL.md as the system prompt and provides a `web_fetch` tool so the LLM can fetch the article. Supports Anthropic, OpenAI, and DeepSeek providers via `LLM_PROVIDER` env var.

36 changes: 36 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# ── Build stage ──────────────────────────────────────────────────────────
FROM node:26-bookworm-slim AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
RUN npm prune --production

# ── Runtime stage ─────────────────────────────────────────────────────────
FROM node:26-bookworm-slim

RUN apt-get update \
&& apt-get install -y --no-install-recommends ca-certificates ffmpeg \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY --from=build /app/package*.json ./
COPY --from=build /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
COPY --from=build /app/assets ./assets
COPY --from=build /app/src ./src

RUN groupadd -r appuser && useradd -r -g appuser -d /app appuser \
&& chown -R appuser:appuser /app
USER appuser

ENV HOST=0.0.0.0 \
PORT=4317 \
PUBLIC_BASE_PATH=/news-video-creating \
PUBLIC_DEMO_MODE=1

EXPOSE 4317

CMD ["node", "dist/server.js"]
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,7 @@ Open `.env.local` and pick **one of two providers**:

```env
TTS_PROVIDER=lucylab
VIETNAMESE_API_KEY=sk_live_xxxxxxxxxxxxxxxxxxxx
VIETNAMESE_API_KEY=your_lucylab_api_key_here
VIETNAMESE_VOICEID=22charvoiceiduuidhere
```

Expand Down
Loading