From dcf72d8fcf8dfc6249e8a294052df728f7d76326 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miguel=20=C3=81ngel?= Date: Fri, 15 May 2026 13:16:35 -0700 Subject: [PATCH 1/3] feat(skills): add pr-to-hyperframes visual walkthrough skill MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit New skill that generates short video walkthroughs of PR changes and embeds them in the PR description. Detects visual/UI changes in the diff and composes a before/after or feature walkthrough using HyperFrames. Includes ambient suggestion — when creating a PR with visual changes, the skill proactively offers to generate a walkthrough video so reviewers can see changes in motion instead of reading diffs. --- CLAUDE.md | 2 +- skills/pr-to-hyperframes/SKILL.md | 188 ++++++++++++++++++++++++++++++ 2 files changed, 189 insertions(+), 1 deletion(-) create mode 100644 skills/pr-to-hyperframes/SKILL.md diff --git a/CLAUDE.md b/CLAUDE.md index 4c8b6990a..b8fa6f199 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -67,4 +67,4 @@ will not match CI. Use it only for local-only experimentation. ## Skills -Composition authoring (not repo development) is guided by skills installed via `npx skills add heygen-com/hyperframes`. See `skills/` for source. Invoke `/hyperframes`, `/hyperframes-cli`, `/hyperframes-registry`, `/tailwind`, or `/gsap` when authoring compositions. Use `/tailwind` for projects created with `hyperframes init --tailwind` so agents follow the pinned Tailwind v4 browser-runtime contract instead of Studio's Tailwind v3 setup. Use `/animejs`, `/css-animations`, `/lottie`, `/three`, or `/waapi` when a composition uses those first-party runtime adapters. Invoke `/hyperframes-media` for asset preprocessing (TTS narration, audio/video transcription, background removal for transparent overlays) — these commands have their own skill so the CLI skill stays focused on the dev loop. When a user provides a website URL and wants a video, invoke `/website-to-hyperframes` — it runs the full 7-step capture-to-video pipeline. +Composition authoring (not repo development) is guided by skills installed via `npx skills add heygen-com/hyperframes`. See `skills/` for source. Invoke `/hyperframes`, `/hyperframes-cli`, `/hyperframes-registry`, `/tailwind`, or `/gsap` when authoring compositions. Use `/tailwind` for projects created with `hyperframes init --tailwind` so agents follow the pinned Tailwind v4 browser-runtime contract instead of Studio's Tailwind v3 setup. Use `/animejs`, `/css-animations`, `/lottie`, `/three`, or `/waapi` when a composition uses those first-party runtime adapters. Invoke `/hyperframes-media` for asset preprocessing (TTS narration, audio/video transcription, background removal for transparent overlays) — these commands have their own skill so the CLI skill stays focused on the dev loop. When a user provides a website URL and wants a video, invoke `/website-to-hyperframes` — it runs the full 7-step capture-to-video pipeline. When creating a PR with visual/UI changes, invoke `/pr-to-hyperframes` to generate a short walkthrough video and embed it in the PR description — reviewers see the changes in motion instead of reading diffs. diff --git a/skills/pr-to-hyperframes/SKILL.md b/skills/pr-to-hyperframes/SKILL.md new file mode 100644 index 000000000..dbb67b7d8 --- /dev/null +++ b/skills/pr-to-hyperframes/SKILL.md @@ -0,0 +1,188 @@ +--- +name: pr-to-hyperframes +description: | + Generate a short visual walkthrough video for a pull request and embed it in the PR description. Use when: (1) the user is about to create or has just created a PR with visual/UI changes, (2) the user asks for a PR demo video, walkthrough, or visual summary, (3) you detect the current branch has changes to UI components, styles, layouts, or frontend code and a PR is being created. Triggers on: "create a PR video", "add a walkthrough", "make a demo for this PR", "record the changes", or when `gh pr create` is about to run on a branch with visual diffs. +--- + +# PR to HyperFrames + +Generate a short video walkthrough of a pull request's visual changes and embed it in the PR body. Reviewers see the changes in motion instead of reading diffs — faster reviews, fewer misunderstandings. + +## When to use + +**Explicit invocation:** + +- User says "make a PR video", "add a walkthrough video to my PR", "record a demo of these changes" + +**Ambient suggestion (proactive):** + +- You're about to run `gh pr create` or the user asks you to open a PR +- The branch diff touches visual files (see detection rules below) +- Suggest: _"This PR has visual changes — want me to generate a quick HyperFrames walkthrough video to embed in the description?"_ +- If the user declines, proceed with the normal PR. Never push. + +## Detection rules + +A diff counts as "visual" if it touches any of: + +- `*.tsx`, `*.jsx`, `*.vue`, `*.svelte` files that contain JSX/template markup (not pure logic files) +- `*.css`, `*.scss`, `*.less`, `*.module.css`, `*.styled.*` +- `*.html` files +- Image assets (`*.png`, `*.jpg`, `*.svg`, `*.gif`, `*.webp`) +- Tailwind config, theme files, design tokens +- Storybook stories (`*.stories.*`) +- Component library files + +**Skip suggestion** if the diff is purely: + +- Backend/API changes, migrations, configs +- Test files only +- Documentation only +- Dependency bumps + +--- + +## Workflow + +### Step 1: Analyze the diff + +```bash +git diff main...HEAD --stat +git diff main...HEAD --name-only +``` + +Identify: + +1. Which files changed and what kind of changes (new component, restyled existing, layout shift, new page) +2. The narrative: what's the story of this PR in 10-15 seconds? +3. Key visual moments worth highlighting + +Read the changed files to understand the actual UI changes. Don't guess from filenames. + +### Step 2: Capture before/after states + +If the project has a dev server or Storybook: + +**Before state** — capture from `main`: + +```bash +git stash # if needed +git checkout main +# start dev server, capture screenshots/recordings of affected pages +npx hyperframes browser capture --url --output before/ +git checkout - # back to feature branch +git stash pop # if needed +``` + +**After state** — capture from the feature branch: + +```bash +# start dev server, capture screenshots/recordings of affected pages +npx hyperframes browser capture --url --output after/ +``` + +If no dev server is available, compose the video from the diff itself — show code snippets, annotated screenshots, or architectural diagrams. A code-walkthrough video still beats a wall of diff text. + +### Step 3: Compose the video + +Initialize and build the walkthrough composition: + +```bash +npx hyperframes init pr-walkthrough --non-interactive +``` + +**Read:** The `hyperframes` skill (load it for composition rules). + +Build a composition that tells the PR story. Typical structure: + +| Beat | Duration | Content | +| ------- | -------- | --------------------------------------------------- | +| Title | 2-3s | PR title + one-liner description | +| Context | 2-3s | What area of the app changed (screenshot/highlight) | +| Before | 3-4s | Previous state (if available) | +| After | 3-4s | New state with annotations pointing out changes | +| Summary | 2s | Key takeaway + PR number | + +Adapt the structure to what makes sense. A CSS-only fix might just be a before/after split. A new feature might need a full walkthrough. A layout change might use an animated overlay. + +**Design guidelines:** + +- Keep it under 20 seconds. Reviewers are busy. +- Use the project's brand colors if `design.md` exists, otherwise use a clean neutral palette. +- Annotate changes — arrows, highlights, zoom-ins on the specific things that changed. +- End card should read: `PR #` + +### Step 4: Render + +```bash +cd pr-walkthrough +npx hyperframes lint +npx hyperframes render -o ../pr-walkthrough.mp4 +``` + +### Step 5: Upload and embed + +Upload the video and embed it in the PR body: + +```bash +# Upload to a public host or use GitHub's drag-drop +# Then add to PR body: +gh pr edit <number> --body "$(gh pr view <number> --json body -q .body) + +## Visual Walkthrough + +https://user-images.githubusercontent.com/<video-url> + +<sub>Walkthrough generated with [HyperFrames](https://github.com/nichochar/hyperframes-oss) — write HTML, render video.</sub> +" +``` + +If creating a new PR, include the video section in the initial `gh pr create --body`. + +The attribution line is a single `<sub>` tag at the end of the walkthrough section. It links to the repo — useful for reviewers who want to make their own walkthrough videos. + +--- + +## Composition tips + +### Before/after split + +For style changes, use a vertical or horizontal split with a wipe transition: + +- Left/top = before, right/bottom = after +- Animate a divider line sweeping across to reveal the change +- Label each side clearly + +### Feature walkthrough + +For new features, simulate user interaction: + +- Show the page loading +- Highlight the new element with a pulse or glow +- Show the interaction flow (click → result) +- Use cursor animation to guide the eye + +### Code-only fallback + +When no UI can be captured: + +- Show the key files changed (syntax-highlighted code blocks) +- Highlight the specific lines that changed (green for additions, red for removals) +- Zoom into the important parts +- Add brief text annotations explaining the change + +--- + +## Examples + +**Simple CSS fix:** + +> 5-second video: before screenshot → wipe transition → after screenshot → "Fixed padding on mobile nav — PR #142" + +**New component:** + +> 12-second video: title card → component in isolation → component in context → interaction demo → end card + +**Refactor with visual changes:** + +> 15-second video: title card → 3 before/after pairs cycling through affected pages → summary of what changed → end card From 35869e1f9da413a05bb93b5cd9e9f544c884a576 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miguel=20=C3=81ngel?= <miguel.sierra@heygen.com> Date: Fri, 15 May 2026 13:55:58 -0700 Subject: [PATCH 2/3] feat(skills): add pr-to-hyperframes video walkthrough skill MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Full pipeline for generating narrated video walkthroughs of PRs: - SKILL.md with 5-step workflow (analyze → narrate → TTS → manifest → render) - build.mjs: manifest-driven HTML composition with syntax-highlighted diffs, code slides, captions from whisper transcripts, and GSAP timeline - render.sh: full pipeline (assets → whisper → build → hyperframes → ffmpeg) - generate-audio.sh: per-segment Gemini TTS with validation and silence trimming - make-video.sh: fallback static-slide assembler Branding is auto-detected from the repo (package.json, design.md, git remote, logo files) — the only HyperFrames mention is a subtle attribution line in the outro and PR body. This makes the skill generic across any project. Includes ambient suggestion mode: when creating a PR with visual changes, the skill proactively offers to generate a walkthrough video. --- skills/pr-to-hyperframes/.gitignore | 2 + skills/pr-to-hyperframes/SKILL.md | 461 +++++++--- .../scripts/generate-audio.sh | 263 ++++++ .../pr-to-hyperframes/scripts/make-video.sh | 83 ++ skills/pr-to-hyperframes/video/.gitignore | 6 + skills/pr-to-hyperframes/video/build.mjs | 838 ++++++++++++++++++ .../pr-to-hyperframes/video/hyperframes.json | 9 + skills/pr-to-hyperframes/video/meta.json | 4 + skills/pr-to-hyperframes/video/render.sh | 155 ++++ 9 files changed, 1722 insertions(+), 99 deletions(-) create mode 100644 skills/pr-to-hyperframes/.gitignore create mode 100755 skills/pr-to-hyperframes/scripts/generate-audio.sh create mode 100755 skills/pr-to-hyperframes/scripts/make-video.sh create mode 100644 skills/pr-to-hyperframes/video/.gitignore create mode 100644 skills/pr-to-hyperframes/video/build.mjs create mode 100644 skills/pr-to-hyperframes/video/hyperframes.json create mode 100644 skills/pr-to-hyperframes/video/meta.json create mode 100755 skills/pr-to-hyperframes/video/render.sh diff --git a/skills/pr-to-hyperframes/.gitignore b/skills/pr-to-hyperframes/.gitignore new file mode 100644 index 000000000..6c7148c93 --- /dev/null +++ b/skills/pr-to-hyperframes/.gitignore @@ -0,0 +1,2 @@ +tmp/ +out/ diff --git a/skills/pr-to-hyperframes/SKILL.md b/skills/pr-to-hyperframes/SKILL.md index dbb67b7d8..bf4fc4769 100644 --- a/skills/pr-to-hyperframes/SKILL.md +++ b/skills/pr-to-hyperframes/SKILL.md @@ -1,31 +1,78 @@ --- name: pr-to-hyperframes description: | - Generate a short visual walkthrough video for a pull request and embed it in the PR description. Use when: (1) the user is about to create or has just created a PR with visual/UI changes, (2) the user asks for a PR demo video, walkthrough, or visual summary, (3) you detect the current branch has changes to UI components, styles, layouts, or frontend code and a PR is being created. Triggers on: "create a PR video", "add a walkthrough", "make a demo for this PR", "record the changes", or when `gh pr create` is about to run on a branch with visual diffs. + Create a narrated video walkthrough of a pull request with code slides, diff visualization, and audio narration. Pulls branding from the repo automatically. Use when: (1) the user asks for a PR walkthrough, PR video, or demo video, (2) you're about to create a PR with visual/UI changes and want to suggest a video, (3) the user says "make a PR video", "add a walkthrough", "record a demo for this PR". Triggers on: PR creation with visual diffs, explicit walkthrough requests, or when `gh pr create` is about to run on a branch with UI changes. --- -# PR to HyperFrames +# PR walkthrough video -Generate a short video walkthrough of a pull request's visual changes and embed it in the PR body. Reviewers see the changes in motion instead of reading diffs — faster reviews, fewer misunderstandings. +Create a narrated walkthrough video for a pull request. This provides the same benefit as a Loom video from the PR author — walking through the code changes, explaining what was done and why, so reviewers understand the PR quickly. -## When to use +**Input:** A GitHub pull request URL, PR number, or the current branch (auto-detects the PR). -**Explicit invocation:** +**Output:** An MP4 video at 1280x720 (30 fps) with audio narration, whisper-timed captions, and branded intro/outro slides, saved to `out/pr-<number>-walkthrough.mp4`. -- User says "make a PR video", "add a walkthrough video to my PR", "record a demo of these changes" +All intermediate files (audio, manifest, scripts) go in `tmp/pr-<number>/` relative to this skill directory. This directory is gitignored. Only the final `.mp4` lives at `out/`. -**Ambient suggestion (proactive):** +Run commands that reference `./scripts` or `./video` from this skill directory. -- You're about to run `gh pr create` or the user asks you to open a PR -- The branch diff touches visual files (see detection rules below) -- Suggest: _"This PR has visual changes — want me to generate a quick HyperFrames walkthrough video to embed in the description?"_ -- If the user declines, proceed with the normal PR. Never push. +## Branding -## Detection rules +**The skill auto-detects branding from the repo.** It never hardcodes project-specific colors, logos, or names. At the start of every run, resolve branding: -A diff counts as "visual" if it touches any of: +1. **Project name** — read `package.json` → `name` field (strip `@scope/` prefix). Fallback: git remote repo name. Fallback: directory name. +2. **Colors** — read `design.md` or `DESIGN.md` if it exists (check both casings). Extract primary color, background color, accent color. Fallback: neutral palette (`#09090b` text on `#ffffff` background, `#3b82f6` accent). +3. **Fonts** — from `design.md` if present. Fallback: `"Geist"` for body, `"Geist Mono"` for code. +4. **Logo** — look for `logo.svg` or `logo.png` in repo root, `public/`, `assets/`, `.github/`. If found, use it in intro/outro. If not found, use the project name as text. +5. **Repo identifier** — parse `git remote get-url origin` for the `org/repo` slug (e.g., `acme/widget`). -- `*.tsx`, `*.jsx`, `*.vue`, `*.svelte` files that contain JSX/template markup (not pure logic files) +Pass these values to `build.mjs` via a `branding` key in the manifest: + +```json +{ + "branding": { + "name": "widget", + "org": "acme", + "repo": "acme/widget", + "logo": null, + "colors": { + "text": "#09090b", + "background": "#ffffff", + "accent": "#3b82f6", + "caption": "#ffd800", + "captionBg": "#09090b" + }, + "fonts": { + "body": "Geist", + "mono": "Geist Mono" + } + } +} +``` + +The **outro slide** shows the project logo/name and a subtle attribution line: + +``` +[Project Logo or Name] +PR Walkthrough · #NNN +Made with HyperFrames +``` + +The **footer bar** shows the project mark + name on the left, and `PR #NNN` on the right. The **PR body** attribution reads: + +```html +<sub>Walkthrough by [HyperFrames](https://hyperframes.dev) — write HTML, render video.</sub> +``` + +This is the only HyperFrames mention. Everything else is the repo's own branding. + +## When to suggest (ambient mode) + +When you're about to run `gh pr create` or the user asks you to open a PR, check if the branch diff touches visual files: + +**Visual file patterns:** + +- `*.tsx`, `*.jsx`, `*.vue`, `*.svelte` with JSX/template markup - `*.css`, `*.scss`, `*.less`, `*.module.css`, `*.styled.*` - `*.html` files - Image assets (`*.png`, `*.jpg`, `*.svg`, `*.gif`, `*.webp`) @@ -33,156 +80,372 @@ A diff counts as "visual" if it touches any of: - Storybook stories (`*.stories.*`) - Component library files -**Skip suggestion** if the diff is purely: +**Skip suggestion** if the diff is purely backend, tests, docs, or dependency bumps. -- Backend/API changes, migrations, configs -- Test files only -- Documentation only -- Dependency bumps +If visual changes are detected, suggest: _"This PR has visual changes — want me to generate a quick walkthrough video to embed in the description?"_ ---- +If the user declines, proceed with the normal PR. Never push. + +## Philosophy + +**This is a walkthrough from the author's perspective.** The goal is the same as if the PR author sat down with a reviewer and walked them through the changes — showing specific code, explaining what changed and why, in an order that builds understanding. + +This means: + +- **The narration drives everything.** Write the walkthrough narration first, as a continuous explanation of the PR. Then figure out what should be on screen at each moment. +- **Show the code.** The default visual is a code diff or source file. Text slides are the exception (intro, brief transitions, outro), not the rule. +- **Walk through changes in a logical order**, not necessarily file order or commit order — always anchored to concrete code. +- **Explain the "why", not just the "what".** The code on screen shows what changed. The narration adds the reasoning. ## Workflow -### Step 1: Analyze the diff +### Step 1: Understand the PR + +Read the PR commits, diff, and description. Understand the narrative arc: + +- What problem does this solve? +- What's the approach? +- What are the key mechanisms? ```bash -git diff main...HEAD --stat -git diff main...HEAD --name-only +gh pr view <number> --json title,body,commits +git log main..HEAD --oneline +git diff main..HEAD --stat ``` -Identify: +**Skip generated files.** When reading the diff, ignore auto-generated files: -1. Which files changed and what kind of changes (new component, restyled existing, layout shift, new page) -2. The narrative: what's the story of this PR in 10-15 seconds? -3. Key visual moments worth highlighting +- Lockfiles (`yarn.lock`, `package-lock.json`, `bun.lockb`) +- Generated docs, API reports, changelogs +- Build output, bundled assets, source maps +- Snapshots, schema dumps -Read the changed files to understand the actual UI changes. Don't guess from filenames. +If unsure whether a file is generated, check for a "DO NOT EDIT" header. Filter these out when picking which files to feature. -### Step 2: Capture before/after states +**Resolve branding.** Read `package.json`, `design.md`, check for logos, parse git remote. Build the `branding` object for the manifest. -If the project has a dev server or Storybook: +### Step 2: Write the narration -**Before state** — capture from `main`: +Write the narration as continuous text, broken into logical segments. Each segment is a beat of the walkthrough. Save this as `tmp/pr-<number>/SCRIPT.md`. + +The narration should read like the author explaining the PR to a colleague: "So here's what we're doing... The core problem was X... The approach I took was Y... If you look at this function here..." + +Structure: intro → context/problem → code walkthrough → summary. See **Script structure** below. + +Avoid redundancy between intro and first content segment. + +### Step 3: Generate audio and timestamps + +Generate per-segment audio clips with one TTS call per segment: ```bash -git stash # if needed -git checkout main -# start dev server, capture screenshots/recordings of affected pages -npx hyperframes browser capture --url <dev-url> --output before/ -git checkout - # back to feature branch -git stash pop # if needed +./scripts/generate-audio.sh narration.json tmp/pr-<number>/ ``` -**After state** — capture from the feature branch: +**API key:** Sourced from `.env` file (`GEMINI_API_KEY`). -```bash -# start dev server, capture screenshots/recordings of affected pages -npx hyperframes browser capture --url <dev-url> --output after/ +#### Narration JSON format + +```json +{ + "style": "Read the following walkthrough narration in a calm, steady, professional tone. Speak at a measured pace as if the author of a pull request were walking a colleague through the code changes.", + "voice": "Iapetus", + "slides": [ + "This pull request adds group-aware binding resolution...", + "The core problem was that arrow bindings broke when...", + "If you look at the getBindingTarget method..." + ] +} ``` -If no dev server is available, compose the video from the diff itself — show code snippets, annotated screenshots, or architectural diagrams. A code-walkthrough video still beats a wall of diff text. +- **`style`** — Voice persona and pacing instructions. Keep it short and specific. +- **`voice`** — Gemini voice name (default: `Iapetus`). +- **`slides`** — Array of narration text, one entry per segment. + +#### How it works + +1. For each segment, the script builds a prompt: style preamble + segment text. +2. One API call to `gemini-2.5-pro-tts` per segment generates a WAV clip directly. +3. Each clip is validated (duration sanity check vs word count) and retried automatically if the output is bad. +4. Leading/trailing silence is trimmed from each clip. + +**Output:** Per-segment audio clips (`audio-00.wav`, ...) and a `durations.json` file mapping each audio filename to its duration in seconds. + +**Dependencies:** ffmpeg / ffprobe. No Python packages required beyond the standard library. + +**Do NOT use** `[pause long]` or `[pause medium]` markup tags — the model may read them aloud literally. + +### Step 4: Write the manifest + +The manifest is a JSON file that describes every slide in the video. It bridges the narration/audio step and the hyperframes renderer. + +Read the `durations.json` from step 3 to get the duration (in seconds) for each audio clip. Then write a `manifest.json` alongside the audio files: + +```json +{ + "pr": 142, + "branding": { + "name": "widget", + "org": "acme", + "repo": "acme/widget", + "logo": null, + "colors": { + "text": "#09090b", + "background": "#ffffff", + "accent": "#3b82f6", + "caption": "#ffd800", + "captionBg": "#09090b" + }, + "fonts": { "body": "Geist", "mono": "Geist Mono" } + }, + "slides": [ + { + "type": "intro", + "title": "Fix canvas z-index layering #142", + "date": "May 15, 2026", + "audio": "audio-00.wav", + "durationInSeconds": 3.2 + }, + { + "type": "diff", + "filename": "packages/editor/editor.css", + "language": "css", + "diff": "@@ -12,7 +12,7 @@\n --z-canvas: 100;\n- --z-canvas-front: 600;\n+ --z-canvas-front: 250;", + "audio": "audio-01.wav", + "durationInSeconds": 25.8 + }, + { + "type": "code", + "filename": "packages/editor/src/Editor.ts", + "language": "typescript", + "code": "function getZIndex() {\n return 250\n}", + "audio": "audio-02.wav", + "durationInSeconds": 13.5 + }, + { + "type": "text", + "title": "Summary", + "subtitle": "Moved canvas-in-front from z-index 600 to 250.", + "audio": "audio-07.wav", + "durationInSeconds": 7.4 + }, + { + "type": "outro", + "durationInSeconds": 3 + } + ] +} +``` + +#### Slide types + +| Type | Required fields | Description | +| --------- | ------------------------------------------------------------ | ------------------------------- | +| `intro` | `title`, `date`, `audio`, `durationInSeconds` | Project name + title + date | +| `diff` | `filename`, `language`, `diff`, `audio`, `durationInSeconds` | Syntax-highlighted unified diff | +| `code` | `filename`, `language`, `code`, `audio`, `durationInSeconds` | Syntax-highlighted source code | +| `text` | `title`, `audio`, `durationInSeconds` | Title + optional `subtitle` | +| `list` | `title`, `items`, `audio`, `durationInSeconds` | Title + numbered items | +| `image` | `src`, `audio`, `durationInSeconds` | Pre-rendered image (fallback) | +| `segment` | `title`, `durationInSeconds` | Silent title card between parts | +| `outro` | `durationInSeconds` | Project branding + attribution | + +#### Animated scroll with `focus` + +For longer diffs or code (more than ~30 lines), the renderer keeps the font at a readable 16px and uses an animated viewport that scrolls between focus points. Add a `focus` array to `diff` or `code` slides: + +```json +{ + "type": "diff", + "filename": "src/lib/Editor.ts", + "language": "typescript", + "diff": "... 60-line diff ...", + "focus": [ + { "line": 3, "at": 0 }, + { "line": 25, "at": 0.4 }, + { "line": 50, "at": 0.8 } + ], + "audio": "audio-03.wav", + "durationInSeconds": 30 +} +``` + +- **`line`** — The line number (0-indexed) to center on screen. +- **`at`** — When to arrive at this position, as a fraction of the slide's duration (0 = start, 1 = end). -### Step 3: Compose the video +**When to use focus:** Any diff or code slide with more than ~30 lines. +**When to omit focus:** Short diffs (<=30 lines) fit on screen and don't need scrolling. -Initialize and build the walkthrough composition: +#### Writing diff fields + +For `diff` slides, paste the **unified diff** for the relevant hunk(s) — the output of `git diff` for that section, including the `@@` hunk header and `+`/`-`/` ` line prefixes. The renderer parses these to apply green/red backgrounds. ```bash -npx hyperframes init pr-walkthrough --non-interactive +git diff main..HEAD -- path/to/file.ts ``` -**Read:** The `hyperframes` skill (load it for composition rules). +Include only the relevant hunks. Strip the `diff --git` and `---`/`+++` header lines — start from `@@`. + +#### Segment title slides -Build a composition that tells the PR story. Typical structure: +Insert a **`segment` slide** before each content segment to introduce it — except before the intro and context segments. Each segment slide is **3 seconds of silence** with the segment title centered. -| Beat | Duration | Content | -| ------- | -------- | --------------------------------------------------- | -| Title | 2-3s | PR title + one-liner description | -| Context | 2-3s | What area of the app changed (screenshot/highlight) | -| Before | 3-4s | Previous state (if available) | -| After | 3-4s | New state with annotations pointing out changes | -| Summary | 2s | Key takeaway + PR number | +```json +{ + "type": "segment", + "title": "State machine refactor", + "durationInSeconds": 3 +} +``` -Adapt the structure to what makes sense. A CSS-only fix might just be a before/after split. A new feature might need a full walkthrough. A layout change might use an animated overlay. +#### Segment title labels on code/diff slides -**Design guidelines:** +Add a `title` field to `code` and `diff` slides to show a small label in the top-left corner identifying the current segment. Use the same title as the preceding `segment` slide. -- Keep it under 20 seconds. Reviewers are busy. -- Use the project's brand colors if `design.md` exists, otherwise use a clean neutral palette. -- Annotate changes — arrows, highlights, zoom-ins on the specific things that changed. -- End card should read: `PR #<number> — <title>` +### Step 5: Render the video -### Step 4: Render +Run the `render.sh` script: ```bash -cd pr-walkthrough -npx hyperframes lint -npx hyperframes render -o ../pr-walkthrough.mp4 +./video/render.sh \ + tmp/pr-<number>/manifest.json \ + out/pr-<number>-walkthrough.mp4 ``` -### Step 5: Upload and embed +The script: -Upload the video and embed it in the PR body: +1. Copies referenced audio/image files into `video/assets/`. +2. Runs whisper transcription on each audio file → `video/transcripts/audio-NN.json` (idempotent). +3. Runs `build.mjs <manifest>` to generate `video/index.html` — a hyperframes composition with timed clips, GSAP timeline for transitions and code-focus pans, and captions derived from whisper transcripts. +4. Lints and renders 1920x1080 frames via `npx hyperframes render`. +5. Downscales to 1280x720 / 30fps and recompresses with ffmpeg (CRF 26 + AAC 96k). + +**Dependencies:** Node.js 22+, ffmpeg, Python 3. `hyperframes` is invoked via `npx --yes`. + +### Step 6: Embed in PR + +After rendering, embed the video in the PR body: ```bash -# Upload to a public host or use GitHub's drag-drop -# Then add to PR body: +# Add to existing PR: gh pr edit <number> --body "$(gh pr view <number> --json body -q .body) ## Visual Walkthrough -https://user-images.githubusercontent.com/<video-url> +<video src=\"out/pr-<number>-walkthrough.mp4\"></video> -<sub>Walkthrough generated with [HyperFrames](https://github.com/nichochar/hyperframes-oss) — write HTML, render video.</sub> +<sub>Walkthrough by [HyperFrames](https://hyperframes.dev) — write HTML, render video.</sub> " ``` -If creating a new PR, include the video section in the initial `gh pr create --body`. +Or include the video section in the initial `gh pr create --body` when creating a new PR. -The attribution line is a single `<sub>` tag at the end of the walkthrough section. It links to the repo — useful for reviewers who want to make their own walkthrough videos. +#### Caption sync via whisper ---- +Captions appear as colored text on a solid dark pill at the bottom. Start/end times come from word-level whisper transcripts grouped into 5-7 word chunks, breaking on natural pauses (>450ms gaps). Whisper may transcribe brand names phonetically — acceptable for captions. -## Composition tips +#### File size knobs -### Before/after split +Default targets ~30-60 MB for an 8-minute video. To tune: -For style changes, use a vertical or horizontal split with a wipe transition: +- `--crf <n>` in the ffmpeg step: 22 is near-lossless, 26 is default, 30+ is smaller. -- Left/top = before, right/bottom = after -- Animate a divider line sweeping across to reveal the change -- Label each side clearly +## File organization -### Feature walkthrough +``` +pr-to-hyperframes/ +├── SKILL.md # This file +├── scripts/ # CLI tools (checked in) +│ ├── generate-audio.sh # narration.json → per-slide WAVs + durations.json +│ └── make-video.sh # Static slide + audio assembly fallback +├── video/ # Hyperframes project (checked in) +│ ├── hyperframes.json # hyperframes config +│ ├── meta.json # project meta +│ ├── build.mjs # manifest.json → index.html composition +│ ├── render.sh # manifest.json → 720p MP4 (full pipeline) +│ ├── assets/ # Auto-populated at render time (gitignored) +│ ├── transcripts/ # Whisper word-level JSON (gitignored, cached) +│ └── renders/ # Intermediate 1080p renders (gitignored) +├── out/ # Final outputs (gitignored) +│ └── pr-XXXX-walkthrough.mp4 +└── tmp/ # Intermediate files (gitignored) + └── pr-XXXX/ + ├── SCRIPT.md # Narration script + ├── narration.json # Input to generate-audio.sh + ├── durations.json # Audio durations + ├── manifest.json # Input to render.sh + └── audio-XX.wav # Per-segment audio clips +``` -For new features, simulate user interaction: +## API configuration -- Show the page loading -- Highlight the new element with a pulse or glow -- Show the interaction flow (click → result) -- Use cursor animation to guide the eye +- **Gemini API key:** Stored as `GEMINI_API_KEY` in the project root `.env`. +- **TTS model:** `gemini-2.5-pro-tts` +- **TTS voice:** `Iapetus` (default) -### Code-only fallback +## Script structure -When no UI can be captured: +The walkthrough follows a consistent narrative arc. 8-12 segments total, with the vast majority showing code. -- Show the key files changed (syntax-highlighted code blocks) -- Highlight the specific lines that changed (green for additions, red for removals) -- Zoom into the important parts -- Add brief text annotations explaining the change +### Intro (1 segment) ---- +The intro card: project logo/name + PR title + date. The narration should be a single sentence framing what the PR does at a high level. + +Manifest slide type: `intro`. + +### Context (0-1 segments) + +Brief orientation before diving into code. What was the situation before this PR? What problem motivated the work? + +- Be concrete: "Arrow bindings broke when the target was inside a group" not "There were issues with bindings" +- Name the area of the codebase affected + +If context can be explained while showing the first piece of code, skip the standalone context segment. + +Manifest slide type: `text` or `diff`. + +### Code walkthrough (6-10 segments) + +The bulk of the video. Walk through actual code changes, showing diffs and files while explaining what was done and why. + +**Every segment should show code.** Use `diff` slides for changes and `code` slides for unchanged reference code. + +- **Name files and functions.** Every segment should reference at least one specific file or function. +- **Show the diff.** Use `git diff main..HEAD -- path/to/file` and extract relevant hunks. +- **Order by understanding, not by file.** Present changes in the order that builds comprehension. +- **Explain the "why", not just the "what".** +- **Skip boilerplate, but mention it.** "There are also some type exports added in `index.ts`." +- **Group related small changes.** If three files got the same one-line fix, one segment covers all three. + +### Summary (1 segment) + +Brief recap of what the PR accomplished. A sentence or two summarizing the change, mentioning known limitations or follow-up work. + +Manifest slide type: `text`. -## Examples +### Outro (1 segment, silent) -**Simple CSS fix:** +The project logo/name, a subtle "Made with HyperFrames" line, 3 seconds of silence. -> 5-second video: before screenshot → wipe transition → after screenshot → "Fixed padding on mobile nav — PR #142" +Manifest slide type: `outro` with `durationInSeconds: 3`. -**New component:** +## Narration writing tips -> 12-second video: title card → component in isolation → component in context → interaction demo → end card +- **Be specific about code.** Say "In `BindingUtil.ts`, the `onAfterChange` handler now checks for group ancestors" — not "The binding system was updated." +- **Each segment = one change or closely related group.** +- **Write as the author.** "So the main thing here is..." or "The tricky part was..." are fine. +- **Avoid redundancy** between intro and first content segment. +- **Mention files that aren't shown.** If a PR touches 15 files but only 6 are interesting, briefly acknowledge the others. +- Aim for **5-7 minutes** total narration. -**Refactor with visual changes:** +## Checklist -> 15-second video: title card → 3 before/after pairs cycling through affected pages → summary of what changed → end card +- [ ] Resolve repo branding (name, colors, fonts, logo) +- [ ] Read all PR commits and understand the full diff +- [ ] Write narration in SCRIPT.md (8-12 segments) +- [ ] Generate per-segment audio (Iapetus voice) +- [ ] Read durations.json to get per-segment durations +- [ ] Write manifest.json with slide types, diffs/code, audio refs, and branding +- [ ] Render video with render.sh +- [ ] Verify final output: 1280x720 / 30 fps, audio synced, captions readable, outro present +- [ ] Embed video in PR body with HyperFrames attribution diff --git a/skills/pr-to-hyperframes/scripts/generate-audio.sh b/skills/pr-to-hyperframes/scripts/generate-audio.sh new file mode 100755 index 000000000..1f479045f --- /dev/null +++ b/skills/pr-to-hyperframes/scripts/generate-audio.sh @@ -0,0 +1,263 @@ +#!/bin/bash +# generate-audio.sh — Generate walkthrough narration audio from a JSON script. +# +# Generates one TTS call per segment, producing individual WAV clips directly. +# No chunking, alignment, or splitting needed. +# +# Usage: +# ./generate-audio.sh <script.json> [output-dir] +# +# Input JSON format: +# { +# "style": "Read in a calm, steady, professional tone...", +# "voice": "Iapetus", (optional, default: Iapetus) +# "slides": [ +# "Intro narration text...", +# "Problem slide narration...", +# "Approach narration...", +# ... +# ] +# } +# +# Output: +# <output-dir>/audio-00.wav, audio-01.wav, ... +# <output-dir>/durations.json +# +# Dependencies: +# ffmpeg / ffprobe +# +# Environment: +# GEMINI_API_KEY — required. Auto-sourced from .env if not set. +# +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" + +# --- Args --- +SCRIPT_JSON="${1:?Usage: generate-audio.sh <script.json> [output-dir]}" +OUTPUT_DIR="${2:-.}" + +# Resolve relative paths +[[ "$SCRIPT_JSON" != /* ]] && SCRIPT_JSON="$(pwd)/$SCRIPT_JSON" +[[ "$OUTPUT_DIR" != /* ]] && OUTPUT_DIR="$(pwd)/$OUTPUT_DIR" + +if [ ! -f "$SCRIPT_JSON" ]; then + echo "Error: ${SCRIPT_JSON} not found" + exit 1 +fi + +mkdir -p "$OUTPUT_DIR" + +PYTHON="python3" + +# --- API key --- +REPO_ROOT=$(git rev-parse --show-toplevel 2>/dev/null || echo ".") + +if [ -z "${GEMINI_API_KEY:-}" ]; then + if [ -f "${REPO_ROOT}/.env" ]; then + export $(grep '^GEMINI_API_KEY=' "${REPO_ROOT}/.env" | xargs) 2>/dev/null || true + fi +fi +GEMINI_API_KEY="${GEMINI_API_KEY:?Set GEMINI_API_KEY environment variable or add it to .env}" + +# --- Config --- +TTS_MODEL="gemini-2.5-pro-preview-tts" +TTS_ENDPOINT="https://generativelanguage.googleapis.com/v1beta/models/${TTS_MODEL}:generateContent" +SPEED=1.2 + +# --- Run everything in Python for reliability --- +"$PYTHON" - "$SCRIPT_JSON" "$OUTPUT_DIR" "$GEMINI_API_KEY" "$TTS_MODEL" "$TTS_ENDPOINT" "$SPEED" <<'PYTHON_SCRIPT' +import json, sys, os, subprocess, base64, urllib.request, re + +script_json = sys.argv[1] +output_dir = sys.argv[2] +api_key = sys.argv[3] +tts_model = sys.argv[4] +tts_endpoint = sys.argv[5] +speed = float(sys.argv[6]) + +MAX_RETRIES = 2 + +def api_call(endpoint, body_dict): + body = json.dumps(body_dict).encode() + req = urllib.request.Request( + f"{endpoint}?key={api_key}", + data=body, + headers={"Content-Type": "application/json"}, + method="POST", + ) + with urllib.request.urlopen(req) as resp: + return json.loads(resp.read()) + +# --- Load narration --- +with open(script_json) as f: + data = json.load(f) + +voice = data.get("voice", "Iapetus") +slides = data["slides"] +style = data.get("style", + "Read the following in a calm, steady, professional tone. " + "Speak at a measured pace.") + +word_count = sum(len(s.split()) for s in slides) +print(f"=== Generating narration audio ===") +print(f" Voice: {voice}") +print(f" Slides: {len(slides)}") +print(f" Words: {word_count}") +print() + +def call_tts(prompt_text): + response = api_call(tts_endpoint, { + "contents": [{"parts": [{"text": prompt_text}]}], + "generationConfig": { + "responseModalities": ["AUDIO"], + "speechConfig": { + "voiceConfig": { + "prebuiltVoiceConfig": { + "voiceName": voice + } + } + } + } + }) + + error_msg = response.get("error", {}).get("message", "") + if error_msg: + raise RuntimeError(f"TTS API error: {error_msg}") + + return base64.b64decode(response["candidates"][0]["content"]["parts"][0]["inlineData"]["data"]) + +def pcm_to_wav(pcm_bytes, out_wav): + pcm_tmp = out_wav + ".pcm" + with open(pcm_tmp, "wb") as f: + f.write(pcm_bytes) + subprocess.run([ + "ffmpeg", "-y", "-f", "s16le", "-ar", "24000", "-ac", "1", + "-i", pcm_tmp, "-af", f"atempo={speed}", "-ar", "48000", out_wav + ], capture_output=True, check=True) + os.remove(pcm_tmp) + +def get_duration(wav_path): + result = subprocess.run( + ["ffprobe", "-v", "error", "-show_entries", "format=duration", "-of", "csv=p=0", wav_path], + capture_output=True, text=True + ) + return float(result.stdout.strip()) + +def validate_duration(wav_path, word_count): + dur = get_duration(wav_path) + expected = word_count / 150 * 60 / speed + lower = expected * 0.3 + upper = expected * 3.0 + if word_count < 15: + return dur < 30, dur + return lower <= dur <= upper, dur + +# --- Generate one TTS call per segment --- +durations = {} + +for i, text in enumerate(slides): + num = f"{i:02d}" + out_path = os.path.join(output_dir, f"audio-{num}.wav") + wc = len(text.split()) + prompt = f"{style}\n\n{text}" + + ok = False + for attempt in range(MAX_RETRIES + 1): + try: + label = f" [{num}] " + ("" if attempt == 0 else f"(retry {attempt}) ") + print(f"{label}Generating ({wc} words)...", end=" ", flush=True) + pcm_data = call_tts(prompt) + pcm_to_wav(pcm_data, out_path) + ok, dur = validate_duration(out_path, wc) + if ok: + print(f"{dur:.1f}s") + durations[f"audio-{num}.wav"] = round(dur, 2) + break + else: + expected = wc / 150 * 60 / speed + print(f"{dur:.1f}s (expected ~{expected:.0f}s, retrying)") + except (urllib.error.HTTPError, RuntimeError) as e: + print(f"error: {e}") + if attempt == MAX_RETRIES: + print(f" [error] Segment {i} failed after {MAX_RETRIES + 1} attempts") + sys.exit(1) + + if not ok: + dur = get_duration(out_path) + durations[f"audio-{num}.wav"] = round(dur, 2) + print(f" [warn] Segment {i} audio may be unreliable ({dur:.1f}s for {wc} words)") + +# --- Trim silence from each clip --- +MAX_SILENCE = 0.15 +SILENCE_THRESHOLD = "-40dB" +print() +print("=== Trimming silence ===") + +for i in range(len(slides)): + num = f"{i:02d}" + clip_path = os.path.join(output_dir, f"audio-{num}.wav") + + detect = subprocess.run([ + "ffmpeg", "-i", clip_path, "-af", + f"silencedetect=noise={SILENCE_THRESHOLD}:d=0.1", + "-f", "null", "-" + ], capture_output=True, text=True) + stderr = detect.stderr + + clip_dur = get_duration(clip_path) + + silence_starts = re.findall(r'silence_start: ([\d.]+)', stderr) + silence_ends = re.findall(r'silence_end: ([\d.]+)', stderr) + + trim_start = 0.0 + if silence_starts and float(silence_starts[0]) < 0.05: + if silence_ends: + leading_silence = float(silence_ends[0]) + if leading_silence > MAX_SILENCE: + trim_start = leading_silence - MAX_SILENCE + + trim_end = clip_dur + is_last = (i == len(slides) - 1) + if not is_last and silence_starts: + last_silence_start = float(silence_starts[-1]) + last_silence_is_trailing = True + for se in silence_ends: + se_val = float(se) + if se_val > last_silence_start and se_val < clip_dur - 0.05: + last_silence_is_trailing = False + break + if last_silence_is_trailing and last_silence_start > 0.05: + trailing_silence = clip_dur - last_silence_start + if trailing_silence > MAX_SILENCE: + trim_end = last_silence_start + MAX_SILENCE + + if trim_start > 0 or trim_end < clip_dur: + trimmed_path = clip_path + ".tmp.wav" + subprocess.run([ + "ffmpeg", "-y", "-i", clip_path, + "-ss", str(trim_start), "-to", str(trim_end), + "-c", "copy", trimmed_path + ], capture_output=True) + os.replace(trimmed_path, clip_path) + new_dur = trim_end - trim_start + durations[f"audio-{num}.wav"] = round(new_dur, 2) + print(f" audio-{num}.wav: {clip_dur:.1f}s -> {new_dur:.1f}s (trimmed {clip_dur - new_dur:.1f}s)") + else: + print(f" audio-{num}.wav: {clip_dur:.1f}s (no trim needed)") + +# --- Write durations.json --- +durations_path = os.path.join(output_dir, "durations.json") +with open(durations_path, "w") as f: + json.dump(durations, f, indent=2) + +total_dur = sum(durations.values()) +print(f"\n Wrote durations.json ({len(durations)} entries, {total_dur:.1f}s total)") + +print() +print("=== Done ===") +PYTHON_SCRIPT + +echo "" +echo "Output:" +ls -la "${OUTPUT_DIR}"/audio-*.wav 2>/dev/null || echo " (no files generated)" diff --git a/skills/pr-to-hyperframes/scripts/make-video.sh b/skills/pr-to-hyperframes/scripts/make-video.sh new file mode 100755 index 000000000..12b4a3bd9 --- /dev/null +++ b/skills/pr-to-hyperframes/scripts/make-video.sh @@ -0,0 +1,83 @@ +#!/bin/bash +# make-video.sh — Assemble walkthrough slides + audio into a final MP4. +# Fallback for when hyperframes render is not available. +# +# Usage: +# ./make-video.sh <slide-dir> <output.mp4> [outro-duration] +# +# Expects in <slide-dir>: +# slide-00.png, slide-01.png, ... (one per segment, including outro) +# audio-00.wav, audio-01.wav, ... (one per narrated segment) +# +# The last slide PNG without a matching audio WAV is the silent outro. +# +set -euo pipefail + +SLIDE_DIR="${1:?Usage: make-video.sh <slide-dir> <output.mp4> [outro-duration]}" +OUTPUT="${2:?Usage: make-video.sh <slide-dir> <output.mp4> [outro-duration]}" +OUTRO_DUR="${3:-3}" + +# Resolve relative paths +[[ "$SLIDE_DIR" != /* ]] && SLIDE_DIR="$(pwd)/$SLIDE_DIR" +[[ "$OUTPUT" != /* ]] && OUTPUT="$(pwd)/$OUTPUT" + +mkdir -p "$(dirname "$OUTPUT")" + +TMPDIR_WORK=$(mktemp -d) +trap "rm -rf $TMPDIR_WORK" EXIT + +echo "=== Assembling video ===" +echo " Slides: $SLIDE_DIR" +echo " Output: $OUTPUT" + +SLIDE_COUNT=$(ls "$SLIDE_DIR"/slide-*.png 2>/dev/null | wc -l | tr -d ' ') +AUDIO_COUNT=$(ls "$SLIDE_DIR"/audio-*.wav 2>/dev/null | wc -l | tr -d ' ') + +echo " Found $SLIDE_COUNT slides, $AUDIO_COUNT audio clips" +echo " Last slide (no audio) = outro (${OUTRO_DUR}s)" + +CONCAT_LIST="$TMPDIR_WORK/concat.txt" +> "$CONCAT_LIST" + +for i in $(seq 0 $((SLIDE_COUNT - 1))); do + NUM=$(printf "%02d" $i) + SLIDE="$SLIDE_DIR/slide-${NUM}.png" + AUDIO="$SLIDE_DIR/audio-${NUM}.wav" + SEGMENT="$TMPDIR_WORK/segment-${NUM}.mp4" + + if [ -f "$AUDIO" ]; then + DUR=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$AUDIO") + echo " segment-${NUM}: slide + audio (${DUR}s)" + + ffmpeg -y -loop 1 -i "$SLIDE" -i "$AUDIO" \ + -c:v libx264 -tune stillimage -pix_fmt yuv420p \ + -vf "scale=1600:900:force_original_aspect_ratio=decrease,pad=1600:900:(ow-iw)/2:(oh-ih)/2" \ + -c:a aac -b:a 192k -ar 48000 \ + -shortest -movflags +faststart \ + "$SEGMENT" 2>/dev/null + else + echo " segment-${NUM}: silent outro (${OUTRO_DUR}s)" + ffmpeg -y -loop 1 -i "$SLIDE" -f lavfi -i anullsrc=r=48000:cl=mono \ + -c:v libx264 -tune stillimage -pix_fmt yuv420p \ + -vf "scale=1600:900:force_original_aspect_ratio=decrease,pad=1600:900:(ow-iw)/2:(oh-ih)/2" \ + -c:a aac -b:a 192k -ar 48000 \ + -t "$OUTRO_DUR" -movflags +faststart \ + "$SEGMENT" 2>/dev/null + fi + + echo "file '$SEGMENT'" >> "$CONCAT_LIST" +done + +echo "" +echo " Concatenating ${SLIDE_COUNT} segments..." +ffmpeg -y -f concat -safe 0 -i "$CONCAT_LIST" \ + -c copy -movflags +faststart \ + "$OUTPUT" 2>/dev/null + +FINAL_DUR=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$OUTPUT") +FINAL_SIZE=$(ls -lh "$OUTPUT" | awk '{print $5}') +echo "" +echo "=== Done ===" +echo " Output: $OUTPUT" +echo " Duration: ${FINAL_DUR}s" +echo " Size: $FINAL_SIZE" diff --git a/skills/pr-to-hyperframes/video/.gitignore b/skills/pr-to-hyperframes/video/.gitignore new file mode 100644 index 000000000..4f4dd1832 --- /dev/null +++ b/skills/pr-to-hyperframes/video/.gitignore @@ -0,0 +1,6 @@ +# Generated by render.sh +index.html +assets/ +transcripts/ +renders/ +node_modules/ diff --git a/skills/pr-to-hyperframes/video/build.mjs b/skills/pr-to-hyperframes/video/build.mjs new file mode 100644 index 000000000..c8f07aa69 --- /dev/null +++ b/skills/pr-to-hyperframes/video/build.mjs @@ -0,0 +1,838 @@ +// build.mjs — Generate index.html for a PR walkthrough video from a manifest +// JSON file. Reads slide definitions and branding config, then emits one HTML +// composition with timed clips + a single GSAP timeline driving slide +// transitions, code-focus pans, and captions sourced from whisper word-level +// transcripts of each audio file. +// +// Usage: +// node build.mjs <path-to-manifest.json> +// +// Expects: +// - audio-NN.wav files alongside the manifest (referenced by slide.audio) +// - copies of those files in ./assets/audio-NN.wav (done by render.sh) +// - whisper transcripts in ./transcripts/audio-NN.json (done by render.sh) +// - manifest.branding for project-specific colors, fonts, and name +// +// Output: ./index.html (the hyperframes composition) + +import fs from "node:fs"; +import path from "node:path"; +import url from "node:url"; + +const __dirname = path.dirname(url.fileURLToPath(import.meta.url)); + +// --- Args -------------------------------------------------------------------- + +const manifestPath = process.argv[2]; +if (!manifestPath) { + console.error("Usage: node build.mjs <path-to-manifest.json>"); + process.exit(1); +} +const manifestAbs = path.resolve(manifestPath); +if (!fs.existsSync(manifestAbs)) { + console.error(`Manifest not found: ${manifestAbs}`); + process.exit(1); +} + +const manifest = JSON.parse(fs.readFileSync(manifestAbs, "utf8")); + +// --- Branding ---------------------------------------------------------------- + +const DEFAULT_BRANDING = { + name: "Project", + org: "", + repo: "", + logo: null, + colors: { + text: "#09090b", + background: "#ffffff", + accent: "#3b82f6", + caption: "#ffd800", + captionBg: "#09090b", + }, + fonts: { + body: "Geist", + mono: "Geist Mono", + }, +}; + +const brand = { ...DEFAULT_BRANDING, ...manifest.branding }; +brand.colors = { ...DEFAULT_BRANDING.colors, ...(manifest.branding?.colors || {}) }; +brand.fonts = { ...DEFAULT_BRANDING.fonts, ...(manifest.branding?.fonts || {}) }; + +const repoSlug = brand.repo || `${brand.org}/${brand.name}`; + +// --- Whisper transcripts ----------------------------------------------------- + +const TRANSCRIPTS_DIR = path.join(__dirname, "transcripts"); +const transcripts = new Map(); +if (fs.existsSync(TRANSCRIPTS_DIR)) { + for (const f of fs.readdirSync(TRANSCRIPTS_DIR)) { + if (!f.endsWith(".json")) continue; + const audioName = f.replace(/\.json$/, ".wav"); + transcripts.set(audioName, JSON.parse(fs.readFileSync(path.join(TRANSCRIPTS_DIR, f), "utf8"))); + } +} + +function chunkTranscript(words, { maxWords = 7, gapThreshold = 0.45 } = {}) { + const chunks = []; + let current = []; + for (const w of words) { + if (current.length === 0) { + current.push(w); + continue; + } + const prev = current[current.length - 1]; + const gap = w.start - prev.end; + if (gap > gapThreshold || current.length >= maxWords) { + chunks.push(current); + current = [w]; + } else { + current.push(w); + } + } + if (current.length) chunks.push(current); + return chunks.map((group) => ({ + text: group.map((g) => g.text).join(" "), + start: group[0].start, + end: group[group.length - 1].end, + })); +} + +function makeCaptions(audioFile, audioStart) { + const words = transcripts.get(audioFile); + if (!words) return []; + const chunks = chunkTranscript(words); + return chunks.map((c) => ({ + text: c.text, + start: audioStart + c.start, + duration: Math.max(0.4, c.end - c.start), + })); +} + +// --- Cumulative timing ------------------------------------------------------- + +let cursor = 0; +const timed = manifest.slides.map((slide, i) => { + const start = cursor; + const duration = slide.durationInSeconds; + cursor += duration; + return { slide, start, duration, i }; +}); +const totalDuration = cursor; + +// --- HTML escape ------------------------------------------------------------- + +function esc(s) { + return String(s).replace(/&/g, "&").replace(/</g, "<").replace(/>/g, ">"); +} + +// --- Light syntax highlighting ----------------------------------------------- + +const KEYWORDS = new Set([ + "abstract", + "as", + "async", + "await", + "boolean", + "break", + "case", + "catch", + "class", + "const", + "constructor", + "continue", + "default", + "delete", + "do", + "else", + "enum", + "export", + "extends", + "false", + "finally", + "for", + "from", + "function", + "get", + "if", + "implements", + "import", + "in", + "instanceof", + "interface", + "is", + "let", + "new", + "null", + "number", + "of", + "override", + "private", + "protected", + "public", + "readonly", + "return", + "set", + "static", + "string", + "super", + "switch", + "this", + "throw", + "true", + "try", + "type", + "typeof", + "undefined", + "void", + "while", + "yield", + "any", + "never", + "unknown", +]); + +function highlightLine(line) { + const re = + /(\/\/.*$)|(\/\*[\s\S]*?\*\/)|('(?:\\.|[^'\\])*')|("(?:\\.|[^"\\])*")|(`(?:\\.|[^`\\])*`)|(\b\d+(?:\.\d+)?\b)|(\b[A-Za-z_$][\w$]*\b)|(@\w+)/g; + let out = ""; + let last = 0; + for (const m of line.matchAll(re)) { + out += esc(line.slice(last, m.index)); + const [tok, comment, block, sq, dq, bt, num, ident, decorator] = m; + if (comment || block) out += `<span class="t-c">${esc(tok)}</span>`; + else if (sq || dq || bt) out += `<span class="t-s">${esc(tok)}</span>`; + else if (num) out += `<span class="t-n">${esc(tok)}</span>`; + else if (decorator) out += `<span class="t-d">${esc(tok)}</span>`; + else if (ident) { + if (KEYWORDS.has(ident)) out += `<span class="t-k">${esc(ident)}</span>`; + else if (/^[A-Z]/.test(ident)) out += `<span class="t-t">${esc(ident)}</span>`; + else out += esc(ident); + } + last = m.index + tok.length; + } + out += esc(line.slice(last)); + return out || " "; +} + +// --- Logo -------------------------------------------------------------------- + +function renderLogo() { + if (brand.logo) { + const ext = path.extname(brand.logo).toLowerCase(); + if (ext === ".svg") { + const svgPath = path.resolve(brand.logo); + if (fs.existsSync(svgPath)) { + return fs.readFileSync(svgPath, "utf8"); + } + } + return `<img src="assets/${path.basename(brand.logo)}" class="project-logo" alt="${esc(brand.name)}" />`; + } + return `<span class="project-name-text">${esc(brand.name)}</span>`; +} + +// --- Slide renderers --------------------------------------------------------- + +function slideAttrs({ start, duration, i }, extra = "") { + const initialStyle = i === 0 ? ` style="opacity: 1"` : ""; + return `class="clip slide" data-start="${start}" data-duration="${duration}" data-track-index="2" id="slide-${i}"${initialStyle} ${extra}`; +} + +function renderIntro({ slide, start, duration, i }) { + const title = slide.title || `PR #${manifest.pr}`; + const cleanTitle = title.replace(/\s*#\d+\s*$/, "").trim(); + const words = cleanTitle.split(/\s+/); + const highlightIndex = Math.max(0, words.length - 2); + const highlighted = words + .map((w, n) => (n === highlightIndex ? `<span class="special">${esc(w)}</span>` : esc(w))) + .join(" "); + + return ` +<div ${slideAttrs({ start, duration, i })}> + <div class="slide-bg"></div> + <div class="slide-stage stage--intro"> + <div class="eyebrow"> + <span class="pill">Pull Request</span> + <span>${esc(repoSlug)} · #${manifest.pr}</span> + </div> + <h1 class="title-xl">${highlighted}</h1> + ${slide.subtitle ? `<p class="subtitle-lg">${esc(slide.subtitle)}</p>` : ""} + <div class="meta-row"> + ${slide.date ? `<span>${esc(slide.date)}</span><span class="dot"></span>` : ""} + <span>Walkthrough</span> + </div> + </div> +</div>`; +} + +function renderSegment({ slide, start, duration, i }) { + return ` +<div ${slideAttrs({ start, duration, i })}> + <div class="slide-bg"></div> + <div class="slide-stage stage--segment"> + <div class="seg-rule"></div> + <h2 class="title-segment">${esc(slide.title || "")}</h2> + <div class="seg-rule"></div> + </div> +</div>`; +} + +function renderCode({ slide, start, duration, i }) { + const lines = (slide.code || "").split("\n"); + const focus = slide.focus || [{ line: 0, at: 0 }]; + const codeLines = lines + .map( + (l, n) => + `<div class="cl" data-line="${n}"><span class="ln">${String(n + 1).padStart(2, " ")}</span><span class="lc">${highlightLine(l)}</span></div>`, + ) + .join(""); + const focusJson = JSON.stringify(focus); + return ` +<div ${slideAttrs({ start, duration, i }, `data-focus='${focusJson}' data-kind="code"`)}> + <div class="slide-bg"></div> + <div class="slide-stage stage--code"> + <div class="file-bar"> + <span class="lang-badge">${esc(slide.language || "ts")}</span> + <span class="file-name">${esc(slide.filename || "")}</span> + <span class="slide-title">${esc(slide.title || "")}</span> + </div> + <div class="code-viewport"> + <div class="code-scroller" id="code-scroll-${i}"> + ${codeLines} + </div> + <div class="code-fade code-fade--top"></div> + <div class="code-fade code-fade--bottom"></div> + </div> + </div> +</div>`; +} + +function renderDiffLines(diff) { + const lines = diff.split("\n"); + return lines + .map((l) => { + let cls = "dl"; + let mark = ""; + if (l.startsWith("@@")) { + cls += " dl-hunk"; + mark = "⋯"; + } else if (l.startsWith("+++") || l.startsWith("---")) { + cls += " dl-meta"; + } else if (l.startsWith("+")) { + cls += " dl-add"; + mark = "+"; + } else if (l.startsWith("-")) { + cls += " dl-del"; + mark = "−"; + } else { + mark = " "; + } + const body = l.startsWith("+") || l.startsWith("-") ? l.slice(1) : l; + return `<div class="${cls}"><span class="dm">${esc(mark)}</span><span class="dc">${highlightLine(body)}</span></div>`; + }) + .join(""); +} + +function renderDiff({ slide, start, duration, i }) { + return ` +<div ${slideAttrs({ start, duration, i }, `data-kind="diff"`)}> + <div class="slide-bg"></div> + <div class="slide-stage stage--code"> + <div class="file-bar"> + <span class="lang-badge">${esc(slide.language || "ts")}</span> + <span class="file-name">${esc(slide.filename || "")}</span> + <span class="slide-title">${esc(slide.title || "")}</span> + </div> + <div class="code-viewport"> + <div class="code-scroller" id="code-scroll-${i}"> + ${renderDiffLines(slide.diff || "")} + </div> + <div class="code-fade code-fade--top"></div> + <div class="code-fade code-fade--bottom"></div> + </div> + </div> +</div>`; +} + +function renderText({ slide, start, duration, i }) { + return ` +<div ${slideAttrs({ start, duration, i })}> + <div class="slide-bg"></div> + <div class="slide-stage stage--intro"> + <div class="eyebrow"> + <span class="pill">Summary</span> + <span>${esc(repoSlug)} · #${manifest.pr}</span> + </div> + <h1 class="title-xl">${esc(slide.title || "")}</h1> + ${slide.subtitle ? `<p class="subtitle-lg">${esc(slide.subtitle)}</p>` : ""} + </div> +</div>`; +} + +function renderList({ slide, start, duration, i }) { + const items = (slide.items || []) + .map( + (it, n) => + `<li class="list-item"><span class="list-num">${n + 1}.</span><span>${esc(it)}</span></li>`, + ) + .join(""); + return ` +<div ${slideAttrs({ start, duration, i })}> + <div class="slide-bg"></div> + <div class="slide-stage stage--list"> + <h2 class="title-list">${esc(slide.title || "")}</h2> + <ol class="list-items">${items}</ol> + </div> +</div>`; +} + +function renderImage({ slide, start, duration, i }) { + return ` +<div ${slideAttrs({ start, duration, i })}> + <div class="slide-bg"></div> + <div class="slide-stage stage--image"> + <img class="image-fill" src="assets/${esc(slide.src || "")}" alt=""/> + </div> +</div>`; +} + +function renderOutro({ start, duration, i }) { + return ` +<div ${slideAttrs({ start, duration, i })}> + <div class="slide-bg"></div> + <div class="slide-stage stage--outro"> + <div class="brand-big">${renderLogo()}</div> + <div class="outro-meta">PR Walkthrough · #${manifest.pr}</div> + <div class="outro-attribution">Made with HyperFrames</div> + </div> +</div>`; +} + +const RENDERERS = { + intro: renderIntro, + segment: renderSegment, + code: renderCode, + diff: renderDiff, + text: renderText, + list: renderList, + image: renderImage, + outro: renderOutro, +}; + +const slidesHtml = timed + .map((t) => { + const r = RENDERERS[t.slide.type]; + if (!r) throw new Error(`Unknown slide type: ${t.slide.type}`); + return r(t); + }) + .join(""); + +// --- Audio elements ---------------------------------------------------------- + +const audioHtml = timed + .filter(({ slide }) => slide.audio) + .map( + ({ slide, start, i }) => + `<audio class="clip" data-start="${start}" data-duration="${slide.durationInSeconds}" data-track-index="100" data-volume="1" src="assets/${slide.audio}" id="audio-${i}"></audio>`, + ) + .join("\n"); + +// --- Captions --------------------------------------------------------------- + +const allCaptions = []; +for (const { slide, start } of timed) { + if (!slide.audio) continue; + const caps = makeCaptions(slide.audio, start); + allCaptions.push(...caps); +} + +const CAPTION_GAP = 0.002; +const captionsHtml = allCaptions + .map((c, k) => { + const dur = Math.max(0.05, c.duration - CAPTION_GAP); + return `<div class="clip caption" data-start="${c.start.toFixed(3)}" data-duration="${dur.toFixed(3)}" data-track-index="${60 + (k % 4)}" id="cap-${k}">${esc(c.text)}</div>`; + }) + .join("\n"); + +// --- Timeline JS ------------------------------------------------------------- + +const timelineJs = []; + +for (const { slide, start, duration, i } of timed) { + const fadeIn = 0.4; + const fadeOut = 0.4; + if (i === 0) { + timelineJs.push(`tl.set("#slide-${i}", { opacity: 1 }, ${start});`); + } else { + timelineJs.push( + `tl.fromTo("#slide-${i}", { opacity: 0 }, { opacity: 1, duration: ${fadeIn}, ease: "power2.out" }, ${start});`, + ); + } + timelineJs.push( + `tl.to("#slide-${i}", { opacity: 0, duration: ${fadeOut}, ease: "power2.in" }, ${start + duration - fadeOut});`, + ); + timelineJs.push(`tl.set("#slide-${i}", { opacity: 0 }, ${start + duration});`); + + if ((slide.type === "code" || slide.type === "diff") && slide.focus && slide.focus.length) { + const lineHeight = 36; + const focus = slide.focus; + const targets = focus.map((f) => ({ + t: start + (f.at || 0) * duration, + y: -Math.max(0, f.line - 4) * lineHeight, + })); + timelineJs.push(`tl.set("#code-scroll-${i}", { y: ${targets[0].y} }, ${start});`); + for (let k = 1; k < targets.length; k++) { + const prev = targets[k - 1]; + const cur = targets[k]; + const dur = Math.max(0.5, cur.t - prev.t); + timelineJs.push( + `tl.to("#code-scroll-${i}", { y: ${cur.y}, duration: ${dur}, ease: "power1.inOut" }, ${prev.t});`, + ); + } + } +} + +// --- Font imports ------------------------------------------------------------ + +const fontFamilies = [brand.fonts.body, brand.fonts.mono].filter(Boolean); +const fontImport = fontFamilies + .map((f) => { + const encoded = f.replace(/\s+/g, "+"); + return `${encoded}:wght@400;500;600;700`; + }) + .join("&family="); + +// --- Final HTML -------------------------------------------------------------- + +const html = `<!doctype html> +<html lang="en"> + <head> + <meta charset="UTF-8" /> + <meta name="viewport" content="width=1920, height=1080" /> + <script src="https://cdn.jsdelivr.net/npm/gsap@3.14.2/dist/gsap.min.js"></script> + <link rel="preconnect" href="https://fonts.googleapis.com"> + <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin> + <link href="https://fonts.googleapis.com/css2?family=${fontImport}&display=swap" rel="stylesheet"> + <style> + * { margin: 0; padding: 0; box-sizing: border-box; } + html, body { + width: 1920px; height: 1080px; overflow: hidden; + background: ${brand.colors.background}; + font-family: "${brand.fonts.body}", -apple-system, BlinkMacSystemFont, "Inter", "Segoe UI", sans-serif; + font-feature-settings: "ss01", "ss02"; + color: ${brand.colors.text}; + -webkit-font-smoothing: antialiased; + } + + .slide { position: absolute; inset: 0; opacity: 0; } + .slide-bg { position: absolute; inset: 0; background: ${brand.colors.background}; } + .slide-stage { position: absolute; inset: 0; } + + /* Hero / intro / text */ + .stage--intro { + display: flex; flex-direction: column; + justify-content: center; padding: 0 160px; + } + .eyebrow { + display: inline-flex; align-items: center; gap: 16px; + font-size: 22px; font-weight: 500; + color: #71717a; margin-bottom: 40px; + } + .eyebrow .pill { + background: ${brand.colors.accent}1a; + border: 1px solid ${brand.colors.accent}59; + padding: 6px 14px; border-radius: 6px; + color: ${brand.colors.accent}; letter-spacing: 0.04em; + font-weight: 600; font-size: 18px; + } + .title-xl { + font-size: 132px; font-weight: 700; line-height: 1.02; + letter-spacing: -0.035em; max-width: 1600px; + color: ${brand.colors.text}; + } + .special { + background: ${brand.colors.accent}47; + padding: 0 0.08em; + border-radius: 6px; + box-decoration-break: clone; + -webkit-box-decoration-break: clone; + } + .subtitle-lg { + margin-top: 48px; font-size: 34px; font-weight: 400; + color: #52525b; max-width: 1500px; line-height: 1.4; + letter-spacing: -0.005em; + } + .meta-row { + margin-top: 72px; display: flex; align-items: center; gap: 28px; + font-size: 22px; color: #71717a; + font-family: "${brand.fonts.mono}", ui-monospace, SFMono-Regular, "SF Mono", Menlo, monospace; + letter-spacing: 0.02em; + } + .meta-row .dot { width: 4px; height: 4px; background: #d4d4d8; border-radius: 50%; } + + /* Segment slide */ + .stage--segment { + display: flex; flex-direction: column; + justify-content: center; align-items: center; + padding: 0 160px; gap: 56px; + } + .seg-rule { + width: 96px; height: 2px; + background: ${brand.colors.accent}; + border-radius: 1px; + } + .title-segment { + font-size: 88px; font-weight: 600; letter-spacing: -0.03em; + text-align: center; max-width: 1600px; line-height: 1.08; + color: ${brand.colors.text}; + } + + /* Code / diff slide */ + .stage--code { + display: flex; flex-direction: column; + padding: 72px 96px 120px; + } + .file-bar { + display: flex; align-items: center; gap: 20px; + font-size: 22px; color: #71717a; margin-bottom: 28px; + padding-bottom: 24px; + border-bottom: 1px solid #e4e4e7; + } + .lang-badge { + background: #f4f4f5; + border: 1px solid #e4e4e7; + color: #52525b; font-weight: 600; + padding: 5px 10px; border-radius: 4px; + font-size: 14px; text-transform: uppercase; letter-spacing: 0.10em; + font-family: "${brand.fonts.mono}", ui-monospace, SFMono-Regular, "SF Mono", Menlo, monospace; + } + .file-name { + font-family: "${brand.fonts.mono}", ui-monospace, SFMono-Regular, "SF Mono", Menlo, monospace; + color: ${brand.colors.text}; font-size: 22px; font-weight: 500; + } + .slide-title { + margin-left: auto; color: #71717a; + font-size: 22px; font-weight: 500; + letter-spacing: -0.005em; + } + .code-viewport { + position: relative; flex: 1; + overflow: hidden; + border-radius: 12px; + background: #f6f8fa; + border: 1px solid #d0d7de; + } + .code-scroller { + will-change: transform; + padding: 28px 0; + font-family: "${brand.fonts.mono}", ui-monospace, SFMono-Regular, "SF Mono", Menlo, monospace; + font-size: 22px; line-height: 36px; + color: #24292f; font-weight: 500; + } + .code-fade { position: absolute; left: 0; right: 0; height: 64px; pointer-events: none; } + .code-fade--top { top: 0; background: linear-gradient(180deg, #f6f8fa 0%, rgba(246,248,250,0) 100%); } + .code-fade--bottom { bottom: 0; background: linear-gradient(0deg, #f6f8fa 0%, rgba(246,248,250,0) 100%); } + + .cl { display: flex; padding: 0 32px; white-space: pre; } + .cl .ln { color: #afb8c1; width: 56px; flex-shrink: 0; text-align: right; padding-right: 24px; user-select: none; } + .cl .lc { flex: 1; } + + .dl { display: flex; padding: 0 32px; white-space: pre; } + .dl .dm { width: 28px; flex-shrink: 0; color: #afb8c1; text-align: center; user-select: none; font-weight: 700; } + .dl .dc { flex: 1; } + .dl-add { background: #dafbe1; } + .dl-add .dm { color: #1a7f37; } + .dl-add .dc { color: #1f2328; } + .dl-del { background: #ffebe9; } + .dl-del .dm { color: #cf222e; } + .dl-del .dc { color: #1f2328; } + .dl-hunk { color: #57606a; background: #ddf4ff; } + .dl-meta { color: #6e7781; opacity: 0.7; } + + /* GitHub Light syntax tokens */ + .t-c { color: #6e7781; font-style: italic; } + .t-s { color: #0a3069; } + .t-n { color: #0550ae; } + .t-k { color: #cf222e; } + .t-t { color: #1f883d; } + .t-d { color: #8250df; } + + /* List slide */ + .stage--list { + display: flex; flex-direction: column; + justify-content: center; align-items: center; + padding: 0 160px; gap: 64px; + } + .title-list { + font-size: 72px; font-weight: 600; letter-spacing: -0.025em; + color: ${brand.colors.text}; + } + .list-items { + list-style: none; display: flex; flex-direction: column; + gap: 28px; font-size: 44px; color: ${brand.colors.text}; + } + .list-item { display: flex; gap: 24px; align-items: baseline; } + .list-num { color: ${brand.colors.accent}; font-weight: 700; min-width: 64px; text-align: right; } + + /* Image slide */ + .stage--image { + display: flex; align-items: center; justify-content: center; + padding: 64px 96px; + } + .image-fill { + width: 100%; height: 100%; + object-fit: contain; + } + + /* Outro */ + .stage--outro { + display: flex; flex-direction: column; + justify-content: center; align-items: center; gap: 56px; + } + .brand-big { + display: flex; align-items: center; justify-content: center; + color: ${brand.colors.text}; + } + .brand-big .project-logo { max-width: 720px; max-height: 200px; } + .project-name-text { + font-size: 120px; font-weight: 700; letter-spacing: -0.03em; + } + .outro-meta { + font-size: 18px; color: #71717a; + letter-spacing: 0.16em; text-transform: uppercase; + font-family: "${brand.fonts.mono}", ui-monospace, SFMono-Regular, "SF Mono", Menlo, monospace; + } + .outro-attribution { + font-size: 14px; color: #a1a1aa; + letter-spacing: 0.12em; text-transform: uppercase; + font-family: "${brand.fonts.mono}", ui-monospace, SFMono-Regular, "SF Mono", Menlo, monospace; + } + + /* Footer */ + .footer-bar { + position: absolute; bottom: 32px; left: 96px; right: 96px; + display: flex; + justify-content: space-between; + align-items: center; + font-size: 14px; color: #a1a1aa; + letter-spacing: 0.10em; text-transform: uppercase; + font-family: "${brand.fonts.mono}", ui-monospace, SFMono-Regular, "SF Mono", Menlo, monospace; + z-index: 10; + } + .brand { + display: flex; align-items: center; gap: 12px; + color: ${brand.colors.text}; font-weight: 600; + font-family: "${brand.fonts.body}", sans-serif; + letter-spacing: -0.01em; text-transform: none; + font-size: 18px; + } + .footer-meta { color: #a1a1aa; } + + /* Captions */ + .caption-stage { + position: absolute; + bottom: 32px; + left: 50%; + transform: translateX(-50%); + width: 100%; + max-width: 1700px; + z-index: 9; + text-align: center; + pointer-events: none; + } + .caption { + position: absolute; + bottom: 0; + left: 50%; + transform: translateX(-50%); + width: max-content; + max-width: 1700px; + background: ${brand.colors.captionBg}; + color: ${brand.colors.caption}; + padding: 14px 24px; + border-radius: 10px; + font-family: "${brand.fonts.body}", sans-serif; + font-size: 44px; + font-weight: 700; + letter-spacing: -0.02em; + text-transform: none; + text-align: center; + line-height: 1.15; + white-space: normal; + text-wrap: balance; + -webkit-text-wrap: balance; + } + + /* Pie progress indicator */ + .progress-pie { + position: absolute; + top: 40px; right: 96px; + width: 22px; height: 22px; + z-index: 11; + opacity: 0.85; + } + .progress-pie svg { width: 100%; height: 100%; display: block; } + </style> + </head> + <body> + <div id="root" + data-composition-id="main" + data-start="0" + data-duration="${totalDuration}" + data-width="1920" + data-height="1080"> + +${slidesHtml} + + <div class="caption-stage clip" data-start="0" data-duration="${totalDuration}" data-track-index="49" id="caption-stage"> +${captionsHtml} + </div> + + <div class="footer-bar clip" data-start="0" data-duration="${totalDuration}" data-track-index="50" id="footer"> + <div class="brand"><span>${esc(brand.name)}</span></div> + <div class="footer-meta">PR #${manifest.pr}</div> + </div> + + <div class="progress-pie clip" data-start="0" data-duration="${totalDuration}" data-track-index="51" id="pie"> + <svg viewBox="0 0 64 64"> + <circle cx="32" cy="32" r="24" fill="none" stroke="#e4e4e7" stroke-width="8"/> + <circle id="pie-fill" cx="32" cy="32" r="24" fill="none" stroke="${brand.colors.accent}" stroke-width="8" + stroke-dasharray="150.796" stroke-dashoffset="150.796" stroke-linecap="butt" + transform="rotate(-90 32 32)"/> + </svg> + </div> + +${audioHtml} + + </div> + + <script> + window.__timelines = window.__timelines || {}; + const tl = gsap.timeline({ paused: true }); + +${timelineJs.map((s) => "\t\t\t" + s).join("\n")} + + // Pie indicator + tl.fromTo("#pie-fill", + { attr: { "stroke-dashoffset": 150.796 } }, + { attr: { "stroke-dashoffset": 0 }, duration: ${totalDuration}, ease: "none" }, + 0); + + tl.set("#footer", { opacity: 1 }, 0); + tl.to("#footer", { opacity: 0, duration: 0.6 }, ${totalDuration - 0.6}); + + window.__timelines["main"] = tl; + </script> + </body> +</html> +`; + +fs.writeFileSync(path.join(__dirname, "index.html"), html); + +console.log(`Wrote ${path.relative(process.cwd(), path.join(__dirname, "index.html"))}`); +console.log(` ${timed.length} slides, ${totalDuration.toFixed(2)}s total`); +console.log( + ` ${timed.filter((t) => t.slide.audio).length} audio tracks, ${allCaptions.length} captions`, +); +console.log(` Branding: ${brand.name} (${repoSlug})`); diff --git a/skills/pr-to-hyperframes/video/hyperframes.json b/skills/pr-to-hyperframes/video/hyperframes.json new file mode 100644 index 000000000..5fb1d6d87 --- /dev/null +++ b/skills/pr-to-hyperframes/video/hyperframes.json @@ -0,0 +1,9 @@ +{ + "$schema": "https://hyperframes.heygen.com/schema/hyperframes.json", + "registry": "https://raw.githubusercontent.com/heygen-com/hyperframes/main/registry", + "paths": { + "blocks": "compositions", + "components": "compositions/components", + "assets": "assets" + } +} diff --git a/skills/pr-to-hyperframes/video/meta.json b/skills/pr-to-hyperframes/video/meta.json new file mode 100644 index 000000000..e59f77d99 --- /dev/null +++ b/skills/pr-to-hyperframes/video/meta.json @@ -0,0 +1,4 @@ +{ + "id": "pr-walkthrough", + "name": "pr-walkthrough" +} diff --git a/skills/pr-to-hyperframes/video/render.sh b/skills/pr-to-hyperframes/video/render.sh new file mode 100755 index 000000000..ec5482912 --- /dev/null +++ b/skills/pr-to-hyperframes/video/render.sh @@ -0,0 +1,155 @@ +#!/bin/bash +# render.sh — Render a pr-walkthrough video from a manifest JSON file using +# hyperframes. The pipeline: +# 1. Copy referenced audio/image files into ./assets/ +# 2. Run whisper transcription on each audio file → ./transcripts/ +# 3. Run build.mjs to generate index.html +# 4. Lint + render via npx hyperframes (1080p/30fps) +# 5. Downscale + recompress to 1280×720 with ffmpeg → final MP4 +# +# Usage: +# ./render.sh <manifest.json> <output.mp4> +# +set -euo pipefail + +MANIFEST="${1:?Usage: render.sh <manifest.json> <output.mp4>}" +OUTPUT="${2:?Usage: render.sh <manifest.json> <output.mp4>}" + +# Resolve relative paths +[[ "$MANIFEST" != /* ]] && MANIFEST="$(pwd)/$MANIFEST" +[[ "$OUTPUT" != /* ]] && OUTPUT="$(pwd)/$OUTPUT" + +if [ ! -f "$MANIFEST" ]; then + echo "Error: Manifest not found: $MANIFEST" >&2 + exit 1 +fi + +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +ASSETS_DIR="$SCRIPT_DIR/assets" +TRANSCRIPTS_DIR="$SCRIPT_DIR/transcripts" +RENDERS_DIR="$SCRIPT_DIR/renders" +MANIFEST_DIR="$(dirname "$MANIFEST")" + +mkdir -p "$ASSETS_DIR" "$TRANSCRIPTS_DIR" "$RENDERS_DIR" +mkdir -p "$(dirname "$OUTPUT")" + +echo "=== Rendering walkthrough video ===" +echo " Manifest: $MANIFEST" +echo " Output: $OUTPUT" +echo "" + +# --- 1. Extract referenced audio/image filenames from the manifest --------- +REFERENCED_FILES=$(python3 -c " +import json, sys +with open(sys.argv[1]) as f: + m = json.load(f) +files = set() +for s in m['slides']: + if 'audio' in s and s['audio']: files.add(s['audio']) + if 'src' in s and s['src']: files.add(s['src']) +# Copy logo if specified in branding +b = m.get('branding', {}) +if b.get('logo'): + files.add(b['logo']) +print('\n'.join(sorted(files))) +" "$MANIFEST") + +# --- 2. Copy referenced files into ./assets/ ------------------------------- +echo " [1/5] Copying assets..." +rm -rf "$ASSETS_DIR" +mkdir -p "$ASSETS_DIR" +for FILE in $REFERENCED_FILES; do + SRC="$MANIFEST_DIR/$FILE" + if [ -f "$SRC" ]; then + cp "$SRC" "$ASSETS_DIR/$FILE" + else + echo " Warning: Referenced file not found: $SRC" >&2 + fi +done + +# --- 3. Whisper transcribe each audio file (idempotent per file) ---------- +echo " [2/5] Transcribing audio (whisper)..." +for WAV in "$ASSETS_DIR"/*.wav; do + [ -f "$WAV" ] || continue + BASE=$(basename "$WAV" .wav) + OUT_JSON="$TRANSCRIPTS_DIR/$BASE.json" + if [ -f "$OUT_JSON" ] && [ "$OUT_JSON" -nt "$WAV" ]; then + continue + fi + echo " transcribing $BASE..." + (cd "$SCRIPT_DIR" && npx --yes hyperframes transcribe "assets/$BASE.wav" --json >/dev/null) + if [ -f "$SCRIPT_DIR/transcript.json" ]; then + mv "$SCRIPT_DIR/transcript.json" "$OUT_JSON" + fi +done + +# --- 4. Generate index.html ------------------------------------------------ +echo " [3/5] Building composition..." +(cd "$SCRIPT_DIR" && node build.mjs "$MANIFEST") + +# --- 5. Lint (warn but don't fail) ------------------------------------------ +(cd "$SCRIPT_DIR" && npx --yes hyperframes lint) || { + echo " Warning: lint reported issues (continuing)" >&2 +} + +# --- 6. Render at 1080p/30fps with hyperframes ----------------------------- +echo " [4/5] Rendering 1080p frames..." +RENDER_NAME="walkthrough-$$" +TEMP_RENDER="$RENDERS_DIR/$RENDER_NAME.mp4" +rm -f "$TEMP_RENDER" +(cd "$SCRIPT_DIR" && npx --yes hyperframes render \ + -q draft --crf 30 \ + -o "renders/$RENDER_NAME.mp4") >/dev/null 2>&1 & +RENDER_PID=$! + +PREV_SIZE=-1 +STABLE=0 +while kill -0 "$RENDER_PID" 2>/dev/null; do + if [ -f "$TEMP_RENDER" ]; then + SIZE=$(stat -f '%z' "$TEMP_RENDER" 2>/dev/null || stat -c '%s' "$TEMP_RENDER" 2>/dev/null || echo 0) + if [ "$SIZE" -gt 0 ] && [ "$SIZE" -eq "$PREV_SIZE" ]; then + STABLE=$((STABLE + 1)) + if [ "$STABLE" -ge 3 ]; then break; fi + else + STABLE=0 + fi + PREV_SIZE=$SIZE + fi + sleep 2 +done + +pkill -P "$RENDER_PID" 2>/dev/null || true +kill "$RENDER_PID" 2>/dev/null || true +wait "$RENDER_PID" 2>/dev/null || true + +if [ ! -f "$TEMP_RENDER" ]; then + echo "Error: hyperframes did not produce $TEMP_RENDER" >&2 + exit 1 +fi + +# --- 7. Downscale 1080p → 720p, recompress for smaller file --------------- +echo " [5/5] Downscaling to 720p / 30fps..." +ffmpeg -y -i "$TEMP_RENDER" \ + -vf "scale=1280:720:flags=lanczos,fps=30" \ + -c:v libx264 -preset slow -crf 26 -pix_fmt yuv420p \ + -c:a aac -b:a 96k -ar 48000 \ + -movflags +faststart \ + "$OUTPUT" 2>/dev/null + +rm -f "$TEMP_RENDER" + +# --- Report ------------------------------------------------------------------ +if [ -f "$OUTPUT" ]; then + FINAL_DUR=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$OUTPUT" 2>/dev/null || echo "?") + FINAL_SIZE=$(ls -lh "$OUTPUT" | awk '{print $5}') + echo "" + echo "=== Done ===" + echo " Output: $OUTPUT" + echo " Resolution: 1280×720" + echo " FPS: 30" + echo " Duration: ${FINAL_DUR}s" + echo " Size: $FINAL_SIZE" +else + echo "Error: render failed — output file not created" >&2 + exit 1 +fi From 6180030e568f6ef0576b680d62e64abdd158d3cd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miguel=20=C3=81ngel?= <miguel.sierra@heygen.com> Date: Fri, 15 May 2026 14:03:38 -0700 Subject: [PATCH 3/3] =?UTF-8?q?fix(skills):=20address=20test=20feedback=20?= =?UTF-8?q?=E2=80=94=20clarify=20manifest=20schema,=20branding,=20duration?= =?UTF-8?q?=20heuristic?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Based on testing the skill with subagents on HyperFrames PR #874 and Pacific PR #27684: - Emphasize that manifest schema must match build.mjs exactly (agents were inventing their own slide structures) - Add duration estimation formula: ~2.5 words/second - Specify which design.md tokens to extract for branding - Add GitHub org avatar as fallback logo source - Adjust narration length guidance for small vs large PRs --- skills/pr-to-hyperframes/SKILL.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/skills/pr-to-hyperframes/SKILL.md b/skills/pr-to-hyperframes/SKILL.md index bf4fc4769..b668af8ca 100644 --- a/skills/pr-to-hyperframes/SKILL.md +++ b/skills/pr-to-hyperframes/SKILL.md @@ -21,9 +21,9 @@ Run commands that reference `./scripts` or `./video` from this skill directory. **The skill auto-detects branding from the repo.** It never hardcodes project-specific colors, logos, or names. At the start of every run, resolve branding: 1. **Project name** — read `package.json` → `name` field (strip `@scope/` prefix). Fallback: git remote repo name. Fallback: directory name. -2. **Colors** — read `design.md` or `DESIGN.md` if it exists (check both casings). Extract primary color, background color, accent color. Fallback: neutral palette (`#09090b` text on `#ffffff` background, `#3b82f6` accent). -3. **Fonts** — from `design.md` if present. Fallback: `"Geist"` for body, `"Geist Mono"` for code. -4. **Logo** — look for `logo.svg` or `logo.png` in repo root, `public/`, `assets/`, `.github/`. If found, use it in intro/outro. If not found, use the project name as text. +2. **Colors** — read `design.md` or `DESIGN.md` if it exists (check both casings). Extract these specific tokens: `text` (body text color), `background` (page/slide background), `accent` (primary brand color for highlights/pills/progress). Map the closest values you find — design files vary in format. Fallback: neutral palette (`#09090b` text on `#ffffff` background, `#3b82f6` accent). +3. **Fonts** — from `design.md` if present. Extract the body/display font and the monospace/code font. Fallback: `"Geist"` for body, `"Geist Mono"` for code. +4. **Logo** — look for `logo.svg` or `logo.png` in repo root, `public/`, `assets/`, `.github/`. If not found, try `gh api orgs/<org> --jq .avatar_url` to get the org's GitHub avatar. If nothing found, use the project name as text. 5. **Repo identifier** — parse `git remote get-url origin` for the `org/repo` slug (e.g., `acme/widget`). Pass these values to `build.mjs` via a `branding` key in the manifest: @@ -179,6 +179,8 @@ Generate per-segment audio clips with one TTS call per segment: The manifest is a JSON file that describes every slide in the video. It bridges the narration/audio step and the hyperframes renderer. +**The manifest schema below is the exact format `build.mjs` expects.** Do not invent your own slide structure, nest content in sub-objects, or rename fields. Copy the schema exactly — `build.mjs` reads `slide.type`, `slide.title`, `slide.diff`, `slide.code`, `slide.filename`, `slide.language`, `slide.audio`, `slide.durationInSeconds`, `slide.focus`, `slide.items`, `slide.src`, `slide.subtitle`, and `slide.date` as top-level fields on each slide object. + Read the `durations.json` from step 3 to get the duration (in seconds) for each audio clip. Then write a `manifest.json` alongside the audio files: ```json @@ -436,7 +438,8 @@ Manifest slide type: `outro` with `durationInSeconds: 3`. - **Write as the author.** "So the main thing here is..." or "The tricky part was..." are fine. - **Avoid redundancy** between intro and first content segment. - **Mention files that aren't shown.** If a PR touches 15 files but only 6 are interesting, briefly acknowledge the others. -- Aim for **5-7 minutes** total narration. +- **Duration estimation:** professional narration pace is ~2.5 words/second. Count the words in each segment's narration text and divide by 2.5 to get `durationInSeconds`. Add 1-2 seconds for visual-only moments (intro reveal, diff highlight pause). A 50-word segment ≈ 22 seconds. +- Aim for **5-7 minutes** total narration for large PRs, **1-3 minutes** for small fixes. ## Checklist