Add Gemini Veo video generation

> Migrated from cameronsjo/cadence-palette#1

## Summary

  Add Gemini Veo video generation support to cadence-palette alongside the existing image generation pipeline. Veo 3.1 is GA with native audio, 720p/1080p/4K resolution, and 4-8 second clips — unlocking short-form
  mascot animations, product demos, and promotional clips from structured prompt files.

  ## Motivation

  - The current pipeline (`/generate-prompt` → `/generate-image`) only supports static images via `gemini-3-pro-image-preview`
  - Gemini's Veo 3.1 (`veo-3.1-generate-preview`) uses the same `google-genai` SDK and `GOOGLE_API_KEY`, so infrastructure overlap is high
  - Video generation is an **expensive cost tier** ($0.15-$0.60/sec, ~$3.20 per 8-sec clip at 1080p) — the existing pre-flight validation and cost confirmation patterns in `/generate-image` are directly reusable
  - First real use case already exists: Code Puppy "bark like a chicken" mascot animation prompt (`prompts/bark-like-a-chicken-video.md`)

  ## Proposed Changes

  ### 1. New skill: `generate-video-prompt` (parallel to `generate-prompt`)

  Structured video prompt file format with frontmatter:

  ```yaml
  ---
  name: bark-like-a-chicken-video
  model: veo-3.1-generate-preview
  aspect_ratio: '16:9'        # 16:9 or 9:16 only (no 1:1)
  resolution: 1080p            # 720p, 1080p, 4k
  duration: 8                  # 4, 6, or 8 seconds
  cost_estimate: $3.20
  style: pixel-art-animation
  last_generated: null
  last_updated: '2026-03-26T20:00:00Z'
  ---

  Body sections differ from image prompts — temporal, not spatial:

  ┌───────────────────────────────┬────────────────────────────────────────────┐
  │         Image Prompt          │                Video Prompt                │
  ├───────────────────────────────┼────────────────────────────────────────────┤
  │ Subject (position, materials) │ Subject + Action (what happens over time)  │
  ├───────────────────────────────┼────────────────────────────────────────────┤
  │ Environment (static setting)  │ Environment + Camera (movement, shot type) │
  ├───────────────────────────────┼────────────────────────────────────────────┤
  │ Secondary Elements            │ Motion Events (sequence of beats)          │
  ├───────────────────────────────┼────────────────────────────────────────────┤
  │ Lighting                      │ Lighting + Color Grading                   │
  ├───────────────────────────────┼────────────────────────────────────────────┤
  │ Style                         │ Style + Sound Design (Veo 3+ native audio) │
  └───────────────────────────────┴────────────────────────────────────────────┘

  2. New skill: generate-video (parallel to generate-image)

  Reuses the same infrastructure gate pattern:
  - Check GOOGLE_API_KEY
  - Check google-genai SDK installed
  - Read + validate prompt files
  - Cost confirmation gate (even more critical at $3.20/clip vs $0.13/image)
  - Generate via client.models.generate_videos() with polling loop
  - Save to generated/{name}-{timestamp}.mp4

  Key API differences from image generation:
  - Async operation — returns an operation object, must poll operation.done
  - No free tier — billing required from first call
  - Limited aspect ratios — only 16:9 and 9:16
  - Fixed 24fps
  - Native audio on Veo 3+ — prompts can describe soundscapes

  3. New spec: specs/video-prompt-engineering.md

  Video-specific prompt engineering guide covering:
  - Temporal structure (setup → action → button)
  - Camera vocabulary (tracking shot, dolly, static hold, etc.)
  - Audio cues for Veo 3+ (dialogue in quotes, SFX descriptions, ambient sound)
  - Duration planning (what fits in 4s vs 8s)
  - Scene extension chaining for longer narratives (up to ~1 minute)

  4. Updates to existing skills

  - gemini-image-gen/SKILL.md — add "See also: video generation" cross-reference
  - setup-generation/SKILL.md — Veo uses the same API key, but note billing requirement (no free tier for video)

  Veo API Reference

  ┌──────────────────────┬──────────────────────────────┐
  │       Property       │            Value             │
  ├──────────────────────┼──────────────────────────────┤
  │ Model                │ veo-3.1-generate-preview     │
  ├──────────────────────┼──────────────────────────────┤
  │ SDK                  │ google-genai (same as image) │
  ├──────────────────────┼──────────────────────────────┤
  │ Durations            │ 4, 6, 8 seconds              │
  ├──────────────────────┼──────────────────────────────┤
  │ Resolutions          │ 720p, 1080p, 4K (3.1 only)   │
  ├──────────────────────┼──────────────────────────────┤
  │ Aspect ratios        │ 16:9, 9:16                   │
  ├──────────────────────┼──────────────────────────────┤
  │ Cost (1080p, 8s)     │ ~$3.20                       │
  ├──────────────────────┼──────────────────────────────┤
  │ Cost (720p fast, 8s) │ ~$1.20                       │
  ├──────────────────────┼──────────────────────────────┤
  │ Audio                │ Native on Veo 3+             │
  ├──────────────────────┼──────────────────────────────┤
  │ Latency              │ 11s–6min                     │
  └──────────────────────┴──────────────────────────────┘

  Cost Tier Implications

  Video is the most expensive generation tier in the palette:

  ┌────────────────┬────────┬───────────────────────────────────────────────────────────┐
  │      Tier      │  Cost  │                           Gate                            │
  ├────────────────┼────────┼───────────────────────────────────────────────────────────┤
  │ Image 1K       │ ~$0.04 │ Validation only                                           │
  ├────────────────┼────────┼───────────────────────────────────────────────────────────┤
  │ Image 2K       │ ~$0.13 │ Validation only                                           │
  ├────────────────┼────────┼───────────────────────────────────────────────────────────┤
  │ Image 4K       │ ~$0.24 │ Validation + confirmation                                 │
  ├────────────────┼────────┼───────────────────────────────────────────────────────────┤
  │ Video 720p 8s  │ ~$1.20 │ Validation + explicit cost confirmation                   │
  ├────────────────┼────────┼───────────────────────────────────────────────────────────┤
  │ Video 1080p 8s │ ~$3.20 │ Validation + explicit cost confirmation                   │
  ├────────────────┼────────┼───────────────────────────────────────────────────────────┤
  │ Video 4K 8s    │ ~$4.80 │ Validation + explicit cost confirmation + "are you sure?" │
  └────────────────┴────────┴───────────────────────────────────────────────────────────┘

  Out of Scope

  - Image-to-video (passing a generated image as first frame) — future enhancement
  - Scene extension chaining (building 1-minute narratives) — future enhancement
  - Style reference images — Veo 3.1 supports this but adds complexity
  - Vertex AI endpoint support — Gemini API only for now

  Test Plan

  - Write a video prompt using /generate-video-prompt
  - Validate frontmatter parsing handles video-specific fields (model, duration, cost_estimate)
  - Confirm infrastructure gate detects missing GOOGLE_API_KEY
  - Confirm cost confirmation gate fires before generation
  - Generate a test video at 720p/4s (cheapest option: ~$0.60) to validate the pipeline
  - Verify .mp4 output saves correctly with timestamp filename
  - Verify polling loop handles both fast completion and timeout gracefully

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Gemini Veo video generation #25

Summary

Motivation

Proposed Changes

1. New skill: `generate-video-prompt` (parallel to `generate-prompt`)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add Gemini Veo video generation #25

Description

Summary

Motivation

Proposed Changes

1. New skill: generate-video-prompt (parallel to generate-prompt)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. New skill: `generate-video-prompt` (parallel to `generate-prompt`)