Migrated from cameronsjo/cadence-palette#1
Summary
Add Gemini Veo video generation support to cadence-palette alongside the existing image generation pipeline. Veo 3.1 is GA with native audio, 720p/1080p/4K resolution, and 4-8 second clips — unlocking short-form
mascot animations, product demos, and promotional clips from structured prompt files.
Motivation
- The current pipeline (
/generate-prompt → /generate-image) only supports static images via gemini-3-pro-image-preview
- Gemini's Veo 3.1 (
veo-3.1-generate-preview) uses the same google-genai SDK and GOOGLE_API_KEY, so infrastructure overlap is high
- Video generation is an expensive cost tier ($0.15-$0.60/sec, ~$3.20 per 8-sec clip at 1080p) — the existing pre-flight validation and cost confirmation patterns in
/generate-image are directly reusable
- First real use case already exists: Code Puppy "bark like a chicken" mascot animation prompt (
prompts/bark-like-a-chicken-video.md)
Proposed Changes
1. New skill: generate-video-prompt (parallel to generate-prompt)
Structured video prompt file format with frontmatter:
---
name: bark-like-a-chicken-video
model: veo-3.1-generate-preview
aspect_ratio: '16:9' # 16:9 or 9:16 only (no 1:1)
resolution: 1080p # 720p, 1080p, 4k
duration: 8 # 4, 6, or 8 seconds
cost_estimate: $3.20
style: pixel-art-animation
last_generated: null
last_updated: '2026-03-26T20:00:00Z'
---
Body sections differ from image prompts — temporal, not spatial:
┌───────────────────────────────┬────────────────────────────────────────────┐
│ Image Prompt │ Video Prompt │
├───────────────────────────────┼────────────────────────────────────────────┤
│ Subject (position, materials) │ Subject + Action (what happens over time) │
├───────────────────────────────┼────────────────────────────────────────────┤
│ Environment (static setting) │ Environment + Camera (movement, shot type) │
├───────────────────────────────┼────────────────────────────────────────────┤
│ Secondary Elements │ Motion Events (sequence of beats) │
├───────────────────────────────┼────────────────────────────────────────────┤
│ Lighting │ Lighting + Color Grading │
├───────────────────────────────┼────────────────────────────────────────────┤
│ Style │ Style + Sound Design (Veo 3+ native audio) │
└───────────────────────────────┴────────────────────────────────────────────┘
2. New skill: generate-video (parallel to generate-image)
Reuses the same infrastructure gate pattern:
- Check GOOGLE_API_KEY
- Check google-genai SDK installed
- Read + validate prompt files
- Cost confirmation gate (even more critical at $3.20/clip vs $0.13/image)
- Generate via client.models.generate_videos() with polling loop
- Save to generated/{name}-{timestamp}.mp4
Key API differences from image generation:
- Async operation — returns an operation object, must poll operation.done
- No free tier — billing required from first call
- Limited aspect ratios — only 16:9 and 9:16
- Fixed 24fps
- Native audio on Veo 3+ — prompts can describe soundscapes
3. New spec: specs/video-prompt-engineering.md
Video-specific prompt engineering guide covering:
- Temporal structure (setup → action → button)
- Camera vocabulary (tracking shot, dolly, static hold, etc.)
- Audio cues for Veo 3+ (dialogue in quotes, SFX descriptions, ambient sound)
- Duration planning (what fits in 4s vs 8s)
- Scene extension chaining for longer narratives (up to ~1 minute)
4. Updates to existing skills
- gemini-image-gen/SKILL.md — add "See also: video generation" cross-reference
- setup-generation/SKILL.md — Veo uses the same API key, but note billing requirement (no free tier for video)
Veo API Reference
┌──────────────────────┬──────────────────────────────┐
│ Property │ Value │
├──────────────────────┼──────────────────────────────┤
│ Model │ veo-3.1-generate-preview │
├──────────────────────┼──────────────────────────────┤
│ SDK │ google-genai (same as image) │
├──────────────────────┼──────────────────────────────┤
│ Durations │ 4, 6, 8 seconds │
├──────────────────────┼──────────────────────────────┤
│ Resolutions │ 720p, 1080p, 4K (3.1 only) │
├──────────────────────┼──────────────────────────────┤
│ Aspect ratios │ 16:9, 9:16 │
├──────────────────────┼──────────────────────────────┤
│ Cost (1080p, 8s) │ ~$3.20 │
├──────────────────────┼──────────────────────────────┤
│ Cost (720p fast, 8s) │ ~$1.20 │
├──────────────────────┼──────────────────────────────┤
│ Audio │ Native on Veo 3+ │
├──────────────────────┼──────────────────────────────┤
│ Latency │ 11s–6min │
└──────────────────────┴──────────────────────────────┘
Cost Tier Implications
Video is the most expensive generation tier in the palette:
┌────────────────┬────────┬───────────────────────────────────────────────────────────┐
│ Tier │ Cost │ Gate │
├────────────────┼────────┼───────────────────────────────────────────────────────────┤
│ Image 1K │ ~$0.04 │ Validation only │
├────────────────┼────────┼───────────────────────────────────────────────────────────┤
│ Image 2K │ ~$0.13 │ Validation only │
├────────────────┼────────┼───────────────────────────────────────────────────────────┤
│ Image 4K │ ~$0.24 │ Validation + confirmation │
├────────────────┼────────┼───────────────────────────────────────────────────────────┤
│ Video 720p 8s │ ~$1.20 │ Validation + explicit cost confirmation │
├────────────────┼────────┼───────────────────────────────────────────────────────────┤
│ Video 1080p 8s │ ~$3.20 │ Validation + explicit cost confirmation │
├────────────────┼────────┼───────────────────────────────────────────────────────────┤
│ Video 4K 8s │ ~$4.80 │ Validation + explicit cost confirmation + "are you sure?" │
└────────────────┴────────┴───────────────────────────────────────────────────────────┘
Out of Scope
- Image-to-video (passing a generated image as first frame) — future enhancement
- Scene extension chaining (building 1-minute narratives) — future enhancement
- Style reference images — Veo 3.1 supports this but adds complexity
- Vertex AI endpoint support — Gemini API only for now
Test Plan
- Write a video prompt using /generate-video-prompt
- Validate frontmatter parsing handles video-specific fields (model, duration, cost_estimate)
- Confirm infrastructure gate detects missing GOOGLE_API_KEY
- Confirm cost confirmation gate fires before generation
- Generate a test video at 720p/4s (cheapest option: ~$0.60) to validate the pipeline
- Verify .mp4 output saves correctly with timestamp filename
- Verify polling loop handles both fast completion and timeout gracefully
Summary
Add Gemini Veo video generation support to cadence-palette alongside the existing image generation pipeline. Veo 3.1 is GA with native audio, 720p/1080p/4K resolution, and 4-8 second clips — unlocking short-form
mascot animations, product demos, and promotional clips from structured prompt files.
Motivation
/generate-prompt→/generate-image) only supports static images viagemini-3-pro-image-previewveo-3.1-generate-preview) uses the samegoogle-genaiSDK andGOOGLE_API_KEY, so infrastructure overlap is high/generate-imageare directly reusableprompts/bark-like-a-chicken-video.md)Proposed Changes
1. New skill:
generate-video-prompt(parallel togenerate-prompt)Structured video prompt file format with frontmatter: