Highlight Extractor

A local CLI that turns a source video into a single concatenated highlight reel in the source's original orientation (16:9). Pick the moments manually (type timestamp ranges) or automatically (transcribe the audio, then let Gemini rank the most engaging moments).

How it works (staged pipeline)

Each stage is a single-task module: it reads one input artifact and writes one output artifact, knowing nothing about the other stages. The pivot artifact is segments.json — an ordered list of Segment objects. Whether those segments are typed by a human or ranked by AI, everything downstream is identical.

Stage	Status	What it does
`probe`	✅	video → metadata (duration, resolution, fps)
`extract_audio`	✅	video → 16kHz mono wav
`transcribe`	✅	wav → transcript.json (Groq Whisper)
`audio_energy`	🔲 stub	wav → peaks.json
`build_candidates`	🔲 stub	merge signals → candidates.json
`select`	✅	→ segments.json (manual ranges or Gemini auto)
`clip`	✅	video + segments.json → individual clips
`concat`	✅	clips → reel.mp4

scene_detect, subtitles, vertical_reframe, external_context, music are future layers and are not present yet.

Requirements

Python 3.11+
ffmpeg and ffprobe on your PATH (checked at startup with a clear error if missing).
- Windows: winget install Gyan.FFmpeg, or a build from https://www.gyan.dev/ffmpeg/builds/ with its bin/ folder on PATH.
Software encoding only (libx264). No GPU encoders.
API keys (auto mode only): a Groq key for transcription and a Gemini key for selection. Manual mode needs neither.

Setup

cd highlight-extractor
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

For auto mode, copy .env.example to .env and fill in your keys:

GROQ_API_KEY=...
GEMINI_API_KEY=...

.env is gitignored — keys never get committed.

Run

python cli.py "C:\path\to\source.mp4"

You can also omit the path and you'll be prompted for it. The first question is what to make:

Highlight reel — cut clips (manual or AI), optionally subtitled (below).
Subtitle a whole video — generate learning subtitles over the entire video. Outputs a .ass sidecar by default (load it in VLC/mpv, no re-encode, toggleable) or burns it in. Romaji-only is fully offline/free.

For a highlight reel, the remaining questions are:

Source video path (or pass it as the CLI arg above)
Selection mode — manual or auto 3a. (manual) Ranges — e.g. 00:30-00:45, 01:10-01:25 or one per line. Accepts MM:SS-MM:SS and HH:MM:SS-HH:MM:SS. 3b. (auto) How many highlights, an optional steer for the AI (e.g. "focus on funny moments"), and a max seconds per highlight (0 = no cap).
Padding seconds added around each cut (default 0.5)
Subtitles? yes/no — burn Japanese-learning subtitles onto the reel. If yes, you're also asked whether the source already shows its own subtitles; if so, ours move to the top and show only Japanese + romaji (so they don't collide with the source's bottom subs).
Output filename (default reel.mp4, written next to the source)

Subtitles (Japanese-learning aid)

Built for studying Japanese. Romaji is always shown, spaced by word, not syllable (kudasai, not ku da sai), so you can hear where words begin and end. You choose which other lines appear:

Romaji + English
Japanese + romaji + English
Japanese + romaji
Romaji only

English is only meaningful for Japanese audio. For other-language audio, the Japanese line is a translation (so you learn how to say it), with romaji.

How it's produced:

Romaji is generated offline (Janome word-segmentation + pykakasi), so the romaji line is free, instant, unlimited, and never depends on the network — ideal for subtitling whole episodes. Romaji-only on Japanese audio needs no API at all.
Translation uses an LLM with fallback: Gemini first, then Groq (llama-3.3-70b-versatile) if Gemini is overloaded — so a Gemini 503 spike won't stop you. Both use keys you already have.

For clean output, use a raw source without burned-in subtitles; for sources that already show subs, answer "yes" to the top-position prompt so ours don't collide.

In auto mode the tool extracts audio, transcribes it, and asks Gemini to pick the highlights — then echoes the resolved config and the final segment list (with the AI's labels) and asks you to confirm before rendering.

Flags

Flag	Default	Meaning
`--padding`	`0.5`	Seconds added around each cut
`--max-clip`	`0`	Auto mode: cap each highlight's length in seconds (`0` = no cap)
`--output`	`reel.mp4`	Output filename
`--keep-temp`	off	Preserve `workdir/<task_id>/` for debugging
`--fresh`	off	Auto mode: ignore the cached transcript and re-transcribe

Transcript caching (auto mode)

The first auto run on a video transcribes it and caches the result under cache/ (gitignored), keyed on the file's path + size + modified-time. Later runs on the same video reuse the transcript — so you can re-run with a different highlight count, steer, or --max-clip without spending Groq quota or waiting on transcription again. Edit/replace the video and the key changes automatically; pass --fresh to force a re-transcribe.

Rendering correctness

Frame-accurate cuts: every segment is re-encoded (never -c copy). Stream-copy only cuts on keyframes, which drifts boundaries by seconds.
Glitch-free concat: every clip is normalized to identical parameters (libx264 / yuv420p, source resolution + fps, AAC 44.1kHz stereo), then joined with the concat demuxer (-c copy, safe because the clips already match).
Validation: ranges are clamped to [0, duration], empty ranges dropped, and after padding, overlapping/adjacent segments are merged.

Working directory

Each run uses workdir/<task_id>/ for intermediate clips and segments.json. It is removed at the end unless you pass --keep-temp. The folder is gitignored.

Layout

highlight-extractor/
  cli.py                 # interactive config + orchestration
  config.py              # Config dataclass
  models.py              # Segment dataclass + segments.json I/O
  media/
    ffmpeg.py            # subprocess wrappers, presence check, probe
  pipeline/
    probe.py             # ✅
    extract_audio.py     # 🔲 stub
    transcribe.py        # 🔲 stub
    audio_energy.py      # 🔲 stub
    build_candidates.py  # 🔲 stub
    select.py            # ✅ manual mode; auto stub
    clip.py              # ✅
    concat.py            # ✅
  workdir/               # per-run temp (gitignored)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
media		media
music		music
pipeline		pipeline
workdir		workdir
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
cache.py		cache.py
cli.py		cli.py
config.py		config.py
gemini.py		gemini.py
llm.py		llm.py
models.py		models.py
requirements.txt		requirements.txt
romaji.py		romaji.py
settings.py		settings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Highlight Extractor

How it works (staged pipeline)

Requirements

Setup

Run

Subtitles (Japanese-learning aid)

Flags

Transcript caching (auto mode)

Rendering correctness

Working directory

Layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Highlight Extractor

How it works (staged pipeline)

Requirements

Setup

Run

Subtitles (Japanese-learning aid)

Flags

Transcript caching (auto mode)

Rendering correctness

Working directory

Layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages