video2ai

Turn any video into AI-ready structured content. Extract frames, transcribe audio, auto-detect key moments — all running locally on Apple Silicon.

What it does

Drop a video in. Get back:

Timestamped transcript (Whisper, local)
Key frames auto-selected per transcript segment
Visual theme clusters for instant filtering
Self-contained HTML export with embedded images + transcript

How it works

Video
  |
  +-- ffmpeg --> frames (1/sec)
  |
  +-- Whisper --> transcript segments with timestamps
  |
  +-- Apple Vision (Neural Engine) --> 768-dim embedding per frame
       |
       +-- cosine distance --> visual state change detection --> key frame suggestions
       |
       +-- k-means clustering --> visual theme groups (person, screen, product, etc.)

Zero Python ML overhead. Frame embeddings run on the Neural Engine via VNGenerateImageFeaturePrintRequest — near-zero CPU/RAM. No PyTorch, no CLIP, no transformers.

The workflow

Upload or paste a URL (YouTube, Vimeo, etc. via yt-dlp)
Pipeline runs: probe → extract frames → transcribe → embed → suggest key frames
Review page: transcript sidebar + frame grid per segment
Visual theme filter: cluster thumbnails at top — click to suppress a theme (e.g. "talking head") and keep another (e.g. "product shots")
Export: download self-contained HTML with key frames + transcript

Install

# Clone
git clone <repo-url> && cd capabilities

# Install
pip install -e .

# Dependencies
brew install ffmpeg
pip install openai-whisper pyobjc-framework-Vision yt-dlp

Run

# Web UI (recommended)
video2ai-web

# CLI
video2ai path/to/video.mp4 -o output/

Requirements

macOS (Apple Vision framework)
Python 3.12+
ffmpeg
Whisper (for transcription)
Ollama (optional, for LLM summaries)

Architecture

Module	Role
`probe.py`	ffprobe wrapper — duration, resolution, codecs
`frames.py`	ffmpeg frame extraction at configurable intervals
`transcribe.py`	Whisper audio transcription
`clip_match.py`	Apple Vision embeddings, change detection, k-means clustering
`vision.py`	Apple Vision OCR + image classification
`llm.py`	Ollama LLM analysis (optional)
`web.py`	Flask web UI — upload, process, review, export
`embed.py`	ffmpeg metadata embedding into video files

Built with Claude Code.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
video2ai		video2ai
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

video2ai

What it does

How it works

The workflow

Install

Run

Requirements

Architecture

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

video2ai

What it does

How it works

The workflow

Install

Run

Requirements

Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages