I originally wrote convertvideos.sh back in 2017 to recover disk space on my home file server. I was using a OnePlus phone that time, that created really large video recordings. As I took regular backups onto my laptop, and started running out of space, I was curious if I could compress the videos without visible loss in quality and save disk space. I was able to do it, and reclaim 30-50% disk space as I remember.
The original BASH script was pretty simple (running off ffmpeg) — walk a folder, look at each video's resolution, and re-encode anything above a "too high" bitrate down to a fixed target. That script worked fine for years, but the more I used it, the more I noticed it was making the same mistake everywhere: it treated a 1080p screen recording the same as a 1080p sports clip. One has almost no motion and could compress brutally hard; the other has fast cuts and grain and falls apart at low bitrates. The fixed-bitrate ladder either wasted bits on the simple content or starved the complex content. There was no in-between.
I recently sat down to rewrite the script and asked myself: if I were building this as a small platform rather than a one-off bash script, what would I do differently? The project in this folder is what I came up with.
The new version probes each video, optionally classifies its content with a small local Vision-Language Model, looks up the right encode profile in a policy file, encodes with ffmpeg in CRF mode, and verifies the result with VMAF before deciding whether to keep it. The original convertvideos.sh is still here as a thin wrapper, so the 2017 invocation still works.
Author: Chanaveer Kadapatti License: Apache 2.0 (see
LICENSE)
┌────────┐ ┌─────────┐ ┌──────────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│ probe │ → │ triage │ → │ classify │ → │ policy │ → │ encode │ → │ verify │ → │ commit │
│ ffprobe│ │ skip if │ │ VLM (opt'l): │ │ json │ │ ffmpeg │ │ VMAF │ │ in-place
│ │ │ already │ │ Qwen2.5-VL, │ │ lookup │ │ CRF │ │ ≥ floor│ │ or │
│ │ │ low bps │ │ MiniCPM-V,… │ │ │ │ + retry│ │ ? │ │ sibling│
└────────┘ └─────────┘ └──────────────┘ └────────┘ └────────┘ └────────┘ └────────┘
The way I think about it: the classifier proposes a profile, and VMAF disposes. The LLM looks at sampled frames and suggests something like "this is a screencast, compress it hard." The encoder takes that suggestion and runs the encode at a tight CRF. VMAF then checks the output against the source. If the perceptual quality comes in below the floor I set for that profile, the pipeline retries at a tighter CRF. If it still can't meet quality after a couple of retries, it abandons the encode and leaves the original alone.
A few of the design decisions I felt strongly about:
Don't pay the LLM cost when you don't have to. Even a small local model isn't free to run, and the inference latency adds up across a few hundred videos. So before I invoke the VLM, the pipeline runs a cheap heuristic: if the source's bitrate is already low for its resolution, I skip it entirely. There's nothing meaningful to gain from re-encoding a phone clip that's already at 4 Mbps for 1080p. The LLM only earns its keep on the long-tail content where the encode profile actually matters.
Keep the policy in a file, not in code. This is the part I most wanted to get right. The encoding rules — what CRF to use for a screencast, what audio bitrate for cinematic content, what VMAF floor to require — all live in policy.json. If I want to retune the screencast profile from CRF 28 to 27, that's a one-line edit to a config file, not a code change. To me, that's the difference between a script and a platform.
Make every decision auditable. Each video produces a row in manifest.csv capturing the classification, the rationale the LLM gave, the parameters used, the VMAF score, and the final decision. When I'm tuning the policy, the manifest is what tells me whether a change actually paid off — average savings per profile, average VMAF, how often quality fell below the floor. Without it I'd just be guessing.
Default to safety, not in-place writes. The 2017 version overwrote the source file as soon as the encode finished. I've lost footage that way before, on encodes that turned out to have artifacts I didn't notice until much later. The new version writes outputs as *-compressed.mp4 next to the source by default. If you want it to replace the original, you have to pass --in-place, and even then the replacement only happens after VMAF confirms the encode is good.
Resume should be free. I designed this assuming I'd run it overnight on large libraries. If the machine reboots, or I Ctrl+C, or one specific video fails to encode for some reason, the next run should pick up exactly where the previous one left off. The manifest is keyed on resolved file paths, and anything already marked done is automatically skipped on the next run. Per-video errors are caught and logged but don't kill the loop.
The LLM should be optional. I wanted this to run on a laptop without Ollama installed, or on a server without GPU access. Without --llm, the pipeline falls back to a sensible default profile and still works fine. The LLM is value-add, not a hard dependency. I think this is the right shape for any AI-augmented tool — the AI improves the result, but the system has to function without it.
- Inputs: anything ffmpeg can read —
mp4,mov,mkv,avi,webm,m4v,mpg,mpeg,wmv,flv. - Output: mp4 with H.264 by default, or H.265 if you switch the codec in
config.json. - Three ways to run it:
- Point it at a directory and let it process everything:
--dir /path. - Have it scan a directory and write a plan CSV, edit the plan to skip files or override CRF, then run from the CSV. This is the mode I'd recommend for large libraries — I always want a chance to see what's going to be processed before any encoding starts.
- A local web UI:
--web.
- Point it at a directory and let it process everything:
- Optional VLM: any vision-capable Ollama model. I've been using
qwen2.5vl:3bbecause it's small enough to be fast on CPU. If you have a GPU you can swap inqwen2.5vl:7b,minicpm-v:8b, orllava:7bfor better classification accuracy. - Long-running stability: designed for overnight runs. Per-video errors are isolated so the loop keeps going through individual failures, and the manifest is updated atomically after every video.
I deliberately scoped some things out of this rewrite:
- HDR / Dolby Vision metadata preservation. The encode path is SDR-safe. HDR sources will still encode, but their dynamic-range metadata may not survive. I wouldn't run this on master files for that reason.
- Chapter and subtitle preservation. The encode pulls only the first video and audio streams. Adding chapter and subtitle copy is straightforward, but I haven't needed it.
- GPU encode (NVENC, AMF, VideoToolbox). It's CPU-only for now. Adding GPU encode would basically be one switch in the policy file plus a different
-c:vargument. - Concurrent jobs from the web UI. I made the UI single-job on purpose — running two encodes at once on the same local machine just makes both slower, and a UI that lets you do it is a footgun.
ffmpeg has to be on PATH and built with libvmaf (VMAF lives inside ffmpeg as a filter):
- macOS with Homebrew:
brew install ffmpegships with libvmaf. - Ubuntu / Debian 22.04+:
sudo apt install ffmpeg. - Windows: I use the gyan.dev builds; BtbN also works. Both ship libvmaf-enabled. Add the
bin\folder to PATH.
You can confirm libvmaf is in your build with:
ffmpeg -filters 2>&1 | grep -i vmafYou should see a line for libvmaf VV->V Calculate the VMAF ....
Python 3.10 or later, then:
git clone <this-repo> && cd "Video compress project"
python3 -m venv .venv
source .venv/bin/activate # Windows (Git Bash): source .venv/Scripts/activate
pip install -r requirements.txtI kept the dependency list deliberately small — requests, fastapi, uvicorn, jinja2. JSON parsing is stdlib, so there's no pyyaml or anything else that needs system libraries.
Skip this if you'll always run without --llm. Otherwise:
# Install: https://ollama.com/download
ollama pull qwen2.5vl:3b
ollama serveI chose Ollama specifically because it gives me a clean local HTTP API and handles model loading and unloading in the background, which matters when the machine has limited RAM. The daemon usually starts automatically — you can confirm it's up with:
curl http://localhost:11434/api/tagscp config.example.json config.jsonOpen config.json and tune what you want — model name, output directory, log level, VMAF override. The defaults are sensible enough that you can probably leave it alone for a first run.
./convertvideos.sh --dir ~/Videos/raw
# with the LLM:
./convertvideos.sh --dir ~/Videos/raw --llmThis is the mode I use most often, because I always want a chance to look at the list before the encoder starts touching anything. Generate the plan:
./convertvideos.sh --scan-to-csv plan.csv --dir ~/Videos/rawOpen plan.csv in a spreadsheet or text editor. For each row you can:
- Set
actiontoskipfor files you want left alone. - Set
crf_overrideto force a specific CRF (bypasses the LLM and the policy). - Set
audio_bitrate_override, e.g.64kfor lecture-style audio.
Then run from the edited plan:
./convertvideos.sh --csv plan.csv --llm./convertvideos.sh --webThen open http://127.0.0.1:8765. The UI exposes the same three modes (directory, CSV, scan-to-CSV) plus a live log and a manifest table that refreshes while the job runs.
I usually run this with nohup and tail the log:
nohup ./convertvideos.sh --dir ~/Videos/raw --llm --in-place > run.log 2>&1 &
tail -f run.logCtrl+C — or kill <pid> (SIGTERM) — exits cleanly after the current video finishes. Re-running the same command picks up exactly where it left off, because the manifest already records what's been done.
| Flag | Default | What it does |
|---|---|---|
--dir PATH |
— | Process every video in PATH (recursive by default) |
--csv PATH |
— | Process files listed in a plan CSV |
--scan-to-csv OUT |
— | Walk --dir and write a plan CSV (no encoding) |
--web |
— | Start the local web UI |
--llm |
off | Enable VLM classification |
--llm-model NAME |
from config | Override Ollama model |
--llm-endpoint URL |
from config | Override Ollama endpoint |
--in-place |
off | Replace original after VMAF passes (else write *-compressed.mp4 beside it) |
--no-vmaf |
— | Skip VMAF (faster but no quality safety net — not recommended) |
--output-dir PATH |
beside source | Centralized output directory |
--config PATH |
config.json |
Config file path |
--policy PATH |
policy.json |
Policy file path |
--manifest PATH |
from config | Manifest file path |
--retry-failed |
off | Re-attempt files previously marked failed |
--no-recursive |
— | With --dir, do not recurse |
--web-host HOST |
127.0.0.1 |
UI bind address |
--web-port PORT |
8765 |
UI bind port |
--log-level LEVEL |
INFO |
DEBUG / INFO / WARNING / ERROR |
The legacy invocation ./convertvideos.sh files.csv (a bare list of paths) still works — if you pass a single argument that ends in .csv, the wrapper auto-routes it to --csv. I wanted the original 2017 contract to keep working, since I have other scripts that call it that way.
This is the file I'd point you at first if you want to understand the project. It's the encoding policy expressed as data: each content type has a CRF, an audio bitrate, an encoder preset, and a VMAF floor. There are also small modifiers that nudge CRF up or down based on motion and detail signals from the classifier.
If you want to experiment, what I do is copy the file, change a number, run on a small known set, and compare manifests. JSON doesn't have comments, so I've put explanatory text under _comment and description keys (the resolver only reads specific keys, so those are harmless).
Per-machine knobs: which Ollama model to use, output directory, VMAF floor override, log level, in-place behavior. Copy config.example.json to config.json and edit.
I split these from policy.json on purpose. The policy is the encoding contract — it's the artifact, version-controlled, the same on every machine. The config is per-machine preferences — different on my laptop than on the server. Mixing them tends to cause grief later.
manifest.csv is the source of truth for what's been processed. Every input gets one row, with status, classification, encode parameters, VMAF score, savings percentage, and final decision (committed, sibling, skipped-already-efficient, skipped-by-user, skipped-no-savings, abandoned).
A few patterns I use:
- To resume a run: just re-run the same command. Anything
doneis skipped automatically. - To retry failed files: add
--retry-failed. - To force a re-process of a specific file: delete its row from
manifest.csv. - For a quick savings report: open it in a spreadsheet. Sum
original_size_bytesandcompressed_size_bytesand you have your number.
The UI has three tabs: run from a directory, run from a CSV, or scan a directory and save a plan CSV. Below the tabs are the same options you'd pass on the command line — LLM toggle (with an endpoint check button so you can confirm Ollama is reachable before starting), VMAF toggle, in-place toggle, retry-failed toggle, and an output directory override. Once a job starts there's a live log and a manifest table that both refresh every 1.5 seconds.
I made it single-job on purpose. Two encodes fighting over CPU on a local machine just makes both slower, and a UI that lets you queue up parallel jobs is a footgun I didn't want to build.
A few things I've actually run into:
Required tool not found on PATH: ffmpeg — ffmpeg isn't on PATH. Install it (see Setup above) and reopen the terminal.
libvmaf errors — your ffmpeg build doesn't include libvmaf. On Ubuntu 22.04+ apt install ffmpeg has it; on Windows, switch to a gyan.dev or BtbN build. As a workaround you can run with --no-vmaf, but you lose the quality safety net, so I wouldn't recommend it for unattended runs.
LLM endpoint ... is not reachable — Ollama isn't running. Start it with ollama serve (or via the desktop app on macOS/Windows). You can verify with curl http://localhost:11434/api/tags.
LLM classification returns unknown for everything — usually means the model wasn't pulled, or the model name in config.json doesn't match what ollama list shows. Update llm.model in config.json to match exactly.
Output is bigger than the source — this happens with already-efficient sources, like phone clips at low bitrate. The pipeline detects it and skips committing the encode when behavior.delete_smaller is true (the default in config.example.json).
Stuck mid-encode after Ctrl+C — the partial output file is cleaned up automatically. If a *-compressed.mp4 is still sitting next to the source after the run exits, it's safe to delete by hand.
I want to inspect a single file's classification without encoding it — quickest path is to use --scan-to-csv, then run with --llm and action=skip set on every row except the file you want.
Manifest grows huge — there's no built-in compaction yet. I just filter the CSV by date or split per project directory when it gets unwieldy.
JSON parse error on policy.json or config.json — usually a trailing comma. JSON is stricter than YAML about that. Open the file in any JSON-aware editor and the bad spot will show up immediately.
.
├── convertvideos.sh # bash wrapper (preserves the original 2017 entrypoint)
├── policy.json # encode policy table — the artifact worth reading
├── config.example.json # runtime preferences — copy to config.json
├── requirements.txt
├── README.md
├── LICENSE
├── videocompress/ # one file per pipeline stage
│ ├── __init__.py
│ ├── cli.py # argparse, run loop, signals
│ ├── discover.py # directory walk + plan-CSV parsing
│ ├── manifest.py # resume state, CSV I/O
│ ├── pipeline.py # per-video orchestration
│ ├── probe.py # ffprobe + frame sampling
│ ├── classify.py # VLM client (optional)
│ ├── policy.py # policy resolver
│ ├── encode.py # ffmpeg encode
│ ├── verify.py # VMAF
│ └── web.py # FastAPI app
└── web/
└── templates/
└── index.html # single-page UI
I split the code one file per pipeline stage on purpose — probe, classify, policy, encode, verify. Each file maps to a step in the pipeline diagram above. I'd rather have ten short files I can open and immediately understand than three long ones that hide what each part is doing.
Apache 2.0. See LICENSE.