Skip to content

Chanaveer/Intelligent_Video_Compression

Repository files navigation

Video Compress

I originally wrote convertvideos.sh back in 2017 to recover disk space on my home file server. I was using a OnePlus phone that time, that created really large video recordings. As I took regular backups onto my laptop, and started running out of space, I was curious if I could compress the videos without visible loss in quality and save disk space. I was able to do it, and reclaim 30-50% disk space as I remember.

The original BASH script was pretty simple (running off ffmpeg) — walk a folder, look at each video's resolution, and re-encode anything above a "too high" bitrate down to a fixed target. That script worked fine for years, but the more I used it, the more I noticed it was making the same mistake everywhere: it treated a 1080p screen recording the same as a 1080p sports clip. One has almost no motion and could compress brutally hard; the other has fast cuts and grain and falls apart at low bitrates. The fixed-bitrate ladder either wasted bits on the simple content or starved the complex content. There was no in-between.

I recently sat down to rewrite the script and asked myself: if I were building this as a small platform rather than a one-off bash script, what would I do differently? The project in this folder is what I came up with.

The new version probes each video, optionally classifies its content with a small local Vision-Language Model, looks up the right encode profile in a policy file, encodes with ffmpeg in CRF mode, and verifies the result with VMAF before deciding whether to keep it. The original convertvideos.sh is still here as a thin wrapper, so the 2017 invocation still works.

Author: Chanaveer Kadapatti License: Apache 2.0 (see LICENSE)


How the pipeline works

 ┌────────┐   ┌─────────┐   ┌──────────────┐   ┌────────┐   ┌────────┐   ┌────────┐   ┌────────┐
 │ probe  │ → │ triage  │ → │ classify     │ → │ policy │ → │ encode │ → │ verify │ → │ commit │
 │ ffprobe│   │ skip if │   │ VLM (opt'l): │   │ json   │   │ ffmpeg │   │ VMAF   │   │ in-place
 │        │   │ already │   │ Qwen2.5-VL,  │   │ lookup │   │ CRF    │   │ ≥ floor│   │   or   │
 │        │   │ low bps │   │ MiniCPM-V,…  │   │        │   │ + retry│   │ ?      │   │ sibling│
 └────────┘   └─────────┘   └──────────────┘   └────────┘   └────────┘   └────────┘   └────────┘

The way I think about it: the classifier proposes a profile, and VMAF disposes. The LLM looks at sampled frames and suggests something like "this is a screencast, compress it hard." The encoder takes that suggestion and runs the encode at a tight CRF. VMAF then checks the output against the source. If the perceptual quality comes in below the floor I set for that profile, the pipeline retries at a tighter CRF. If it still can't meet quality after a couple of retries, it abandons the encode and leaves the original alone.

Why I built it this way

A few of the design decisions I felt strongly about:

Don't pay the LLM cost when you don't have to. Even a small local model isn't free to run, and the inference latency adds up across a few hundred videos. So before I invoke the VLM, the pipeline runs a cheap heuristic: if the source's bitrate is already low for its resolution, I skip it entirely. There's nothing meaningful to gain from re-encoding a phone clip that's already at 4 Mbps for 1080p. The LLM only earns its keep on the long-tail content where the encode profile actually matters.

Keep the policy in a file, not in code. This is the part I most wanted to get right. The encoding rules — what CRF to use for a screencast, what audio bitrate for cinematic content, what VMAF floor to require — all live in policy.json. If I want to retune the screencast profile from CRF 28 to 27, that's a one-line edit to a config file, not a code change. To me, that's the difference between a script and a platform.

Make every decision auditable. Each video produces a row in manifest.csv capturing the classification, the rationale the LLM gave, the parameters used, the VMAF score, and the final decision. When I'm tuning the policy, the manifest is what tells me whether a change actually paid off — average savings per profile, average VMAF, how often quality fell below the floor. Without it I'd just be guessing.

Default to safety, not in-place writes. The 2017 version overwrote the source file as soon as the encode finished. I've lost footage that way before, on encodes that turned out to have artifacts I didn't notice until much later. The new version writes outputs as *-compressed.mp4 next to the source by default. If you want it to replace the original, you have to pass --in-place, and even then the replacement only happens after VMAF confirms the encode is good.

Resume should be free. I designed this assuming I'd run it overnight on large libraries. If the machine reboots, or I Ctrl+C, or one specific video fails to encode for some reason, the next run should pick up exactly where the previous one left off. The manifest is keyed on resolved file paths, and anything already marked done is automatically skipped on the next run. Per-video errors are caught and logged but don't kill the loop.

The LLM should be optional. I wanted this to run on a laptop without Ollama installed, or on a server without GPU access. Without --llm, the pipeline falls back to a sensible default profile and still works fine. The LLM is value-add, not a hard dependency. I think this is the right shape for any AI-augmented tool — the AI improves the result, but the system has to function without it.


What it supports

  • Inputs: anything ffmpeg can read — mp4, mov, mkv, avi, webm, m4v, mpg, mpeg, wmv, flv.
  • Output: mp4 with H.264 by default, or H.265 if you switch the codec in config.json.
  • Three ways to run it:
    1. Point it at a directory and let it process everything: --dir /path.
    2. Have it scan a directory and write a plan CSV, edit the plan to skip files or override CRF, then run from the CSV. This is the mode I'd recommend for large libraries — I always want a chance to see what's going to be processed before any encoding starts.
    3. A local web UI: --web.
  • Optional VLM: any vision-capable Ollama model. I've been using qwen2.5vl:3b because it's small enough to be fast on CPU. If you have a GPU you can swap in qwen2.5vl:7b, minicpm-v:8b, or llava:7b for better classification accuracy.
  • Long-running stability: designed for overnight runs. Per-video errors are isolated so the loop keeps going through individual failures, and the manifest is updated atomically after every video.

What it doesn't do (yet)

I deliberately scoped some things out of this rewrite:

  • HDR / Dolby Vision metadata preservation. The encode path is SDR-safe. HDR sources will still encode, but their dynamic-range metadata may not survive. I wouldn't run this on master files for that reason.
  • Chapter and subtitle preservation. The encode pulls only the first video and audio streams. Adding chapter and subtitle copy is straightforward, but I haven't needed it.
  • GPU encode (NVENC, AMF, VideoToolbox). It's CPU-only for now. Adding GPU encode would basically be one switch in the policy file plus a different -c:v argument.
  • Concurrent jobs from the web UI. I made the UI single-job on purpose — running two encodes at once on the same local machine just makes both slower, and a UI that lets you do it is a footgun.

Setup

ffmpeg with libvmaf

ffmpeg has to be on PATH and built with libvmaf (VMAF lives inside ffmpeg as a filter):

  • macOS with Homebrew: brew install ffmpeg ships with libvmaf.
  • Ubuntu / Debian 22.04+: sudo apt install ffmpeg.
  • Windows: I use the gyan.dev builds; BtbN also works. Both ship libvmaf-enabled. Add the bin\ folder to PATH.

You can confirm libvmaf is in your build with:

ffmpeg -filters 2>&1 | grep -i vmaf

You should see a line for libvmaf VV->V Calculate the VMAF ....

Python and dependencies

Python 3.10 or later, then:

git clone <this-repo> && cd "Video compress project"
python3 -m venv .venv
source .venv/bin/activate          # Windows (Git Bash): source .venv/Scripts/activate
pip install -r requirements.txt

I kept the dependency list deliberately small — requests, fastapi, uvicorn, jinja2. JSON parsing is stdlib, so there's no pyyaml or anything else that needs system libraries.

(Optional) Ollama for the VLM

Skip this if you'll always run without --llm. Otherwise:

# Install: https://ollama.com/download
ollama pull qwen2.5vl:3b
ollama serve

I chose Ollama specifically because it gives me a clean local HTTP API and handles model loading and unloading in the background, which matters when the machine has limited RAM. The daemon usually starts automatically — you can confirm it's up with:

curl http://localhost:11434/api/tags

Configure

cp config.example.json config.json

Open config.json and tune what you want — model name, output directory, log level, VMAF override. The defaults are sensible enough that you can probably leave it alone for a first run.


Running it

Process a whole directory

./convertvideos.sh --dir ~/Videos/raw
# with the LLM:
./convertvideos.sh --dir ~/Videos/raw --llm

Plan first, edit, then run

This is the mode I use most often, because I always want a chance to look at the list before the encoder starts touching anything. Generate the plan:

./convertvideos.sh --scan-to-csv plan.csv --dir ~/Videos/raw

Open plan.csv in a spreadsheet or text editor. For each row you can:

  • Set action to skip for files you want left alone.
  • Set crf_override to force a specific CRF (bypasses the LLM and the policy).
  • Set audio_bitrate_override, e.g. 64k for lecture-style audio.

Then run from the edited plan:

./convertvideos.sh --csv plan.csv --llm

Web UI

./convertvideos.sh --web

Then open http://127.0.0.1:8765. The UI exposes the same three modes (directory, CSV, scan-to-CSV) plus a live log and a manifest table that refreshes while the job runs.

Long overnight runs

I usually run this with nohup and tail the log:

nohup ./convertvideos.sh --dir ~/Videos/raw --llm --in-place > run.log 2>&1 &
tail -f run.log

Ctrl+C — or kill <pid> (SIGTERM) — exits cleanly after the current video finishes. Re-running the same command picks up exactly where it left off, because the manifest already records what's been done.


CLI reference

Flag Default What it does
--dir PATH Process every video in PATH (recursive by default)
--csv PATH Process files listed in a plan CSV
--scan-to-csv OUT Walk --dir and write a plan CSV (no encoding)
--web Start the local web UI
--llm off Enable VLM classification
--llm-model NAME from config Override Ollama model
--llm-endpoint URL from config Override Ollama endpoint
--in-place off Replace original after VMAF passes (else write *-compressed.mp4 beside it)
--no-vmaf Skip VMAF (faster but no quality safety net — not recommended)
--output-dir PATH beside source Centralized output directory
--config PATH config.json Config file path
--policy PATH policy.json Policy file path
--manifest PATH from config Manifest file path
--retry-failed off Re-attempt files previously marked failed
--no-recursive With --dir, do not recurse
--web-host HOST 127.0.0.1 UI bind address
--web-port PORT 8765 UI bind port
--log-level LEVEL INFO DEBUG / INFO / WARNING / ERROR

The legacy invocation ./convertvideos.sh files.csv (a bare list of paths) still works — if you pass a single argument that ends in .csv, the wrapper auto-routes it to --csv. I wanted the original 2017 contract to keep working, since I have other scripts that call it that way.


Configuration

policy.json — the encoding contract

This is the file I'd point you at first if you want to understand the project. It's the encoding policy expressed as data: each content type has a CRF, an audio bitrate, an encoder preset, and a VMAF floor. There are also small modifiers that nudge CRF up or down based on motion and detail signals from the classifier.

If you want to experiment, what I do is copy the file, change a number, run on a small known set, and compare manifests. JSON doesn't have comments, so I've put explanatory text under _comment and description keys (the resolver only reads specific keys, so those are harmless).

config.example.jsonconfig.json — runtime preferences

Per-machine knobs: which Ollama model to use, output directory, VMAF floor override, log level, in-place behavior. Copy config.example.json to config.json and edit.

I split these from policy.json on purpose. The policy is the encoding contract — it's the artifact, version-controlled, the same on every machine. The config is per-machine preferences — different on my laptop than on the server. Mixing them tends to cause grief later.


Manifest and resume

manifest.csv is the source of truth for what's been processed. Every input gets one row, with status, classification, encode parameters, VMAF score, savings percentage, and final decision (committed, sibling, skipped-already-efficient, skipped-by-user, skipped-no-savings, abandoned).

A few patterns I use:

  • To resume a run: just re-run the same command. Anything done is skipped automatically.
  • To retry failed files: add --retry-failed.
  • To force a re-process of a specific file: delete its row from manifest.csv.
  • For a quick savings report: open it in a spreadsheet. Sum original_size_bytes and compressed_size_bytes and you have your number.

Web UI

The UI has three tabs: run from a directory, run from a CSV, or scan a directory and save a plan CSV. Below the tabs are the same options you'd pass on the command line — LLM toggle (with an endpoint check button so you can confirm Ollama is reachable before starting), VMAF toggle, in-place toggle, retry-failed toggle, and an output directory override. Once a job starts there's a live log and a manifest table that both refresh every 1.5 seconds.

I made it single-job on purpose. Two encodes fighting over CPU on a local machine just makes both slower, and a UI that lets you queue up parallel jobs is a footgun I didn't want to build.


Troubleshooting

A few things I've actually run into:

Required tool not found on PATH: ffmpeg — ffmpeg isn't on PATH. Install it (see Setup above) and reopen the terminal.

libvmaf errors — your ffmpeg build doesn't include libvmaf. On Ubuntu 22.04+ apt install ffmpeg has it; on Windows, switch to a gyan.dev or BtbN build. As a workaround you can run with --no-vmaf, but you lose the quality safety net, so I wouldn't recommend it for unattended runs.

LLM endpoint ... is not reachable — Ollama isn't running. Start it with ollama serve (or via the desktop app on macOS/Windows). You can verify with curl http://localhost:11434/api/tags.

LLM classification returns unknown for everything — usually means the model wasn't pulled, or the model name in config.json doesn't match what ollama list shows. Update llm.model in config.json to match exactly.

Output is bigger than the source — this happens with already-efficient sources, like phone clips at low bitrate. The pipeline detects it and skips committing the encode when behavior.delete_smaller is true (the default in config.example.json).

Stuck mid-encode after Ctrl+C — the partial output file is cleaned up automatically. If a *-compressed.mp4 is still sitting next to the source after the run exits, it's safe to delete by hand.

I want to inspect a single file's classification without encoding it — quickest path is to use --scan-to-csv, then run with --llm and action=skip set on every row except the file you want.

Manifest grows huge — there's no built-in compaction yet. I just filter the CSV by date or split per project directory when it gets unwieldy.

JSON parse error on policy.json or config.json — usually a trailing comma. JSON is stricter than YAML about that. Open the file in any JSON-aware editor and the bad spot will show up immediately.


Project layout

.
├── convertvideos.sh             # bash wrapper (preserves the original 2017 entrypoint)
├── policy.json                  # encode policy table — the artifact worth reading
├── config.example.json          # runtime preferences — copy to config.json
├── requirements.txt
├── README.md
├── LICENSE
├── videocompress/               # one file per pipeline stage
│   ├── __init__.py
│   ├── cli.py                   # argparse, run loop, signals
│   ├── discover.py              # directory walk + plan-CSV parsing
│   ├── manifest.py              # resume state, CSV I/O
│   ├── pipeline.py              # per-video orchestration
│   ├── probe.py                 # ffprobe + frame sampling
│   ├── classify.py              # VLM client (optional)
│   ├── policy.py                # policy resolver
│   ├── encode.py                # ffmpeg encode
│   ├── verify.py                # VMAF
│   └── web.py                   # FastAPI app
└── web/
    └── templates/
        └── index.html           # single-page UI

I split the code one file per pipeline stage on purpose — probe, classify, policy, encode, verify. Each file maps to a step in the pipeline diagram above. I'd rather have ten short files I can open and immediately understand than three long ones that hide what each part is doing.


License

Apache 2.0. See LICENSE.

About

Python project to review your saved video recordings & intelligently compress them to save disk space

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages