Extract PGS (Blu-ray) subtitles from MKV files and convert them to SRT using PaddleOCR and Ollama VLMs.
PgsToSrtPlus decodes PGS subtitle bitmaps, preprocesses and splits them into individual text lines, then runs a two-stage OCR pipeline:
- PaddleOCR performs a fast first-pass recognition on every line. Lines above a confidence threshold are accepted as-is.
- Lines below the threshold fall back to a Vision Language Model (via Ollama) for a more accurate read.
- A second VLM pass performs italic detection by comparing the original subtitle bitmap against a synthetically rendered upright reference image, classifying each token as italic or roman.
The result is an SRT file with accurate text and <i> markup.
PaddleOCR generally outperforms Tesseract in terms of accuracy, particularly on complex, low-quality, and scene-text documents.
PaddleOCR also has a wider dictionary than Tesseract, and will properly recognize non-ASCII characters that sometimes appear in subtitles, such as music notes (♪ ♫ ♬).
- English (
en) — default - Japanese (
ja)
Other languages may work but will use a generic fallback prompt. Language-specific OCR prompts, fonts, and post-processing steps are configurable per language.
- Docker
- Ollama running separately and accessible from the Docker container (used for low-confidence OCR fallback and italic detection). Default model:
qwen3-vl:32b-instruct
Pull the image:
# CPU
docker pull ebette1/pgs-to-srt-plus:latest
# GPU (NVIDIA)
docker pull ebette1/pgs-to-srt-plus-gpu:latestRun:
docker run --rm --add-host=host.docker.internal:host-gateway \
-v /path/to/media:/media \
ebette1/pgs-to-srt-plus:latest \
"/media/movie.mkv" \
--ollama http://host.docker.internal:11434For GPU acceleration (requires NVIDIA Container Toolkit):
docker run --rm --gpus all --add-host=host.docker.internal:host-gateway \
-v /path/to/media:/media \
ebette1/pgs-to-srt-plus-gpu:latest \
"/media/movie.mkv" \
--ollama http://host.docker.internal:11434The SRT file is written next to the input file. Use -o /path with a bind mount to write elsewhere.
| Option | Default | Description |
|---|---|---|
--ollama |
http://127.0.0.1:11434 |
Ollama endpoint URL |
--language, -l |
en |
Subtitle language (en, ja) |
--track |
auto-detect | PGS track index |
-o, --output |
same as input | Output directory |
--model |
qwen3-vl:32b-instruct |
Ollama VLM model |
--device |
cpu |
PaddleOCR device (cpu, gpu) |
--verify-threshold |
0.97 |
PaddleOCR confidence below which to fall back to VLM |
--paddle-model |
PP-OCRv5_server_rec |
PaddleOCR recognition model |
- Tentacule/PgsToSrt — the original inspiration for this project
- SubtitleEdit / libse — PGS parsing and Matroska container support
- ebette1/pgs-to-srt-plus (CPU)
- ebette1/pgs-to-srt-plus-gpu (NVIDIA GPU)