Skip to content

eebette/PgsToSrtPlus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PgsToSrtPlus

Extract PGS (Blu-ray) subtitles from MKV files and convert them to SRT using PaddleOCR and Ollama VLMs.

How It Works

PgsToSrtPlus decodes PGS subtitle bitmaps, preprocesses and splits them into individual text lines, then runs a two-stage OCR pipeline:

  1. PaddleOCR performs a fast first-pass recognition on every line. Lines above a confidence threshold are accepted as-is.
  2. Lines below the threshold fall back to a Vision Language Model (via Ollama) for a more accurate read.
  3. A second VLM pass performs italic detection by comparing the original subtitle bitmap against a synthetically rendered upright reference image, classifying each token as italic or roman.

The result is an SRT file with accurate text and <i> markup.

Why PaddleOCR?

PaddleOCR generally outperforms Tesseract in terms of accuracy, particularly on complex, low-quality, and scene-text documents.

PaddleOCR also has a wider dictionary than Tesseract, and will properly recognize non-ASCII characters that sometimes appear in subtitles, such as music notes (♪ ♫ ♬).

Supported Languages

  • English (en) — default
  • Japanese (ja)

Other languages may work but will use a generic fallback prompt. Language-specific OCR prompts, fonts, and post-processing steps are configurable per language.

Requirements

  • Docker
  • Ollama running separately and accessible from the Docker container (used for low-confidence OCR fallback and italic detection). Default model: qwen3-vl:32b-instruct

Quick Start

Pull the image:

# CPU
docker pull ebette1/pgs-to-srt-plus:latest

# GPU (NVIDIA)
docker pull ebette1/pgs-to-srt-plus-gpu:latest

Run:

docker run --rm --add-host=host.docker.internal:host-gateway \
  -v /path/to/media:/media \
  ebette1/pgs-to-srt-plus:latest \
  "/media/movie.mkv" \
  --ollama http://host.docker.internal:11434

For GPU acceleration (requires NVIDIA Container Toolkit):

docker run --rm --gpus all --add-host=host.docker.internal:host-gateway \
  -v /path/to/media:/media \
  ebette1/pgs-to-srt-plus-gpu:latest \
  "/media/movie.mkv" \
  --ollama http://host.docker.internal:11434

The SRT file is written next to the input file. Use -o /path with a bind mount to write elsewhere.

Options

Option Default Description
--ollama http://127.0.0.1:11434 Ollama endpoint URL
--language, -l en Subtitle language (en, ja)
--track auto-detect PGS track index
-o, --output same as input Output directory
--model qwen3-vl:32b-instruct Ollama VLM model
--device cpu PaddleOCR device (cpu, gpu)
--verify-threshold 0.97 PaddleOCR confidence below which to fall back to VLM
--paddle-model PP-OCRv5_server_rec PaddleOCR recognition model

Acknowledgments

Docker Images

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors