Convert PDF documents to audio using fully local, offline TTS. No cloud, no API keys.
Pipeline: PDF/URL → text extraction (PyMuPDF/pdftotext/trafilatura) → speech synthesis (Kokoro TTS) → opus audio (ffmpeg)
Smart text cleaning:
- Footnotes stripped by font-size detection
- References/Bibliography section removed
- Inline citations
[1],(Smith et al., 2023)removed
brew tap mathiasconradt/pdf2audio https://github.com/mathiasconradt/pdf2audio
brew install pdf2audioThen run from anywhere:
pdf2audio
pdf2audio --file=~/Downloads/paper.pdf --openOn first run, Kokoro downloads model weights (~300MB) from Hugging Face.
- macOS (uses
opento play result) - uv — Python package manager
- ffmpeg — audio encoding
- poppler — robust text extraction for printed web PDFs
1. Install system deps
brew install ffmpeg poppler uv2. Clone / download this repo
git clone <repo-url>
cd pdf2audio3. Create venv and install Python deps
uv venv
uv syncFirst run downloads Kokoro model weights (~300MB) from Hugging Face automatically.
Interactive file browser (no argument):
./pdf2audio.shNavigate with ↑↓, Enter to open folder or select PDF, Tab to toggle opening audio after conversion, Esc to quit.
Type letters/digits/_/- to filter the file list — Esc clears filter, then quits.
Folders shown as [name] at top, PDF files below. [..] to go up.
Direct path:
./pdf2audio.sh --file=~/Downloads/paper.pdf
./pdf2audio.sh --file="~/Downloads/My Papers/paper.pdf" --openURL input:
./pdf2audio.sh --file=https://example.com/paper.pdf
./pdf2audio.sh --file=https://example.com/blog/articlePDF URLs are downloaded to a private temp file before extraction. HTML URLs are
converted with deterministic readability extraction (trafilatura, with a basic
HTML fallback) to avoid navigation, footer, and ad-like page chrome where possible.
Options:
| Option | Description |
|---|---|
--file=PATH |
Path to input PDF or http(s) URL (omit to use file browser) |
--open |
Open output audio file after conversion (default on) |
--help |
Show usage info |
Output .opus saved next to source PDF.
Edit the top of pdf2audio.sh to change voice or speed:
VOICE="af_heart" # TTS voice
SPEED=1.0 # 1.0 = normal, 1.5 = fasterAvailable voices (Kokoro 82M model):
| Code | Description |
|---|---|
af_heart |
American English, female (default) |
am_adam |
American English, male |
bf_emma |
British English, female |
bm_george |
British English, male |
- Processing time scales with PDF length — expect ~1 min per 10 pages on CPU (no GPU needed)
- Image-only / scanned PDFs will fail — text layer required
- JS-rendered, paywalled, or heavily interactive pages may not extract cleanly from URL input
- Output is ~24kbps opus, highly compressed (~1MB per hour of audio)

