pdf2audio

Convert PDF documents to audio using fully local, offline TTS. No cloud, no API keys.

Pipeline: PDF/URL → text extraction (PyMuPDF/pdftotext/trafilatura) → speech synthesis (Kokoro TTS) → opus audio (ffmpeg)

Smart text cleaning:

Footnotes stripped by font-size detection
References/Bibliography section removed
Inline citations [1], (Smith et al., 2023) removed

Install via Homebrew

brew tap mathiasconradt/pdf2audio https://github.com/mathiasconradt/pdf2audio
brew install pdf2audio

Then run from anywhere:

pdf2audio
pdf2audio --file=~/Downloads/paper.pdf --open

On first run, Kokoro downloads model weights (~300MB) from Hugging Face.

Manual Install

Requirements

macOS (uses open to play result)
uv — Python package manager
ffmpeg — audio encoding
poppler — robust text extraction for printed web PDFs

Manual Install

1. Install system deps

brew install ffmpeg poppler uv

2. Clone / download this repo

git clone <repo-url>
cd pdf2audio

3. Create venv and install Python deps

uv venv
uv sync

First run downloads Kokoro model weights (~300MB) from Hugging Face automatically.

Usage

Interactive file browser (no argument):

./pdf2audio.sh

Navigate with ↑↓, Enter to open folder or select PDF, Tab to toggle opening audio after conversion, Esc to quit.
Type letters/digits/_/- to filter the file list — Esc clears filter, then quits.
Folders shown as [name] at top, PDF files below. [..] to go up.

Direct path:

./pdf2audio.sh --file=~/Downloads/paper.pdf
./pdf2audio.sh --file="~/Downloads/My Papers/paper.pdf" --open

URL input:

./pdf2audio.sh --file=https://example.com/paper.pdf
./pdf2audio.sh --file=https://example.com/blog/article

PDF URLs are downloaded to a private temp file before extraction. HTML URLs are converted with deterministic readability extraction (trafilatura, with a basic HTML fallback) to avoid navigation, footer, and ad-like page chrome where possible.

Options:

Option	Description
`--file=PATH`	Path to input PDF or `http(s)` URL (omit to use file browser)
`--open`	Open output audio file after conversion (default on)
`--help`	Show usage info

Output .opus saved next to source PDF.

Configuration

Edit the top of pdf2audio.sh to change voice or speed:

VOICE="af_heart"   # TTS voice
SPEED=1.0          # 1.0 = normal, 1.5 = faster

Available voices (Kokoro 82M model):

Code	Description
`af_heart`	American English, female (default)
`am_adam`	American English, male
`bf_emma`	British English, female
`bm_george`	British English, male

Notes

Processing time scales with PDF length — expect ~1 min per 10 pages on CPU (no GPU needed)
Image-only / scanned PDFs will fail — text layer required
JS-rendered, paywalled, or heavily interactive pages may not extract cleanly from URL input
Output is ~24kbps opus, highly compressed (~1MB per hour of audio)

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.agents/skills/sonar-context-augmentation		.agents/skills/sonar-context-augmentation
.github/workflows		.github/workflows
Formula		Formula
docs/images		docs/images
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
pdf2audio.sh		pdf2audio.sh
pdf_browser.py		pdf_browser.py
pdf_extract.py		pdf_extract.py
pyproject.toml		pyproject.toml
sonar-project.properties		sonar-project.properties
url_extract.py		url_extract.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf2audio

Install via Homebrew

Manual Install

Requirements

Manual Install

Usage

Configuration

Notes

About

Uh oh!

Releases 11

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pdf2audio

Install via Homebrew

Manual Install

Requirements

Manual Install

Usage

Configuration

Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages