Skip to content

mathiasconradt/pdf2audio

Repository files navigation

pdf2audio

pdf2audio

Convert PDF documents to audio using fully local, offline TTS. No cloud, no API keys.

Pipeline: PDF/URL → text extraction (PyMuPDF/pdftotext/trafilatura) → speech synthesis (Kokoro TTS) → opus audio (ffmpeg)

Smart text cleaning:

  • Footnotes stripped by font-size detection
  • References/Bibliography section removed
  • Inline citations [1], (Smith et al., 2023) removed

Install via Homebrew

brew tap mathiasconradt/pdf2audio https://github.com/mathiasconradt/pdf2audio
brew install pdf2audio

Then run from anywhere:

pdf2audio
pdf2audio --file=~/Downloads/paper.pdf --open

On first run, Kokoro downloads model weights (~300MB) from Hugging Face.


Manual Install

Requirements

  • macOS (uses open to play result)
  • uv — Python package manager
  • ffmpeg — audio encoding
  • poppler — robust text extraction for printed web PDFs

Manual Install

1. Install system deps

brew install ffmpeg poppler uv

2. Clone / download this repo

git clone <repo-url>
cd pdf2audio

3. Create venv and install Python deps

uv venv
uv sync

First run downloads Kokoro model weights (~300MB) from Hugging Face automatically.

Usage

Interactive file browser (no argument):

./pdf2audio.sh

Navigate with ↑↓, Enter to open folder or select PDF, Tab to toggle opening audio after conversion, Esc to quit.
Type letters/digits/_/- to filter the file list — Esc clears filter, then quits.
Folders shown as [name] at top, PDF files below. [..] to go up.

Screenshot

Direct path:

./pdf2audio.sh --file=~/Downloads/paper.pdf
./pdf2audio.sh --file="~/Downloads/My Papers/paper.pdf" --open

URL input:

./pdf2audio.sh --file=https://example.com/paper.pdf
./pdf2audio.sh --file=https://example.com/blog/article

PDF URLs are downloaded to a private temp file before extraction. HTML URLs are converted with deterministic readability extraction (trafilatura, with a basic HTML fallback) to avoid navigation, footer, and ad-like page chrome where possible.

Options:

Option Description
--file=PATH Path to input PDF or http(s) URL (omit to use file browser)
--open Open output audio file after conversion (default on)
--help Show usage info

Output .opus saved next to source PDF.

Configuration

Edit the top of pdf2audio.sh to change voice or speed:

VOICE="af_heart"   # TTS voice
SPEED=1.0          # 1.0 = normal, 1.5 = faster

Available voices (Kokoro 82M model):

Code Description
af_heart American English, female (default)
am_adam American English, male
bf_emma British English, female
bm_george British English, male

Notes

  • Processing time scales with PDF length — expect ~1 min per 10 pages on CPU (no GPU needed)
  • Image-only / scanned PDFs will fail — text layer required
  • JS-rendered, paywalled, or heavily interactive pages may not extract cleanly from URL input
  • Output is ~24kbps opus, highly compressed (~1MB per hour of audio)

About

Convert PDF documents to audio using fully local, offline TTS. No cloud, no API keys.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors