Skip to content

laurenceputra/pdf-downsampler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pdf-downsample

A simple tool to shrink PDFs by downsampling images, using Docker by default.

Quick start (recommended)

  1. Install (no git clone needed):
curl -fsSL https://github.com/laurenceputra/pdf-downsampler/raw/main/scripts/install.sh | bash

You can override the source with --source-url or --source-path if needed.

  1. Open a new terminal or run:
source "$HOME/.bash_aliases"
  1. Run it:
pdf-downsample --input input.pdf --output output.pdf --medium screen

Short flags and defaults (medium defaults to screen):

pdf-downsample -i input.pdf -o output.pdf

If you don’t have Docker

This tool uses Docker by default to avoid system-level installs. If you don’t have Docker, install it first (then rerun the commands above).

Native Linux dependencies (no Docker)

If you prefer to run from a local repo checkout, install Ghostscript and qpdf:

  • Debian/Ubuntu: sudo apt-get install ghostscript qpdf
  • Fedora: sudo dnf install ghostscript qpdf
  • Arch: sudo pacman -S ghostscript qpdf

Batch processing a folder

pdf-downsample-dir --input-dir ./in --output-dir ./out --medium screen --parallel 4

Recursive + filters:

pdf-downsample-dir --input-dir ./in --output-dir ./out --recursive --include '*.pdf' --exclude '*draft*'

When using --recursive, the output directory mirrors the input subdirectory structure to avoid filename collisions. Include/exclude patterns match either the basename (always) or the input-relative path (when --recursive), so patterns like invoice*.pdf still match nested files while path-aware patterns like 2024/*.pdf are also supported.

Options (most common)

Medium presets (DPI, low to high):

  • web: 96
  • screen: 120
  • ebook: 160
  • print: 300
  • press: 600
  • --dpi <number>: override the preset
  • --jpeg-quality <1-100>: JPEG quality for re-encoding (default 85)
  • --threshold <float>: downsample threshold (1.0 is more aggressive)
  • --skip-existing: skip if output exists (takes precedence over --no-overwrite)
  • --overwrite/--no-overwrite: allow or prevent overwriting (default: overwrite)
  • --suffix <text>: apply when output exists (e.g. -downsampled)
  • --progress: show progress steps without full verbose output
  • --report: emit a summary report (sizes, pages, elapsed time)
  • --keep-temp: keep intermediate files in a tmp/run-<timestamp> folder for debugging
  • --verify-pages/--no-verify-pages: verify output page count matches input (default: verify)
  • --verbose: show Ghostscript commands and retries
  • --gs-timeout <seconds>: Ghostscript timeout (default 600, 0 disables)
  • --qpdf-timeout <seconds>: qpdf timeout (default 600, 0 disables)

Power users (run from repo)

If you prefer not to install:

bin/pdf-downsample --input input.pdf --output output.pdf --medium screen
scripts/run_docker_dir.sh --input-dir ./in --output-dir ./out --medium screen

Operational notes

  • Performance: large, image-heavy PDFs can take minutes to process depending on CPU and disk speed.
  • Resources: ensure Docker has enough memory (at least 2GB recommended) for large PDFs.
  • Safety: malformed or encrypted PDFs will fail with a non-zero exit code and a clear error message (encrypted PDFs are not supported).
  • Timeouts: Ghostscript and qpdf default timeouts are 600s; increase them for very large inputs.
  • Skip-existing behavior: the skip check is based on a preflight existence check; if another process deletes the output after the check, the run may still skip.
  • Temporary files: without --keep-temp, intermediate files are created in a temporary directory and cleaned up automatically.

Troubleshooting

  • Command not found: run source ~/.bash_aliases or open a new terminal.
  • Docker not installed: install Docker, then try again.
  • Input file not found: use absolute paths or run from the directory with the PDF.
  • Output looks unchanged: try --threshold 1.0 or lower --dpi.
  • Output size warning: if output is >80% of input, the tool warns but still succeeds; try lowering --dpi or --threshold.
  • Short flags: -i/-o/-m are supported for input/output/medium.

About

A simple tool to shrink PDFs by downsampling images, using Docker by default.

Topics

Resources

License

Stars

Watchers

Forks

Contributors