Reproducible benchmark for diffusion posterior sampling (DPS) algorithms on canonical inverse problems (denoising, deconvolution, imputation, and reconstruction from partial Fourier measurements), with model-based baselines and end-to-end scripts for training denoisers, parameter search, posterior sampling, and evaluation.
Why a container? This project depends on custom CUDA/C++ sampling operators from
logsumexpv2and Triton-compiled Python kernels. Compiling these reliably across different CUDA/PyTorch/toolchain versions is fragile. The published Docker image pins the exact PyTorch/CUDA stack and ships with the compiled extensions, so you can run the benchmark without wrestling with build environments.
- Requirements
- Linux x86_64 with an NVIDIA driver (CUDA 12.x runtime compatible) and the NVIDIA Container Toolkit (
nvidia-docker2). - Disk space: a full run can consume ~171 GB.
- Pull the image
# Public image on GitHub Container Registry
docker pull ghcr.io/zacmar/logsumexpv2:v2.1- GPU smoke test
docker run --rm --gpus all ghcr.io/zacmar/logsumexpv2:v2.1 python - <<'PY'
import torch as th
print("CUDA available:", th.cuda.is_available())
if th.cuda.is_available():
print("Device:", th.cuda.get_device_name(0))
PY- Run the pipeline (minimal example)
# Create a host directory for large artifacts
mkdir -p "$HOME/dps-benchmark-data"
# Run inside the container with your repo mounted
# (uses /data for outputs via EXPERIMENTS_ROOT)
docker run --rm -it \
--gpus all \
--shm-size=8g \
-u $(id -u):$(id -g) \
-v "$PWD":/workspace \
-v "$HOME/dps-benchmark-data":/data \
-w /workspace \
-e EXPERIMENTS_ROOT=/data \
ghcr.io/zacmar/logsumexpv2:v2.1 \
python generate-datasets.py identity student 1Then continue with the stages below using the same docker run pattern and replacing the last line (the Python command).
- GPU: Some sampling routines are CUDA-accelerated and require an NVIDIA GPU.
- Container: We publish a Docker image specifically to avoid the pain of compiling the
CUDA/C++ sampling operators from
logsumexpv2and setting up Triton across diverse environments. If you prefer a local install, you’ll need a matching PyTorch/CUDA toolchain and a working CUDA build environment. - Storage: Set
EXPERIMENTS_ROOTto a directory with sufficient space. A full run requires about 171 GB.
export EXPERIMENTS_ROOT=/path/to/fast/storageThe pipeline assumes that the environment variable EXPERIMENTS_ROOT points to a path that points to a storage device that has sufficient space.
A full run of the pipeline requires about 171 gigabytes of storage.
The pipeline is compartmentalized into stages; each stage typically has an associated Python file with an argument parser. Where applicable, you can specify:
- Forward operator:
identity,convolution,sample,fourier - Jump distribution (and parameters):
gauss,laplace,student,bernoulli-laplace
–Bernoulli-Laplacehas 2 parameters; the others have 1. - DPS algorithm and denoiser: DPS algorithms
{cdps, diffpir, dpnp}with denoiser{learned, gibbs}.
This enables straightforward parallelization across compute nodes.
Synthesize training/validation/test signals with the specified jump distribution. For test signals, simulate the measurement process and draw gold‑standard posterior samples via Gibbs methods.
python generate-datasets.py identity student 1The training signals are used to train standard noise-conditional score networks. An example launch looks like
python train.py bernoulli-laplace 0.1 1
The output of this stage are learned denoisers for the specified jump distributions.
Use the validation set to select parameters for model‑based methods and DPS algorithms. Parameter grids (defined in the scripts) are tuned for the standard distributions in generate-datasets.py and may need adjustment for exotic cases. We also compute model‑based estimates on the test data at the chosen parameters.
python grid-search.py convolution student 1Outputs: validation MSEs over the grids (for model‑based and DPS algorithms) and the corresponding test‑set estimates for model‑based methods.
Run the DPS algorithms on the test data using the optimal parameters inferred from validation. Launches are parameterized by forward operator, jump distribution, DPS algorithm, and denoiser.
python posterior-sampling.py identity laplace 1 diffpir gibbsOutputs: posterior samples produced by the DPS algorithms.
With gold‑standard posterior samples (Gibbs), DPS samples, and model‑based MMSE estimates on disk, evaluate and produce the figures/tables.
Main tables:
python -m postprocessing.mmse-gap-latex-tableData for the main figure(s):
python -m postprocessing.posterior-figure-dataIf you use this benchmark in your research, please cite:
@misc{zach2025statisticalbenchmarkdiffusionposterior,
title={A Statistical Benchmark for Diffusion Posterior Sampling Algorithms},
author={Zach, Martin and Haouchat, Youssef and Unser, Michael},
year={2025},
eprint={2509.12821},
archivePrefix={arXiv},
primaryClass={eess.SP},
url={https://arxiv.org/abs/2509.12821},
}