Write once, run anywhere.
ezpz makes distributed PyTorch launches portable across any supported
hardware {NVIDIA, AMD, Intel, MPS, CPU} with zero code changes.
- Multi-hardware β automatic device detection and backend selection (CUDA/NCCL, XPU/CCL, MPS, CPU/Gloo)
- Zero code changes β same script runs on a laptop, a single GPU, or a thousand-node supercomputer
- HPC integration β native PBS and SLURM support with automatic hostfile discovery and rank assignment
- Metric tracking β built-in
Historyclass for recording, plotting, and saving training metrics - CLI tools β
ezpz launch,ezpz test,ezpz doctorfor launching jobs, smoke-testing, and diagnostics
uv pip install git+https://github.com/saforem2/ezpzimport torch
import ezpz
rank = ezpz.setup_torch() # auto-detects device + backend
device = ezpz.get_torch_device()
model = torch.nn.Linear(128, 10).to(device)
model = ezpz.wrap_model(model) # DDP by default# Same command everywhere -- Mac laptop, NVIDIA cluster, Intel Aurora:
ezpz launch python3 train.pyezpz launch python3 train.py # launch distributed training
ezpz test # smoke-test your setup
ezpz doctor # diagnose environment issuesFull documentation is available at ezpz.cool.