NVIDIA GPU onboarding and validation tool for second-hand cards (900–4000 series, 2–24 GB VRAM).
Runs a structured test suite against every detected GPU and logs results to a cumulative CSV you can open in Excel, LibreOffice Calc, or pandas.
| Test | What it checks |
|---|---|
| Static info | Name, VRAM, VBIOS, compute capability, PCIe slot width vs card max |
| Baseline | Idle temperature, fan speed, power draw, ECC error counters |
| VRAM fill | Fills available VRAM with a known pattern and verifies read-back (catches bad memory cells) |
| Stress test | Matrix-multiply loop for N seconds; monitors for thermal throttling |
| Benchmarks | VRAM bandwidth (GB/s), FP32/FP16 TFLOPS, PCIe H↔D bandwidth — compared against reference specs |
- Ubuntu 22.04 LTS (or similar)
- NVIDIA drivers installed (
nvidia-smiin PATH) - Python 3.9+
uvfor environment setup
# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create venv and install dependencies (torch ~2 GB first download)
./setup.shsetup.sh auto-detects your CUDA version and picks the matching PyTorch wheel index.
./run.sh # test all GPUs, 60 s stress
./run.sh --stress 120 # longer stress test
./run.sh --no-stress # skip stress (quick ID + VRAM test only)
./run.sh --no-memtest # skip VRAM fill test
./run.sh --no-bench # skip bandwidth/TFLOPS benchmarks
./run.sh --gpu 0 # test one GPU by index
./run.sh --report # also save gpu_report_<ts>.json/.txtOr directly:
python gpu_onboard.py --helpResults are appended to gpu_onboard_log.csv after every run. Use gpu_log.py to query it:
python gpu_log.py runs # list all test runs
python gpu_log.py runs --failed # only failed runs
python gpu_log.py runs --arch turing --vram >=8192
python gpu_log.py cards # one row per unique GPU (latest result)
python gpu_log.py cards --passed
python gpu_log.py report 20260416_005218 # reproduce full report from CSV
python gpu_log.py history GPU-34c746a6 # all runs for one card
python gpu_log.py delete run 20260416_005218 # remove a run
python gpu_log.py delete gpu GPU-34c746a6 # remove all rows for a card
python gpu_log.py clean # keep only latest result per GPUgpu_onboard.py # main test script
gpu_log.py # log query/management tool
setup.sh # first-time venv + dependency setup
run.sh # convenience wrapper (activates venv, runs gpu_onboard.py)
pyproject.toml # project metadata and dependencies
MIT — see LICENSE.