You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This adds a GPU comparison benchmark script that runs the same per-sample training experiment on both an A100 and an H200, recording peak GPU memory, forward/backward times, and OOM status for 10 large-grid Materials Project samples under f32 and bf16-mixed precision.
The 10 task IDs are Materials Project entries with relatively large charge-density grids, spanning 3.4 M – 46.7 M voxels across a variety of shapes and aspect ratios.
Benchmark results
Model: resunet.ResUNet3D, n_channels=32, n_residual_blocks=1, kernel_size=5, depth=2, batch_size=1, single GPU, 3 epochs per experiment.
f32
Task ID
Grid shape
Voxels
A100 status
A100 peak (GB)
A100 epoch (s)
H200 status
H200 peak (GB)
H200 epoch (s)
Speedup (H200/A100)
mp-1890579
56 × 56 × 1080
3.4 M
✅
10.50
1.65
✅
10.80
1.13
1.46×
mp-1849767
60 × 60 × 1120
4.0 M
✅
12.42
1.95
✅
12.80
1.28
1.52×
mp-1851604
60 × 60 × 1120
4.0 M
✅
12.42
1.93
✅
12.80
1.29
1.50×
mp-1862536
80 × 80 × 1024
6.6 M
✅
19.89
3.17
✅
20.56
2.19
1.45×
mp-1847208
1120 × 84 × 84
7.9 M
✅
23.89
3.87
✅
24.73
2.46
1.57×
mp-1936557
80 × 756 × 216
13.1 M
✅
39.09
6.47
✅
40.53
3.94
1.64×
mp-1850168
972 × 240 × 128
29.9 M
❌ OOM
—
—
✅
91.97
8.67
—
mp-1887804
320 × 320 × 320
32.8 M
❌ OOM
—
—
✅
100.55
111.10
—
mp-1889246
540 × 144 × 432
33.6 M
❌ OOM
—
—
✅
103.25
136.31
—
mp-1871122
360 × 360 × 360
46.7 M
❌ OOM
—
—
✅
131.65
303.15
—
bf16-mixed
Task ID
Grid shape
Voxels
A100 status
A100 peak (GB)
A100 epoch (s)
H200 status
H200 peak (GB)
H200 epoch (s)
Speedup (H200/A100)
mp-1890579
56 × 56 × 1080
3.4 M
✅
5.50
1.07
✅
5.50
0.70
1.53×
mp-1849767
60 × 60 × 1120
4.0 M
✅
6.50
1.27
✅
6.50
0.78
1.63×
mp-1851604
60 × 60 × 1120
4.0 M
✅
6.50
1.25
✅
6.50
0.78
1.60×
mp-1862536
80 × 80 × 1024
6.6 M
✅
10.40
1.94
✅
10.40
1.36
1.43×
mp-1847208
1120 × 84 × 84
7.9 M
✅
12.49
2.46
✅
12.49
1.51
1.63×
mp-1936557
80 × 756 × 216
13.1 M
✅
20.44
3.73
✅
20.44
2.64
1.41×
mp-1850168
972 × 240 × 128
29.9 M
✅
46.28
8.77
✅
46.28
5.12
1.71×
mp-1887804
320 × 320 × 320
32.8 M
✅
50.59
169.91
✅
50.59
109.60
1.55×
mp-1889246
540 × 144 × 432
33.6 M
✅
51.95
208.94
✅
51.95
135.30
1.54×
mp-1871122
360 × 360 × 360
46.7 M
❌ OOM
—
—
✅
71.90
188.83
—
Summary
Precision
A100 ✅
A100 ❌ OOM
H200 ✅
H200 ❌ OOM
f32
6 / 10
4 / 10
10 / 10
0 / 10
bf16-mixed
9 / 10
1 / 10
10 / 10
0 / 10
Key findings
H200 handles all 10 task IDs under both precisions — all A100 OOMs are resolved by the larger VRAM (139.8 GB).
H200 is consistently 1.4–1.7× faster per epoch across all grid sizes and precisions.
Files
scripts/benchmark_gpus.py -- Reads two JSON result files (one per GPU) produced by
status, peak_gb, and epoch_s don't close over any loop variable — they're pure helpers that get re-created on every iteration of the outer for prec and inner for tid loops. Move them outside the loops (e.g. as module-level helpers or before the lines = [] block). Fix this →
The label column uses the hardcoded strings "GPU 1 (reference)" / "GPU 2" while the per-precision tables use gpu1_name / gpu2_name. Consider using the actual GPU names here as well for consistency (e.g. f"| {gpu1_name} (reference) | ... |"). Fix this →
Nit: Ratio uses lowercase x instead of ×
scripts/benchmark_gpus.py:34
returnf"{a/b:.2f}x"
The PR description uses the proper multiplication sign ×. Using × here would make the markdown output match. Minor, but worth keeping consistent.
Suggestion: No summary table in output
The PR description includes a useful summary table (OOM counts per GPU/precision). The script doesn't emit one — consider adding it after the per-precision tables so the generated .md file is self-contained.
Summary: One real bug (stdout never written), one style issue (nested helper functions), and two minor consistency points. The core logic — key-based lookup, ratio computation, grid shape display — looks correct.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This adds a GPU comparison benchmark script that runs the same per-sample training experiment on both an A100 and an H200, recording peak GPU memory, forward/backward times, and OOM status for 10 large-grid Materials Project samples under f32 and bf16-mixed precision.
The 10 task IDs are Materials Project entries with relatively large charge-density grids, spanning 3.4 M – 46.7 M voxels across a variety of shapes and aspect ratios.
Benchmark results
Model:
resunet.ResUNet3D,n_channels=32,n_residual_blocks=1,kernel_size=5,depth=2,batch_size=1, single GPU, 3 epochs per experiment.f32
bf16-mixed
Summary
Key findings
Files
scripts/benchmark_gpus.py-- Reads two JSON result files (one per GPU) produced by