Benchmark A100 vs H200 GPUs by hanaol · Pull Request #120 · Quantum-Accelerators/electrai

hanaol · 2026-04-13T13:20:08Z

Summary

This adds a GPU comparison benchmark script that runs the same per-sample training experiment on both an A100 and an H200, recording peak GPU memory, forward/backward times, and OOM status for 10 large-grid Materials Project samples under f32 and bf16-mixed precision.

The 10 task IDs are Materials Project entries with relatively large charge-density grids, spanning 3.4 M – 46.7 M voxels across a variety of shapes and aspect ratios.

Benchmark results

Model: resunet.ResUNet3D, n_channels=32, n_residual_blocks=1, kernel_size=5, depth=2, batch_size=1, single GPU, 3 epochs per experiment.

f32

Task ID	Grid shape	Voxels	A100 status	A100 peak (GB)	A100 epoch (s)	H200 status	H200 peak (GB)	H200 epoch (s)	Speedup (H200/A100)
mp-1890579	56 × 56 × 1080	3.4 M	✅	10.50	1.65	✅	10.80	1.13	1.46×
mp-1849767	60 × 60 × 1120	4.0 M	✅	12.42	1.95	✅	12.80	1.28	1.52×
mp-1851604	60 × 60 × 1120	4.0 M	✅	12.42	1.93	✅	12.80	1.29	1.50×
mp-1862536	80 × 80 × 1024	6.6 M	✅	19.89	3.17	✅	20.56	2.19	1.45×
mp-1847208	1120 × 84 × 84	7.9 M	✅	23.89	3.87	✅	24.73	2.46	1.57×
mp-1936557	80 × 756 × 216	13.1 M	✅	39.09	6.47	✅	40.53	3.94	1.64×
mp-1850168	972 × 240 × 128	29.9 M	❌ OOM	—	—	✅	91.97	8.67	—
mp-1887804	320 × 320 × 320	32.8 M	❌ OOM	—	—	✅	100.55	111.10	—
mp-1889246	540 × 144 × 432	33.6 M	❌ OOM	—	—	✅	103.25	136.31	—
mp-1871122	360 × 360 × 360	46.7 M	❌ OOM	—	—	✅	131.65	303.15	—

bf16-mixed

Task ID	Grid shape	Voxels	A100 status	A100 peak (GB)	A100 epoch (s)	H200 status	H200 peak (GB)	H200 epoch (s)	Speedup (H200/A100)
mp-1890579	56 × 56 × 1080	3.4 M	✅	5.50	1.07	✅	5.50	0.70	1.53×
mp-1849767	60 × 60 × 1120	4.0 M	✅	6.50	1.27	✅	6.50	0.78	1.63×
mp-1851604	60 × 60 × 1120	4.0 M	✅	6.50	1.25	✅	6.50	0.78	1.60×
mp-1862536	80 × 80 × 1024	6.6 M	✅	10.40	1.94	✅	10.40	1.36	1.43×
mp-1847208	1120 × 84 × 84	7.9 M	✅	12.49	2.46	✅	12.49	1.51	1.63×
mp-1936557	80 × 756 × 216	13.1 M	✅	20.44	3.73	✅	20.44	2.64	1.41×
mp-1850168	972 × 240 × 128	29.9 M	✅	46.28	8.77	✅	46.28	5.12	1.71×
mp-1887804	320 × 320 × 320	32.8 M	✅	50.59	169.91	✅	50.59	109.60	1.55×
mp-1889246	540 × 144 × 432	33.6 M	✅	51.95	208.94	✅	51.95	135.30	1.54×
mp-1871122	360 × 360 × 360	46.7 M	❌ OOM	—	—	✅	71.90	188.83	—

Summary

Precision	A100 ✅	A100 ❌ OOM	H200 ✅	H200 ❌ OOM
f32	6 / 10	4 / 10	10 / 10	0 / 10
bf16-mixed	9 / 10	1 / 10	10 / 10	0 / 10

Key findings

H200 handles all 10 task IDs under both precisions — all A100 OOMs are resolved by the larger VRAM (139.8 GB).
H200 is consistently 1.4–1.7× faster per epoch across all grid sizes and precisions.

Files

scripts/benchmark_gpus.py -- Reads two JSON result files (one per GPU) produced by

hanaol · 2026-04-13T13:20:28Z

@claude review this PR

claude · 2026-04-13T13:21:01Z

Claude finished @hanaol's task in 1m 26s —— View job

Code Review: `scripts/benchmark_gpus.py`

Gather context
Read changed files
Review code and provide feedback

Overall this is a clean, well-structured comparison script. A few issues to flag:

Bug: `--out` omitted silently discards output

scripts/benchmark_gpus.py:135-140

output = "\n".join(lines)

if args.out:
    args.out.parent.mkdir(parents=True, exist_ok=True)
    args.out.write_text(output)
# no else: print(output)

The module docstring says the default is stdout, but when --out is not passed the output is computed and thrown away — nothing is printed. Fix this →

Minor: Helper functions redefined on every loop iteration

scripts/benchmark_gpus.py:104-117

for tid in all_task_ids:
    key = (tid, prec)
    r1 = idx1.get(key)
    r2 = idx2.get(key)

    def status(r): ...
    def peak_gb(r): ...
    def epoch_s(r): ...

status, peak_gb, and epoch_s don't close over any loop variable — they're pure helpers that get re-created on every iteration of the outer for prec and inner for tid loops. Move them outside the loops (e.g. as module-level helpers or before the lines = [] block). Fix this →

Minor: Metadata table uses generic GPU labels

scripts/benchmark_gpus.py:78-81

a("| GPU | Model | VRAM |")
a("|-----|-------|:----:|")
a(f"| GPU 1 (reference) | {gpu1_name} | {gpu1_info['total_mem_gb']} GB |")
a(f"| GPU 2 | {gpu2_name} | {gpu2_info['total_mem_gb']} GB |")

The label column uses the hardcoded strings "GPU 1 (reference)" / "GPU 2" while the per-precision tables use gpu1_name / gpu2_name. Consider using the actual GPU names here as well for consistency (e.g. f"| {gpu1_name} (reference) | ... |"). Fix this →

Nit: Ratio uses lowercase `x` instead of `×`

scripts/benchmark_gpus.py:34

return f"{a / b:.2f}x"

The PR description uses the proper multiplication sign ×. Using × here would make the markdown output match. Minor, but worth keeping consistent.

Suggestion: No summary table in output

The PR description includes a useful summary table (OOM counts per GPU/precision). The script doesn't emit one — consider adding it after the per-precision tables so the generated .md file is self-contained.

Summary: One real bug (stdout never written), one style issue (nested helper functions), and two minor consistency points. The core logic — key-based lookup, ratio computation, grid shape display — looks correct.

Hananeh Oliaei added 2 commits April 11, 2026 23:14

gpu comparison module

b6b7ab1

changed module

a9b3754

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark A100 vs H200 GPUs#120

Benchmark A100 vs H200 GPUs#120
hanaol wants to merge 2 commits intohanaol/mixed-precisionfrom
hanaol/benchmark-a100-vs-h200

hanaol commented Apr 13, 2026

Uh oh!

hanaol commented Apr 13, 2026

Uh oh!

claude bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hanaol commented Apr 13, 2026

Summary

Benchmark results

f32

bf16-mixed

Summary

Key findings

Files

Uh oh!

hanaol commented Apr 13, 2026

Uh oh!

claude bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: scripts/benchmark_gpus.py

Bug: --out omitted silently discards output

Minor: Helper functions redefined on every loop iteration

Minor: Metadata table uses generic GPU labels

Nit: Ratio uses lowercase x instead of ×

Suggestion: No summary table in output

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Apr 13, 2026 •

edited

Loading

Code Review: `scripts/benchmark_gpus.py`

Bug: `--out` omitted silently discards output

Nit: Ratio uses lowercase `x` instead of `×`