Feat: make TF32 and cuDNN benchmarking opt-in by carshadi · Pull Request #18 · AllenNeuralDynamics/aind-torch-utils

carshadi · 2026-05-28T22:13:28Z

This PR updates CUDA backend configuration so performance-oriented PyTorch backend flags are controlled explicitly by InferenceConfig and default to False.

Previously, GpuWorker always enabled torch.backends.cudnn.benchmark, and use_tf32=True also forced both matmul TF32 and cuDNN TF32 on. This made backend behavior less explicit and could override PyTorch/session defaults. The new behavior sets only the intended backend options directly from config:

torch.backends.cuda.matmul.allow_tf32 = self.cfg.use_tf32
torch.backends.cudnn.benchmark = self.cfg.cudnn_benchmark

Changes

Changed InferenceConfig.use_tf32 default from True to False
Added InferenceConfig.cudnn_benchmark, defaulting to False
Stopped modifying torch.backends.cudnn.allow_tf32, which defaults to True in PyTorch. This can be set to False by the user in their inference script if desired.
Replaced CLI opt-out flag --no-tf32 with opt-in --tf32
Added opt-in CLI flag --cudnn-benchmark
Updated CLI docs to reflect the new flags
Added config tests covering the new defaults and override behavior

Rationale

TF32 matmul and cuDNN benchmarking can improve throughput in certain cases, but it is hardware and data-size dependent. TF32 is only supported for Ampere and newer GPUs, whereas the default CodeOcean flex instance uses an older T4 (Turing). cuDNN benchmarking can also introduce a performance hit for smaller volumes where the auto-tuner overhead eclipses any performance gain.

- TF32 is only useful on newer hardware - cudnn.benchmark=True can cause a performance hit for smaller volumes while the auto-tuner runs

camilolaiton

Looks good to me! Super useful! Probably not here in this repo, but we should discuss about quantization of our models (yours, Anna's, mine, etc) and if it requires any changes here (which I doubt).

carshadi · 2026-06-10T18:02:11Z

Looks good to me! Super useful! Probably not here in this repo, but we should discuss about quantization of our models (yours, Anna's, mine, etc) and if it requires any changes here (which I doubt).

Yeah I totally agree re: quantization

Feat: change TF32 and cudnn benchmark to False by default

d609f15

- TF32 is only useful on newer hardware - cudnn.benchmark=True can cause a performance hit for smaller volumes while the auto-tuner runs

carshadi marked this pull request as ready for review June 10, 2026 12:02

carshadi requested a review from camilolaiton June 10, 2026 12:02

camilolaiton approved these changes Jun 10, 2026

View reviewed changes

carshadi merged commit 569edeb into main Jun 10, 2026
3 checks passed

carshadi deleted the feat-change-cuda-defaults branch June 10, 2026 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: make TF32 and cuDNN benchmarking opt-in#18

Feat: make TF32 and cuDNN benchmarking opt-in#18
carshadi merged 1 commit into
mainfrom
feat-change-cuda-defaults

carshadi commented May 28, 2026 •

edited

Loading

Uh oh!

camilolaiton left a comment

Uh oh!

carshadi commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

carshadi commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

camilolaiton left a comment

Choose a reason for hiding this comment

Uh oh!

carshadi commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

carshadi commented May 28, 2026 •

edited

Loading