Skip to content

Feat: make TF32 and cuDNN benchmarking opt-in#18

Merged
carshadi merged 1 commit into
mainfrom
feat-change-cuda-defaults
Jun 10, 2026
Merged

Feat: make TF32 and cuDNN benchmarking opt-in#18
carshadi merged 1 commit into
mainfrom
feat-change-cuda-defaults

Conversation

@carshadi

@carshadi carshadi commented May 28, 2026

Copy link
Copy Markdown
Member

This PR updates CUDA backend configuration so performance-oriented PyTorch backend flags are controlled explicitly by InferenceConfig and default to False.

Previously, GpuWorker always enabled torch.backends.cudnn.benchmark, and use_tf32=True also forced both matmul TF32 and cuDNN TF32 on. This made backend behavior less explicit and could override PyTorch/session defaults. The new behavior sets only the intended backend options directly from config:

torch.backends.cuda.matmul.allow_tf32 = self.cfg.use_tf32
torch.backends.cudnn.benchmark = self.cfg.cudnn_benchmark

Changes

  • Changed InferenceConfig.use_tf32 default from True to False
  • Added InferenceConfig.cudnn_benchmark, defaulting to False
  • Stopped modifying torch.backends.cudnn.allow_tf32, which defaults to True in PyTorch. This can be set to False by the user in their inference script if desired.
  • Replaced CLI opt-out flag --no-tf32 with opt-in --tf32
  • Added opt-in CLI flag --cudnn-benchmark
  • Updated CLI docs to reflect the new flags
  • Added config tests covering the new defaults and override behavior

Rationale

TF32 matmul and cuDNN benchmarking can improve throughput in certain cases, but it is hardware and data-size dependent. TF32 is only supported for Ampere and newer GPUs, whereas the default CodeOcean flex instance uses an older T4 (Turing). cuDNN benchmarking can also introduce a performance hit for smaller volumes where the auto-tuner overhead eclipses any performance gain.

- TF32 is only useful on newer hardware
- cudnn.benchmark=True can cause a performance hit for smaller volumes while the auto-tuner runs
@carshadi carshadi marked this pull request as ready for review June 10, 2026 12:02
@carshadi carshadi requested a review from camilolaiton June 10, 2026 12:02

@camilolaiton camilolaiton left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Super useful! Probably not here in this repo, but we should discuss about quantization of our models (yours, Anna's, mine, etc) and if it requires any changes here (which I doubt).

@carshadi

Copy link
Copy Markdown
Member Author

Looks good to me! Super useful! Probably not here in this repo, but we should discuss about quantization of our models (yours, Anna's, mine, etc) and if it requires any changes here (which I doubt).

Yeah I totally agree re: quantization

@carshadi carshadi merged commit 569edeb into main Jun 10, 2026
3 checks passed
@carshadi carshadi deleted the feat-change-cuda-defaults branch June 10, 2026 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants