Feature: batchnorm gpu test by harz05 · Pull Request #14 · ML4EP/SOFIE

harz05 · 2026-03-20T23:23:12Z

Closes #4

This implements the 3 GPU/Alpaka methods for ROperator_BatchNormalization and adds a test in the Alpaka CUDA test suite.

`ROperator_BatchNormalization.hxx`

Generate_GPU_Kernel_ALPAKA : elementwise kernel struct. Initialize() already pre-expands per-channel scale/bias/mean tensors to the full [N,C,H,W] shape and folds 1/sqrt(var + eps) into scale, so the per-element formula reduces to y[i] = (x[i] - mean[i]) * scale[i] + bias[i]. Kernel pattern is same like other operators
Generate_GPU_Kernel_Definitions_ALPAKA : declares the kernel instance used at launch.
Generate_GPU_ALPAKA : sets up work division via alpaka::getValidWorkDiv and dispatches the kernel with alpaka::exec. Uses the deviceBuf_<name> buffers for all weight tensors consistent with how RModel_ALPAKA.cxx allocates them.

Files for testing:

BatchNormModelGenerator.py : creates BatchNorm.onnx from nn.BatchNorm2d(2) in eval mode with fixed weights. Input shape (1, 2, 2, 2), scale [1, 2], bias [0, 0.5], running mean [0.5, 3], running var [1, 4]. Also prints the reference floats used in the .ref.hxx.
input_models/BatchNorm.onnx : the exported ONNX model (opset 13).
input_models/references/BatchNorm.ref.hxx : 8 reference output values computed by PyTorch.
TestCustomModelsFromONNXForAlpakaCuda.cxx : added SofieAlpakaTest.BatchNorm which loads the model on GPU, runs inference, and checks each output element against the reference within tolerance.

Testing:

Tested on Google Colab (NVIDIA T4, CUDA 12.x). New test passes and all 26 tests in the suite pass with no regressions.

sanjibansg

Hi @harz05,
Thanks for the PR. A question regarding the onnx file for testing.

sanjibansg · 2026-03-21T13:57:10Z

src/SOFIE_core/test/BatchNormModelGenerator.py

+"""Generate BatchNorm.onnx and print reference output values for BatchNorm.ref.hxx.
+
+Model: nn.BatchNorm2d(2) in eval mode, input shape (1, 2, 2, 2).
+"""
+


How will this file be executed? Also, since you provide the .onnx file anyway, how will this script be useful?

Thank you for reviewing,
this file is standalone and I included it for reproducibility so that the model and ref values can be regenerated and verified independently. python3 BatchNormModelGenerator.py executes the file and we get the .onnx along with the ref values.

harz05 added 2 commits March 21, 2026 03:25

feat: BatchNormalization GPU alpaka kernel and test

a03ddd3

fix: pass .dat filename to BatchNorm session constructor

f9c9141

sanjibansg reviewed Mar 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: batchnorm gpu test#14

Feature: batchnorm gpu test#14
harz05 wants to merge 2 commits intoML4EP:gpu/alpakafrom
harz05:feat/batchnorm-gpu-test

harz05 commented Mar 20, 2026

Uh oh!

sanjibansg left a comment

Uh oh!

sanjibansg Mar 21, 2026

Uh oh!

harz05 Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

harz05 commented Mar 20, 2026

ROperator_BatchNormalization.hxx

Files for testing:

Testing:

Uh oh!

sanjibansg left a comment

Choose a reason for hiding this comment

Uh oh!

sanjibansg Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

harz05 Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`ROperator_BatchNormalization.hxx`