For both CPU and CUDA https://github.com/microsoft/onnxruntime-training-examples/pull/190#issuecomment-2197924047 https://github.com/microsoft/onnxruntime-training-examples/issues/189
For both CPU and CUDA
#190 (comment)
#189