Use torch.accelerator API in Imagenet example by eromomon · Pull Request #13 · dvrogozh/examples

eromomon · 2025-05-12T23:27:08Z

Refactor Imagenet example to utilize torch.accelerator API. torch.accelerator API allows to abstract some of the accelerator specifics in the user scripts. By leveraging this API, the code becomes more adaptable to various hardware accelerators.

* Add support for Intel GPU to MNIST example * Add support for Intel GPU to MNIST Forward-Forward example * Add support for Intel GPU to MNIST using RNN example and update README with optional arguments * Refactor argument parsing in MNIST examples. There is no need to use `default=False` with `store_true` Signed-off-by: jafraustro <jaime.fraustro.valdez@intel.com>

* Add support for Intel GPU to Basic VAE example and update README with optional arguments * Remove `default=False` from `store_true` arguments * Fix typo in Readme

Signed-off-by: jafraustro <jaime.fraustro.valdez@intel.com>

dvrogozh · 2025-05-12T23:44:46Z

-If running on CUDA, you should always use the NCCL backend for multi-processing distributed training since it currently provides the best distributed training performance.
-
-For XPU multiprocessing is not supported as of PyTorch 2.6.
+You should always use the NCCL backend for multi-processing distributed training since it currently provides the best distributed training performance.


Revert (we previously adjusted this for XPU).

dvrogozh · 2025-05-12T23:48:36Z

-        device = torch.device("mps")
-        model = model.to(device)
+
+    elif args.gpu is not None and device.type=='cuda':


Neither of these 2 if paths need to be cuda specific I think. You can make this generic.

Solved, else block sets the model to the generic device

dvrogozh · 2025-05-12T23:49:06Z

-    else:
-        device = torch.device("cpu")
-
-    print (f"Device to use: ", {device.type})


Please, preserve print out of the detected device type.

This is printed in the main function, line 116 prints the device to use

dvrogozh · 2025-05-12T23:52:13Z

-                torch.cuda.set_device(args.gpu)
-                model.cuda(args.gpu)
+                torch.accelerator.set_device_index(args.gpu)
+                model.to(device)


That's not equivalent to the prev. code. Should be model.to(args.gpu) if this works or need to query current device from torch.accelerator if it does not.

To be honest, I suggest to revert this place here and use cuda specific calls. That's eligible considering that this all is protected by if device.type == 'cuda' on line 174. And if you want to convert to new API, then we probably need to introduce XCCL support. We can do that, but better to defer to other PR I think,

Reverted to code with cuda

dvrogozh · 2025-05-12T23:57:58Z

            if args.gpu is None:
                checkpoint = torch.load(args.resume)
-            elif torch.cuda.is_available():
+            elif device.type=='cuda':


And if device type is not cuda, don't load at all :). This does not make sense. I think you can generalize this:

elif: log = f{device.type}:{args.gpu}''

I believe this should work of XPU and other devices as well.

eromomon · 2025-05-19T22:32:00Z

@dvrogozh , Could you please help review the latest changes?

dvrogozh

I suggest you can open PR directly for upstream examples. This looks good enough.

dvrogozh · 2025-05-19T22:44:32Z

@@ -1,2 +1,2 @@
 torch
-torchvision==0.20.0
+torchvision


torch>=2.6 on the line above.

Signed-off-by: eromomon <edgar.romo.montiel@intel.com>

eromomon · 2025-07-03T22:43:06Z

Solved in PR pytorch#1349

jafraustro and others added 5 commits February 21, 2025 09:42

Add support for Intel GPU to Basic VAE example

4a2e3e3

* Add support for Intel GPU to Basic VAE example and update README with optional arguments * Remove `default=False` from `store_true` arguments * Fix typo in Readme

Add support for Intel GPU to Siamese Network example

8212991

Add support for Intel GPU to Fast Neural Style example

dcaff04

Add support for Intel GPU to GAT example

78c48ab

Signed-off-by: jafraustro <jaime.fraustro.valdez@intel.com>

dvrogozh requested changes May 12, 2025

View reviewed changes

eromomon requested a review from dvrogozh May 13, 2025 23:48

dvrogozh reviewed May 19, 2025

View reviewed changes

eromomon force-pushed the eromomon/accel-imagenet branch from 9d86fba to 8bab510 Compare May 19, 2025 23:26

Add Accelerator Api to Imagenet Example

27a4fd9

Signed-off-by: eromomon <edgar.romo.montiel@intel.com>

eromomon force-pushed the eromomon/accel-imagenet branch from 8bab510 to 27a4fd9 Compare May 19, 2025 23:28

dvrogozh force-pushed the main branch from 35c0da0 to 6f61614 Compare June 24, 2025 18:54

eromomon closed this Jul 3, 2025

Conversation

eromomon commented May 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eromomon commented May 19, 2025

Uh oh!

dvrogozh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eromomon commented Jul 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants