Skip to content

Use torch.accelerator API in Imagenet example#13

Closed
eromomon wants to merge 6 commits into
dvrogozh:mainfrom
eromomon:eromomon/accel-imagenet
Closed

Use torch.accelerator API in Imagenet example#13
eromomon wants to merge 6 commits into
dvrogozh:mainfrom
eromomon:eromomon/accel-imagenet

Conversation

@eromomon
Copy link
Copy Markdown

Refactor Imagenet example to utilize torch.accelerator API. torch.accelerator API allows to abstract some of the accelerator specifics in the user scripts. By leveraging this API, the code becomes more adaptable to various hardware accelerators.

jafraustro and others added 5 commits February 21, 2025 09:42
* Add support for Intel GPU to MNIST example
* Add support for Intel GPU to MNIST Forward-Forward example
* Add support for Intel GPU to MNIST using RNN example and update README with optional arguments
* Refactor argument parsing in MNIST examples. There is no need to use `default=False` with `store_true`

Signed-off-by: jafraustro <jaime.fraustro.valdez@intel.com>
* Add support for Intel GPU to Basic VAE example and update README with optional arguments
* Remove `default=False` from `store_true` arguments
* Fix typo in Readme
Signed-off-by: jafraustro <jaime.fraustro.valdez@intel.com>
Comment thread imagenet/README.md Outdated
If running on CUDA, you should always use the NCCL backend for multi-processing distributed training since it currently provides the best distributed training performance.

For XPU multiprocessing is not supported as of PyTorch 2.6.
You should always use the NCCL backend for multi-processing distributed training since it currently provides the best distributed training performance.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert (we previously adjusted this for XPU).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved

Comment thread imagenet/main.py Outdated
device = torch.device("mps")
model = model.to(device)

elif args.gpu is not None and device.type=='cuda':
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither of these 2 if paths need to be cuda specific I think. You can make this generic.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved, else block sets the model to the generic device

Comment thread imagenet/main.py Outdated
else:
device = torch.device("cpu")

print (f"Device to use: ", {device.type})
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, preserve print out of the detected device type.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is printed in the main function, line 116 prints the device to use

Comment thread imagenet/main.py Outdated
torch.cuda.set_device(args.gpu)
model.cuda(args.gpu)
torch.accelerator.set_device_index(args.gpu)
model.to(device)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not equivalent to the prev. code. Should be model.to(args.gpu) if this works or need to query current device from torch.accelerator if it does not.

To be honest, I suggest to revert this place here and use cuda specific calls. That's eligible considering that this all is protected by if device.type == 'cuda' on line 174. And if you want to convert to new API, then we probably need to introduce XCCL support. We can do that, but better to defer to other PR I think,

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted to code with cuda

Comment thread imagenet/main.py Outdated
if args.gpu is None:
checkpoint = torch.load(args.resume)
elif torch.cuda.is_available():
elif device.type=='cuda':
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if device type is not cuda, don't load at all :). This does not make sense. I think you can generalize this:

elif:
    log = f{device.type}:{args.gpu}''

I believe this should work of XPU and other devices as well.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solved

@eromomon eromomon requested a review from dvrogozh May 13, 2025 23:48
@eromomon
Copy link
Copy Markdown
Author

@dvrogozh , Could you please help review the latest changes?

Copy link
Copy Markdown
Owner

@dvrogozh dvrogozh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you can open PR directly for upstream examples. This looks good enough.

Comment thread imagenet/requirements.txt
@@ -1,2 +1,2 @@
torch
torchvision==0.20.0
torchvision
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch>=2.6 on the line above.

@eromomon eromomon force-pushed the eromomon/accel-imagenet branch from 9d86fba to 8bab510 Compare May 19, 2025 23:26
Signed-off-by: eromomon <edgar.romo.montiel@intel.com>
@eromomon
Copy link
Copy Markdown
Author

eromomon commented Jul 3, 2025

Solved in PR pytorch#1349

@eromomon eromomon closed this Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants