Use torch.accelerator API in Imagenet example#13
Conversation
* Add support for Intel GPU to MNIST example * Add support for Intel GPU to MNIST Forward-Forward example * Add support for Intel GPU to MNIST using RNN example and update README with optional arguments * Refactor argument parsing in MNIST examples. There is no need to use `default=False` with `store_true` Signed-off-by: jafraustro <jaime.fraustro.valdez@intel.com>
* Add support for Intel GPU to Basic VAE example and update README with optional arguments * Remove `default=False` from `store_true` arguments * Fix typo in Readme
Signed-off-by: jafraustro <jaime.fraustro.valdez@intel.com>
| If running on CUDA, you should always use the NCCL backend for multi-processing distributed training since it currently provides the best distributed training performance. | ||
|
|
||
| For XPU multiprocessing is not supported as of PyTorch 2.6. | ||
| You should always use the NCCL backend for multi-processing distributed training since it currently provides the best distributed training performance. |
There was a problem hiding this comment.
Revert (we previously adjusted this for XPU).
| device = torch.device("mps") | ||
| model = model.to(device) | ||
|
|
||
| elif args.gpu is not None and device.type=='cuda': |
There was a problem hiding this comment.
Neither of these 2 if paths need to be cuda specific I think. You can make this generic.
There was a problem hiding this comment.
Solved, else block sets the model to the generic device
| else: | ||
| device = torch.device("cpu") | ||
|
|
||
| print (f"Device to use: ", {device.type}) |
There was a problem hiding this comment.
Please, preserve print out of the detected device type.
There was a problem hiding this comment.
This is printed in the main function, line 116 prints the device to use
| torch.cuda.set_device(args.gpu) | ||
| model.cuda(args.gpu) | ||
| torch.accelerator.set_device_index(args.gpu) | ||
| model.to(device) |
There was a problem hiding this comment.
That's not equivalent to the prev. code. Should be model.to(args.gpu) if this works or need to query current device from torch.accelerator if it does not.
To be honest, I suggest to revert this place here and use cuda specific calls. That's eligible considering that this all is protected by if device.type == 'cuda' on line 174. And if you want to convert to new API, then we probably need to introduce XCCL support. We can do that, but better to defer to other PR I think,
| if args.gpu is None: | ||
| checkpoint = torch.load(args.resume) | ||
| elif torch.cuda.is_available(): | ||
| elif device.type=='cuda': |
There was a problem hiding this comment.
And if device type is not cuda, don't load at all :). This does not make sense. I think you can generalize this:
elif:
log = f{device.type}:{args.gpu}''
I believe this should work of XPU and other devices as well.
|
@dvrogozh , Could you please help review the latest changes? |
dvrogozh
left a comment
There was a problem hiding this comment.
I suggest you can open PR directly for upstream examples. This looks good enough.
| @@ -1,2 +1,2 @@ | |||
| torch | |||
| torchvision==0.20.0 | |||
| torchvision | |||
9d86fba to
8bab510
Compare
Signed-off-by: eromomon <edgar.romo.montiel@intel.com>
8bab510 to
27a4fd9
Compare
|
Solved in PR pytorch#1349 |
Refactor Imagenet example to utilize torch.accelerator API. torch.accelerator API allows to abstract some of the accelerator specifics in the user scripts. By leveraging this API, the code becomes more adaptable to various hardware accelerators.