Fix torch `autocast` deprecation warning in gradient checkpointing by Copilot · Pull Request #1010 · sillsdev/silnlp

Copilot · 2026-04-24T18:17:48Z

Confirmed torch 2.6.0 patches the torch.load weights_only=True RCE vulnerability
Confirmed torch 2.6.0+cu121 does not exist; switched CUDA backend from cu121 → cu124
Updated pyproject.toml: torch ^2.5 → ^2.6, source URL cu121 → cu124
Updated poetry.lock:
- torch: 2.5.1+cu121 → 2.6.0+cu124
- All NVIDIA CUDA packages: 12.1.x → 12.4.x versions
- nvidia-cusparselt-cu12: new dependency 0.6.2 (required by torch 2.6.0)
- triton: 3.1.0 → 3.2.0
- sympy: 1.13.3 → 1.13.1 (pinned exactly by torch 2.6.0+cu124)
- Updated content-hash to match new pyproject.toml
Added explanatory comment for use_reentrant=False in gradient_checkpointing_kwargs

This change is

Agent-Logs-Url: https://github.com/sillsdev/silnlp/sessions/718c44c7-fd32-4f0e-923f-1c0164875e59 Co-authored-by: benjaminking <1214233+benjaminking@users.noreply.github.com>

benjaminking

This is probably a good change to make, but I tested it with use_reentrant=False and I still got the autocast warning, but it came from a different line this time:

/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning:
`torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.

@benjaminking reviewed 1 file and all commit messages, and made 1 comment.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on copilot[bot]).

…cpu.amp.autocast Agent-Logs-Url: https://github.com/sillsdev/silnlp/sessions/4592cc6c-17bc-498d-9320-6da2e1b8729b Co-authored-by: benjaminking <1214233+benjaminking@users.noreply.github.com>

…he source Agent-Logs-Url: https://github.com/sillsdev/silnlp/sessions/3eeb9ac4-ca51-42c0-a741-15f1ccea7d80 Co-authored-by: benjaminking <1214233+benjaminking@users.noreply.github.com>

Agent-Logs-Url: https://github.com/sillsdev/silnlp/sessions/3eeb9ac4-ca51-42c0-a741-15f1ccea7d80 Co-authored-by: benjaminking <1214233+benjaminking@users.noreply.github.com>

benjaminking · 2026-04-27T13:09:28Z

This warning was being produced by Torch code, and it looks like PyTorch's policy is to stop producing patches when the next minor version is released. So, the way to fix it was to upgrade the Torch version from 2.4 to 2.6, which also involved moving from CUDA 12.1 to 12.4 with the PyTorch wheel. I'm sure we'll eventually want to upgrade PyTorch and CUDA versions, but we should discuss whether now is the right time.

On jobs_backlog, the installed CUDA version is 12.4, and on cheetah_47gb, it is 13.0. Weirdly though, this branch works fine in an interactive session in jobs_backlog, but fails for a missing library file (libcudnn.so.9) when sending a task to jobs_backlog.

mshannon-sil · 2026-04-27T15:47:56Z

Strange. Thinking through this, we know that the silnlp container has pip requirements already installed based on the poetry lock file at the time the docker image was made. When running remote execution with this image, clearml looks at the poetry.lock file and does the installation process again, though most of the time everything is already installed so it takes little time. When running an interactive session, this installation does not happen and you're just using the already installed packages unless you manually install more packages. I'm guessing this difference is the cause for any different results you're seeing. I'd think that if you create a new docker image with the updated requirements, this bug would go away, but I'm not 100% positive.

benjaminking

I figured out what was going on and put in a temporary fix for. I'll include the long version below, but the short version is that, in the long term, we need to have the Nvidia CUDA packages installed on the Docker image to avoid this issue we were seeing. The temporary fix is to have poetry reinstall all of the packages in the venv.

And now for the long version. Our current Docker image supports CUDA 12.4, which is the version this PR upgrades to. The CuDNN package is already present on the Docker image, while other Nvidia packages need to be installed by Poetry. Poetry has a setting to use packages already installed on the Docker image if it can. But there is a phenomenon called "shadowing" where the package on the image is hidden if any package from that namespace is installed in the venv with poetry.

Essentially, our two options with the Nvidia packages are to either have all of them pre-installed on the Docker image or install all of them with Poetry. I've temporarily implemented the latter, but updating the Docker image is the long-term fix. I have verified that this change successfully removes the autocast warning and that the experiment pipeline runs successfully.

@benjaminking made 1 comment.
Reviewable status: 0 of 3 files reviewed, all discussions resolved (waiting on mshannon-sil).

mshannon-sil · 2026-05-05T22:36:30Z

That makes sense. I'd vote for going ahead and updating the docker container to have the nvidia packages preinstalled. Maybe this is something I should do, since I can test that the new dockerfile creates an image that runs successfully on my local GPU, before pushing a new version of the silnlp image publicly.

benjaminking · 2026-05-07T19:21:32Z

We will plan to tie this upgrade to the upgrade of the Python and Ubuntu version.

Initial plan

c4a11ff

Copilot AI assigned Copilot and benjaminking Apr 24, 2026

Copilot started work on behalf of benjaminking April 24, 2026 18:17 View session

Copilot AI linked an issue Apr 24, 2026 that may be closed by this pull request

Torch autocast deprecation warning #1009

Open

Fix torch autocast deprecation warning by using use_reentrant=False

bf46032

Agent-Logs-Url: https://github.com/sillsdev/silnlp/sessions/718c44c7-fd32-4f0e-923f-1c0164875e59 Co-authored-by: benjaminking <1214233+benjaminking@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix Torch autocast deprecation warning during training~~ Fix torch autocast deprecation warning in gradient checkpointing Apr 24, 2026

Copilot finished work on behalf of benjaminking April 24, 2026 18:21

Copilot AI requested a review from benjaminking April 24, 2026 18:21

benjaminking reviewed Apr 24, 2026

View reviewed changes

Copilot started work on behalf of benjaminking April 24, 2026 19:40 View session

Suppress torch.utils.checkpoint FutureWarning about deprecated torch.…

a42702f

…cpu.amp.autocast Agent-Logs-Url: https://github.com/sillsdev/silnlp/sessions/4592cc6c-17bc-498d-9320-6da2e1b8729b Co-authored-by: benjaminking <1214233+benjaminking@users.noreply.github.com>

Copilot finished work on behalf of benjaminking April 24, 2026 19:44

Copilot AI requested a review from benjaminking April 24, 2026 19:44

Copilot started work on behalf of benjaminking April 24, 2026 19:59 View session

Copilot AI and others added 2 commits April 24, 2026 20:16

Upgrade torch to 2.5.1+cu121 to fix autocast deprecation warning at t…

04cc832

…he source Agent-Logs-Url: https://github.com/sillsdev/silnlp/sessions/3eeb9ac4-ca51-42c0-a741-15f1ccea7d80 Co-authored-by: benjaminking <1214233+benjaminking@users.noreply.github.com>

Upgrade torch to 2.6.0+cu124 to patch torch.load RCE vulnerability (CVE)

4d5e274

Agent-Logs-Url: https://github.com/sillsdev/silnlp/sessions/3eeb9ac4-ca51-42c0-a741-15f1ccea7d80 Co-authored-by: benjaminking <1214233+benjaminking@users.noreply.github.com>

Copilot finished work on behalf of benjaminking April 24, 2026 20:28

benjaminking requested a review from mshannon-sil April 27, 2026 13:09

Create proper poetry.lock

c4bf59f

benjaminking reviewed May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix torch `autocast` deprecation warning in gradient checkpointing#1010

Fix torch `autocast` deprecation warning in gradient checkpointing#1010
Copilot wants to merge 6 commits into
masterfrom
copilot/fix-autocast-deprecation-warning

Copilot AI commented Apr 24, 2026 •

edited by ddaspit

Loading

Uh oh!

benjaminking left a comment

Uh oh!

benjaminking commented Apr 27, 2026

Uh oh!

mshannon-sil commented Apr 27, 2026

Uh oh!

benjaminking left a comment

Uh oh!

mshannon-sil commented May 5, 2026

Uh oh!

benjaminking commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Copilot AI commented Apr 24, 2026 • edited by ddaspit Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benjaminking left a comment

Choose a reason for hiding this comment

Uh oh!

benjaminking commented Apr 27, 2026

Uh oh!

mshannon-sil commented Apr 27, 2026

Uh oh!

benjaminking left a comment

Choose a reason for hiding this comment

Uh oh!

mshannon-sil commented May 5, 2026

Uh oh!

benjaminking commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Apr 24, 2026 •

edited by ddaspit

Loading