Skip to content

Modernize distributed/rpc/pipeline#1385

Merged
msaroufim merged 3 commits into
mainfrom
update_rpc
Aug 25, 2025
Merged

Modernize distributed/rpc/pipeline#1385
msaroufim merged 3 commits into
mainfrom
update_rpc

Conversation

@msaroufim
Copy link
Copy Markdown
Contributor

@msaroufim msaroufim commented Aug 25, 2025

Replaced PyTorch's deprecated DistributedOptimizer with manual optimizer management using torch.compile to fix the TorchScript deprecation warning so optimizers on each remote worker and manually called step() and zero_grad() through RPC. The remaining warnings (ProcessGroupGloo and NetworkX) I didn't know how to fix at user code. Impressed at Claude for figuring this out.

Output is now clean

(create) ➜  pipeline git:(main) ✗ python main.py
[Gloo] Rank 0 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
[Gloo] Rank 2 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
[Gloo] Rank 1 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
Processing batch 0
Processing batch 1
Processing batch 2
number of splits = 1, execution time = 24.637782335281372
[Gloo] Rank 0 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
[Gloo] Rank 1 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
[Gloo] Rank 2 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
Processing batch 0
Processing batch 1
Processing batch 2
number of splits = 2, execution time = 19.14631748199463
[Gloo] Rank 0 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
[Gloo] Rank 2 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
[Gloo] Rank 1 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
Processing batch 0
Processing batch 1
Processing batch 2
number of splits = 4, execution time = 15.70963716506958
[Gloo] Rank 1 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
[Gloo] Rank 0 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
[Gloo] Rank 2 is connected to 2 peer ranks. Expected number of connected peer ranks is : 2
Processing batch 0
Processing batch 1
Processing batch 2
number of splits = 8, execution time = 11.398766994476318

@meta-cla meta-cla Bot added the cla signed label Aug 25, 2025
@netlify
Copy link
Copy Markdown

netlify Bot commented Aug 25, 2025

Deploy Preview for pytorch-examples-preview canceled.

Name Link
🔨 Latest commit f979909
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-examples-preview/deploys/68acb887f1608700082c9ceb

Mark Saroufim added 2 commits August 25, 2025 12:22
@msaroufim msaroufim changed the title Modernize RPC example Modernize distributed/rpc/pipeline Aug 25, 2025
@msaroufim msaroufim merged commit 746c0a2 into main Aug 25, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant