Using DTensor to handle local num_heads change while TP is applied by wwwjn · Pull Request #3465 · pytorch/tutorials

wwwjn · 2025-07-16T02:37:07Z

Fixes #ISSUE_NUMBER. This PR is to make the TP tutorial up-to-date with DTensor changes.

Description

After DTensor enhancement, we are not able to use DTensor to handle the change of num_heads instead of manually handle the tensor shape while TP is applied.
Corresponding changes in pytorch/examples: pytorch/examples#1373

Checklist

The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
Only one issue is addressed in this pull request
Labels from the issue that this PR is fixing are added to this pull request
No unnecessary issues are included into this pull request.

pytorch-bot · 2025-07-16T02:37:10Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3465

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wwwjn · 2025-07-16T02:39:14Z

cc @tianyu-l

tianyu-l · 2025-07-16T03:36:13Z

 If there are any more tensor operations (such as view operations) between the column-wise linear and the row-wise linear, we would need to adjust the relevant shape related ops to sharded shape.

-For the Llama model, in the attention layer there are couple of view operations that are shape related. In particular, column-wise parallel for ``wq``/ ``wk``/ ``wv`` linear layers, the activation tensor is sharded on the ``num_heads`` dimension, so we would need to adjust the ``num_heads`` to local ``num_heads``.
+For the Llama model, in the attention layer, there are several view operations related to shape. Specifically, for column-wise parallelism in the ``wq``/``wk``/``wv`` linear layers, the activation tensor is sharded on the ``num_heads`` dimension. To manage the difference between global and local ``num_heads``, we should set ``use_local_output=False`` to ensure the output is a DTensor. Unlike a regular tensor, a DTensor is aware of the parallelism plans and will automatically handle changes in the ``num_heads`` dimension.


I think we should be able to use DTensor i.e. set use_local_output=False everywhere.
Maybe it's OK to keep a mixed usage of use_local_output so people are aware of this flexibility, but we should mention it here.

…3465) * fsdp1 -> fsdp2 * change num_heads in tutorial --------- Co-authored-by: Svetlana Karslioglu <svekars@meta.com>

wwwjn added 3 commits July 15, 2025 11:10

fsdp1 -> fsdp2

55493b2

change num_heads in tutorial

90c66f8

rewrite

630e1d2

meta-cla Bot added the cla signed label Jul 16, 2025

wwwjn changed the title ~~Using DTensor to handel local num_heads change while TP is applied~~ Using DTensor to handle local num_heads change while TP is applied Jul 16, 2025

Merge branch 'main' into tp_tutorial_2

6dd3297

tianyu-l approved these changes Jul 16, 2025

View reviewed changes

svekars approved these changes Jul 16, 2025

View reviewed changes

Merge branch 'main' into tp_tutorial_2

31df79b

svekars merged commit b78fc75 into pytorch:main Jul 18, 2025
21 checks passed

mikaylagawarecki pushed a commit that referenced this pull request Jul 23, 2025

Using DTensor to handle local num_heads change while TP is applied (#…

5d79795

…3465) * fsdp1 -> fsdp2 * change num_heads in tutorial --------- Co-authored-by: Svetlana Karslioglu <svekars@meta.com>

mikaylagawarecki pushed a commit that referenced this pull request Jul 23, 2025

Using DTensor to handle local num_heads change while TP is applied (#…

73a4da5

…3465) * fsdp1 -> fsdp2 * change num_heads in tutorial --------- Co-authored-by: Svetlana Karslioglu <svekars@meta.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using DTensor to handle local num_heads change while TP is applied#3465

Using DTensor to handle local num_heads change while TP is applied#3465
svekars merged 5 commits into
pytorch:mainfrom
wwwjn:tp_tutorial_2

wwwjn commented Jul 16, 2025 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jul 16, 2025 •

edited

Loading

Uh oh!

wwwjn commented Jul 16, 2025

Uh oh!

tianyu-l Jul 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wwwjn commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

pytorch-bot Bot commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3465

Uh oh!

wwwjn commented Jul 16, 2025

Uh oh!

tianyu-l Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wwwjn commented Jul 16, 2025 •

edited

Loading

pytorch-bot Bot commented Jul 16, 2025 •

edited

Loading