Remove upstreamed sharding rule overrides for constant_pad_nd, native_layer_norm, and native_layer_norm_backward#373
Draft
pianpwk wants to merge 1 commit intometa-pytorch:mainfrom
Conversation
…_layer_norm, and native_layer_norm_backward These three rules were carried as local overrides in autoparallel while upstream PyTorch lacked proper handling: - constant_pad_nd: non-replicate strategy filtering on padded dims (upstreamed in pytorch/pytorch#175656) - native_layer_norm forward: correct per-output shapes and contiguous strides (upstreamed in pytorch/pytorch#175652) - native_layer_norm backward: contiguous stride handling for grad_input (upstreamed in a companion PR to pytorch/pytorch) With all three fixes now in upstream PyTorch, the overrides can be removed and autoparallel defers to the upstream register_op_strategy implementations. Authored with Claude.
Contributor
|
@pianpwk the layernorm implementation in PyTorch actually makes some limiting assumptions, like the number of sharding placements for inputs / weights / etc are all the same. So they would need to be rewritten I believe |
pianpwk
added a commit
to pytorch/pytorch
that referenced
this pull request
Apr 8, 2026
…m, RMSNorm FW/BW" Removes op strategies for layernorm, RMS norm FWD/BWD, since they don't compose well with AutoParallel, in favor of single-dim strategies I think this should fix meta-pytorch/autoparallel#142, and maybe allow us to delete the overrides in meta-pytorch/autoparallel#399, meta-pytorch/autoparallel#373 [ghstack-poisoned]
pianpwk
added a commit
to pytorch/pytorch
that referenced
this pull request
Apr 8, 2026
Removes op strategies for layernorm, RMS norm FWD/BWD, since they don't compose well with AutoParallel, in favor of single-dim strategies I think this should fix meta-pytorch/autoparallel#142, and maybe allow us to delete the overrides in meta-pytorch/autoparallel#399, meta-pytorch/autoparallel#373 [ghstack-poisoned]
pytorchmergebot
pushed a commit
to pytorch/pytorch
that referenced
this pull request
Apr 9, 2026
Removes op strategies for layernorm, RMS norm FWD/BWD, since they don't compose well with AutoParallel, in favor of single-dim strategies I think this should fix meta-pytorch/autoparallel#142, and maybe allow us to delete the overrides in meta-pytorch/autoparallel#399, meta-pytorch/autoparallel#373 Pull Request resolved: #179173 Approved by: https://github.com/zpcore
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…
These three rules were carried as local overrides in autoparallel while upstream PyTorch lacked proper handling:
With all three fixes now in upstream PyTorch, the overrides can be removed and autoparallel defers to the upstream register_op_strategy implementations.
Authored with Claude.