Skip to content

Remove upstreamed sharding rule overrides for constant_pad_nd, native_layer_norm, and native_layer_norm_backward#373

Draft
pianpwk wants to merge 1 commit intometa-pytorch:mainfrom
pianpwk:remove-upstreamed-sharding-rules
Draft

Remove upstreamed sharding rule overrides for constant_pad_nd, native_layer_norm, and native_layer_norm_backward#373
pianpwk wants to merge 1 commit intometa-pytorch:mainfrom
pianpwk:remove-upstreamed-sharding-rules

Conversation

@pianpwk
Copy link
Copy Markdown
Contributor

@pianpwk pianpwk commented Mar 18, 2026

These three rules were carried as local overrides in autoparallel while upstream PyTorch lacked proper handling:

With all three fixes now in upstream PyTorch, the overrides can be removed and autoparallel defers to the upstream register_op_strategy implementations.

Authored with Claude.

…_layer_norm, and native_layer_norm_backward

These three rules were carried as local overrides in autoparallel while
upstream PyTorch lacked proper handling:

- constant_pad_nd: non-replicate strategy filtering on padded dims
  (upstreamed in pytorch/pytorch#175656)
- native_layer_norm forward: correct per-output shapes and contiguous
  strides (upstreamed in pytorch/pytorch#175652)
- native_layer_norm backward: contiguous stride handling for grad_input
  (upstreamed in a companion PR to pytorch/pytorch)

With all three fixes now in upstream PyTorch, the overrides can be
removed and autoparallel defers to the upstream register_op_strategy
implementations.

Authored with Claude.
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 18, 2026
@fmassa
Copy link
Copy Markdown
Contributor

fmassa commented Mar 19, 2026

@pianpwk the layernorm implementation in PyTorch actually makes some limiting assumptions, like the number of sharding placements for inputs / weights / etc are all the same. So they would need to be rewritten I believe

pianpwk added a commit to pytorch/pytorch that referenced this pull request Apr 8, 2026
…m, RMSNorm FW/BW"

Removes op strategies for layernorm, RMS norm FWD/BWD, since they don't compose well with AutoParallel, in favor of single-dim strategies

I think this should fix meta-pytorch/autoparallel#142, and maybe allow us to delete the overrides in meta-pytorch/autoparallel#399, meta-pytorch/autoparallel#373




[ghstack-poisoned]
pianpwk added a commit to pytorch/pytorch that referenced this pull request Apr 8, 2026
Removes op strategies for layernorm, RMS norm FWD/BWD, since they don't compose well with AutoParallel, in favor of single-dim strategies

I think this should fix meta-pytorch/autoparallel#142, and maybe allow us to delete the overrides in meta-pytorch/autoparallel#399, meta-pytorch/autoparallel#373




[ghstack-poisoned]
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Apr 9, 2026
Removes op strategies for layernorm, RMS norm FWD/BWD, since they don't compose well with AutoParallel, in favor of single-dim strategies

I think this should fix meta-pytorch/autoparallel#142, and maybe allow us to delete the overrides in meta-pytorch/autoparallel#399, meta-pytorch/autoparallel#373

Pull Request resolved: #179173
Approved by: https://github.com/zpcore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants