Remove upstreamed sharding rule overrides for constant_pad_nd, native_layer_norm, and native_layer_norm_backward by pianpwk · Pull Request #373 · meta-pytorch/autoparallel

pianpwk · 2026-03-18T20:06:21Z

…

These three rules were carried as local overrides in autoparallel while upstream PyTorch lacked proper handling:

constant_pad_nd: non-replicate strategy filtering on padded dims (upstreamed in [DTensor] constant_pad_nd non-replicate strategy pytorch/pytorch#175656)
native_layer_norm forward: correct per-output shapes and contiguous strides (upstreamed in [DTensor] layernorm output meta pytorch/pytorch#175652)
native_layer_norm backward: contiguous stride handling for grad_input (upstreamed in a companion PR to pytorch/pytorch)

With all three fixes now in upstream PyTorch, the overrides can be removed and autoparallel defers to the upstream register_op_strategy implementations.

Authored with Claude.

…_layer_norm, and native_layer_norm_backward These three rules were carried as local overrides in autoparallel while upstream PyTorch lacked proper handling: - constant_pad_nd: non-replicate strategy filtering on padded dims (upstreamed in pytorch/pytorch#175656) - native_layer_norm forward: correct per-output shapes and contiguous strides (upstreamed in pytorch/pytorch#175652) - native_layer_norm backward: contiguous stride handling for grad_input (upstreamed in a companion PR to pytorch/pytorch) With all three fixes now in upstream PyTorch, the overrides can be removed and autoparallel defers to the upstream register_op_strategy implementations. Authored with Claude.

fmassa · 2026-03-19T09:35:27Z

@pianpwk the layernorm implementation in PyTorch actually makes some limiting assumptions, like the number of sharding placements for inputs / weights / etc are all the same. So they would need to be rewritten I believe

…m, RMSNorm FW/BW" Removes op strategies for layernorm, RMS norm FWD/BWD, since they don't compose well with AutoParallel, in favor of single-dim strategies I think this should fix meta-pytorch/autoparallel#142, and maybe allow us to delete the overrides in meta-pytorch/autoparallel#399, meta-pytorch/autoparallel#373 [ghstack-poisoned]

Removes op strategies for layernorm, RMS norm FWD/BWD, since they don't compose well with AutoParallel, in favor of single-dim strategies I think this should fix meta-pytorch/autoparallel#142, and maybe allow us to delete the overrides in meta-pytorch/autoparallel#399, meta-pytorch/autoparallel#373 [ghstack-poisoned]

Removes op strategies for layernorm, RMS norm FWD/BWD, since they don't compose well with AutoParallel, in favor of single-dim strategies I think this should fix meta-pytorch/autoparallel#142, and maybe allow us to delete the overrides in meta-pytorch/autoparallel#399, meta-pytorch/autoparallel#373 Pull Request resolved: #179173 Approved by: https://github.com/zpcore

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 18, 2026

pianpwk mentioned this pull request Apr 3, 2026

[shard prop] single-dim rules for LayerNorm, RMSNorm FW/BW pytorch/pytorch#179173

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove upstreamed sharding rule overrides for constant_pad_nd, native_layer_norm, and native_layer_norm_backward#373

Remove upstreamed sharding rule overrides for constant_pad_nd, native_layer_norm, and native_layer_norm_backward#373
pianpwk wants to merge 1 commit intometa-pytorch:mainfrom
pianpwk:remove-upstreamed-sharding-rules

pianpwk commented Mar 18, 2026

Uh oh!

fmassa commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pianpwk commented Mar 18, 2026

Uh oh!

fmassa commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants