Add --muon-coefficient-type argument for Muon optimizer by mchrzanowski · Pull Request #3927 · NVIDIA/Megatron-LM

mchrzanowski · 2026-03-18T16:26:52Z

Allow users to select the Newton-Schulz polynomial coefficient set (e.g. simple, quintic, polar_express, aol) via the new --muon-coefficient-type CLI flag. Supported types are discovered dynamically from the installed emerging_optimizers package so that upstream additions are picked up automatically without code changes.

Allow users to select the Newton-Schulz polynomial coefficient set (e.g. simple, quintic, polar_express, aol) via the new --muon-coefficient-type CLI flag. Supported types are discovered dynamically from the installed emerging_optimizers package so that upstream additions are picked up automatically without code changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

copy-pr-bot · 2026-03-18T16:26:56Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-03-18T16:27:05Z

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

Add the oncall reviewer (optional reviewer)
Add required review teams based on your changes

See the contribution guide for more details.

Derive supported coefficient types from the public NSCoeffT Literal type via typing.get_args() rather than reading keys from the private _COEFFICIENT_SETS dict. Tests likewise avoid importing _COEFFICIENT_SETS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

emerging_optimizers must be installed to use Muon, so there is no need for a hardcoded fallback list. get_supported_coefficient_types() now asserts the package is present and reads NSCoeffT directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace hardcoded _NS_STEPS_FOR_COEFF_TYPE mapping with dynamic discovery via get_supported_coefficient_types() (backed by NSCoeffT). Since get_coefficient_iterator cycles/repeats coefficients, a single default step count works for all types. Remove redundant duplicate test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mkhona-nvidia · 2026-03-18T23:04:03Z

megatron/core/optimizer/optimizer_config.py

    muon_fp32_matmul_prec: str = "medium"
    """The precision to use for the fp32 matmul. Defaults to "medium"."""

+    muon_coefficient_type: str = "quintic"


Again, would recommend pinging downstream users (MBridge, etc) to add this to the list of Muon args

deepakn94 · 2026-03-19T04:42:32Z

megatron/core/optimizer/muon.py

@@ -288,6 +311,7 @@ def lion_init_state_fn(opt, config=None):
        "use_nesterov": config.muon_use_nesterov,


BTW, should we check for certain versions of emerging_optimizers? I believe this got re-named to nesterov in the latest release.

I feel it is better to keep the code only support one version. And I think having that version be the tagged in pyproject is reasonable. Optionally we can add a global check.
We will bump main to support v0.2.0 soon(after dev refactor+bump and main2dev sync, both are finalizing).

skyw · 2026-03-19T16:19:12Z

megatron/core/optimizer/muon.py

+def get_supported_coefficient_types() -> tuple[str, ...]:
+    """Return the coefficient types supported by the installed emerging_optimizers.
+
+    Reads the members of the ``NSCoeffT`` Literal type so that new types


Suggestion: Don't make this dependency. Couple of reasons:

It usually needs a good reason for runtime to depend on static type checking code. NSCoeffT is not supposed to be used in runtime by design.

NSCoeffT is not guarantee to provide all the supported coefficient type. It is used for static type checking. and there are options to expand like use union, e.g. Union[NSCoeffT, ExpCoeffT, "blah"] as long as it can be justified instead of adding every new type into the type.

Although I don't incline to change, it is not beneficial for megatron to make this dependency.

i mean do you have a better idea other than hard-coding the acceptable strings right now? that seems so hacky; you cant even see the list of supported coefficient types because it's in another repo

It would depend on the goal, automatically picking up new types? or verify megatron use the right coefficient type string to call emerging_optimizers, or both.

For the former, it needs emerging_optimizer to commit to interface stability (which it hasn't although we try to avoid breaking changes) and provide a blessed list for public consumption. For the latter, Megatron can also use type check in theory (although reality is hard). There are many alternatives, depend on the level of protection you want to achieve.

i was thinking more of the former; agreed that type checking sounds hard and not worth the effort

i think im going to leave this in there for now, but we can refactor once emerging optimizers decided what it wants to do

megatron/core/optimizer/muon.py

The Lion class moved in emerging_optimizers 0.2. Gate the import behind an explicit version check so users get a clear error instead of a silent ImportError on older versions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mchrzanowski requested review from a team as code owners March 18, 2026 16:26

svcnvidia-nemo-ci marked this pull request as draft March 18, 2026 16:27

root and others added 4 commits March 18, 2026 09:34

Run autoformatter on test file

17597ac

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mchrzanowski marked this pull request as ready for review March 18, 2026 23:02

Merge branch 'main' into add-muon-coefficient-type

8a025d3

mchrzanowski requested a review from mkhona-nvidia March 18, 2026 23:02

svcnvidia-nemo-ci requested a review from a team March 18, 2026 23:02

svcnvidia-nemo-ci added the complexity: low label Mar 18, 2026

mkhona-nvidia approved these changes Mar 18, 2026

View reviewed changes

chtruong814 approved these changes Mar 19, 2026

View reviewed changes

svcnvidia-nemo-ci added the Final Review PR is in the "final review" stage label Mar 19, 2026

deepakn94 reviewed Mar 19, 2026

View reviewed changes

cuichenx approved these changes Mar 19, 2026

View reviewed changes

skyw reviewed Mar 19, 2026

View reviewed changes

Merge branch 'main' into add-muon-coefficient-type

678c832

ericharper reviewed Mar 19, 2026

View reviewed changes

megatron/core/optimizer/muon.py Show resolved Hide resolved

root and others added 2 commits March 19, 2026 15:29

Narrow broad Exception catch to PackageNotFoundError

02bad44

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mchrzanowski enabled auto-merge March 19, 2026 23:52

mchrzanowski requested review from deepakn94 and ericharper March 21, 2026 09:27

deepakn94 approved these changes Mar 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --muon-coefficient-type argument for Muon optimizer#3927

Add --muon-coefficient-type argument for Muon optimizer#3927
mchrzanowski wants to merge 9 commits intoNVIDIA:mainfrom
mchrzanowski:add-muon-coefficient-type

mchrzanowski commented Mar 18, 2026

Uh oh!

copy-pr-bot bot commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

mkhona-nvidia Mar 18, 2026

Uh oh!

deepakn94 Mar 19, 2026

Uh oh!

FDecaYed Mar 19, 2026

Uh oh!

skyw Mar 19, 2026 •

edited

Loading

Uh oh!

mchrzanowski Mar 19, 2026

Uh oh!

skyw Mar 19, 2026

Uh oh!

mchrzanowski Mar 19, 2026

Uh oh!

mchrzanowski Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

		@@ -288,6 +311,7 @@ def lion_init_state_fn(opt, config=None):
		"use_nesterov": config.muon_use_nesterov,

Conversation

mchrzanowski commented Mar 18, 2026

Uh oh!

copy-pr-bot bot commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

mkhona-nvidia Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

deepakn94 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

FDecaYed Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

skyw Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mchrzanowski Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

skyw Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

mchrzanowski Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

mchrzanowski Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

skyw Mar 19, 2026 •

edited

Loading