Skip to content

Add --muon-coefficient-type argument for Muon optimizer#3927

Open
mchrzanowski wants to merge 9 commits intoNVIDIA:mainfrom
mchrzanowski:add-muon-coefficient-type
Open

Add --muon-coefficient-type argument for Muon optimizer#3927
mchrzanowski wants to merge 9 commits intoNVIDIA:mainfrom
mchrzanowski:add-muon-coefficient-type

Conversation

@mchrzanowski
Copy link
Contributor

Allow users to select the Newton-Schulz polynomial coefficient set (e.g. simple, quintic, polar_express, aol) via the new --muon-coefficient-type CLI flag. Supported types are discovered dynamically from the installed emerging_optimizers package so that upstream additions are picked up automatically without code changes.

Allow users to select the Newton-Schulz polynomial coefficient set
(e.g. simple, quintic, polar_express, aol) via the new
--muon-coefficient-type CLI flag. Supported types are discovered
dynamically from the installed emerging_optimizers package so that
upstream additions are picked up automatically without code changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mchrzanowski mchrzanowski requested review from a team as code owners March 18, 2026 16:26
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 18, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@svcnvidia-nemo-ci svcnvidia-nemo-ci marked this pull request as draft March 18, 2026 16:27
@github-actions
Copy link
Contributor

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

  1. Add the oncall reviewer (optional reviewer)
  2. Add required review teams based on your changes

See the contribution guide for more details.

root and others added 4 commits March 18, 2026 09:34
Derive supported coefficient types from the public NSCoeffT Literal
type via typing.get_args() rather than reading keys from the private
_COEFFICIENT_SETS dict. Tests likewise avoid importing _COEFFICIENT_SETS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
emerging_optimizers must be installed to use Muon, so there is no
need for a hardcoded fallback list. get_supported_coefficient_types()
now asserts the package is present and reads NSCoeffT directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace hardcoded _NS_STEPS_FOR_COEFF_TYPE mapping with dynamic discovery
via get_supported_coefficient_types() (backed by NSCoeffT). Since
get_coefficient_iterator cycles/repeats coefficients, a single default
step count works for all types. Remove redundant duplicate test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mchrzanowski mchrzanowski marked this pull request as ready for review March 18, 2026 23:02
muon_fp32_matmul_prec: str = "medium"
"""The precision to use for the fp32 matmul. Defaults to "medium"."""

muon_coefficient_type: str = "quintic"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, would recommend pinging downstream users (MBridge, etc) to add this to the list of Muon args

@svcnvidia-nemo-ci svcnvidia-nemo-ci added the Final Review PR is in the "final review" stage label Mar 19, 2026
@@ -288,6 +311,7 @@ def lion_init_state_fn(opt, config=None):
"use_nesterov": config.muon_use_nesterov,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, should we check for certain versions of emerging_optimizers? I believe this got re-named to nesterov in the latest release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it is better to keep the code only support one version. And I think having that version be the tagged in pyproject is reasonable. Optionally we can add a global check.
We will bump main to support v0.2.0 soon(after dev refactor+bump and main2dev sync, both are finalizing).

def get_supported_coefficient_types() -> tuple[str, ...]:
"""Return the coefficient types supported by the installed emerging_optimizers.

Reads the members of the ``NSCoeffT`` Literal type so that new types
Copy link
Contributor

@skyw skyw Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Don't make this dependency. Couple of reasons:

  • It usually needs a good reason for runtime to depend on static type checking code. NSCoeffT is not supposed to be used in runtime by design.
  • NSCoeffT is not guarantee to provide all the supported coefficient type. It is used for static type checking. and there are options to expand like use union, e.g. Union[NSCoeffT, ExpCoeffT, "blah"] as long as it can be justified instead of adding every new type into the type.

Although I don't incline to change, it is not beneficial for megatron to make this dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean do you have a better idea other than hard-coding the acceptable strings right now? that seems so hacky; you cant even see the list of supported coefficient types because it's in another repo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would depend on the goal, automatically picking up new types? or verify megatron use the right coefficient type string to call emerging_optimizers, or both.

For the former, it needs emerging_optimizer to commit to interface stability (which it hasn't although we try to avoid breaking changes) and provide a blessed list for public consumption. For the latter, Megatron can also use type check in theory (although reality is hard). There are many alternatives, depend on the level of protection you want to achieve.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was thinking more of the former; agreed that type checking sounds hard and not worth the effort

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think im going to leave this in there for now, but we can refactor once emerging optimizers decided what it wants to do

root and others added 2 commits March 19, 2026 15:29
The Lion class moved in emerging_optimizers 0.2. Gate the import
behind an explicit version check so users get a clear error instead
of a silent ImportError on older versions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

complexity: low Final Review PR is in the "final review" stage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants