Add --muon-coefficient-type argument for Muon optimizer#3927
Add --muon-coefficient-type argument for Muon optimizer#3927mchrzanowski wants to merge 9 commits intoNVIDIA:mainfrom
Conversation
Allow users to select the Newton-Schulz polynomial coefficient set (e.g. simple, quintic, polar_express, aol) via the new --muon-coefficient-type CLI flag. Supported types are discovered dynamically from the installed emerging_optimizers package so that upstream additions are picked up automatically without code changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
This PR has been automatically converted to draft because all PRs must start as drafts. When you are ready for review, click Ready for Review to begin the review process. This will:
See the contribution guide for more details. |
Derive supported coefficient types from the public NSCoeffT Literal type via typing.get_args() rather than reading keys from the private _COEFFICIENT_SETS dict. Tests likewise avoid importing _COEFFICIENT_SETS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
emerging_optimizers must be installed to use Muon, so there is no need for a hardcoded fallback list. get_supported_coefficient_types() now asserts the package is present and reads NSCoeffT directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace hardcoded _NS_STEPS_FOR_COEFF_TYPE mapping with dynamic discovery via get_supported_coefficient_types() (backed by NSCoeffT). Since get_coefficient_iterator cycles/repeats coefficients, a single default step count works for all types. Remove redundant duplicate test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| muon_fp32_matmul_prec: str = "medium" | ||
| """The precision to use for the fp32 matmul. Defaults to "medium".""" | ||
|
|
||
| muon_coefficient_type: str = "quintic" |
There was a problem hiding this comment.
Again, would recommend pinging downstream users (MBridge, etc) to add this to the list of Muon args
| @@ -288,6 +311,7 @@ def lion_init_state_fn(opt, config=None): | |||
| "use_nesterov": config.muon_use_nesterov, | |||
There was a problem hiding this comment.
BTW, should we check for certain versions of emerging_optimizers? I believe this got re-named to nesterov in the latest release.
There was a problem hiding this comment.
I feel it is better to keep the code only support one version. And I think having that version be the tagged in pyproject is reasonable. Optionally we can add a global check.
We will bump main to support v0.2.0 soon(after dev refactor+bump and main2dev sync, both are finalizing).
| def get_supported_coefficient_types() -> tuple[str, ...]: | ||
| """Return the coefficient types supported by the installed emerging_optimizers. | ||
|
|
||
| Reads the members of the ``NSCoeffT`` Literal type so that new types |
There was a problem hiding this comment.
Suggestion: Don't make this dependency. Couple of reasons:
- It usually needs a good reason for runtime to depend on static type checking code. NSCoeffT is not supposed to be used in runtime by design.
NSCoeffTis not guarantee to provide all the supported coefficient type. It is used for static type checking. and there are options to expand like use union, e.g.Union[NSCoeffT, ExpCoeffT, "blah"]as long as it can be justified instead of adding every new type into the type.
Although I don't incline to change, it is not beneficial for megatron to make this dependency.
There was a problem hiding this comment.
i mean do you have a better idea other than hard-coding the acceptable strings right now? that seems so hacky; you cant even see the list of supported coefficient types because it's in another repo
There was a problem hiding this comment.
It would depend on the goal, automatically picking up new types? or verify megatron use the right coefficient type string to call emerging_optimizers, or both.
For the former, it needs emerging_optimizer to commit to interface stability (which it hasn't although we try to avoid breaking changes) and provide a blessed list for public consumption. For the latter, Megatron can also use type check in theory (although reality is hard). There are many alternatives, depend on the level of protection you want to achieve.
There was a problem hiding this comment.
i was thinking more of the former; agreed that type checking sounds hard and not worth the effort
There was a problem hiding this comment.
i think im going to leave this in there for now, but we can refactor once emerging optimizers decided what it wants to do
The Lion class moved in emerging_optimizers 0.2. Gate the import behind an explicit version check so users get a clear error instead of a silent ImportError on older versions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allow users to select the Newton-Schulz polynomial coefficient set (e.g. simple, quintic, polar_express, aol) via the new --muon-coefficient-type CLI flag. Supported types are discovered dynamically from the installed emerging_optimizers package so that upstream additions are picked up automatically without code changes.