Skip to content

Support CANS orthogonalization in Muon.#140

Open
mihara-bot wants to merge 6 commits intoNVIDIA-NeMo:mainfrom
mihara-bot:feat/muon-cans
Open

Support CANS orthogonalization in Muon.#140
mihara-bot wants to merge 6 commits intoNVIDIA-NeMo:mainfrom
mihara-bot:feat/muon-cans

Conversation

@mihara-bot
Copy link

This adds coefficient_type=\"cans\" Newton-Schulz coefficients (and tests) so the optimizer can match CANS-based Muon implementations.

CANS (http://arxiv.org/abs/2506.10935) is an algorithm very similar to PolarExpress, which can also accelerate LLM pre-training.

This adds `coefficient_type=\"cans\"` Newton-Schulz coefficients (and tests) so the optimizer can match CANS-based Muon implementations.

Made-with: Cursor
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 20, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Author

@mihara-bot mihara-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the appropriate coefficients after human check.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 20, 2026

Greptile Summary

This PR adds "cans" as a new built-in coefficient type for Newton-Schulz orthogonalization across the Muon, PolarGrad, and Scion optimizers, enabling users to match CANS-based Muon implementations from the arxiv paper. The CANS coefficient set uses a Remez + adaptive interval approach and, like "polar_express", uses repeat_last iteration mode (the final coefficient is repeated for any steps beyond the 5-entry list).

Key changes:

  • muon_utils.py: Adds 5-entry CANS coefficient set, extends NSCoeffT, and updates the iter_mode selection to include "cans" in the repeat_last branch.
  • Optimizer docstrings (muon.py, polargrad.py, scion.py): Updated to list "cans" as a valid coefficient_type option.
  • tests/test_muon_utils.py: Two new reference-matching tests verify the per-step math for 5-step CANS and the repeat_last semantics for 9-step CANS. The 9-step reference correctly extends the 5-element coefficient list by 4 repeats to reach 9 total entries, consistent with the repeat_last implementation.

The functional implementation is correct. The one new concern identified is that the "custom" coefficient type has no way to opt into repeat_last mode — a user providing custom CANS-style coefficients would silently get cycle behavior instead. Several documentation gaps and test coverage improvements were flagged in prior review threads and remain open.

Confidence Score: 3/5

  • Functionally correct implementation, but open documentation gaps and missing orthogonality quality test warrant follow-up before merging.
  • The core CANS integration (coefficients, repeat_last mode wiring, NSCoeffT type) is correct and the new tests verify both 5-step and 9-step behavior against a reference implementation. However, several issues from prior review threads remain unresolved: "aol" is still absent from optimizer class docstrings, no SVD orthogonality quality benchmark exists for CANS, and the coefficient generation script is not present in the repository. A new concern — that "custom" mode cannot opt into repeat_last — also warrants attention.
  • emerging_optimizers/orthogonalized_optimizers/muon_utils.py (iter_mode not configurable for custom type); tests/test_muon_utils.py (no SVD quality benchmark for CANS); all three optimizer files for the "aol" docstring omission.

Important Files Changed

Filename Overview
emerging_optimizers/orthogonalized_optimizers/muon_utils.py CANS coefficients added correctly to _COEFFICIENT_SETS; iter_mode logic updated to use repeat_last for CANS; "cans" documented in newton_schulz docstring. Minor issue: "custom" type still hardcodes iter_mode to "cycle" with no override, limiting users who want CANS-like repeat-last behavior with custom coefficients.
tests/test_muon_utils.py Two new tests added: test_cans_close_to_reference (5-step, verifies per-step math) and test_get_cans_9steps_close_to_reference (9-step, correctly extends the 5-element list by 4 repeats to total 9 entries, matching repeat_last semantics). Both tests are structurally correct. Missing orthogonalization quality benchmark (e.g. CANS vs quintic on ill-conditioned matrix) noted in prior threads.
emerging_optimizers/orthogonalized_optimizers/muon.py Docstring updated to list "cans" as a valid coefficient_type; "aol" remains unlisted (pre-existing omission noted in prior threads). No functional changes.
emerging_optimizers/orthogonalized_optimizers/polargrad.py Docstring updated to list "cans"; "aol" still omitted (same as muon.py). No functional changes.
emerging_optimizers/orthogonalized_optimizers/scion.py Docstring updated to list "cans"; "aol" still omitted (same as muon.py). No functional changes.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["newton_schulz(x, steps, coefficient_type)"] --> B{coefficient_type}
    B -->|"simple / quintic / aol / polar_express / cans"| C["coefficient_sets = _COEFFICIENT_SETS[coefficient_type]"]
    B -->|"custom"| D{custom_coefficient_sets provided?}
    D -->|No| E["raise ValueError"]
    D -->|Yes| F["coefficient_sets = custom_coefficient_sets"]
    C --> G{coefficient_type in polar_express, cans?}
    F --> H["iter_mode = 'cycle' (hardcoded — no override)"]
    G -->|Yes| I["iter_mode = 'repeat_last'"]
    G -->|No| J["iter_mode = 'cycle'"]
    I --> K["get_coefficient_iterator(steps, coefficient_sets, repeat_last)"]
    J --> K
    H --> K
    K --> L["for a, b, c in coeff_iter: X = ns_step(X, a, b, c)"]
    L --> M["return orthogonalized X"]

    style H fill:#ffe0b2,stroke:#e65100
    style I fill:#c8e6c9,stroke:#2e7d32
Loading

Last reviewed commit: "Merge branch 'main' ..."

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Xinlin Zhuang <1147220090@qq.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Xinlin Zhuang <1147220090@qq.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Xinlin Zhuang <1147220090@qq.com>
Copy link
Contributor

@skyw skyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes needed. otherwise LGTM.

Instruction of DCO is in CONTRIBUTING.md

Comment on lines +194 to +196
iter_mode: CoeffIterMode = (
"repeat_last" if coefficient_type in ("polar_express", "cans") else "cycle"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 iter_mode not configurable for "custom" coefficient type

With this PR, repeat_last is now the correct mode for two built-in types ("polar_express" and "cans"). However, the iter_mode for "custom" is still hardcoded to "cycle" via the else branch:

iter_mode: CoeffIterMode = (
    "repeat_last" if coefficient_type in ("polar_express", "cans") else "cycle"
)

A user who provides custom coefficients designed for CANS-style repeat_last behavior (e.g., a longer custom Remez set with a stable last step) has no way to opt into repeat_last. They will silently get cycle instead, which wraps back to the first coefficient — giving completely wrong results for their use case. Now that repeat_last is an established, supported pattern, exposing it for "custom" would make the API consistent:

def newton_schulz(
    ...
    custom_coefficient_sets: list[tuple[float, float, float]] | None = None,
    custom_iter_mode: CoeffIterMode = "cycle",  # new parameter
    ...
):
    ...
    if coefficient_type == "custom":
        ...
        iter_mode = custom_iter_mode
    else:
        iter_mode = "repeat_last" if coefficient_type in ("polar_express", "cans") else "cycle"

Alternatively, at minimum, add a note to the docstring warning that "custom" always cycles and does not support repeat_last.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants