Support CANS orthogonalization in Muon.#140
Support CANS orthogonalization in Muon.#140mihara-bot wants to merge 6 commits intoNVIDIA-NeMo:mainfrom
Conversation
This adds `coefficient_type=\"cans\"` Newton-Schulz coefficients (and tests) so the optimizer can match CANS-based Muon implementations. Made-with: Cursor
mihara-bot
left a comment
There was a problem hiding this comment.
I added the appropriate coefficients after human check.
Greptile SummaryThis PR adds Key changes:
The functional implementation is correct. The one new concern identified is that the Confidence Score: 3/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["newton_schulz(x, steps, coefficient_type)"] --> B{coefficient_type}
B -->|"simple / quintic / aol / polar_express / cans"| C["coefficient_sets = _COEFFICIENT_SETS[coefficient_type]"]
B -->|"custom"| D{custom_coefficient_sets provided?}
D -->|No| E["raise ValueError"]
D -->|Yes| F["coefficient_sets = custom_coefficient_sets"]
C --> G{coefficient_type in polar_express, cans?}
F --> H["iter_mode = 'cycle' (hardcoded — no override)"]
G -->|Yes| I["iter_mode = 'repeat_last'"]
G -->|No| J["iter_mode = 'cycle'"]
I --> K["get_coefficient_iterator(steps, coefficient_sets, repeat_last)"]
J --> K
H --> K
K --> L["for a, b, c in coeff_iter: X = ns_step(X, a, b, c)"]
L --> M["return orthogonalized X"]
style H fill:#ffe0b2,stroke:#e65100
style I fill:#c8e6c9,stroke:#2e7d32
Last reviewed commit: "Merge branch 'main' ..." |
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Xinlin Zhuang <1147220090@qq.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Xinlin Zhuang <1147220090@qq.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Xinlin Zhuang <1147220090@qq.com>
| iter_mode: CoeffIterMode = ( | ||
| "repeat_last" if coefficient_type in ("polar_express", "cans") else "cycle" | ||
| ) |
There was a problem hiding this comment.
iter_mode not configurable for "custom" coefficient type
With this PR, repeat_last is now the correct mode for two built-in types ("polar_express" and "cans"). However, the iter_mode for "custom" is still hardcoded to "cycle" via the else branch:
iter_mode: CoeffIterMode = (
"repeat_last" if coefficient_type in ("polar_express", "cans") else "cycle"
)A user who provides custom coefficients designed for CANS-style repeat_last behavior (e.g., a longer custom Remez set with a stable last step) has no way to opt into repeat_last. They will silently get cycle instead, which wraps back to the first coefficient — giving completely wrong results for their use case. Now that repeat_last is an established, supported pattern, exposing it for "custom" would make the API consistent:
def newton_schulz(
...
custom_coefficient_sets: list[tuple[float, float, float]] | None = None,
custom_iter_mode: CoeffIterMode = "cycle", # new parameter
...
):
...
if coefficient_type == "custom":
...
iter_mode = custom_iter_mode
else:
iter_mode = "repeat_last" if coefficient_type in ("polar_express", "cans") else "cycle"Alternatively, at minimum, add a note to the docstring warning that "custom" always cycles and does not support repeat_last.
This adds
coefficient_type=\"cans\"Newton-Schulz coefficients (and tests) so the optimizer can match CANS-based Muon implementations.CANS (http://arxiv.org/abs/2506.10935) is an algorithm very similar to PolarExpress, which can also accelerate LLM pre-training.