-
Notifications
You must be signed in to change notification settings - Fork 85
Question about interchangeability #43
Copy link
Copy link
Open
Description
First, I would like to express my gratitude to the researchers who transparently shared their outstanding work.
I have a question regarding the interchangeability discussed in Section 3.5.1 of the technical report. The report states that the best learning results were observed when Muon was used as the optimizer in both the pre-training and SFT processes. I am curious whether the same results were observed in RL beyond SFT.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels