Hi, thanks for the great work!
I have a few quick questions about the training dynamics of SIM-CoT. First, is the auxiliary decoder jointly trained with the base model, or is it kept fixed during training? It seems gradients flow through it, but I want to confirm whether its parameters are updated.
Second, regarding the choice of the number of implicit steps K and its relation to step-level alignment. In practice, different QA sample (and datasets) may require different numbers of reasoning steps—for example, harder math problems typically involve longer reasoning steps. In this case, for each latent token z, which step I should use to supervise it? I also have the same question for the curriculum strategy: when K is gradually increased during training, how is consistent alignment between latent tokens and reasoning steps maintained?
Thanks again for the insightful work!
Hi, thanks for the great work!
I have a few quick questions about the training dynamics of SIM-CoT. First, is the auxiliary decoder jointly trained with the base model, or is it kept fixed during training? It seems gradients flow through it, but I want to confirm whether its parameters are updated.
Second, regarding the choice of the number of implicit steps
Kand its relation to step-level alignment. In practice, different QA sample (and datasets) may require different numbers of reasoning steps—for example, harder math problems typically involve longer reasoning steps. In this case, for each latent tokenz, which step I should use to supervise it? I also have the same question for the curriculum strategy: when K is gradually increased during training, how is consistent alignment between latent tokens and reasoning steps maintained?Thanks again for the insightful work!