Questions on SIM-CoT training dynamics (decoder training, K selection, and step alignment)

Hi, thanks for the great work!

I have a few quick questions about the training dynamics of SIM-CoT. First, is the auxiliary decoder jointly trained with the base model, or is it kept fixed during training? It seems gradients flow through it, but I want to confirm whether its parameters are updated.

Second,  regarding the choice of the number of implicit steps `K` and its relation to step-level alignment. In practice, different QA sample (and datasets) may require different numbers of reasoning steps—for example, harder math problems typically involve longer reasoning steps. In this case, for each latent token `z`, which step I should use to supervise it? I also have the same question for the curriculum strategy: when K is gradually increased during training, how is consistent alignment between latent tokens and reasoning steps maintained?

Thanks again for the insightful work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions on SIM-CoT training dynamics (decoder training, K selection, and step alignment) #12

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Questions on SIM-CoT training dynamics (decoder training, K selection, and step alignment) #12

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions