Skip to content

Questions on SIM-CoT training dynamics (decoder training, K selection, and step alignment) #12

@wnn2000

Description

@wnn2000

Hi, thanks for the great work!

I have a few quick questions about the training dynamics of SIM-CoT. First, is the auxiliary decoder jointly trained with the base model, or is it kept fixed during training? It seems gradients flow through it, but I want to confirm whether its parameters are updated.

Second, regarding the choice of the number of implicit steps K and its relation to step-level alignment. In practice, different QA sample (and datasets) may require different numbers of reasoning steps—for example, harder math problems typically involve longer reasoning steps. In this case, for each latent token z, which step I should use to supervise it? I also have the same question for the curriculum strategy: when K is gradually increased during training, how is consistent alignment between latent tokens and reasoning steps maintained?

Thanks again for the insightful work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions