Skip to content

RFC: Deprecate allow_training_without_logprobs option #528

@corbt

Description

@corbt

Summary

We're considering removing the allow_training_without_logprobs option from ART. This RFC is to gather community feedback before making this change.

Background

The allow_training_without_logprobs option allows training without requiring generation logprobs from the model. However, this approach has several drawbacks:

  1. Importance sampling requires logprobs for stable training: In our experiments and in the wider RL community, having generation logprobs is essential for importance sampling, which is critical for stable training results. Training without them leads to less reliable outcomes.

  2. Code complexity: Maintaining this alternative path adds complexity to the codebase and makes it harder to reason about the training flow.

  3. Subtle bugs: The additional code path creates opportunities for subtle bugs. For example, in PR #527 we discovered tool-call tokenization issues that were partially enabled by this mode's complexity.

Proposal

Remove the allow_training_without_logprobs option entirely, simplifying the codebase and ensuring all users benefit from the more robust training path that uses logprobs.

Request for Feedback

Is anyone in the community actively using allow_training_without_logprobs with good results?

If you're using this option and it's working well for your use case, please let us know:

  • What is your use case?
  • Why do you need to train without logprobs?
  • What results are you seeing?

If we don't hear from users who depend on this feature, we plan to remove it in an upcoming release.


Related: #527

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions