-
Notifications
You must be signed in to change notification settings - Fork 667
Description
Summary
We're considering removing the allow_training_without_logprobs option from ART. This RFC is to gather community feedback before making this change.
Background
The allow_training_without_logprobs option allows training without requiring generation logprobs from the model. However, this approach has several drawbacks:
-
Importance sampling requires logprobs for stable training: In our experiments and in the wider RL community, having generation logprobs is essential for importance sampling, which is critical for stable training results. Training without them leads to less reliable outcomes.
-
Code complexity: Maintaining this alternative path adds complexity to the codebase and makes it harder to reason about the training flow.
-
Subtle bugs: The additional code path creates opportunities for subtle bugs. For example, in PR #527 we discovered tool-call tokenization issues that were partially enabled by this mode's complexity.
Proposal
Remove the allow_training_without_logprobs option entirely, simplifying the codebase and ensuring all users benefit from the more robust training path that uses logprobs.
Request for Feedback
Is anyone in the community actively using allow_training_without_logprobs with good results?
If you're using this option and it's working well for your use case, please let us know:
- What is your use case?
- Why do you need to train without logprobs?
- What results are you seeing?
If we don't hear from users who depend on this feature, we plan to remove it in an upcoming release.
Related: #527