[Question] Details about DAPO training (entropy behavior, clip_high, dynamic sampling, loss design)

Hi, thanks again for your great work!

I have a question regarding the DAPO training described in the paper.

In the paper, it is mentioned that using DAPO leads to a noticeable entropy decrease during training. However, from my understanding, one of the key motivations behind DAPO is to *prevent entropy from collapsing too quickly* and maintain better exploration.

While going through the released training scripts, I did not find implementations corresponding to some components that seem important for DAPO, such as:

* `clip_high` in the policy update
* dynamic filtering of samples
* specific mechanisms to stabilize entropy

So I would like to ask for more detailed clarification on the DAPO setup used in the paper:

1. **Was `clip_high` used in your implementation?**
   If so, what value/range was used?

2. **Was any form of dynamic sampling applied during training?**

3. **What loss formulation was used?**

   * Is it token-level mean loss (`token_mean`) or sequence-level?
   * Any modification compared to standard PPO-style objectives?

4. **Was length normalization or length penalty applied?**
   If yes, how was it incorporated into the reward or loss?

5. **How do you interpret the entropy drop observed in your experiments?**
   Is it expected behavior under your configuration, or controlled in some way?

Understanding these details would be very helpful for reproducing the results and better understanding the role of DAPO in your framework.

Thanks a lot for your time and help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Details about DAPO training (entropy behavior, clip_high, dynamic sampling, loss design) #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Details about DAPO training (entropy behavior, clip_high, dynamic sampling, loss design) #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions