docs: clarify PPO entropy metrics in PPO trainer docs by biefan · Pull Request #5289 · huggingface/trl

biefan · 2026-03-14T18:09:25Z

Summary

Clarify the difference between objective/entropy and policy/entropy_avg in the PPO trainer docs.

What changed

Updated the objective/entropy description to match the rollout-time computation ((-logprobs).sum(1).mean()).
Updated the policy/entropy_avg description to match the optimization-time entropy computed from logits.
Added a short note explaining why these two metrics are expected to differ.

Why

Issue #2023 points out that the two entropy metrics had very similar wording, which made interpretation difficult when debugging PPO runs.

Fixes #2023

Note

Low Risk
Low risk documentation-only change that updates metric wording and adds a brief clarification note; no runtime or API behavior is modified.

Overview
Clarifies the PPO trainer metric docs by rewriting the descriptions of objective/entropy (rollout-time proxy computed from -logprobs) and policy/entropy_avg (optimization-time categorical entropy computed from logits).

Adds an explicit note explaining that these metrics are measured at different phases (rollouts vs. PPO optimization) and therefore are expected to differ.

^{Written by Cursor Bugbot for commit 8367f1e. This will update automatically on new commits. Configure here.}

docs: clarify PPO entropy metric definitions

8367f1e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: clarify PPO entropy metrics in PPO trainer docs#5289

docs: clarify PPO entropy metrics in PPO trainer docs#5289
biefan wants to merge 1 commit intohuggingface:mainfrom
biefan:docs/clarify-ppo-entropy-metrics-2023

biefan commented Mar 14, 2026 •

edited by cursor bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

biefan commented Mar 14, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Why

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

biefan commented Mar 14, 2026 •

edited by cursor bot

Loading