Allow reward functions to log extra columns and scalar metrics by manueldeprada · Pull Request #5233 · huggingface/trl

manueldeprada · 2026-03-06T17:44:22Z

TLDR

Adds log_extra and log_metric hooks to reward functions in GRPO and RLOO so custom reward functions can log extra completion columns and scalar metrics without accessing private trainer state.

So:

While working with custom reward functions (e.g. for calibration training), I found myself using log_completions=True, which logs samples to trackio (super useful!).

However, I also needed to log additional columns to the completions table (like extracted answers and gold labels) and scalar metrics (like accuracy and ECE) directly from within the reward function. Currently there’s no clean way to do this in GRPO without accessing private state, to the best of my knowledge.

This PR adds two public methods to _BaseTrainer and passes them to reward functions via reward_kwargs:

log_extra(column, values): adds extra columns to the completions table (parquet + wandb/trackio)
log_metric(name, value): logs scalar metrics through the existing _metrics mechanism, so they show up as plots alongside KL, entropy, etc.

Both are fully backwards compatible, as existing reward functions will just absorb them into **kwargs. It is not super orthodox to pass functions like this, but it is simple and adds very little code.. I also added them to RLOO, as it is similar, for completeness.

Example usage

def my_reward_fn(completions, answer, log_extra=None, log_metric=None, **kwargs):
    extracted = [extract_answer(c) for c in completions]
    rewards = [1.0 if e == a else 0.0 for e, a in zip(extracted, answer)]

    if log_extra:
        log_extra("golden_answer", list(answer))
        log_extra("extracted_answer", extracted)

    if log_metric:
        log_metric("accuracy", sum(rewards) / len(rewards))

    return rewards

Docs, tests, and signatures were Clauded, so lmk if you’d be interested in merging it and I can put more effort into polishing the PR.

Shoutout to the berner guys, Andy, @lewtun, joel and @lvwerra. Hope everything is going well, its also fun to find myself using TRL from the other side 🙂

Note

Medium Risk
Touches core GRPO/RLOO training logging paths and distributed gather/metric aggregation, so incorrect ordering or buffering could misattribute logged values across ranks; changes are additive and covered by new tests.

Overview
Enables custom reward functions in GRPOTrainer and RLOOTrainer to emit extra per-completion columns and custom scalar metrics via new log_extra / log_metric callbacks passed through reward_kwargs.

Trainers now buffer and flush these logs safely in distributed runs (sorted-key gathering/averaging), include extras in the completions parquet/table output, and add tests + docs examples demonstrating the new hooks.

^{Written by Cursor Bugbot for commit 9bbcb82. This will update automatically on new commits. Configure here.}

…logs

trl/trainer/rloo_trainer.py

trl/trainer/grpo_trainer.py

qgallouedec · 2026-03-10T01:01:55Z

Thanks @manueldeprada!

I understand the need, there's currently no simple way to log extra metrics from within a reward function without modifying the codebase. The closest workaround today is to use separate reward functions with reward_weights=0 for pure logging, but that doesn't help when the metric is a byproduct of the reward computation and you want to avoid recomputing it.

That said, I'm not fully sold on the approach. A reward function should ideally be passive: it takes inputs, returns a scalar, and has no side effects. Passing a log_metric callback blurs that boundary.

I don't have a better alternative in mind though, and from the user's perspective, this is probably the most intuitive and simple solution. So I'd consider merging it. Open to thoughts from others. @albertvillanova @AmineDiro.

qgallouedec · 2026-03-10T01:12:24Z

@codex review

HuggingFaceDocBuilderDev · 2026-03-10T01:15:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 71270089a6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

trl/trainer/grpo_trainer.py

… and RLOO Trainer documentation

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

trl/trainer/grpo_trainer.py

…logs

manueldeprada · 2026-03-11T22:42:40Z

Thanks @manueldeprada!

I understand the need, there's currently no simple way to log extra metrics from within a reward function without modifying the codebase. The closest workaround today is to use separate reward functions with reward_weights=0 for pure logging, but that doesn't help when the metric is a byproduct of the reward computation and you want to avoid recomputing it.

That said, I'm not fully sold on the approach. A reward function should ideally be passive: it takes inputs, returns a scalar, and has no side effects. Passing a log_metric callback blurs that boundary.

I don't have a better alternative in mind though, and from the user's perspective, this is probably the most intuitive and simple solution. So I'd consider merging it. Open to thoughts from others. @albertvillanova @AmineDiro.

Thanks for considering @qgallouedec !!

I agree, its not elegant. But it solves the need with minimal changes. Let me know if you need anything else from my side. On the meantime, it's working well for me!

P.S. On a related note, I’ve been having some issues with Trackio’s local DB–Hub dataset synchronization. For some runs, the data never gets written to the Hub dataset, so I end up losing it. I'm still investigating and not sure yet whether it’s something on my side, with Trackio, or with TRL. I thought I’d mention it here in case anyone has noticed something similar.

qgallouedec · 2026-03-11T23:53:00Z

answering on trackio, yes, it's something that we encountered as well but it should be fixed if you upgrade trackio. we've been working on trackio robustness

manueldeprada · 2026-03-12T10:36:32Z

answering on trackio, yes, it's something that we encountered as well but it should be fixed if you upgrade trackio. we've been working on trackio robustness

that worked! last trackio requires last huggingface-hub, which requires transformers to 5.0, but that works fine with source TRL as long as I dont use judges.

Updating all these, trackio seems to persists data correctly. Awesome!!

Another bit you may be interested in: I sometimes get 429 rate-limited for a few minutes while watching https://manueldeprada-thinking.hf.space It's not big issue, but I guess you might want to check that the default simple trackio experience is as smooth as possible.

PS2: cloned TRL works well with sdpa-powered FA4 with latest vllm 😄

manueldeprada added 6 commits March 3, 2026 18:58

extra_logs

f5d3975

fix bug

c8742d5

also log metrics

2e03c46

tests and docs

149447f

update docs and signatures

382b2b9

Merge branch 'main' of https://github.com/huggingface/trl into extra_…

44c4c1a

…logs

cursor bot reviewed Mar 6, 2026

View reviewed changes

trl/trainer/rloo_trainer.py Show resolved Hide resolved

opsie

9e262ac

cursor bot reviewed Mar 6, 2026

View reviewed changes

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec mentioned this pull request Mar 7, 2026

[GRPO/RLOO] Extract tokenize prompts from _generate_single_turn #5240

Merged

Merge branch 'main' into extra_logs

d74a6bd

qgallouedec added 2 commits March 10, 2026 01:07

fix dist issue

2d150ed

style + concsistency

7127008

chatgpt-codex-connector bot reviewed Mar 10, 2026

View reviewed changes

trl/trainer/grpo_trainer.py Show resolved Hide resolved

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec added 2 commits March 10, 2026 04:41

edge case different key order

188045d

Clarify logging requirements for log_extra and log_metric in GRPO…

3d5676c

… and RLOO Trainer documentation

cursor bot reviewed Mar 10, 2026

View reviewed changes

trl/trainer/grpo_trainer.py Show resolved Hide resolved

trl/trainer/grpo_trainer.py Show resolved Hide resolved

Merge branch 'main' of https://github.com/huggingface/trl into extra_…

9bbcb82

…logs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow reward functions to log extra columns and scalar metrics#5233

Allow reward functions to log extra columns and scalar metrics#5233
manueldeprada wants to merge 13 commits intohuggingface:mainfrom
manueldeprada:extra_logs

manueldeprada commented Mar 6, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

qgallouedec commented Mar 10, 2026

Uh oh!

qgallouedec commented Mar 10, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Mar 10, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

manueldeprada commented Mar 11, 2026

Uh oh!

qgallouedec commented Mar 11, 2026

Uh oh!

manueldeprada commented Mar 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

manueldeprada commented Mar 6, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TLDR

So:

Example usage

Uh oh!

Uh oh!

Uh oh!

qgallouedec commented Mar 10, 2026

Uh oh!

qgallouedec commented Mar 10, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Mar 10, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

manueldeprada commented Mar 11, 2026

Uh oh!

qgallouedec commented Mar 11, 2026

Uh oh!

manueldeprada commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

manueldeprada commented Mar 6, 2026 •

edited by cursor bot

Loading

manueldeprada commented Mar 12, 2026 •

edited

Loading