Move `rollout_func` from `_generate_single_turn` to `_generate` by qgallouedec · Pull Request #5232 · huggingface/trl

qgallouedec · 2026-03-06T17:01:23Z

Context

Part of the series to fix the re-tokenization bug in GRPO multi-turn tool calling (see #5224).

When the model generates a completion in a tool-calling loop, the decoded text is re-tokenized via apply_chat_template, which can produce different token IDs due to BPE merge ambiguities. To fix this, we need a token-in / token-out pipeline: tokenize once, then pass raw token IDs through every subsequent generation call — never decoding and re-tokenizing.

This PR is a preparatory refactor that moves rollout_func handling out of _generate_single_turn and into _generate.

Changes

Move rollout_func dispatch from _generate_single_turn to _generate: The rollout_func code path (including vLLM weight sync, key validation, and extra fields extraction) now lives in _generate, before _generate_single_turn is called.

Why

_generate_single_turn currently mixes two concerns: custom rollout dispatch and generation backend dispatch (vLLM / transformers_paged / regular). Moving rollout_func up to _generate separates these responsibilities and makes _generate_single_turn purely about generation.

This separation is needed for the next PR, which will introduce a centralized tokenization step in _generate_single_turn. By having rollout_func handled at the _generate level, the tokenization refactor in _generate_single_turn won't interfere with custom rollout logic (which manages its own tokenization).

Backward compatibility

rollout_func is called with the same arguments and its output is handled identically for the single-turn case.

Note: in the multi-turn tool-calling path (_tool_call_loop), _generate_single_turn is called for re-generation after tool execution. Previously, this would hit the rollout_func early return, which was already incorrect — rollout_func generates from scratch and has no awareness of the tool-augmented conversation. After this PR, _tool_call_loop correctly bypasses rollout_func and goes straight to the generation backends. In practice, combining rollout_func with tools was not a supported use case.

Note

Medium Risk
Touches GRPO generation control flow and changes when custom rollouts are invoked, which can affect tool-calling/multi-turn behavior and vLLM sync semantics; covered by updated unit tests but still impacts a core training path.

Overview
Moves custom rollout_func handling (vLLM weight sync, required key validation, and extra-field extraction) out of _generate_single_turn and into _generate, making _generate_single_turn only responsible for backend generation.

Updates rollout-dispatch unit tests to call _generate and expands the mocked trainer state/accelerator fields to match the new _generate execution path (including mapping env_mask to the returned tool mask).

^{Written by Cursor Bugbot for commit 0558dc9. This will update automatically on new commits. Configure here.}

…dling

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f3f6a5df71

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

trl/trainer/grpo_trainer.py

…-token

… for None values

…-token

qgallouedec · 2026-03-07T00:58:06Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d138be76d3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-07T01:02:03Z

trl/trainer/grpo_trainer.py

+                missing_keys_list = sorted(missing_keys)
+                raise ValueError(f"rollout_func must return keys {missing_keys_list} in its output dict.")
+            extra_fields = {k: v for k, v in output.items() if k not in required_keys}
+            prompt_ids, completion_ids, logprobs = output["prompt_ids"], output["completion_ids"], output["logprobs"]


Reject rollout_func with tools on non-vLLM backends

After moving rollout_func handling into _generate, the first turn can return non-None logprobs from rollout_func, but post-tool turns in _tool_call_loop now always use _generate_single_turn, which returns post_tool_logprobs=None for regular and paged Transformers generation. In that case, _tool_call_loop still takes the if logprobs is not None branch and later indexes post_tool_logprobs[idx], causing a runtime crash when a tool call is present. This affects runs that set both rollout_func and tools without vLLM, so the combination should be blocked or post_tool_logprobs should be normalized before use.

Useful? React with 👍 / 👎.

albertvillanova

Thanks.

…-token

HuggingFaceDocBuilderDev · 2026-03-09T17:47:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec and others added 10 commits March 5, 2026 19:10

support prompts or token IDs in VLLMClient and update API request han…

f10285e

…dling

test

7d2bb67

consistency

3b356ac

fix

82c4508

another fix

3ea2fcf

fix docstring

445f4ba

Add support for multi-modal inputs in VLLMClient and vllm_serve

8c6c88d

Merge branch 'main' into vllm-accept-token-ids

f617b2d

Merge branch 'main' into vllm-accept-token-ids

eaffd67

Move rollout_func from _generate_single_turn to _generate`

f3f6a5d

qgallouedec changed the title ~~Move rollout_func from _generate_single_turn to _generate`~~ Move rollout_func from _generate_single_turn to _generate Mar 6, 2026

qgallouedec mentioned this pull request Mar 6, 2026

Add support for raw ids in prompts in vLLM client and server #5225

Merged

chatgpt-codex-connector bot reviewed Mar 6, 2026

View reviewed changes

trl/trainer/grpo_trainer.py Show resolved Hide resolved

qgallouedec mentioned this pull request Mar 6, 2026

Add VLM support when passing raw token IDs to vLLM client #5227

Merged

qgallouedec and others added 9 commits March 6, 2026 17:09

fix style

d417543

support multi-image

4b927d6

style

029fc1f

Merge branch 'vllm-accept-token-ids' into vllm-support-image-with-raw…

20b4039

…-token

Merge branch 'vllm-support-image-with-raw-token' into move-rollout-func

b8e3912

Fix handling of images in OnlineDPOTrainer to ensure proper structure…

07181cb

… for None values

Merge branch 'main' into vllm-accept-token-ids

6ff1e56

Merge branch 'vllm-accept-token-ids' into vllm-support-image-with-raw…

9f340e4

…-token

Merge branch 'vllm-support-image-with-raw-token' into move-rollout-func

d138be7

chatgpt-codex-connector bot reviewed Mar 7, 2026

View reviewed changes

qgallouedec mentioned this pull request Mar 7, 2026

[GRPO/RLOO] Tokenize before vLLM generation call #5238

Merged

qgallouedec requested review from AmineDiro and albertvillanova March 7, 2026 03:07

This was referenced Mar 7, 2026

[GRPO/RLOO] Unify tokenization across all generation backends in _generate_single_turn #5239

Merged

[GRPO/RLOO] Extract tokenize prompts from _generate_single_turn #5240

Merged

This was referenced Mar 7, 2026

[GRPO] Fix re-tokenization bug in tool-calling loop by concatenating token IDs #5242

Open

Re-tokenization bug in GRPO multi-turn tool calling #5224

Open

albertvillanova approved these changes Mar 9, 2026

View reviewed changes

qgallouedec and others added 3 commits March 9, 2026 17:23

revert doc modif

f033e63

Merge branch 'vllm-accept-token-ids' into vllm-support-image-with-raw…

5a1f609

…-token

Merge branch 'vllm-support-image-with-raw-token' into move-rollout-func

1eb3540

qgallouedec and others added 4 commits March 9, 2026 11:52

Merge branch 'main' into vllm-support-image-with-raw-token

d3f7971

simplify multimodal

319d52a

Merge branch 'main' into vllm-support-image-with-raw-token

d5e1906

Merge branch 'vllm-support-image-with-raw-token' into move-rollout-func

4ccadcf

Base automatically changed from vllm-support-image-with-raw-token to main March 9, 2026 23:12

Merge branch 'main' into move-rollout-func

0558dc9

qgallouedec merged commit f3b3705 into main Mar 10, 2026
14 checks passed

qgallouedec deleted the move-rollout-func branch March 10, 2026 00:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move `rollout_func` from `_generate_single_turn` to `_generate`#5232

Move `rollout_func` from `_generate_single_turn` to `_generate`#5232
qgallouedec merged 27 commits intomainfrom
move-rollout-func

qgallouedec commented Mar 6, 2026 •

edited by cursor bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

qgallouedec commented Mar 7, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Uh oh!

albertvillanova left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qgallouedec commented Mar 6, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Context

Changes

Why

Backward compatibility

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

qgallouedec commented Mar 7, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qgallouedec commented Mar 6, 2026 •

edited by cursor bot

Loading