Skip to content

Move rollout_func from _generate_single_turn to _generate#5232

Merged
qgallouedec merged 27 commits intomainfrom
move-rollout-func
Mar 10, 2026
Merged

Move rollout_func from _generate_single_turn to _generate#5232
qgallouedec merged 27 commits intomainfrom
move-rollout-func

Conversation

@qgallouedec
Copy link
Member

@qgallouedec qgallouedec commented Mar 6, 2026

Context

Context

Part of the series to fix the re-tokenization bug in GRPO multi-turn tool calling (see #5224).

When the model generates a completion in a tool-calling loop, the decoded text is re-tokenized via apply_chat_template, which can produce different token IDs due to BPE merge ambiguities. To fix this, we need a token-in / token-out pipeline: tokenize once, then pass raw token IDs through every subsequent generation call — never decoding and re-tokenizing.

This PR is a preparatory refactor that moves rollout_func handling out of _generate_single_turn and into _generate.

Changes

  • Move rollout_func dispatch from _generate_single_turn to _generate: The rollout_func code path (including vLLM weight sync, key validation, and extra fields extraction) now lives in _generate, before _generate_single_turn is called.

Why

_generate_single_turn currently mixes two concerns: custom rollout dispatch and generation backend dispatch (vLLM / transformers_paged / regular). Moving rollout_func up to _generate separates these responsibilities and makes _generate_single_turn purely about generation.

This separation is needed for the next PR, which will introduce a centralized tokenization step in _generate_single_turn. By having rollout_func handled at the _generate level, the tokenization refactor in _generate_single_turn won't interfere with custom rollout logic (which manages its own tokenization).

Backward compatibility

rollout_func is called with the same arguments and its output is handled identically for the single-turn case.

Note: in the multi-turn tool-calling path (_tool_call_loop), _generate_single_turn is called for re-generation after tool execution. Previously, this would hit the rollout_func early return, which was already incorrect — rollout_func generates from scratch and has no awareness of the tool-augmented conversation. After this PR, _tool_call_loop correctly bypasses rollout_func and goes straight to the generation backends. In practice, combining rollout_func with tools was not a supported use case.


Note

Medium Risk
Touches GRPO generation control flow and changes when custom rollouts are invoked, which can affect tool-calling/multi-turn behavior and vLLM sync semantics; covered by updated unit tests but still impacts a core training path.

Overview
Moves custom rollout_func handling (vLLM weight sync, required key validation, and extra-field extraction) out of _generate_single_turn and into _generate, making _generate_single_turn only responsible for backend generation.

Updates rollout-dispatch unit tests to call _generate and expands the mocked trainer state/accelerator fields to match the new _generate execution path (including mapping env_mask to the returned tool mask).

Written by Cursor Bugbot for commit 0558dc9. This will update automatically on new commits. Configure here.

@qgallouedec qgallouedec changed the title Move rollout_func from _generate_single_turn to _generate` Move rollout_func from _generate_single_turn to _generate Mar 6, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f3f6a5df71

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@qgallouedec
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d138be76d3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

missing_keys_list = sorted(missing_keys)
raise ValueError(f"rollout_func must return keys {missing_keys_list} in its output dict.")
extra_fields = {k: v for k, v in output.items() if k not in required_keys}
prompt_ids, completion_ids, logprobs = output["prompt_ids"], output["completion_ids"], output["logprobs"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject rollout_func with tools on non-vLLM backends

After moving rollout_func handling into _generate, the first turn can return non-None logprobs from rollout_func, but post-tool turns in _tool_call_loop now always use _generate_single_turn, which returns post_tool_logprobs=None for regular and paged Transformers generation. In that case, _tool_call_loop still takes the if logprobs is not None branch and later indexes post_tool_logprobs[idx], causing a runtime crash when a tool call is present. This affects runs that set both rollout_func and tools without vLLM, so the combination should be blocked or post_tool_logprobs should be normalized before use.

Useful? React with 👍 / 👎.

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Base automatically changed from vllm-support-image-with-raw-token to main March 9, 2026 23:12
@qgallouedec qgallouedec merged commit f3b3705 into main Mar 10, 2026
14 checks passed
@qgallouedec qgallouedec deleted the move-rollout-func branch March 10, 2026 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants