Move rollout_func from _generate_single_turn to _generate#5232
Move rollout_func from _generate_single_turn to _generate#5232qgallouedec merged 27 commits intomainfrom
rollout_func from _generate_single_turn to _generate#5232Conversation
rollout_func from _generate_single_turn to _generate`rollout_func from _generate_single_turn to _generate
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f3f6a5df71
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d138be76d3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| missing_keys_list = sorted(missing_keys) | ||
| raise ValueError(f"rollout_func must return keys {missing_keys_list} in its output dict.") | ||
| extra_fields = {k: v for k, v in output.items() if k not in required_keys} | ||
| prompt_ids, completion_ids, logprobs = output["prompt_ids"], output["completion_ids"], output["logprobs"] |
There was a problem hiding this comment.
Reject rollout_func with tools on non-vLLM backends
After moving rollout_func handling into _generate, the first turn can return non-None logprobs from rollout_func, but post-tool turns in _tool_call_loop now always use _generate_single_turn, which returns post_tool_logprobs=None for regular and paged Transformers generation. In that case, _tool_call_loop still takes the if logprobs is not None branch and later indexes post_tool_logprobs[idx], causing a runtime crash when a tool call is present. This affects runs that set both rollout_func and tools without vLLM, so the combination should be blocked or post_tool_logprobs should be normalized before use.
Useful? React with 👍 / 👎.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Context
Context
Part of the series to fix the re-tokenization bug in GRPO multi-turn tool calling (see #5224).
When the model generates a completion in a tool-calling loop, the decoded text is re-tokenized via
apply_chat_template, which can produce different token IDs due to BPE merge ambiguities. To fix this, we need a token-in / token-out pipeline: tokenize once, then pass raw token IDs through every subsequent generation call — never decoding and re-tokenizing.promptsin vLLM client and server #5225_generate_single_turn#5239_generate_single_turn#5240This PR is a preparatory refactor that moves
rollout_funchandling out of_generate_single_turnand into_generate.Changes
rollout_funcdispatch from_generate_single_turnto_generate: Therollout_funccode path (including vLLM weight sync, key validation, and extra fields extraction) now lives in_generate, before_generate_single_turnis called.Why
_generate_single_turncurrently mixes two concerns: custom rollout dispatch and generation backend dispatch (vLLM / transformers_paged / regular). Movingrollout_funcup to_generateseparates these responsibilities and makes_generate_single_turnpurely about generation.This separation is needed for the next PR, which will introduce a centralized tokenization step in
_generate_single_turn. By havingrollout_funchandled at the_generatelevel, the tokenization refactor in_generate_single_turnwon't interfere with custom rollout logic (which manages its own tokenization).Backward compatibility
rollout_funcis called with the same arguments and its output is handled identically for the single-turn case.Note: in the multi-turn tool-calling path (
_tool_call_loop),_generate_single_turnis called for re-generation after tool execution. Previously, this would hit therollout_funcearly return, which was already incorrect —rollout_funcgenerates from scratch and has no awareness of the tool-augmented conversation. After this PR,_tool_call_loopcorrectly bypassesrollout_funcand goes straight to the generation backends. In practice, combiningrollout_funcwithtoolswas not a supported use case.Note
Medium Risk
Touches GRPO generation control flow and changes when custom rollouts are invoked, which can affect tool-calling/multi-turn behavior and vLLM sync semantics; covered by updated unit tests but still impacts a core training path.
Overview
Moves custom
rollout_funchandling (vLLM weight sync, required key validation, and extra-field extraction) out of_generate_single_turnand into_generate, making_generate_single_turnonly responsible for backend generation.Updates rollout-dispatch unit tests to call
_generateand expands the mocked trainer state/accelerator fields to match the new_generateexecution path (including mappingenv_maskto the returned tool mask).Written by Cursor Bugbot for commit 0558dc9. This will update automatically on new commits. Configure here.