Skip to content

[GRPO/RLOO] Unify tokenization across all generation backends in _generate_single_turn#5239

Merged
qgallouedec merged 58 commits intomainfrom
unify-tokenization-generate
Mar 10, 2026
Merged

[GRPO/RLOO] Unify tokenization across all generation backends in _generate_single_turn#5239
qgallouedec merged 58 commits intomainfrom
unify-tokenization-generate

Conversation

@qgallouedec
Copy link
Member

@qgallouedec qgallouedec commented Mar 7, 2026

Context

Part of the series to fix the re-tokenization bug in GRPO multi-turn tool calling (see #5224).

When the model generates a completion in a tool-calling loop, the decoded text is re-tokenized via apply_chat_template, which can produce different token IDs due to BPE merge ambiguities. To fix this, we need a token-in / token-out pipeline: tokenize once, then pass raw token IDs through every subsequent generation call — never decoding and re-tokenizing.

The previous PR moved tokenization out of VLLMGeneration.generate so that vLLM always receives raw token IDs. However, the paged and regular generation backends still performed their own independent tokenization. This PR unifies tokenization across all three backends (vLLM, paged, regular) by tokenizing once at the top of _generate_single_turn.

Changes

  • GRPO _generate_single_turn(): Move tokenization to the top of the method, before the backend dispatch. For conversational prompts, use apply_chat_template(tokenize=True, return_dict=True) and extract images + multimodal fields. For text prompts, use processing_class(text=prompts). All three backends (vLLM, paged, regular) now consume the same prompt_ids list. Remove the per-backend tokenization that previously existed in the paged and regular paths.
  • RLOO _generate_single_turn(): Same unification applied.
  • Regular generation path: Build left-padded tensors from the shared prompt_ids using pad(). Convert multimodal fields (numpy arrays from apply_chat_template) to tensors before passing to model.generate().

Why

Previously, each generation backend tokenized prompts independently:

  • vLLM path: tokenized in _generate_single_turn (added in PR4)
  • Paged path: called apply_chat_template or processing_class internally
  • Regular path: called apply_chat_template(return_tensors="pt", padding=True) or processing_class(return_tensors="pt") (we convert to tensor manually now)

This allows in the next PR to move the tokenization outside _generate_single_turn and into the trainers' generate method.

Backward compatibility

No user-facing API changes. The only behavioral difference is that the regular and paged paths now use the same tokenization call as the vLLM path, which should produce identical results.


Note

Medium Risk
Changes core generation input preparation across three backends and introduces new tensor conversion/padding behavior (including multimodal fields), which could subtly alter generation outputs or break on processor edge cases.

Overview
Unifies _generate_single_turn in GRPOTrainer and RLOOTrainer to tokenize prompts once and feed the same prompt_ids into all generation backends (vLLM, paged generate_batch, and standard model.generate), removing backend-specific tokenization.

For conversational/VLM inputs, it now uses apply_chat_template(..., padding=True) then unpads via attention_mask, carries through multimodal fields (e.g. pixel_values, image_grid_thw), and in the standard generation path manually left-pads prompt_ids and converts multimodal outputs to tensors (adding numpy as a dependency) before calling generate.

Written by Cursor Bugbot for commit c098b90. This will update automatically on new commits. Configure here.

@qgallouedec
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ab04b05d6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

generate_inputs = {"input_ids": padded_ids, "attention_mask": attention_mask}
# For VLMs, include multimodal fields as tensors (pixel_values, image_grid_thw, etc.)
for k, v in multimodal_fields.items():
generate_inputs[k] = torch.tensor(np.array(v)) if not isinstance(v, torch.Tensor) else v

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Pad sequence multimodal fields before tensor conversion

The regular-generation path now converts every multimodal_fields entry with torch.tensor(np.array(v)), but apply_chat_template(..., tokenize=True, return_dict=True) can return per-token lists (for example token_type_ids/mm_token_type_ids) whose lengths differ across prompts in a batch. In that case np.array(v) is ragged and raises ValueError, so generation fails before model.generate; previously this was avoided because return_tensors="pt", padding=True produced padded tensors automatically. This needs sequence-aware padding for those fields rather than direct array conversion.

Useful? React with 👍 / 👎.

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Just a question below.

qgallouedec and others added 4 commits March 10, 2026 09:06
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
qgallouedec and others added 7 commits March 10, 2026 18:14
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Base automatically changed from vllm-generate-with-token-ids to main March 10, 2026 18:48
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

@qgallouedec qgallouedec merged commit d5aaaea into main Mar 10, 2026
14 checks passed
@qgallouedec qgallouedec deleted the unify-tokenization-generate branch March 10, 2026 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants