[GRPO] Fix re-tokenization bug in tool-calling loop by concatenating token IDs by qgallouedec · Pull Request #5242 · huggingface/trl

qgallouedec · 2026-03-07T05:40:15Z

Context

Part of the series to fix the re-tokenization bug in GRPO multi-turn tool calling (see #5224).

closes #5224
closes #5144

When the model generates a completion in a tool-calling loop, the decoded text is re-tokenized via apply_chat_template, which can produce different token IDs due to BPE merge ambiguities. To fix this, we need a token-in / token-out pipeline: tokenize once, then pass raw token IDs through every subsequent generation call — never decoding and re-tokenizing.

This is the final PR in the series. It eliminates the re-tokenization in the tool-calling loop — the actual source of the bug.

Changes

New _get_tool_suffix_ids(tool_messages) method: Tokenizes only the tool result portion by diffing a minimal dummy conversation (2 messages vs 3 messages). This avoids re-tokenizing the full conversation history.
_tool_call_loop: Instead of re-tokenizing prompt + completion + tool_results via apply_chat_template, builds the token sequence by concatenation: prompt_ids + completion_ids + tool_suffix_ids. The original prompt and completion token IDs are preserved exactly as they were — only the new tool result tokens are freshly tokenized.
Removed the prefix-preserving sanity check (no longer needed since the prefix is preserved by construction).
Removed the _tokenize_prompts call in the tool loop.

The bug and the fix

Previously, after a tool call:

The completion was decoded to text and appended as an assistant message
The full prompt + assistant + tool_results was re-tokenized via apply_chat_template
Due to BPE merge ambiguity, step 2 could produce different token IDs for the completion part

Now:

The original prompt_ids and completion_ids are kept as-is (never decoded and re-tokenized)
Only the tool result suffix is tokenized, using a minimal dummy conversation to extract just the template formatting
The full prompt is built by concatenation: prompt_ids + completion_ids + suffix_ids

Backward compatibility

No user-facing API changes. _get_tool_suffix_ids and _tool_call_loop are internal methods.

Note

Medium Risk
Touches GRPO/RLOO generation paths and tool-calling control flow; mistakes could change generated token sequences or break tool-loop behavior, impacting training stability.

Overview
Fixes GRPO multi-turn tool calling to be token-in/token-out. The tool loop no longer decodes and re-tokenizes prompt + completion + tool_results; it now concatenates existing prompt_ids/completion_ids with freshly tokenized tool-result suffix IDs to preserve the exact original completion tokens.

Adds internal helper _get_tool_suffix_ids() to tokenize only the tool-result formatting, updates _tool_call_loop() to carry images/multimodal_fields through regeneration without re-running _tokenize_prompts, and simplifies _generate_single_turn() return values (dropping redundant prompt_ids returns) in both grpo_trainer.py and rloo_trainer.py.

^{Written by Cursor Bugbot for commit 10708ca. This will update automatically on new commits. Configure here.}

…dling

…-token

… for None values

…-token

…ration

…_turn

… left-padding for per-token fields

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3375aeac6c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

trl/trainer/grpo_trainer.py

… and multimodal_fields parameters

trl/trainer/grpo_trainer.py

…enerate

…loop

…r and RLOOTrainer

…loop

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

trl/trainer/rloo_trainer.py

qgallouedec · 2026-03-13T23:57:30Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 367a79ebc6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

trl/trainer/grpo_trainer.py

qgallouedec and others added 27 commits March 5, 2026 19:10

support prompts or token IDs in VLLMClient and update API request han…

f10285e

…dling

test

7d2bb67

consistency

3b356ac

fix

82c4508

another fix

3ea2fcf

fix docstring

445f4ba

Add support for multi-modal inputs in VLLMClient and vllm_serve

8c6c88d

Merge branch 'main' into vllm-accept-token-ids

f617b2d

Merge branch 'main' into vllm-accept-token-ids

eaffd67

Move rollout_func from _generate_single_turn to _generate`

f3f6a5d

fix style

d417543

support multi-image

4b927d6

style

029fc1f

Merge branch 'vllm-accept-token-ids' into vllm-support-image-with-raw…

20b4039

…-token

Merge branch 'vllm-support-image-with-raw-token' into move-rollout-func

b8e3912

Fix handling of images in OnlineDPOTrainer to ensure proper structure…

07181cb

… for None values

Merge branch 'main' into vllm-accept-token-ids

6ff1e56

Merge branch 'vllm-accept-token-ids' into vllm-support-image-with-raw…

9f340e4

…-token

Merge branch 'vllm-support-image-with-raw-token' into move-rollout-func

d138be7

Move tokenization before vLLM generation call

09128d6

Fix deadlock issue by ensuring images are always gathered in VLLMGene…

7fd1711

…ration

Unify tokenization across all generation backends in _generate_single…

3ab04b0

…_turn

Extract tokenization out of _generate_single_turn into _tokenize_prompts

5d6d067

Enhance multimodal input handling in GRPO and RLOO trainers by adding…

b4d2c34

… left-padding for per-token fields

style

4922362

Merge branch 'unify-tokenization-generate' into extract-tokenize-prompts

37c48b3

Fix re-tokenization bug in tool-calling loop by concatenating token IDs

3375aea

chatgpt-codex-connector bot reviewed Mar 7, 2026

View reviewed changes

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

Enhance _tool_call_loop to support multimodal inputs by adding images…

638f88a

… and multimodal_fields parameters

cursor bot reviewed Mar 7, 2026

View reviewed changes

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec and others added 20 commits March 10, 2026 12:23

Merge branch 'main' into vllm-generate-with-token-ids

fee553d

Merge branch 'vllm-generate-with-token-ids' into unify-tokenization-g…

90df2de

…enerate

Merge branch 'unify-tokenization-generate' into extract-tokenize-prompts

f36c0ea

Merge branch 'extract-tokenize-prompts' into fix-retokenization-tool-…

8678382

…loop

fix

fdaa90a

style

6f10cd2

Merge branch 'unify-tokenization-generate' into extract-tokenize-prompts

533c337

Merge branch 'extract-tokenize-prompts' into fix-retokenization-tool-…

50418e0

…loop

Merge branch 'main' into unify-tokenization-generate

7e7e3b3

Merge branch 'unify-tokenization-generate' into extract-tokenize-prompts

31d8a0c

Merge branch 'extract-tokenize-prompts' into fix-retokenization-tool-…

e88987f

…loop

Merge branch 'main' into extract-tokenize-prompts

8b4f6af

Merge branch 'extract-tokenize-prompts' into fix-retokenization-tool-…

a704d89

…loop

Merge branch 'main' into extract-tokenize-prompts

81cf273

Remove dead code: eliminate prompt tokenization logic from GRPOTraine…

918686b

…r and RLOOTrainer

remove unused extra_fields from _generate_single_turn return value

9b8de83

style

6c8f55c

Merge branch 'extract-tokenize-prompts' into fix-retokenization-tool-…

130d974

…loop

properly merge upstream

8b27397

fix

6c9db28

Base automatically changed from extract-tokenize-prompts to main March 10, 2026 21:35

Merge branch 'main' into fix-retokenization-tool-loop

441725b

cursor bot reviewed Mar 13, 2026

View reviewed changes

trl/trainer/rloo_trainer.py Outdated Show resolved Hide resolved

align with main

367a79e

chatgpt-codex-connector bot reviewed Mar 14, 2026

View reviewed changes

trl/trainer/grpo_trainer.py Show resolved Hide resolved

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec and others added 2 commits March 14, 2026 00:21

fix

f3f0f8d

Merge branch 'main' into fix-retokenization-tool-loop

5147625

qgallouedec mentioned this pull request Mar 14, 2026

Fix GRPO tool mask alignment after tool-call retokenization #5145

Closed

Merge branch 'main' into fix-retokenization-tool-loop

10708ca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GRPO] Fix re-tokenization bug in tool-calling loop by concatenating token IDs#5242

[GRPO] Fix re-tokenization bug in tool-calling loop by concatenating token IDs#5242
qgallouedec wants to merge 101 commits intomainfrom
fix-retokenization-tool-loop

qgallouedec commented Mar 7, 2026 •

edited by cursor bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

qgallouedec commented Mar 13, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qgallouedec commented Mar 7, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes

The bug and the fix

Backward compatibility

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qgallouedec commented Mar 13, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qgallouedec commented Mar 7, 2026 •

edited by cursor bot

Loading