Add VLM support when passing raw token IDs to vLLM client by qgallouedec · Pull Request #5227 · huggingface/trl

qgallouedec · 2026-03-05T20:58:43Z

Context

Part of the series to fix the re-tokenization bug in GRPO multi-turn tool calling (see #5224).

When the model generates a completion in a tool-calling loop, the decoded text is re-tokenized via apply_chat_template, which can produce different token IDs due to BPE merge ambiguities. To fix this, we need a token-in / token-out pipeline: tokenize once, then pass raw token IDs through every subsequent generation call — never decoding and re-tokenizing.

Changes

vllm_serve.py: When prompts contains token ID lists, images are now attached as multi_modal_data in vLLM's TokensPrompt format, matching the existing behavior for text prompts.
Tests: Add TestVLLMClientServerVLM test class that verifies generation with token IDs + images using Qwen/Qwen2.5-VL-3B-Instruct.

Backward compatibility

Fully backward compatible. No changes to existing call sites or API signatures.

Tests

$ pytest -v tests/test_vllm_client_server.py
========================================================== test session starts ===========================================================
platform linux -- Python 3.13.11, pytest-9.0.2, pluggy-1.6.0 -- /fsx/qgallouedec/miniconda3/envs/trl/bin/python3.13
cachedir: .pytest_cache
rootdir: /fsx/qgallouedec/trl
configfile: pyproject.toml
plugins: rerunfailures-15.1, anyio-4.12.1, xdist-3.8.0, datadir-1.8.0, cov-7.0.0
collected 42 items                                                                                                                       

tests/test_vllm_client_server.py::TestChunkList::test_even_split PASSED                                                            [  2%]
tests/test_vllm_client_server.py::TestChunkList::test_uneven_split PASSED                                                          [  4%]
tests/test_vllm_client_server.py::TestChunkList::test_more_chunks_than_elements PASSED                                             [  7%]
tests/test_vllm_client_server.py::TestChunkList::test_n_equals_len PASSED                                                          [  9%]
tests/test_vllm_client_server.py::TestChunkList::test_n_is_1 PASSED                                                                [ 11%]
tests/test_vllm_client_server.py::TestChunkList::test_single_element_list PASSED                                                   [ 14%]
tests/test_vllm_client_server.py::TestChunkList::test_any_dtype PASSED                                                             [ 16%]
tests/test_vllm_client_server.py::TestExtractLogprobs::test_extract_logprobs_sorts_by_rank_and_replaces_nan PASSED                 [ 19%]
tests/test_vllm_client_server.py::TestExtractLogprobs::test_extract_logprobs_returns_none_token_ids_when_logprobs_missing PASSED   [ 21%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_generate PASSED                                                       [ 23%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_chat PASSED                                                           [ 26%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_chat_with_tools PASSED                                                [ 28%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_generate_with_token_ids PASSED                                        [ 30%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_generate_with_params PASSED                                           [ 33%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_update_model_params PASSED                                            [ 35%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_reset_prefix_cache PASSED                                             [ 38%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_logprobs_match_with_non_default_sampling XFAIL (Importing `bitsan...) [ 40%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_generate PASSED                                                [ 42%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_chat PASSED                                                    [ 45%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_chat_with_tools PASSED                                         [ 47%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_generate_with_token_ids PASSED                                 [ 50%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_generate_with_params PASSED                                    [ 52%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_update_model_params PASSED                                     [ 54%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_reset_prefix_cache PASSED                                      [ 57%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_generate PASSED                                                     [ 59%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_chat PASSED                                                         [ 61%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_chat_with_tools PASSED                                              [ 64%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_generate_with_token_ids PASSED                                      [ 66%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_generate_with_params PASSED                                         [ 69%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_update_model_params PASSED                                          [ 71%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_reset_prefix_cache PASSED                                           [ 73%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_generate SKIPPED (Skipping DP server test for vLLM>=0.14.0 (PR ...) [ 76%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_chat SKIPPED (Skipping DP server test for vLLM>=0.14.0 (PR vllm...) [ 78%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_chat_with_tools SKIPPED (Skipping DP server test for vLLM>=0.14...) [ 80%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_generate_with_token_ids SKIPPED (Skipping DP server test for vL...) [ 83%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_generate_with_params SKIPPED (Skipping DP server test for vLLM>...) [ 85%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_update_model_params SKIPPED (Skipping DP server test for vLLM>=...) [ 88%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_reset_prefix_cache SKIPPED (Skipping DP server test for vLLM>=0...) [ 90%]
tests/test_vllm_client_server.py::TestVLLMClientServerDeviceParameter::test_init_communicator_with_device_int PASSED               [ 92%]
tests/test_vllm_client_server.py::TestVLLMClientServerDeviceParameter::test_init_communicator_with_device_string PASSED            [ 95%]
tests/test_vllm_client_server.py::TestVLLMClientServerDeviceParameter::test_init_communicator_with_torch_device PASSED             [ 97%]
tests/test_vllm_client_server.py::TestVLLMClientServerVLM::test_generate_with_token_ids_and_image PASSED                           [100%]

========================================== 34 passed, 7 skipped, 1 xfailed in 533.61s (0:08:53) ==========================================

Note

Medium Risk
Changes the /generate/ request/response contract for images and alters server-side prompt construction; mis-shaped image payloads could break existing callers or produce incorrect multimodal inputs.

Overview
Adds VLM support for token-in/token-out generation by allowing VLLMClient.generate() and the trl vllm-serve /generate/ endpoint to accept per-prompt image batches (now images: list[list[... ] | None]) and attach them as multi_modal_data even when prompts are raw token ID lists.

Updates OnlineDPOTrainer’s vLLM server path to wrap each image into a singleton list (or None) to match the new API, and adds a new slow vision-gated integration test (TestVLLMClientServerVLM) validating token-id generation with multiple/mixed images using Qwen/Qwen2.5-VL-3B-Instruct.

^{Written by Cursor Bugbot for commit d5e1906. This will update automatically on new commits. Configure here.}

…dling

qgallouedec · 2026-03-06T22:42:09Z

@codex review

…-token

trl/generation/vllm_client.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4b927d63bf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

trl/generation/vllm_client.py

… for None values

qgallouedec · 2026-03-07T00:38:06Z

@codex review

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

trl/scripts/vllm_serve.py

…-token

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 07181cbafc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

trl/generation/vllm_client.py

trl/scripts/vllm_serve.py

albertvillanova

Thanks.

trl/scripts/vllm_serve.py

…-token

HuggingFaceDocBuilderDev · 2026-03-09T17:47:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec added 7 commits March 5, 2026 19:10

support prompts or token IDs in VLLMClient and update API request han…

f10285e

…dling

test

7d2bb67

consistency

3b356ac

fix

82c4508

another fix

3ea2fcf

fix docstring

445f4ba

Add support for multi-modal inputs in VLLMClient and vllm_serve

8c6c88d

qgallouedec changed the base branch from main to vllm-accept-token-ids March 5, 2026 20:59

qgallouedec mentioned this pull request Mar 6, 2026

Re-tokenization bug in GRPO multi-turn tool calling #5224

Open

qgallouedec requested a review from albertvillanova March 6, 2026 15:48

qgallouedec assigned AmineDiro Mar 6, 2026

qgallouedec added 2 commits March 6, 2026 09:48

Merge branch 'main' into vllm-accept-token-ids

f617b2d

Merge branch 'main' into vllm-accept-token-ids

eaffd67

This was referenced Mar 6, 2026

Add support for raw ids in prompts in vLLM client and server #5225

Merged

Move rollout_func from _generate_single_turn to _generate #5232

Merged

support multi-image

4b927d6

qgallouedec and others added 2 commits March 6, 2026 22:43

style

029fc1f

Merge branch 'vllm-accept-token-ids' into vllm-support-image-with-raw…

20b4039

…-token

cursor bot reviewed Mar 6, 2026

View reviewed changes

trl/generation/vllm_client.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 6, 2026

View reviewed changes

trl/generation/vllm_client.py Show resolved Hide resolved

Fix handling of images in OnlineDPOTrainer to ensure proper structure…

07181cb

… for None values

cursor bot reviewed Mar 7, 2026

View reviewed changes

trl/scripts/vllm_serve.py Outdated Show resolved Hide resolved

qgallouedec added 2 commits March 6, 2026 18:41

Merge branch 'main' into vllm-accept-token-ids

6ff1e56

Merge branch 'vllm-accept-token-ids' into vllm-support-image-with-raw…

9f340e4

…-token

chatgpt-codex-connector bot reviewed Mar 7, 2026

View reviewed changes

trl/generation/vllm_client.py Show resolved Hide resolved

trl/scripts/vllm_serve.py Show resolved Hide resolved

qgallouedec mentioned this pull request Mar 7, 2026

[GRPO/RLOO] Tokenize before vLLM generation call #5238

Merged

qgallouedec requested a review from AmineDiro March 7, 2026 03:07

qgallouedec mentioned this pull request Mar 7, 2026

[GRPO/RLOO] Unify tokenization across all generation backends in _generate_single_turn #5239

Merged

This was referenced Mar 7, 2026

[GRPO/RLOO] Extract tokenize prompts from _generate_single_turn #5240

Merged

[GRPO] Fix re-tokenization bug in tool-calling loop by concatenating token IDs #5242

Open

albertvillanova approved these changes Mar 9, 2026

View reviewed changes

trl/scripts/vllm_serve.py Outdated Show resolved Hide resolved

qgallouedec and others added 2 commits March 9, 2026 17:23

revert doc modif

f033e63

Merge branch 'vllm-accept-token-ids' into vllm-support-image-with-raw…

5a1f609

…-token

Base automatically changed from vllm-accept-token-ids to main March 9, 2026 17:51

qgallouedec and others added 3 commits March 9, 2026 11:52

Merge branch 'main' into vllm-support-image-with-raw-token

d3f7971

simplify multimodal

319d52a

Merge branch 'main' into vllm-support-image-with-raw-token

d5e1906

qgallouedec merged commit a262d9f into main Mar 9, 2026
15 of 16 checks passed

qgallouedec deleted the vllm-support-image-with-raw-token branch March 9, 2026 23:12

bangawayoo mentioned this pull request Mar 10, 2026

Add multi-image and prompt token id support to vLLM server #5228

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VLM support when passing raw token IDs to vLLM client#5227

Add VLM support when passing raw token IDs to vLLM client#5227
qgallouedec merged 20 commits intomainfrom
vllm-support-image-with-raw-token

qgallouedec commented Mar 5, 2026 •

edited by cursor bot

Loading

Uh oh!

qgallouedec commented Mar 6, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

qgallouedec commented Mar 7, 2026

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

albertvillanova left a comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

qgallouedec commented Mar 5, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes

Backward compatibility

Tests

Uh oh!

qgallouedec commented Mar 6, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

qgallouedec commented Mar 7, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

qgallouedec commented Mar 5, 2026 •

edited by cursor bot

Loading