Skip to content

Add VLM support when passing raw token IDs to vLLM client#5227

Merged
qgallouedec merged 20 commits intomainfrom
vllm-support-image-with-raw-token
Mar 9, 2026
Merged

Add VLM support when passing raw token IDs to vLLM client#5227
qgallouedec merged 20 commits intomainfrom
vllm-support-image-with-raw-token

Conversation

@qgallouedec
Copy link
Member

@qgallouedec qgallouedec commented Mar 5, 2026

Context

Part of the series to fix the re-tokenization bug in GRPO multi-turn tool calling (see #5224).

When the model generates a completion in a tool-calling loop, the decoded text is re-tokenized via apply_chat_template, which can produce different token IDs due to BPE merge ambiguities. To fix this, we need a token-in / token-out pipeline: tokenize once, then pass raw token IDs through every subsequent generation call — never decoding and re-tokenizing.

Changes

  • vllm_serve.py: When prompts contains token ID lists, images are now attached as multi_modal_data in vLLM's TokensPrompt format, matching the existing behavior for text prompts.
  • Tests: Add TestVLLMClientServerVLM test class that verifies generation with token IDs + images using Qwen/Qwen2.5-VL-3B-Instruct.

Backward compatibility

Fully backward compatible. No changes to existing call sites or API signatures.

Tests

$ pytest -v tests/test_vllm_client_server.py
========================================================== test session starts ===========================================================
platform linux -- Python 3.13.11, pytest-9.0.2, pluggy-1.6.0 -- /fsx/qgallouedec/miniconda3/envs/trl/bin/python3.13
cachedir: .pytest_cache
rootdir: /fsx/qgallouedec/trl
configfile: pyproject.toml
plugins: rerunfailures-15.1, anyio-4.12.1, xdist-3.8.0, datadir-1.8.0, cov-7.0.0
collected 42 items                                                                                                                       

tests/test_vllm_client_server.py::TestChunkList::test_even_split PASSED                                                            [  2%]
tests/test_vllm_client_server.py::TestChunkList::test_uneven_split PASSED                                                          [  4%]
tests/test_vllm_client_server.py::TestChunkList::test_more_chunks_than_elements PASSED                                             [  7%]
tests/test_vllm_client_server.py::TestChunkList::test_n_equals_len PASSED                                                          [  9%]
tests/test_vllm_client_server.py::TestChunkList::test_n_is_1 PASSED                                                                [ 11%]
tests/test_vllm_client_server.py::TestChunkList::test_single_element_list PASSED                                                   [ 14%]
tests/test_vllm_client_server.py::TestChunkList::test_any_dtype PASSED                                                             [ 16%]
tests/test_vllm_client_server.py::TestExtractLogprobs::test_extract_logprobs_sorts_by_rank_and_replaces_nan PASSED                 [ 19%]
tests/test_vllm_client_server.py::TestExtractLogprobs::test_extract_logprobs_returns_none_token_ids_when_logprobs_missing PASSED   [ 21%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_generate PASSED                                                       [ 23%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_chat PASSED                                                           [ 26%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_chat_with_tools PASSED                                                [ 28%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_generate_with_token_ids PASSED                                        [ 30%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_generate_with_params PASSED                                           [ 33%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_update_model_params PASSED                                            [ 35%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_reset_prefix_cache PASSED                                             [ 38%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_logprobs_match_with_non_default_sampling XFAIL (Importing `bitsan...) [ 40%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_generate PASSED                                                [ 42%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_chat PASSED                                                    [ 45%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_chat_with_tools PASSED                                         [ 47%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_generate_with_token_ids PASSED                                 [ 50%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_generate_with_params PASSED                                    [ 52%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_update_model_params PASSED                                     [ 54%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_reset_prefix_cache PASSED                                      [ 57%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_generate PASSED                                                     [ 59%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_chat PASSED                                                         [ 61%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_chat_with_tools PASSED                                              [ 64%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_generate_with_token_ids PASSED                                      [ 66%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_generate_with_params PASSED                                         [ 69%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_update_model_params PASSED                                          [ 71%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_reset_prefix_cache PASSED                                           [ 73%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_generate SKIPPED (Skipping DP server test for vLLM>=0.14.0 (PR ...) [ 76%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_chat SKIPPED (Skipping DP server test for vLLM>=0.14.0 (PR vllm...) [ 78%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_chat_with_tools SKIPPED (Skipping DP server test for vLLM>=0.14...) [ 80%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_generate_with_token_ids SKIPPED (Skipping DP server test for vL...) [ 83%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_generate_with_params SKIPPED (Skipping DP server test for vLLM>...) [ 85%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_update_model_params SKIPPED (Skipping DP server test for vLLM>=...) [ 88%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_reset_prefix_cache SKIPPED (Skipping DP server test for vLLM>=0...) [ 90%]
tests/test_vllm_client_server.py::TestVLLMClientServerDeviceParameter::test_init_communicator_with_device_int PASSED               [ 92%]
tests/test_vllm_client_server.py::TestVLLMClientServerDeviceParameter::test_init_communicator_with_device_string PASSED            [ 95%]
tests/test_vllm_client_server.py::TestVLLMClientServerDeviceParameter::test_init_communicator_with_torch_device PASSED             [ 97%]
tests/test_vllm_client_server.py::TestVLLMClientServerVLM::test_generate_with_token_ids_and_image PASSED                           [100%]

========================================== 34 passed, 7 skipped, 1 xfailed in 533.61s (0:08:53) ==========================================

Note

Medium Risk
Changes the /generate/ request/response contract for images and alters server-side prompt construction; mis-shaped image payloads could break existing callers or produce incorrect multimodal inputs.

Overview
Adds VLM support for token-in/token-out generation by allowing VLLMClient.generate() and the trl vllm-serve /generate/ endpoint to accept per-prompt image batches (now images: list[list[... ] | None]) and attach them as multi_modal_data even when prompts are raw token ID lists.

Updates OnlineDPOTrainer’s vLLM server path to wrap each image into a singleton list (or None) to match the new API, and adds a new slow vision-gated integration test (TestVLLMClientServerVLM) validating token-id generation with multiple/mixed images using Qwen/Qwen2.5-VL-3B-Instruct.

Written by Cursor Bugbot for commit d5e1906. This will update automatically on new commits. Configure here.

@qgallouedec qgallouedec changed the base branch from main to vllm-accept-token-ids March 5, 2026 20:59
@qgallouedec
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4b927d63bf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@qgallouedec
Copy link
Member Author

@codex review

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 07181cbafc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Base automatically changed from vllm-accept-token-ids to main March 9, 2026 17:51
@qgallouedec qgallouedec merged commit a262d9f into main Mar 9, 2026
15 of 16 checks passed
@qgallouedec qgallouedec deleted the vllm-support-image-with-raw-token branch March 9, 2026 23:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants