Add VLM support when passing raw token IDs to vLLM client#5227
Add VLM support when passing raw token IDs to vLLM client#5227qgallouedec merged 20 commits intomainfrom
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4b927d63bf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 07181cbafc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Context
Part of the series to fix the re-tokenization bug in GRPO multi-turn tool calling (see #5224).
When the model generates a completion in a tool-calling loop, the decoded text is re-tokenized via
apply_chat_template, which can produce different token IDs due to BPE merge ambiguities. To fix this, we need a token-in / token-out pipeline: tokenize once, then pass raw token IDs through every subsequent generation call — never decoding and re-tokenizing.promptsin vLLM client and server #5225rollout_funcfrom_generate_single_turnto_generate#5232_generate_single_turn#5239_generate_single_turn#5240Changes
vllm_serve.py: Whenpromptscontains token ID lists, images are now attached asmulti_modal_datain vLLM'sTokensPromptformat, matching the existing behavior for text prompts.TestVLLMClientServerVLMtest class that verifies generation with token IDs + images usingQwen/Qwen2.5-VL-3B-Instruct.Backward compatibility
Fully backward compatible. No changes to existing call sites or API signatures.
Tests
Note
Medium Risk
Changes the
/generate/request/response contract forimagesand alters server-side prompt construction; mis-shaped image payloads could break existing callers or produce incorrect multimodal inputs.Overview
Adds VLM support for token-in/token-out generation by allowing
VLLMClient.generate()and thetrl vllm-serve/generate/endpoint to accept per-prompt image batches (nowimages: list[list[... ] | None]) and attach them asmulti_modal_dataeven when prompts are raw token ID lists.Updates
OnlineDPOTrainer’s vLLM server path to wrap each image into a singleton list (orNone) to match the new API, and adds a new slow vision-gated integration test (TestVLLMClientServerVLM) validating token-id generation with multiple/mixed images usingQwen/Qwen2.5-VL-3B-Instruct.Written by Cursor Bugbot for commit d5e1906. This will update automatically on new commits. Configure here.