Add support for raw ids in `prompts` in vLLM client and server by qgallouedec · Pull Request #5225 · huggingface/trl

qgallouedec · 2026-03-05T19:17:15Z

Context

Part of the series to fix the re-tokenization bug in GRPO multi-turn tool calling (see #5224).

When the model generates a completion in a tool-calling loop, the decoded text is re-tokenized via apply_chat_template, which can produce different token IDs due to BPE merge ambiguities. To fix this, we need a token-in / token-out pipeline: tokenize once, then pass raw token IDs through every subsequent generation call — never decoding and re-tokenizing.

To fix that, we need the ability to pass pre-tokenized prompts directly through the client/server pipeline. This PR adds that capability without changing any existing behavior.

Changes

VLLMClient.generate(): Add support for prompts parameter being tokens. Existing callers using prompts are unaffected.
vllm_serve.py: GenerateRequest now accepts both prompts as tokens.
Tests: Add test_generate_with_token_ids across all test classes to cover the new code path.

Backward compatibility

Fully backward compatible.

Tests

$ pytest -v tests/test_vllm_client_server.py
========================================================== test session starts ===========================================================
platform linux -- Python 3.13.11, pytest-9.0.2, pluggy-1.6.0 -- /fsx/qgallouedec/miniconda3/envs/trl/bin/python3.13
cachedir: .pytest_cache
rootdir: /fsx/qgallouedec/trl
configfile: pyproject.toml
plugins: rerunfailures-15.1, anyio-4.12.1, xdist-3.8.0, datadir-1.8.0, cov-7.0.0
collected 41 items                                                                                       

tests/test_vllm_client_server.py::TestChunkList::test_even_split PASSED                                                            [  2%]
tests/test_vllm_client_server.py::TestChunkList::test_uneven_split PASSED                                                          [  4%]
tests/test_vllm_client_server.py::TestChunkList::test_more_chunks_than_elements PASSED                                             [  7%]
tests/test_vllm_client_server.py::TestChunkList::test_n_equals_len PASSED                                                          [  9%]
tests/test_vllm_client_server.py::TestChunkList::test_n_is_1 PASSED                                                                [ 12%]
tests/test_vllm_client_server.py::TestChunkList::test_single_element_list PASSED                                                   [ 14%]
tests/test_vllm_client_server.py::TestChunkList::test_any_dtype PASSED                                                             [ 17%]
tests/test_vllm_client_server.py::TestExtractLogprobs::test_extract_logprobs_sorts_by_rank_and_replaces_nan PASSED                 [ 19%]
tests/test_vllm_client_server.py::TestExtractLogprobs::test_extract_logprobs_returns_none_token_ids_when_logprobs_missing PASSED   [ 21%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_generate PASSED                                                       [ 24%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_chat PASSED                                                           [ 26%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_chat_with_tools PASSED                                                [ 29%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_generate_with_token_ids PASSED                                        [ 31%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_generate_with_params PASSED                                           [ 34%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_update_model_params PASSED                                            [ 36%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_reset_prefix_cache PASSED                                             [ 39%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_logprobs_match_with_non_default_sampling XFAIL                        [ 41%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_generate PASSED                                                [ 43%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_chat PASSED                                                    [ 46%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_chat_with_tools PASSED                                         [ 48%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_generate_with_token_ids PASSED                                 [ 51%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_generate_with_params PASSED                                    [ 53%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_update_model_params PASSED                                     [ 56%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_reset_prefix_cache PASSED                                      [ 58%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_generate PASSED                                                     [ 60%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_chat PASSED                                                         [ 63%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_chat_with_tools PASSED                                              [ 65%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_generate_with_token_ids PASSED                                      [ 68%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_generate_with_params PASSED                                         [ 70%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_update_model_params PASSED                                          [ 73%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_reset_prefix_cache PASSED                                           [ 75%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_generate SKIPPED (Skipping DP
server test for vLLM>=0.14.0 (PR ...)                                                                                              [ 78%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_chat SKIPPED (Skipping DP server
test for vLLM>=0.14.0 (PR vllm...)                                                                                                 [ 80%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_chat_with_tools SKIPPED (Skipping
DP server test for vLLM>=0.14...)                                                                                                  [ 82%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_generate_with_token_ids SKIPPED                                     [ 85%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_generate_with_params SKIPPED                                        [ 87%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_update_model_params SKIPPED                                         [ 90%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_reset_prefix_cache SKIPPED                                          [ 92%]
tests/test_vllm_client_server.py::TestVLLMClientServerDeviceParameter::test_init_communicator_with_device_int PASSED               [ 95%]
tests/test_vllm_client_server.py::TestVLLMClientServerDeviceParameter::test_init_communicator_with_device_string PASSED            [ 97%]
tests/test_vllm_client_server.py::TestVLLMClientServerDeviceParameter::test_init_communicator_with_torch_device PASSED             [100%]

========================================== 33 passed, 7 skipped, 1 xfailed in 433.87s (0:07:13) ==========================================

Note

Medium Risk
Changes the request/dispatch logic for the /generate/ endpoint and the server-mode generation path, so regressions could surface in prompt handling (especially around the string-vs-token branching and multimodal/image prompts).

Overview
Adds a new token-in path for vLLM generation by allowing VLLMClient.generate() and the /generate/ API (vllm_serve.py) to accept prompts as list[list[int]] in addition to strings; the server now detects token IDs and forwards them to vLLM via prompt_token_ids (disabling image support for that path).

Updates server-mode VLLMGeneration.generate() to pre-tokenize non-chat prompts with processing_class and call vllm_client.generate() with token IDs, and adds test_generate_with_token_ids coverage across the vLLM client/server test variants.

^{Written by Cursor Bugbot for commit f033e63. This will update automatically on new commits. Configure here.}

…dling

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7d2bb6727b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

trl/scripts/vllm_serve.py

qgallouedec · 2026-03-05T20:11:23Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ea2fcff50

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

trl/scripts/vllm_serve.py

albertvillanova

Thanks. Just some minor comments below.

trl/scripts/vllm_serve.py

HuggingFaceDocBuilderDev · 2026-03-09T17:26:04Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec added 2 commits March 5, 2026 19:10

support prompts or token IDs in VLLMClient and update API request han…

f10285e

…dling

test

7d2bb67

qgallouedec requested review from AmineDiro and albertvillanova March 5, 2026 19:17

consistency

3b356ac

chatgpt-codex-connector bot reviewed Mar 5, 2026

View reviewed changes

trl/scripts/vllm_serve.py Outdated Show resolved Hide resolved

qgallouedec marked this pull request as draft March 5, 2026 19:42

qgallouedec added 2 commits March 5, 2026 19:53

fix

82c4508

another fix

3ea2fcf

qgallouedec marked this pull request as ready for review March 5, 2026 20:11

chatgpt-codex-connector bot reviewed Mar 5, 2026

View reviewed changes

trl/scripts/vllm_serve.py Show resolved Hide resolved

fix docstring

445f4ba

This was referenced Mar 5, 2026

Add VLM support when passing raw token IDs to vLLM client #5227

Merged

Re-tokenization bug in GRPO multi-turn tool calling #5224

Open

bangawayoo mentioned this pull request Mar 6, 2026

Add multi-image and prompt token id support to vLLM server #5228

Closed

5 tasks

AmineDiro reviewed Mar 6, 2026

View reviewed changes

trl/scripts/vllm_serve.py Show resolved Hide resolved

qgallouedec added 2 commits March 6, 2026 09:48

Merge branch 'main' into vllm-accept-token-ids

f617b2d

Merge branch 'main' into vllm-accept-token-ids

eaffd67

qgallouedec mentioned this pull request Mar 6, 2026

Move rollout_func from _generate_single_turn to _generate #5232

Merged

Merge branch 'main' into vllm-accept-token-ids

6ff1e56

qgallouedec changed the title ~~Add prompt_token_ids support to vLLM client and server~~ Add support for raw ids in prompts in vLLM client and server Mar 7, 2026

albertvillanova approved these changes Mar 9, 2026

View reviewed changes

trl/scripts/vllm_serve.py Outdated Show resolved Hide resolved

trl/scripts/vllm_serve.py Show resolved Hide resolved

revert doc modif

f033e63

qgallouedec merged commit 9db3688 into main Mar 9, 2026
14 checks passed

qgallouedec deleted the vllm-accept-token-ids branch March 9, 2026 17:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for raw ids in `prompts` in vLLM client and server#5225

Add support for raw ids in `prompts` in vLLM client and server#5225
qgallouedec merged 10 commits intomainfrom
vllm-accept-token-ids

qgallouedec commented Mar 5, 2026 •

edited by cursor bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

qgallouedec commented Mar 5, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

albertvillanova left a comment

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

qgallouedec commented Mar 5, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes

Backward compatibility

Tests

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

qgallouedec commented Mar 5, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

qgallouedec commented Mar 5, 2026 •

edited by cursor bot

Loading