Skip to content

Add support for raw ids in prompts in vLLM client and server#5225

Merged
qgallouedec merged 10 commits intomainfrom
vllm-accept-token-ids
Mar 9, 2026
Merged

Add support for raw ids in prompts in vLLM client and server#5225
qgallouedec merged 10 commits intomainfrom
vllm-accept-token-ids

Conversation

@qgallouedec
Copy link
Member

@qgallouedec qgallouedec commented Mar 5, 2026

Context

Part of the series to fix the re-tokenization bug in GRPO multi-turn tool calling (see #5224).

When the model generates a completion in a tool-calling loop, the decoded text is re-tokenized via apply_chat_template, which can produce different token IDs due to BPE merge ambiguities. To fix this, we need a token-in / token-out pipeline: tokenize once, then pass raw token IDs through every subsequent generation call — never decoding and re-tokenizing.

To fix that, we need the ability to pass pre-tokenized prompts directly through the client/server pipeline. This PR adds that capability without changing any existing behavior.

Changes

  • VLLMClient.generate(): Add support for prompts parameter being tokens. Existing callers using prompts are unaffected.
  • vllm_serve.py: GenerateRequest now accepts both prompts as tokens.
  • Tests: Add test_generate_with_token_ids across all test classes to cover the new code path.

Backward compatibility

Fully backward compatible.

Tests

$ pytest -v tests/test_vllm_client_server.py
========================================================== test session starts ===========================================================
platform linux -- Python 3.13.11, pytest-9.0.2, pluggy-1.6.0 -- /fsx/qgallouedec/miniconda3/envs/trl/bin/python3.13
cachedir: .pytest_cache
rootdir: /fsx/qgallouedec/trl
configfile: pyproject.toml
plugins: rerunfailures-15.1, anyio-4.12.1, xdist-3.8.0, datadir-1.8.0, cov-7.0.0
collected 41 items                                                                                       

tests/test_vllm_client_server.py::TestChunkList::test_even_split PASSED                                                            [  2%]
tests/test_vllm_client_server.py::TestChunkList::test_uneven_split PASSED                                                          [  4%]
tests/test_vllm_client_server.py::TestChunkList::test_more_chunks_than_elements PASSED                                             [  7%]
tests/test_vllm_client_server.py::TestChunkList::test_n_equals_len PASSED                                                          [  9%]
tests/test_vllm_client_server.py::TestChunkList::test_n_is_1 PASSED                                                                [ 12%]
tests/test_vllm_client_server.py::TestChunkList::test_single_element_list PASSED                                                   [ 14%]
tests/test_vllm_client_server.py::TestChunkList::test_any_dtype PASSED                                                             [ 17%]
tests/test_vllm_client_server.py::TestExtractLogprobs::test_extract_logprobs_sorts_by_rank_and_replaces_nan PASSED                 [ 19%]
tests/test_vllm_client_server.py::TestExtractLogprobs::test_extract_logprobs_returns_none_token_ids_when_logprobs_missing PASSED   [ 21%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_generate PASSED                                                       [ 24%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_chat PASSED                                                           [ 26%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_chat_with_tools PASSED                                                [ 29%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_generate_with_token_ids PASSED                                        [ 31%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_generate_with_params PASSED                                           [ 34%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_update_model_params PASSED                                            [ 36%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_reset_prefix_cache PASSED                                             [ 39%]
tests/test_vllm_client_server.py::TestVLLMClientServer::test_logprobs_match_with_non_default_sampling XFAIL                        [ 41%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_generate PASSED                                                [ 43%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_chat PASSED                                                    [ 46%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_chat_with_tools PASSED                                         [ 48%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_generate_with_token_ids PASSED                                 [ 51%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_generate_with_params PASSED                                    [ 53%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_update_model_params PASSED                                     [ 56%]
tests/test_vllm_client_server.py::TestVLLMClientServerBaseURL::test_reset_prefix_cache PASSED                                      [ 58%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_generate PASSED                                                     [ 60%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_chat PASSED                                                         [ 63%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_chat_with_tools PASSED                                              [ 65%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_generate_with_token_ids PASSED                                      [ 68%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_generate_with_params PASSED                                         [ 70%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_update_model_params PASSED                                          [ 73%]
tests/test_vllm_client_server.py::TestVLLMClientServerTP::test_reset_prefix_cache PASSED                                           [ 75%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_generate SKIPPED (Skipping DP
server test for vLLM>=0.14.0 (PR ...)                                                                                              [ 78%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_chat SKIPPED (Skipping DP server
test for vLLM>=0.14.0 (PR vllm...)                                                                                                 [ 80%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_chat_with_tools SKIPPED (Skipping
DP server test for vLLM>=0.14...)                                                                                                  [ 82%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_generate_with_token_ids SKIPPED                                     [ 85%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_generate_with_params SKIPPED                                        [ 87%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_update_model_params SKIPPED                                         [ 90%]
tests/test_vllm_client_server.py::TestVLLMClientServerDP::test_reset_prefix_cache SKIPPED                                          [ 92%]
tests/test_vllm_client_server.py::TestVLLMClientServerDeviceParameter::test_init_communicator_with_device_int PASSED               [ 95%]
tests/test_vllm_client_server.py::TestVLLMClientServerDeviceParameter::test_init_communicator_with_device_string PASSED            [ 97%]
tests/test_vllm_client_server.py::TestVLLMClientServerDeviceParameter::test_init_communicator_with_torch_device PASSED             [100%]

========================================== 33 passed, 7 skipped, 1 xfailed in 433.87s (0:07:13) ==========================================



Note

Medium Risk
Changes the request/dispatch logic for the /generate/ endpoint and the server-mode generation path, so regressions could surface in prompt handling (especially around the string-vs-token branching and multimodal/image prompts).

Overview
Adds a new token-in path for vLLM generation by allowing VLLMClient.generate() and the /generate/ API (vllm_serve.py) to accept prompts as list[list[int]] in addition to strings; the server now detects token IDs and forwards them to vLLM via prompt_token_ids (disabling image support for that path).

Updates server-mode VLLMGeneration.generate() to pre-tokenize non-chat prompts with processing_class and call vllm_client.generate() with token IDs, and adds test_generate_with_token_ids coverage across the vLLM client/server test variants.

Written by Cursor Bugbot for commit f033e63. This will update automatically on new commits. Configure here.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7d2bb6727b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@qgallouedec qgallouedec marked this pull request as draft March 5, 2026 19:42
@qgallouedec qgallouedec marked this pull request as ready for review March 5, 2026 20:11
@qgallouedec
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ea2fcff50

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Just some minor comments below.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec qgallouedec merged commit 9db3688 into main Mar 9, 2026
14 checks passed
@qgallouedec qgallouedec deleted the vllm-accept-token-ids branch March 9, 2026 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants