fix: OpenAI response decoding defaults, multimodal prompt handling, and tokenizer_name override by BolinSNLHM · Pull Request #307 · mlcommons/endpoints

BolinSNLHM · 2026-05-05T21:30:20Z

Summary

Fixes three issues encountered when running the Qwen3-VL-235B-A22B benchmark against vLLM/Dynamo with the Shopify multimodal dataset:

openai/types.py — vLLM omits optional response fields (content, refusal, finish_reason, usage, system_fingerprint, etc.) that our msgspec structs required to be present. Added safe defaults so decoding succeeds regardless of what the server populates.
session.py — The ShopifyMultimodalFormatter produces prompt as list[dict] (OpenAI vision content parts), which the HTTP adapter handles correctly. However, PhaseIssuer's metrics tracking assigned it directly to PromptData.text (typed str | None), causing a decode failure. Added a type guard to coerce non-string prompts to None.
config/schema.py + execute.py — When the model is served from a container-local path (e.g. an NVFP4 snapshot under /root/.cache/huggingface/hub/...), tokenizer loading fails because the path isn't a valid HF repo ID. Added model_params.tokenizer_name to decouple the tokenizer source from the serving model name.

Testing

Validated end-to-end on GB300 (offline scenario, 1000 samples): all samples completed successfully with no decode errors and full latency/throughput metrics generated.

Note: The example YAMLs assume the server is started with --served-model-name matching model_params.name. When serving from a local NVFP4 checkpoint (the typical NVIDIA submission setup), set model_params.name to the actual served path, tokenizer_name to the upstream HF repo ID (e.g. Qwen/Qwen3-VL-235B-A22B-Instruct), and endpoint_config.endpoints to the node IP.

github-actions · 2026-05-05T21:30:36Z

MLCommons CLA bot:
Thank you very much for your submission; we really appreciate it. Before we can accept your contribution,
we ask that you sign the MLCommons CLA (Apache 2). Please submit your GitHub ID to our onboarding form to initiate
authorization. If you are from a MLCommons member organization, we will request that you be added to the CLA.
If you are not from a member organization, we will email you a CLA to sign. For any questions, please contact
support@mlcommons.org.
0 out of 1 committers have signed the MLCommons CLA.
❌ @BoLin Sun
Bolin Sun seems not to be a GitHub user. You need a GitHub account after you become MLCommons member. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You can retrigger this bot by commenting recheck in this Pull Request}

gemini-code-assist

Code Review

This pull request introduces a tokenizer_name configuration option to allow overriding the tokenizer source, which is particularly useful when the serving model uses a local path or a quantized checkpoint. It also enhances the robustness of OpenAI response parsing by adding default values for fields frequently omitted by inference servers and updates the load generator to handle multimodal prompt data. A suggestion was made to improve the tokenizer existence check by verifying local file system paths before querying the Hugging Face Hub, ensuring local tokenizers are correctly identified without triggering API errors.

gemini-code-assist · 2026-05-05T21:31:45Z

+    tokenizer_source = config.model_params.tokenizer_name or model_name
+    tokenizer_name = (
+        tokenizer_source if _check_tokenizer_exists(tokenizer_source) else None
+    )


The current implementation of _check_tokenizer_exists relies on huggingface_hub.model_info, which is designed for repository IDs and will raise an exception (and log a warning) when provided with a local file system path. Since tokenizer_name is explicitly documented in the schema to support local paths, the logic should check for local existence first. This avoids incorrect warnings and ensures local tokenizers are correctly identified and passed to the metrics aggregator.

Suggested change

tokenizer_source = config.model_params.tokenizer_name or model_name

tokenizer_name = (

tokenizer_source if _check_tokenizer_exists(tokenizer_source) else None

)

tokenizer_source = config.model_params.tokenizer_name or model_name

tokenizer_name = (

tokenizer_source

if Path(tokenizer_source).exists() or _check_tokenizer_exists(tokenizer_source)

else None

)

Three independent issues blocked the Qwen3-VL-235B-A22B benchmark against a real vLLM/Dynamo server. They are bundled here because they all surface on the same flow (Shopify multimodal dataset → OpenAI chat completions on vLLM): 1. **OpenAI response decoding too strict (`openai/types.py`).** The msgspec structs that decode `/v1/chat/completions` responses declared `content`, `refusal`, `finish_reason`, `created`, `model`, `usage`, and `system_fingerprint` as required fields with no defaults. vLLM (and the OpenAI spec itself) treats these as nullable/optional — vLLM omits `usage` on non-final SSE chunks, rarely sets `system_fingerprint`, and may omit `content` on tool-only responses. Decoding any such response raised a msgspec validation error before the adapter could return a QueryResult. Added safe defaults so the decoder accepts whatever the server legitimately sends. 2. **Multimodal prompt crashes metrics tracking (`session.py`).** `PhaseIssuer.issue()` constructs a `PromptData(text=..., ...)` for the ISSUED event. `PromptData.text` is typed `str | None`, but the Shopify multimodal formatter sets `prompt` to OpenAI vision content parts (`[{"type": "text", ...}, {"type": "image_url", ...}]`). msgspec rejected the list and the issuer raised, killing the run before any sample was sent. The HTTP adapter handles list-shaped prompts correctly — only the metrics tracking path needed fixing. Type-guard the assignment so non-strings fall through to None and ISL is computed from `token_ids` (which the multimodal datasets already pre-tokenize). 3. **Tokenizer cannot be loaded from container-local model paths (`schema.py`, `execute.py`).** When the model is served from a local cache path (e.g. `/root/.cache/huggingface/hub/models--…`) instead of a HF repo ID, `_check_tokenizer_exists(model_name)` fails its `model_info()` probe and the aggregator runs without ISL/OSL/TPOT metrics. Added a `tokenizer_name` override field to `ModelParams` so the tokenizer source can be set explicitly while the served model name remains a path. The example YAML configs document the new field with a commented hint. Validated all three changes interactively with msgspec/pydantic: - `ChatCompletionResponse` decodes minimal vLLM responses (most fields missing) and full responses with usage. - `PromptData(text=…)` accepts both string prompts and multimodal `list[dict]` prompts. - `ModelParams` round-trips through YAML with and without `tokenizer_name`, and both Q3VL example configs load cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

BolinSNLHM requested a review from a team May 5, 2026 21:30

gemini-code-assist Bot reviewed May 5, 2026

View reviewed changes

BolinSNLHM force-pushed the fix/openai-decoding-and-tokenizer-override branch from 5d35e99 to a33690b Compare May 5, 2026 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: OpenAI response decoding defaults, multimodal prompt handling, and tokenizer_name override#307

fix: OpenAI response decoding defaults, multimodal prompt handling, and tokenizer_name override#307
BolinSNLHM wants to merge 1 commit intomlcommons:mainfrom
BolinSNLHM:fix/openai-decoding-and-tokenizer-override

BolinSNLHM commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BolinSNLHM commented May 5, 2026

Summary

Testing

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant