Summary
Currently DFlash speculative decoding does not support returning logprobs in the API response. When requesting logprobs=True via the OpenAI-compatible API (/v1/chat/completions), the response either errors out or returns null logprobs.
Use Case
Log probabilities are essential for:
- Confidence estimation — filtering or flagging low-confidence generations
- Calibration & evaluation — measuring model uncertainty on benchmarks
- Routing & cascading — deciding whether to escalate to a larger model based on token-level confidence
- RLHF / reward modeling — computing per-token rewards requires logprobs from the policy model
Expected Behavior
{
"logprobs": true,
"top_logprobs": 5
}
should return per-token log probabilities in the response, consistent with the standard SGLang / vLLM behavior for non-speculative decoding.
Notes
- SGLang's native autoregressive decoding and MTP speculative decoding both support logprobs.
- For speculative decoding, logprobs from the target model's verification step would be the correct values to return (since those are the "true" probabilities that determine acceptance).
Summary
Currently DFlash speculative decoding does not support returning
logprobsin the API response. When requestinglogprobs=Truevia the OpenAI-compatible API (/v1/chat/completions), the response either errors out or returns null logprobs.Use Case
Log probabilities are essential for:
Expected Behavior
{ "logprobs": true, "top_logprobs": 5 }should return per-token log probabilities in the response, consistent with the standard SGLang / vLLM behavior for non-speculative decoding.
Notes