Skip to content

Feature request: support logprobs output #39

@dddgogogo

Description

@dddgogogo

Summary

Currently DFlash speculative decoding does not support returning logprobs in the API response. When requesting logprobs=True via the OpenAI-compatible API (/v1/chat/completions), the response either errors out or returns null logprobs.

Use Case

Log probabilities are essential for:

  • Confidence estimation — filtering or flagging low-confidence generations
  • Calibration & evaluation — measuring model uncertainty on benchmarks
  • Routing & cascading — deciding whether to escalate to a larger model based on token-level confidence
  • RLHF / reward modeling — computing per-token rewards requires logprobs from the policy model

Expected Behavior

{
  "logprobs": true,
  "top_logprobs": 5
}

should return per-token log probabilities in the response, consistent with the standard SGLang / vLLM behavior for non-speculative decoding.

Notes

  • SGLang's native autoregressive decoding and MTP speculative decoding both support logprobs.
  • For speculative decoding, logprobs from the target model's verification step would be the correct values to return (since those are the "true" probabilities that determine acceptance).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions