Feature request: support logprobs output

## Summary

Currently DFlash speculative decoding does not support returning `logprobs` in the API response. When requesting `logprobs=True` via the OpenAI-compatible API (`/v1/chat/completions`), the response either errors out or returns null logprobs.

## Use Case

Log probabilities are essential for:
- **Confidence estimation** — filtering or flagging low-confidence generations
- **Calibration & evaluation** — measuring model uncertainty on benchmarks
- **Routing & cascading** — deciding whether to escalate to a larger model based on token-level confidence
- **RLHF / reward modeling** — computing per-token rewards requires logprobs from the policy model

## Expected Behavior

```json
{
  "logprobs": true,
  "top_logprobs": 5
}
```

should return per-token log probabilities in the response, consistent with the standard SGLang / vLLM behavior for non-speculative decoding.

## Notes

- SGLang's native autoregressive decoding and MTP speculative decoding both support logprobs.
- For speculative decoding, logprobs from the **target model's verification step** would be the correct values to return (since those are the "true" probabilities that determine acceptance).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: support logprobs output #39

Summary

Use Case

Expected Behavior

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request: support logprobs output #39

Description

Summary

Use Case

Expected Behavior

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions