Skip to content

Fix constrained-decode latency: bounded top-K, skip unused logSumExp#538

Merged
FuJacob merged 1 commit into
mainfrom
fix/llm-generation-slow-regression
Jun 2, 2026
Merged

Fix constrained-decode latency: bounded top-K, skip unused logSumExp#538
FuJacob merged 1 commit into
mainfrom
fix/llm-generation-slow-regression

Commits

Commits on Jun 2, 2026