fix: skip firstTokenLogProbThreshold when promptTokens are set by alan890104 · Pull Request #438 · argmaxinc/WhisperKit

alan890104 · 2026-03-11T03:51:13Z

Problem

Using promptTokens in DecodingOptions causes intermittent empty transcription results, particularly with distilled/turbo model variants (e.g. large-v3-turbo). This is the root cause of #372.

Root Cause Analysis

When promptTokens are provided, the decoder input sequence becomes:

[<|startofprev|>] + [prompt_tokens...] + [<|sot|>] + [lang] + [task] + [notimestamps]

The prompt tokens shift the decoder's KV cache state, causing the first content token's logprob to occasionally drop below firstTokenLogProbThreshold (default: -1.5). When this happens:

isFirstTokenLogProbTooLow is set to true
The decoding loop immediately breaks — producing zero content tokens
avgLogProb computes to 0.000 (no real tokens to average over)
DecodingFallback triggers with reason "firstTokenLogProbThreshold"
Temperature is increased and decoding retries, but the first token may still fail the threshold
After exhausting all fallback attempts, the final result is empty

Why this only affects WhisperKit

firstTokenLogProbThreshold is a WhisperKit-specific quality gate — it does not exist in:

OpenAI's original Whisper (whisper/decoding.py) — only uses logprob_threshold (avg over full segment), no_speech_threshold, and compression_ratio_threshold
whisper.cpp — faithfully ports the original three thresholds without adding a first-token check

The original Whisper design lets the decoder run to completion and evaluates quality over the entire segment via avgLogprob. This is robust to prompt-induced shifts in the first token's distribution because subsequent tokens compensate. WhisperKit's early abort on the first token prevents this self-correction.

Why it's intermittent

The first token logprob depends on the interaction between prompt token content and audio content. For certain audio segments, the prompt conditioning pushes the first token just below -1.5; for others it stays above. This creates a non-deterministic failure pattern.

Why turbo models are more affected

Distilled/turbo variants (e.g. large-v3-v20240930) have fewer decoder layers, making them more sensitive to changes in the conditioning context. The reduced decoder capacity has less room to absorb the distributional shift from prompt tokens.

Fix

Skip the firstTokenLogProbThreshold check when promptTokens are present (options.promptTokens == nil guard). This is a one-line change in TextDecoder.swift.

Safety: The existing logProbThreshold (avg over full segment, default -1.0) and compressionRatioThreshold remain active as quality gates, matching the original Whisper behavior. Truly bad segments will still be caught and retried via temperature fallback.

When promptTokens are provided, the decoder's KV cache state is shifted by the prompt context, causing the first content token's logprob to drop below firstTokenLogProbThreshold (-1.5). This immediately aborts the decoding loop, producing empty transcription results. This threshold is a WhisperKit-specific quality gate not present in OpenAI's original Whisper or whisper.cpp. The original Whisper relies on avgLogprob (computed over the full segment) for quality filtering, which remains active and serves as a safety net. The issue is intermittent and particularly affects distilled/turbo model variants (e.g. large-v3-turbo) where the reduced decoder capacity is more sensitive to prompt conditioning. Fixes argmaxinc#372

…shold Regression test for argmaxinc#372. Measured on tiny model + jfk.wav: - Without prompt tokens: firstToken logprob ≈ -0.087 - With CJK prompt tokens: firstToken logprob ≈ -0.578 Prompt tokens shift the first content token logprob ~6.6x lower. On turbo models (fewer decoder layers), this shift is amplified enough to breach the default threshold (-1.5). We use -0.5 here to reliably reproduce the issue on tiny, simulating the larger shift on turbo. Without the fix: test fails (empty transcription). With the fix: test passes (normal transcription).

Copilot

Pull request overview

This PR addresses intermittent empty transcription results when DecodingOptions.promptTokens is used by disabling WhisperKit’s firstTokenLogProbThreshold early-abort check in that mode (root cause of #372).

Changes:

Skip the firstTokenLogProbThreshold check when promptTokens are provided.
Add a regression unit test ensuring transcription is non-empty with promptTokens even under a strict firstTokenLogProbThreshold.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`Sources/WhisperKit/Core/TextDecoder.swift`	Disables the first-token logprob early-abort when `options.promptTokens` is non-nil to prevent empty outputs.
`Tests/WhisperKitTests/UnitTests.swift`	Adds regression coverage for prompting with a strict first-token threshold (Issue #372).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-12T19:51:11Z

Sources/WhisperKit/Core/TextDecoder.swift

            isFirstTokenLogProbTooLow =
-                if isFirstToken, let firstTokenLogProbThreshold = options.firstTokenLogProbThreshold, nextTokenLogProb < firstTokenLogProbThreshold {
+                if isFirstToken, options.promptTokens == nil, let firstTokenLogProbThreshold = options.firstTokenLogProbThreshold, nextTokenLogProb < firstTokenLogProbThreshold {
                    true
                } else {


This change makes firstTokenLogProbThreshold effectively a no-op whenever options.promptTokens is non-nil. That’s a behavior change for an existing public option, so it should be documented (e.g., in DecodingOptions docs / README) to avoid confusing callers who set a strict threshold expecting it to be enforced.

alan890104 added 2 commits March 11, 2026 11:50

alan890104 force-pushed the fix/skip-first-token-threshold-with-prompt-tokens branch from 01914ba to 6dafa8a Compare March 11, 2026 04:03

ZachNagengast requested a review from Copilot March 12, 2026 19:44

Copilot started reviewing on behalf of ZachNagengast March 12, 2026 19:45 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: skip firstTokenLogProbThreshold when promptTokens are set#438

fix: skip firstTokenLogProbThreshold when promptTokens are set#438
alan890104 wants to merge 2 commits intoargmaxinc:mainfrom
alan890104:fix/skip-first-token-threshold-with-prompt-tokens

alan890104 commented Mar 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alan890104 commented Mar 11, 2026

Problem

Root Cause Analysis

Why this only affects WhisperKit

Why it's intermittent

Why turbo models are more affected

Fix

Related

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants