Skip to content

fix: skip firstTokenLogProbThreshold when promptTokens are set#438

Open
alan890104 wants to merge 2 commits intoargmaxinc:mainfrom
alan890104:fix/skip-first-token-threshold-with-prompt-tokens
Open

fix: skip firstTokenLogProbThreshold when promptTokens are set#438
alan890104 wants to merge 2 commits intoargmaxinc:mainfrom
alan890104:fix/skip-first-token-threshold-with-prompt-tokens

Conversation

@alan890104
Copy link

Problem

Using promptTokens in DecodingOptions causes intermittent empty transcription results, particularly with distilled/turbo model variants (e.g. large-v3-turbo). This is the root cause of #372.

Root Cause Analysis

When promptTokens are provided, the decoder input sequence becomes:

[<|startofprev|>] + [prompt_tokens...] + [<|sot|>] + [lang] + [task] + [notimestamps]

The prompt tokens shift the decoder's KV cache state, causing the first content token's logprob to occasionally drop below firstTokenLogProbThreshold (default: -1.5). When this happens:

  1. isFirstTokenLogProbTooLow is set to true
  2. The decoding loop immediately breaks — producing zero content tokens
  3. avgLogProb computes to 0.000 (no real tokens to average over)
  4. DecodingFallback triggers with reason "firstTokenLogProbThreshold"
  5. Temperature is increased and decoding retries, but the first token may still fail the threshold
  6. After exhausting all fallback attempts, the final result is empty

Why this only affects WhisperKit

firstTokenLogProbThreshold is a WhisperKit-specific quality gate — it does not exist in:

  • OpenAI's original Whisper (whisper/decoding.py) — only uses logprob_threshold (avg over full segment), no_speech_threshold, and compression_ratio_threshold
  • whisper.cpp — faithfully ports the original three thresholds without adding a first-token check

The original Whisper design lets the decoder run to completion and evaluates quality over the entire segment via avgLogprob. This is robust to prompt-induced shifts in the first token's distribution because subsequent tokens compensate. WhisperKit's early abort on the first token prevents this self-correction.

Why it's intermittent

The first token logprob depends on the interaction between prompt token content and audio content. For certain audio segments, the prompt conditioning pushes the first token just below -1.5; for others it stays above. This creates a non-deterministic failure pattern.

Why turbo models are more affected

Distilled/turbo variants (e.g. large-v3-v20240930) have fewer decoder layers, making them more sensitive to changes in the conditioning context. The reduced decoder capacity has less room to absorb the distributional shift from prompt tokens.

Fix

Skip the firstTokenLogProbThreshold check when promptTokens are present (options.promptTokens == nil guard). This is a one-line change in TextDecoder.swift.

Safety: The existing logProbThreshold (avg over full segment, default -1.0) and compressionRatioThreshold remain active as quality gates, matching the original Whisper behavior. Truly bad segments will still be caught and retried via temperature fallback.

Related

When promptTokens are provided, the decoder's KV cache state is shifted
by the prompt context, causing the first content token's logprob to drop
below firstTokenLogProbThreshold (-1.5). This immediately aborts the
decoding loop, producing empty transcription results.

This threshold is a WhisperKit-specific quality gate not present in
OpenAI's original Whisper or whisper.cpp. The original Whisper relies on
avgLogprob (computed over the full segment) for quality filtering, which
remains active and serves as a safety net.

The issue is intermittent and particularly affects distilled/turbo model
variants (e.g. large-v3-turbo) where the reduced decoder capacity is
more sensitive to prompt conditioning.

Fixes argmaxinc#372
…shold

Regression test for argmaxinc#372. Measured on tiny model + jfk.wav:
  - Without prompt tokens: firstToken logprob ≈ -0.087
  - With CJK prompt tokens: firstToken logprob ≈ -0.578

Prompt tokens shift the first content token logprob ~6.6x lower. On
turbo models (fewer decoder layers), this shift is amplified enough to
breach the default threshold (-1.5). We use -0.5 here to reliably
reproduce the issue on tiny, simulating the larger shift on turbo.

Without the fix: test fails (empty transcription).
With the fix: test passes (normal transcription).
@alan890104 alan890104 force-pushed the fix/skip-first-token-threshold-with-prompt-tokens branch from 01914ba to 6dafa8a Compare March 11, 2026 04:03
@ZachNagengast ZachNagengast requested a review from Copilot March 12, 2026 19:44
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses intermittent empty transcription results when DecodingOptions.promptTokens is used by disabling WhisperKit’s firstTokenLogProbThreshold early-abort check in that mode (root cause of #372).

Changes:

  • Skip the firstTokenLogProbThreshold check when promptTokens are provided.
  • Add a regression unit test ensuring transcription is non-empty with promptTokens even under a strict firstTokenLogProbThreshold.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
Sources/WhisperKit/Core/TextDecoder.swift Disables the first-token logprob early-abort when options.promptTokens is non-nil to prevent empty outputs.
Tests/WhisperKitTests/UnitTests.swift Adds regression coverage for prompting with a strict first-token threshold (Issue #372).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines 852 to 855
isFirstTokenLogProbTooLow =
if isFirstToken, let firstTokenLogProbThreshold = options.firstTokenLogProbThreshold, nextTokenLogProb < firstTokenLogProbThreshold {
if isFirstToken, options.promptTokens == nil, let firstTokenLogProbThreshold = options.firstTokenLogProbThreshold, nextTokenLogProb < firstTokenLogProbThreshold {
true
} else {
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes firstTokenLogProbThreshold effectively a no-op whenever options.promptTokens is non-nil. That’s a behavior change for an existing public option, so it should be documented (e.g., in DecodingOptions docs / README) to avoid confusing callers who set a strict threshold expecting it to be enforced.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Using promptTokens causes the Transcription to return empty result.

2 participants