Skip to content

fix: suppress false-positive warnings when loading whisper audio encoder#13281

Open
octo-patch wants to merge 1 commit intoComfy-Org:masterfrom
octo-patch:fix/issue-13276-whisper-encoder-warnings
Open

fix: suppress false-positive warnings when loading whisper audio encoder#13281
octo-patch wants to merge 1 commit intoComfy-Org:masterfrom
octo-patch:fix/issue-13276-whisper-encoder-warnings

Conversation

@octo-patch
Copy link
Copy Markdown

Fixes #13276

Problem

When loading a standard full whisper checkpoint (which contains both encoder and decoder weights) via AudioEncoderLoader, two classes of spurious warnings appear in the console:

missing audio encoder: ['feature_extractor.mel_spectrogram.spectrogram.window', 'feature_extractor.mel_spectrogram.mel_scale.fb']
unexpected audio encoder: ['decoder.embed_positions.weight', 'decoder.embed_tokens.weight', ...]

These warnings mislead users into thinking the model loaded incorrectly.

Root causes:

  1. Unexpected decoder keys — Full whisper checkpoints contain decoder.* weights alongside encoder.* weights. Since WhisperLargeV3 is encoder-only, all decoder.* keys are flagged as unexpected. They were never needed and should be silently discarded.

  2. Missing mel-spectrogram bufferstorchaudio.transforms.MelSpectrogram registers a Hann window (spectrogram.window) and a mel filterbank (mel_scale.fb) as PyTorch buffers. Standard whisper checkpoints do not store these constants because they are deterministically computed from the model config at init time. load_state_dict(strict=False) flags them as missing, but they are always correctly initialised by torchaudio — the warning is misleading.

Solution

  • Strip decoder.* keys from the state-dict before passing it to load_state_dict, eliminating the "unexpected" warnings for whisper models.
  • After loading, suppress warnings only for the two known torchaudio-computed buffers (feature_extractor.mel_spectrogram.spectrogram.window and feature_extractor.mel_spectrogram.mel_scale.fb); any other genuinely missing keys are still warned about.

Testing

Verified by inspection: the decoder key filter targets only the whisper branch, and the buffer exclusion set contains only torchaudio-managed names that cannot appear in wav2vec2 checkpoints, so wav2vec2 loading is unaffected.

When a full whisper checkpoint (encoder + decoder) is loaded via
AudioEncoderLoader, two classes of spurious warnings were emitted:

1. 'unexpected audio encoder' for every decoder.* key - the decoder is
   not part of WhisperLargeV3, so these keys are always present in full
   whisper checkpoints and should be silently discarded.

2. 'missing audio encoder' for feature_extractor.mel_spectrogram buffers
   (window and mel_scale.fb) - these are torchaudio buffers computed
   deterministically from config at init time; they are never stored in
   standard whisper checkpoints but are always correctly initialised.

Fix: strip decoder keys from the state-dict before loading, and suppress
warnings for the two known torchaudio-computed buffer keys.

Fixes Comfy-Org#13276
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 4, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a6f19547-107f-4e89-b9b9-2af044302b9f

📥 Commits

Reviewing files that changed from the base of the PR and between f21f6b2 and 3d51d63.

📒 Files selected for processing (1)
  • comfy/audio_encoders/audio_encoders.py

📝 Walkthrough

Walkthrough

The audio encoder checkpoint loading logic in comfy/audio_encoders/audio_encoders.py has been updated to handle Whisper3-style state dictionaries more carefully. When loading weights, the code now filters out all parameters with keys starting with decoder. to prevent decoder weights from being treated as encoder-only checkpoint parameters. Additionally, the missing-parameter warning system has been refined to suppress warnings for two specific torchaudio-derived buffer keys (feature_extractor.mel_spectrogram.spectrogram.window and feature_extractor.mel_spectrogram.mel_scale.fb) while still logging warnings for other missing keys. Unexpected keys continue to be logged normally.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: suppressing false-positive warnings when loading whisper audio encoders.
Description check ✅ Passed The description is directly related to the changeset, explaining the problem and solution with clear context about decoder keys and torchaudio buffers.
Linked Issues check ✅ Passed The PR addresses issue #13276 by filtering decoder keys and suppressing warnings for torchaudio buffers, directly fixing both the unexpected and missing encoder warnings reported.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing false-positive warnings in the audio encoder loader; no unrelated modifications are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

missing audio encoder: ['feature_extractor...] unexpected audio encoder: ['decoder.embed...]

1 participant