fix: suppress false-positive warnings when loading whisper audio encoder#13281
fix: suppress false-positive warnings when loading whisper audio encoder#13281octo-patch wants to merge 1 commit intoComfy-Org:masterfrom
Conversation
When a full whisper checkpoint (encoder + decoder) is loaded via AudioEncoderLoader, two classes of spurious warnings were emitted: 1. 'unexpected audio encoder' for every decoder.* key - the decoder is not part of WhisperLargeV3, so these keys are always present in full whisper checkpoints and should be silently discarded. 2. 'missing audio encoder' for feature_extractor.mel_spectrogram buffers (window and mel_scale.fb) - these are torchaudio buffers computed deterministically from config at init time; they are never stored in standard whisper checkpoints but are always correctly initialised. Fix: strip decoder keys from the state-dict before loading, and suppress warnings for the two known torchaudio-computed buffer keys. Fixes Comfy-Org#13276
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe audio encoder checkpoint loading logic in 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Fixes #13276
Problem
When loading a standard full whisper checkpoint (which contains both encoder and decoder weights) via
AudioEncoderLoader, two classes of spurious warnings appear in the console:These warnings mislead users into thinking the model loaded incorrectly.
Root causes:
Unexpected decoder keys — Full whisper checkpoints contain
decoder.*weights alongsideencoder.*weights. SinceWhisperLargeV3is encoder-only, alldecoder.*keys are flagged as unexpected. They were never needed and should be silently discarded.Missing mel-spectrogram buffers —
torchaudio.transforms.MelSpectrogramregisters a Hann window (spectrogram.window) and a mel filterbank (mel_scale.fb) as PyTorch buffers. Standard whisper checkpoints do not store these constants because they are deterministically computed from the model config at init time.load_state_dict(strict=False)flags them as missing, but they are always correctly initialised by torchaudio — the warning is misleading.Solution
decoder.*keys from the state-dict before passing it toload_state_dict, eliminating the "unexpected" warnings for whisper models.feature_extractor.mel_spectrogram.spectrogram.windowandfeature_extractor.mel_spectrogram.mel_scale.fb); any other genuinely missing keys are still warned about.Testing
Verified by inspection: the decoder key filter targets only the whisper branch, and the buffer exclusion set contains only torchaudio-managed names that cannot appear in wav2vec2 checkpoints, so wav2vec2 loading is unaffected.