Fix #1284: preserve FP8 format for layers specified in ignore_layers#1511
Open
LuciferDono wants to merge 1 commit intointel:mainfrom
Open
Fix #1284: preserve FP8 format for layers specified in ignore_layers#1511LuciferDono wants to merge 1 commit intointel:mainfrom
LuciferDono wants to merge 1 commit intointel:mainfrom
Conversation
…yers - Fix get_fp_layer_names() to detect FP8Linear layers via INNER_SUPPORTED_LAYER_TYPES (supersedes intel#1283 detection gap) - Add ignore_layers parameter to convert_module_to_hp_if_necessary() to skip converting specified layers from FP8 to BF16 - Thread resolved ignore_layers through all call sites in BaseCompressor - Add unit tests for both detection and skip-on-convert behavior Signed-off-by: LuciferDono <pranavj821@gmail.com>
Contributor
|
@LuciferDono Have you ever validated it with vLLM? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
For FP8 models (e.g. Qwen3-8B-FP8, DeepSeek-V3), layers specified in
ignore_layersare currently dequantized to BF16 even though the user explicitly wants them kept unchanged. This PR fixes two issues:Detection gap (supersedes
ignore_layersdoes not take effect on FP8 model #1283):get_fp_layer_names()only checkedSUPPORTED_LAYER_TYPES(nn.Linear, Conv1D), missing FP8-specific layer types likeFP8Linear. Now also checksINNER_SUPPORTED_LAYER_TYPES.Preservation:
convert_module_to_hp_if_necessary()blindly converted all quantized layers to high precision. Now accepts anignore_layersparameter to skip specified layers, keeping them in their original FP8 format.Type of Change
Related Issues
Fixes #1284
Supersedes the detection fix from #1283
Changes
auto_round/compressors/utils.py—get_fp_layer_names()now detects FP8Linear viaINNER_SUPPORTED_LAYER_TYPESauto_round/utils/weight_handler.py—convert_module_to_hp_if_necessary()gains backward-compatibleignore_layersparameter (defaultNone)auto_round/compressors/base.py— Resolved ignore layer names stored afterconfigure_layer_config()and threaded through all model-level conversion call sitestest/test_cpu/utils/test_fp8_ignore_layers.py— 6 unit tests covering detection, skip-on-convert, and end-to-end flow with mock FP8Linear layersChecklist Before Submitting