Skip to content

Fix #1284: preserve FP8 format for layers specified in ignore_layers#1511

Open
LuciferDono wants to merge 1 commit intointel:mainfrom
LuciferDono:LuciferDono/fp8-ignore-layers
Open

Fix #1284: preserve FP8 format for layers specified in ignore_layers#1511
LuciferDono wants to merge 1 commit intointel:mainfrom
LuciferDono:LuciferDono/fp8-ignore-layers

Conversation

@LuciferDono
Copy link

Description

For FP8 models (e.g. Qwen3-8B-FP8, DeepSeek-V3), layers specified in ignore_layers are currently dequantized to BF16 even though the user explicitly wants them kept unchanged. This PR fixes two issues:

  1. Detection gap (supersedes ignore_layers does not take effect on FP8 model #1283): get_fp_layer_names() only checked SUPPORTED_LAYER_TYPES (nn.Linear, Conv1D), missing FP8-specific layer types like FP8Linear. Now also checks INNER_SUPPORTED_LAYER_TYPES.

  2. Preservation: convert_module_to_hp_if_necessary() blindly converted all quantized layers to high precision. Now accepts an ignore_layers parameter to skip specified layers, keeping them in their original FP8 format.

Type of Change

  • Bug fix
  • New feature

Related Issues

Fixes #1284
Supersedes the detection fix from #1283

Changes

  • auto_round/compressors/utils.pyget_fp_layer_names() now detects FP8Linear via INNER_SUPPORTED_LAYER_TYPES
  • auto_round/utils/weight_handler.pyconvert_module_to_hp_if_necessary() gains backward-compatible ignore_layers parameter (default None)
  • auto_round/compressors/base.py — Resolved ignore layer names stored after configure_layer_config() and threaded through all model-level conversion call sites
  • test/test_cpu/utils/test_fp8_ignore_layers.py — 6 unit tests covering detection, skip-on-convert, and end-to-end flow with mock FP8Linear layers

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

…yers

- Fix get_fp_layer_names() to detect FP8Linear layers via
  INNER_SUPPORTED_LAYER_TYPES (supersedes intel#1283 detection gap)
- Add ignore_layers parameter to convert_module_to_hp_if_necessary()
  to skip converting specified layers from FP8 to BF16
- Thread resolved ignore_layers through all call sites in BaseCompressor
- Add unit tests for both detection and skip-on-convert behavior

Signed-off-by: LuciferDono <pranavj821@gmail.com>
@xin3he
Copy link
Contributor

xin3he commented Mar 9, 2026

@LuciferDono Have you ever validated it with vLLM?
The quantization_config of fp8 would be overwritten by AutoRound, will it cause a failure to load fp8 tensors?

@xin3he xin3he self-requested a review March 9, 2026 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feat] For FP8 models, keep layers specified in ignore_layers in their original FP8 format

2 participants