Fix #1284: preserve FP8 format for layers specified in ignore_layers by LuciferDono · Pull Request #1511 · intel/auto-round

LuciferDono · 2026-03-08T02:11:20Z

Description

For FP8 models (e.g. Qwen3-8B-FP8, DeepSeek-V3), layers specified in ignore_layers are currently dequantized to BF16 even though the user explicitly wants them kept unchanged. This PR fixes two issues:

Detection gap (supersedes ignore_layers does not take effect on FP8 model #1283): get_fp_layer_names() only checked SUPPORTED_LAYER_TYPES (nn.Linear, Conv1D), missing FP8-specific layer types like FP8Linear. Now also checks INNER_SUPPORTED_LAYER_TYPES.
Preservation: convert_module_to_hp_if_necessary() blindly converted all quantized layers to high precision. Now accepts an ignore_layers parameter to skip specified layers, keeping them in their original FP8 format.

Type of Change

Bug fix
New feature

Related Issues

Fixes #1284
Supersedes the detection fix from #1283

Changes

auto_round/compressors/utils.py — get_fp_layer_names() now detects FP8Linear via INNER_SUPPORTED_LAYER_TYPES
auto_round/utils/weight_handler.py — convert_module_to_hp_if_necessary() gains backward-compatible ignore_layers parameter (default None)
auto_round/compressors/base.py — Resolved ignore layer names stored after configure_layer_config() and threaded through all model-level conversion call sites
test/test_cpu/utils/test_fp8_ignore_layers.py — 6 unit tests covering detection, skip-on-convert, and end-to-end flow with mock FP8Linear layers

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

…yers - Fix get_fp_layer_names() to detect FP8Linear layers via INNER_SUPPORTED_LAYER_TYPES (supersedes intel#1283 detection gap) - Add ignore_layers parameter to convert_module_to_hp_if_necessary() to skip converting specified layers from FP8 to BF16 - Thread resolved ignore_layers through all call sites in BaseCompressor - Add unit tests for both detection and skip-on-convert behavior Signed-off-by: LuciferDono <pranavj821@gmail.com>

xin3he · 2026-03-09T02:10:55Z

@LuciferDono Have you ever validated it with vLLM?
The quantization_config of fp8 would be overwritten by AutoRound, will it cause a failure to load fp8 tensors?

xin3he self-requested a review March 9, 2026 02:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #1284: preserve FP8 format for layers specified in ignore_layers#1511

Fix #1284: preserve FP8 format for layers specified in ignore_layers#1511
LuciferDono wants to merge 1 commit intointel:mainfrom
LuciferDono:LuciferDono/fp8-ignore-layers

LuciferDono commented Mar 8, 2026

Uh oh!

xin3he commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LuciferDono commented Mar 8, 2026

Description

Type of Change

Related Issues

Changes

Checklist Before Submitting

Uh oh!

xin3he commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants