[QUARK-480] Phase 2: replace dict-style quant_config access with LayerQuantConfig by thpereir · Pull Request #274 · ROCm/ATOM

thpereir · 2026-03-06T20:02:16Z

…cross all models and ops

Motivation

Following up DeepSeek R1 enablement with per layer quantization support this converts all other models to use the same structured quantization config instead of a dict.

This PR depends on both #236 and #268

Technical Details

Clean type boundary — LayerQuantConfig is a frozen dataclass; impossible to accidentally mutate or misspell fields.
Single resolution API — resolve(prefix) handles exclude-list, per-layer overrides, pattern matching, and global fallback in one call with documented priority. Less error prone than previous approach of accessing the dict. What was quant_config.get_layer_quant_config(f"{prefix}.fused_qkv_a_proj") becomes quant_config.resolve(prefix)
Parser registry — @register_quant_parser("quark") means new quantization formats are self-contained modules. In the current approach adding a new quantization method means editing QuantizationConfig.__init__ directly.

Test Plan

Test Result

Qwen3-30B-A3B-FP8-dynamic

ATOM Server:

python -m atom.entrypoints.openai_server --model /root/models/RedHatAI/Qwen3-30B-A3B-FP8-dynamic/ --trust-remote-code -tp 4 --kv_cache_dtype fp8

ATOM lm-eval GSM8K:

python tests/evals/gsm8k/gsm8k_eval.py --host http://127.0.0.1 --port 8000 --num-questions 1000 --save-results logs
Running GSM8K evaluation: 1000 questions, 5-shot

Results:
Accuracy: 0.858
Invalid responses: 0.000
Total latency: 24.870 s
Questions per second: 40.209
Total output tokens: 119342
Output tokens per second: 4798.671
Results saved to logs

HuggingFace GSM8K:

Accuracy: 0.86

DeepSeek-R1-0528-MoE-MXFP4-Attn-MTP-PTPC-FP8

ATOM Server:

python -m atom.entrypoints.openai_server --model /root/models/amd/DeepSeek-R1-0528-MoE-MXFP4-Attn-MTP-PTPC-FP8/ --trust-remote-code -tp 8 --kv_cache_dtype fp8

ATOM lm-eval GSM8K:

lm_eval --model local-completions   --model_args "model=/root/models/amd/DeepSeek-R1-0528-MoE-MXFP4-Attn-MTP-PTPC-FP8/,base_url=http://0.0.0.0:8000/v1/completions,tokenized_requests=False,tokenizer_backend=None,num_concurrent=32"   --tasks gsm8k   --num_fewshot 5   --batch_size 1

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9386|±  |0.0066|
|     |       |strict-match    |     5|exact_match|↑  |0.9356|±  |0.0068|

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

thiagocrepaldi · 2026-03-12T01:17:23Z

Is it possible to request a regression test for this pr or for the current main branch?

Copilot

Pull request overview

This PR completes the Phase 2 migration away from dict-style quantization config access by introducing a typed, immutable LayerQuantConfig and routing quant config parsing through a parser registry, then updating models/ops/docs/tests to use the new API.

Changes:

Introduce atom/quant_spec.py with LayerQuantConfig (frozen dataclass), ParsedQuantConfig, and a quant-config parser registry (QuarkParser, GenericParser).
Refactor QuantizationConfig to parse via the registry and expose typed per-layer resolution via get_layer_quant_config().
Update model/ops code and docs/tests to use attribute access (e.g., .quant_dtype) instead of dict keys.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`atom/quant_spec.py`	Adds typed quant spec + parser registry and built-in parsers.
`atom/config.py`	Switches quant parsing to registry + typed resolution; removes dict-based parsing paths.
`atom/models/llama.py`	Uses per-layer resolved quant type instead of global dict access.
`atom/models/deepseek_v2.py`	Replaces dict indexing with typed attribute access for resolved specs.
`atom/models/deepseek_mtp.py`	Replaces dict indexing with typed attribute access.
`atom/model_ops/moe.py`	Updates MoE methods to use typed `LayerQuantConfig` attributes.
`atom/model_ops/linear.py`	Updates LinearBase plumbing to use typed spec attributes.
`atom/model_ops/layernorm.py`	Updates RMSNorm to resolve per-layer spec via `prefix`.
`atom/model_ops/activation.py`	Updates SiluAndMul to resolve per-layer spec via `prefix`.
`tests/test_quant_config.py`	Updates/refactors tests for typed API and parser registry.
`docs/configuration_guide.md`	Updates docs to reflect `LayerQuantConfig` as a frozen dataclass and new parsing path.
`docs/architecture_guide.md`	Removes `LayerQuantConfig` from `atom/config.py` ownership list.

Comments suppressed due to low confidence (1)

docs/configuration_guide.md:139

The LayerQuantConfig table documents quant_method as type str with default "", but the implementation is quant_method: str | None = None. Update the docs to reflect that the field may be None and clarify what None vs an empty string means for downstream logic.

| Field | Type | Default | Description |
|---|---|---|---|
| `quant_type` | `QuantType` | `QuantType.No` | Quantization granularity (see below) |
| `quant_dtype` | `torch.dtype` | `torch.bfloat16` | Data type for quantized weights |
| `is_dynamic` | `bool` | `True` | Use dynamic quantization (scales computed at runtime) |
| `quant_method` | `str` | `""` | Quantization method (e.g., `"quark"`, `"compressed-tensors"`) |

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

atom/models/llama.py

atom/model_ops/moe.py

atom/config.py

atom/quant_spec.py

docs/configuration_guide.md

atom/quant_spec.py

haoyangli0109 · 2026-03-13T03:48:08Z

@thpereir
Thank you very much.
It's great to see that several of the directions we discussed were implemented quickly. If we modify the code for other quantization formats, we should also select the corresponding models for accuracy testing.

atom/quant_spec.py

thpereir · 2026-03-16T23:22:20Z

Rebased again. Fixed a small issue loading gpt-oss-120b. But I found that even main branch has a big issue running inference for gpt-oss-120b! Output is garbage.

I am investigating and working on a separate fix to not pollute this PR

thpereir · 2026-03-17T00:50:54Z

Rebased again. Posted test results on the top PR description

atom/config.py

atom/quant_spec.py

haoyangli0109 · 2026-03-18T11:45:04Z

atom/config.py

-            )
+        # Use the parser registry to build a structured ParsedQuantConfig
+        parser = get_quant_parser(self.quant_method)
+        self._parsed_quant_config = parser.parse(self.hf_quant_config)


If we can pass exclude_layer directly into QuantizationConfig and let it own that field, why can’t we do the same for global_spec and layer_pattern_specs?

Compared to accessing self._parsed_quant_config.layer_pattern_specs, wouldn’t it be more reasonable and straightforward to use self.layer_pattern_specs directly?

Returning a tuple feels somewhat inelegant, but introducing an additional _parsed_quant_config object that doesn’t seem strictly necessary also feels a bit odd.

I will update creating fields, so access is direct but keeping parsed_quant_config as a private implementation detail.

Hi, @thpereir
The typed return contract for QuantConfigParser.parse() is more self-documenting than returning a tuple or dict. When adding new parsers (Quark, GenericParser, etc.), it provides a clear "output contract." And if the parsing output ever needs more fields, adding them to ParsedQuantConfig is cleaner than dealing with an ever-growing tuple.

However, as currently implemented there are several issues:

QuantizationConfig is the only consumer. ParsedQuantConfig is never used directly anywhere else. It gets created and immediately proxied through properties — essentially a middleman that adds a layer of indirection for no benefit.
Inconsistent design. exclude_layers is copied out and stored directly on QuantizationConfig, while global_spec and layer_pattern_specs are accessed via _parsed_quant_config. If one field can be stored directly, why not the other two? This inconsistency is confusing for readers.
Encapsulation is broken. remap_layer_name directly mutates self._parsed_quant_config.layer_pattern_specs, which means ParsedQuantConfig provides no real encapsulation or immutability guarantee — it is neither frozen nor read-only.
Extra cognitive load. When reading self._parsed_quant_config.global_spec, you need to understand what _parsed_quant_config is and where it comes from. If it were simply self.global_spec as a direct field, the comprehension cost would be zero.
A dedicated class is not necessary for the parser return value. A Python NamedTuple, or simply having the parser return a (global_spec, layer_pattern_specs, exclude_layers) tuple, would be equally clear for this simple scenario without introducing an extra class.
Suggestion: keep ParsedQuantConfig as the parser return type (it's a fine interface contract), but treat it as a transient object — destructure it in init and discard it:

parser = get_quant_parser(self.quant_method) parsed = parser.parse(self.hf_quant_config) self.global_spec = parsed.global_spec self.layer_pattern_specs = parsed.layer_pattern_specs self.exclude_layers = list(parsed.exclude_layers)

This way parsers still return a well-typed ParsedQuantConfig, but QuantizationConfig owns all three fields directly — no delegation, no inconsistency, and remap_layer_name just mutates self.layer_pattern_specs without reaching into a private sub-object.
This is just my personal suggestion; I’d like to hear your thoughts.

Agree with your point— _parsed_quant_config as a stored member was inconsistent and added indirection for no real benefit. Updated to destructure the result of parser.parse() directly into self.global_spec, self.layer_pattern_specs, and self.exclude_layers.
ParsedQuantConfig is now purely a typed return contract for parsers; QuantizationConfig owns all three fields directly and remap_layer_name mutates self.layer_pattern_specs without reaching into a sub-object.

… for per-layer quant config Introduce atom/quant_spec.py with: - LayerQuantConfig: frozen dataclass with typed attribute access (quant_type, quant_dtype, is_dynamic, quant_method) replacing the old dict-based LayerQuantConfig(dict) subclass - ParsedQuantConfig: structured output of HF config parsing - Parser registry (@register_quant_parser) with QuarkParser and GenericParser (fallback for compressed-tensors, GPTQ, AWQ, etc.) Refactor QuantizationConfig (atom/config.py): - Internal storage now uses ParsedQuantConfig via parser registry - get_layer_quant_config(prefix) -> LayerQuantConfig (frozen dataclass) - global_quant_config property -> LayerQuantConfig - Convenience properties: quant_type, quant_dtype, is_dynamic - compute_hash() uses typed internal structures Migrate all consumers to typed attribute access: - linear.py: layer_quant_config.quant_type instead of ["quant_type"] - moe.py: all MoE method classes use LayerQuantConfig type hints - activation.py, layernorm.py: accept prefix param, use get_layer_quant_config() instead of bypassing with global_quant_config - deepseek_mtp.py, deepseek_v2.py, llama.py: use get_layer_quant_config() Fix GenericParser exclude-layer key handling (atom/quant_spec.py): - Different quantizers use different keys for excluded layers: compressed-tensors uses "ignore", gpt-oss/HF uses "modules_to_not_convert", Quark uses "exclude" - GenericParser now tries all three keys in priority order so excluded layers are never silently treated as quantized Fix hard-coded quant_config=None across models: - gpt_oss.py OAIAttention: qkv_proj and o_proj were passing quant_config=None, preventing fp8/mxfp4 quantization on attention projections in Quark gpt-oss models (e.g. fp8 qkv + mxfp4 MoE); both now receive quant_config - deepseek_v2.py Indexer: weights_proj passed quant_config=None while sibling linears wq_b and wk correctly used quant_config; fixed for consistency - qwen3_next.py GatedDeltaNet: conv1d ColumnParallelLinear omitted quant_config while other linears in the same class passed it; fixed Fix GenericParser MXFP4 quant_type: default fp4x2 to per_1x32 when no explicit quant_type is found, fixing gpt-oss-120b AITER kernel dispatch.

thpereir force-pushed the thpereir/quark_quant_layer_phase2 branch from 66e940b to 873a969 Compare March 10, 2026 16:29

thpereir changed the base branch from thpereir/quark_quant_layer to main March 10, 2026 16:30

thpereir force-pushed the thpereir/quark_quant_layer_phase2 branch 2 times, most recently from 8801fb5 to 0c9fd54 Compare March 10, 2026 17:06

thpereir changed the title ~~Phase 2: replace dict-style quant_config access with LayerQuantSpec a…~~ Phase 2: replace dict-style quant_config access with LayerQuantConfig Mar 10, 2026

thpereir marked this pull request as ready for review March 11, 2026 16:35

thpereir force-pushed the thpereir/quark_quant_layer_phase2 branch from 56a445b to 3f0ac88 Compare March 12, 2026 20:13

thiagocrepaldi requested review from Copilot and valarLip March 12, 2026 20:16

Copilot started reviewing on behalf of thiagocrepaldi March 12, 2026 20:18 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

haoyangli0109 reviewed Mar 13, 2026

View reviewed changes

atom/quant_spec.py Outdated Show resolved Hide resolved

thpereir force-pushed the thpereir/quark_quant_layer_phase2 branch 5 times, most recently from 0ce71af to 59dcb22 Compare March 16, 2026 23:20

thpereir force-pushed the thpereir/quark_quant_layer_phase2 branch from 59dcb22 to 74ca941 Compare March 17, 2026 00:50

haoyangli0109 reviewed Mar 17, 2026

View reviewed changes

atom/config.py Outdated Show resolved Hide resolved

thpereir changed the title ~~Phase 2: replace dict-style quant_config access with LayerQuantConfig~~ [QUARK-480] Phase 2: replace dict-style quant_config access with LayerQuantConfig Mar 17, 2026

thpereir force-pushed the thpereir/quark_quant_layer_phase2 branch from 74ca941 to c481f2b Compare March 17, 2026 14:19

haoyangli0109 reviewed Mar 18, 2026

View reviewed changes

atom/quant_spec.py Outdated Show resolved Hide resolved

haoyangli0109 reviewed Mar 18, 2026

View reviewed changes

thpereir force-pushed the thpereir/quark_quant_layer_phase2 branch 3 times, most recently from 3e92890 to 2aa9bbf Compare March 19, 2026 17:10

thpereir force-pushed the thpereir/quark_quant_layer_phase2 branch from 2aa9bbf to 4b24376 Compare March 19, 2026 17:27

Conversation

thpereir commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Qwen3-30B-A3B-FP8-dynamic

DeepSeek-R1-0528-MoE-MXFP4-Attn-MTP-PTPC-FP8

Submission Checklist

Uh oh!

thiagocrepaldi commented Mar 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

haoyangli0109 commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

thpereir commented Mar 16, 2026

Uh oh!

thpereir commented Mar 17, 2026

Uh oh!

Uh oh!

Uh oh!

haoyangli0109 Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

thpereir Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

haoyangli0109 Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thpereir Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

thpereir commented Mar 6, 2026 •

edited

Loading

haoyangli0109 commented Mar 13, 2026 •

edited

Loading

haoyangli0109 Mar 19, 2026 •

edited

Loading