[QUARK-480] Phase 2: replace dict-style quant_config access with LayerQuantConfig#274
[QUARK-480] Phase 2: replace dict-style quant_config access with LayerQuantConfig#274
Conversation
66e940b to
873a969
Compare
8801fb5 to
0c9fd54
Compare
|
Is it possible to request a regression test for this pr or for the current main branch? |
56a445b to
3f0ac88
Compare
There was a problem hiding this comment.
Pull request overview
This PR completes the Phase 2 migration away from dict-style quantization config access by introducing a typed, immutable LayerQuantConfig and routing quant config parsing through a parser registry, then updating models/ops/docs/tests to use the new API.
Changes:
- Introduce
atom/quant_spec.pywithLayerQuantConfig(frozen dataclass),ParsedQuantConfig, and a quant-config parser registry (QuarkParser,GenericParser). - Refactor
QuantizationConfigto parse via the registry and expose typed per-layer resolution viaget_layer_quant_config(). - Update model/ops code and docs/tests to use attribute access (e.g.,
.quant_dtype) instead of dict keys.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
atom/quant_spec.py |
Adds typed quant spec + parser registry and built-in parsers. |
atom/config.py |
Switches quant parsing to registry + typed resolution; removes dict-based parsing paths. |
atom/models/llama.py |
Uses per-layer resolved quant type instead of global dict access. |
atom/models/deepseek_v2.py |
Replaces dict indexing with typed attribute access for resolved specs. |
atom/models/deepseek_mtp.py |
Replaces dict indexing with typed attribute access. |
atom/model_ops/moe.py |
Updates MoE methods to use typed LayerQuantConfig attributes. |
atom/model_ops/linear.py |
Updates LinearBase plumbing to use typed spec attributes. |
atom/model_ops/layernorm.py |
Updates RMSNorm to resolve per-layer spec via prefix. |
atom/model_ops/activation.py |
Updates SiluAndMul to resolve per-layer spec via prefix. |
tests/test_quant_config.py |
Updates/refactors tests for typed API and parser registry. |
docs/configuration_guide.md |
Updates docs to reflect LayerQuantConfig as a frozen dataclass and new parsing path. |
docs/architecture_guide.md |
Removes LayerQuantConfig from atom/config.py ownership list. |
Comments suppressed due to low confidence (1)
docs/configuration_guide.md:139
- The
LayerQuantConfigtable documentsquant_methodas typestrwith default"", but the implementation isquant_method: str | None = None. Update the docs to reflect that the field may beNoneand clarify whatNonevs an empty string means for downstream logic.
| Field | Type | Default | Description |
|---|---|---|---|
| `quant_type` | `QuantType` | `QuantType.No` | Quantization granularity (see below) |
| `quant_dtype` | `torch.dtype` | `torch.bfloat16` | Data type for quantized weights |
| `is_dynamic` | `bool` | `True` | Use dynamic quantization (scales computed at runtime) |
| `quant_method` | `str` | `""` | Quantization method (e.g., `"quark"`, `"compressed-tensors"`) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
|
@thpereir |
0ce71af to
59dcb22
Compare
|
Rebased again. Fixed a small issue loading gpt-oss-120b. But I found that even main branch has a big issue running inference for gpt-oss-120b! Output is garbage. I am investigating and working on a separate fix to not pollute this PR |
59dcb22 to
74ca941
Compare
|
Rebased again. Posted test results on the top PR description |
74ca941 to
c481f2b
Compare
atom/config.py
Outdated
| ) | ||
| # Use the parser registry to build a structured ParsedQuantConfig | ||
| parser = get_quant_parser(self.quant_method) | ||
| self._parsed_quant_config = parser.parse(self.hf_quant_config) |
There was a problem hiding this comment.
If we can pass exclude_layer directly into QuantizationConfig and let it own that field, why can’t we do the same for global_spec and layer_pattern_specs?
Compared to accessing self._parsed_quant_config.layer_pattern_specs, wouldn’t it be more reasonable and straightforward to use self.layer_pattern_specs directly?
Returning a tuple feels somewhat inelegant, but introducing an additional _parsed_quant_config object that doesn’t seem strictly necessary also feels a bit odd.
There was a problem hiding this comment.
I will update creating fields, so access is direct but keeping parsed_quant_config as a private implementation detail.
There was a problem hiding this comment.
Hi, @thpereir
The typed return contract for QuantConfigParser.parse() is more self-documenting than returning a tuple or dict. When adding new parsers (Quark, GenericParser, etc.), it provides a clear "output contract." And if the parsing output ever needs more fields, adding them to ParsedQuantConfig is cleaner than dealing with an ever-growing tuple.
However, as currently implemented there are several issues:
QuantizationConfig is the only consumer. ParsedQuantConfig is never used directly anywhere else. It gets created and immediately proxied through properties — essentially a middleman that adds a layer of indirection for no benefit.
Inconsistent design. exclude_layers is copied out and stored directly on QuantizationConfig, while global_spec and layer_pattern_specs are accessed via _parsed_quant_config. If one field can be stored directly, why not the other two? This inconsistency is confusing for readers.
Encapsulation is broken. remap_layer_name directly mutates self._parsed_quant_config.layer_pattern_specs, which means ParsedQuantConfig provides no real encapsulation or immutability guarantee — it is neither frozen nor read-only.
Extra cognitive load. When reading self._parsed_quant_config.global_spec, you need to understand what _parsed_quant_config is and where it comes from. If it were simply self.global_spec as a direct field, the comprehension cost would be zero.
A dedicated class is not necessary for the parser return value. A Python NamedTuple, or simply having the parser return a (global_spec, layer_pattern_specs, exclude_layers) tuple, would be equally clear for this simple scenario without introducing an extra class.
Suggestion: keep ParsedQuantConfig as the parser return type (it's a fine interface contract), but treat it as a transient object — destructure it in init and discard it:
parser = get_quant_parser(self.quant_method)
parsed = parser.parse(self.hf_quant_config)
self.global_spec = parsed.global_spec
self.layer_pattern_specs = parsed.layer_pattern_specs
self.exclude_layers = list(parsed.exclude_layers)
This way parsers still return a well-typed ParsedQuantConfig, but QuantizationConfig owns all three fields directly — no delegation, no inconsistency, and remap_layer_name just mutates self.layer_pattern_specs without reaching into a private sub-object.
This is just my personal suggestion; I’d like to hear your thoughts.
There was a problem hiding this comment.
Agree with your point— _parsed_quant_config as a stored member was inconsistent and added indirection for no real benefit. Updated to destructure the result of parser.parse() directly into self.global_spec, self.layer_pattern_specs, and self.exclude_layers.
ParsedQuantConfig is now purely a typed return contract for parsers; QuantizationConfig owns all three fields directly and remap_layer_name mutates self.layer_pattern_specs without reaching into a sub-object.
3e92890 to
2aa9bbf
Compare
… for per-layer quant config Introduce atom/quant_spec.py with: - LayerQuantConfig: frozen dataclass with typed attribute access (quant_type, quant_dtype, is_dynamic, quant_method) replacing the old dict-based LayerQuantConfig(dict) subclass - ParsedQuantConfig: structured output of HF config parsing - Parser registry (@register_quant_parser) with QuarkParser and GenericParser (fallback for compressed-tensors, GPTQ, AWQ, etc.) Refactor QuantizationConfig (atom/config.py): - Internal storage now uses ParsedQuantConfig via parser registry - get_layer_quant_config(prefix) -> LayerQuantConfig (frozen dataclass) - global_quant_config property -> LayerQuantConfig - Convenience properties: quant_type, quant_dtype, is_dynamic - compute_hash() uses typed internal structures Migrate all consumers to typed attribute access: - linear.py: layer_quant_config.quant_type instead of ["quant_type"] - moe.py: all MoE method classes use LayerQuantConfig type hints - activation.py, layernorm.py: accept prefix param, use get_layer_quant_config() instead of bypassing with global_quant_config - deepseek_mtp.py, deepseek_v2.py, llama.py: use get_layer_quant_config() Fix GenericParser exclude-layer key handling (atom/quant_spec.py): - Different quantizers use different keys for excluded layers: compressed-tensors uses "ignore", gpt-oss/HF uses "modules_to_not_convert", Quark uses "exclude" - GenericParser now tries all three keys in priority order so excluded layers are never silently treated as quantized Fix hard-coded quant_config=None across models: - gpt_oss.py OAIAttention: qkv_proj and o_proj were passing quant_config=None, preventing fp8/mxfp4 quantization on attention projections in Quark gpt-oss models (e.g. fp8 qkv + mxfp4 MoE); both now receive quant_config - deepseek_v2.py Indexer: weights_proj passed quant_config=None while sibling linears wq_b and wk correctly used quant_config; fixed for consistency - qwen3_next.py GatedDeltaNet: conv1d ColumnParallelLinear omitted quant_config while other linears in the same class passed it; fixed Fix GenericParser MXFP4 quant_type: default fp4x2 to per_1x32 when no explicit quant_type is found, fixing gpt-oss-120b AITER kernel dispatch.
2aa9bbf to
4b24376
Compare
…cross all models and ops
Motivation
Following up DeepSeek R1 enablement with per layer quantization support this converts all other models to use the same structured quantization config instead of a dict.
This PR depends on both #236 and #268
Technical Details
resolve(prefix)handles exclude-list, per-layer overrides, pattern matching, and global fallback in one call with documented priority. Less error prone than previous approach of accessing the dict. What wasquant_config.get_layer_quant_config(f"{prefix}.fused_qkv_a_proj")becomesquant_config.resolve(prefix)QuantizationConfig.__init__directly.Test Plan
Test Result
Qwen3-30B-A3B-FP8-dynamic
ATOM Server:
ATOM lm-eval GSM8K:
HuggingFace GSM8K:
DeepSeek-R1-0528-MoE-MXFP4-Attn-MTP-PTPC-FP8
ATOM Server:
ATOM lm-eval GSM8K:
Submission Checklist