Support MLA nvfp4 quant for Deepseek for max perf #582

binghanc · 2025-11-20T03:20:42Z

What does this PR do?

Type of change: new feature

Overview: support for newer checkpoints

Usage

torchrun --nproc-per-node=8 ptq.py --mla_quant nvfp4_wq_a_wkv_a_wq_b_wo_fp8_wkv_b --batch_size 4 --model_path $DS_CKPT --config DeepSeek-V3/inference/configs/config_671B.json --quant_cfg NVFP4_DEFAULT_CFG --output_path $AMAX_PATH

Summary by CodeRabbit

Release Notes

New Features
- Added NVFP4 quantization option for MLA model quantization workflow.
- Expanded quantization configuration choices to include "nvfp4" alongside existing per_tensor_fp8 option.
- Introduced new CLI parameter to specify MLA quantization type during post-training quantization.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

copy-pr-bot · 2025-11-20T03:20:46Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

examples/deepseek/ptq.py

cjluo-nv · 2026-01-08T07:48:25Z

examples/deepseek/ptq.py

+                mtq_cfg["quant_cfg"][layer+"_quantizer"] = {"enable": False}
+
+        # Disable BMM quantizers
+        mtq_cfg["quant_cfg"]["*attn.kv_bmm_quantizer*"] = {"enable": False}


why do we need to disable BMM quantizers?

In my view, we don't quantize wkv_b and BMM weights in the turbo checkpoint. Also, the BMM weights are decomposed from wkv_b in TRTLLM.
If I have missed something, please correct me. @kaiyux

We only need changes for generating "v2.1" for now, no need for "v2.2".

examples/deepseek/ptq.py

codecov · 2026-01-09T17:24:47Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.18%. Comparing base (945ee02) to head (a3a2712).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #582   +/-   ##
=======================================
  Coverage   74.18%   74.18%           
=======================================
  Files         192      192           
  Lines       19236    19236           
=======================================
  Hits        14271    14271           
  Misses       4965     4965

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: binghanc <176802681+binghanc@users.noreply.github.com>

coderabbitai · 2026-01-13T01:55:26Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This PR extends MLA quantization support in the PTQ pipeline by introducing a new "nvfp4" quantization option. The changes add an mla_quant parameter to the quantization flow, validate it against allowed values (None, "per_tensor_fp8", "nvfp4"), assign NVFP4 quantizers to specific linear layers when enabled, and expose the option via CLI argument.

Changes

Cohort / File(s)	Summary
MLA Quantization Configuration `examples/deepseek/ptq.py`	Added support for new "nvfp4" quantization type with specific layer assignments (wq_a, wkv_a, wq_b, wo) and BMM quantizer disabling. Expanded validation to enforce allowed values: None, "per_tensor_fp8", "nvfp4". Updated ptq function signature to accept `mla_quant: str \| None = None` parameter and added CLI argument `--mla_quant` for user configuration.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: adding MLA nvfp4 quantization support for DeepSeek to achieve maximum performance, which matches the core objective of the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

examples/deepseek/ptq.py (2)
314-315: Consider raising ValueError instead of assert for user input validation.

Using assert for input validation can be bypassed with -O (optimized mode) and produces a less user-friendly error message. A ValueError with a descriptive message would be more appropriate for CLI argument validation.
Suggested fix
     allowed_mla_quant = [None, "per_tensor_fp8", "nvfp4"]
-    assert mla_quant in allowed_mla_quant, f"mla_quant must be {allowed_mla_quant}"
+    if mla_quant not in allowed_mla_quant:
+        raise ValueError(f"mla_quant must be one of {allowed_mla_quant}, got {mla_quant!r}")
430-435: Consider using choices for CLI-level validation.

Using argparse's choices parameter would provide immediate feedback to users and generate better --help output.
Suggested improvement
     parser.add_argument(
         "--mla_quant",
         type=str,
         default=None,
+        choices=["per_tensor_fp8", "nvfp4"],
         help="MLA quantization type: None (disable), per_tensor_fp8, nvfp4",
     )
Note: choices doesn't directly support None for omitted arguments, but leaving it out of choices while keeping default=None achieves the desired behavior - when the argument isn't provided, it's None; when provided, it must be one of the choices.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ae96b5 and 99b1b9f.

📒 Files selected for processing (1)

examples/deepseek/ptq.py

🧰 Additional context used

🧬 Code graph analysis (1)

examples/deepseek/ptq.py (1)

modelopt/torch/quantization/qtensor/nvfp4_tensor.py (1)

quantize (143-232)

🔇 Additional comments (4)

examples/deepseek/ptq.py (4)

276-283: LGTM!

The new mla_quant parameter is properly added with a sensible default of None for backward compatibility.

341-347: LGTM!

The formatting improvements with blank lines enhance code readability.

442-442: LGTM!

The new mla_quant argument is correctly propagated to the ptq function.

322-339: The code explicitly disables wkv_b by design, not by oversight.

The implementation intentionally excludes wkv_b from NVFP4 quantization. The separate mla_nvfp4_linear_layers list contains only four layers (wq_a, wkv_a, wq_b, wo), and the comment confirms "wq_a, wkv_a, wq_b, wo use NVFP4 quantization"—wkv_b is not included. The codebase contains no FP8 configuration for wkv_b in any code path. While the PR description naming may suggest otherwise, the actual implementation shows this exclusion is deliberate.

Likely an incorrect or invalid review comment.

cjluo-nv reviewed Jan 8, 2026

View reviewed changes

examples/deepseek/ptq.py Outdated Show resolved Hide resolved

cjluo-nv reviewed Jan 8, 2026

View reviewed changes

cjluo-nv approved these changes Jan 9, 2026

View reviewed changes

binghanc force-pushed the binghanc/dpskr1_nvfp4_v3 branch from 1386863 to f60a5b7 Compare January 9, 2026 05:32

binghanc marked this pull request as ready for review January 9, 2026 05:33

binghanc requested a review from a team as a code owner January 9, 2026 05:33

binghanc requested a review from meenchen January 9, 2026 05:33

binghanc changed the title ~~support for newer checkpoints~~ Feat: Support quantization for DeepSeek-R1-0528-NVFP4-Turbo Jan 9, 2026

binghanc changed the title ~~Feat: Support quantization for DeepSeek-R1-0528-NVFP4-Turbo~~ Feat: Support quantization for new DeepSeek-R1 checkpoints Jan 9, 2026

binghanc changed the title ~~Feat: Support quantization for new DeepSeek-R1 checkpoints~~ Support MLA nvfp4 quant for Deepseek for max perf Jan 10, 2026

binghanc added 4 commits January 13, 2026 09:55

support for dpskr1_nvfp4_v3

428a2cb

Signed-off-by: binghanc <176802681+binghanc@users.noreply.github.com>

modify argparse help info for mla_quant

91af9f0

Signed-off-by: binghanc <176802681+binghanc@users.noreply.github.com>

optimize implementation

c4da5b9

Signed-off-by: binghanc <176802681+binghanc@users.noreply.github.com>

code formatting

99b1b9f

Signed-off-by: binghanc <176802681+binghanc@users.noreply.github.com>

binghanc force-pushed the binghanc/dpskr1_nvfp4_v3 branch from 62585b5 to 99b1b9f Compare January 13, 2026 01:55

coderabbitai bot reviewed Jan 13, 2026

View reviewed changes

Merge branch 'main' into binghanc/dpskr1_nvfp4_v3

a3a2712

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support MLA nvfp4 quant for Deepseek for max perf #582

Support MLA nvfp4 quant for Deepseek for max perf #582

binghanc commented Nov 20, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Nov 20, 2025

Uh oh!

Uh oh!

cjluo-nv Jan 8, 2026

Uh oh!

binghanc Jan 9, 2026 •

edited

Loading

Uh oh!

kaiyux Jan 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 9, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 13, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Support MLA nvfp4 quant for Deepseek for max perf #582

Are you sure you want to change the base?

Support MLA nvfp4 quant for Deepseek for max perf #582

Conversation

binghanc commented Nov 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Nov 20, 2025

Uh oh!

Uh oh!

cjluo-nv Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

binghanc Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kaiyux Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

binghanc commented Nov 20, 2025 •

edited by coderabbitai bot

Loading

binghanc Jan 9, 2026 •

edited

Loading

codecov bot commented Jan 9, 2026 •

edited

Loading

coderabbitai bot commented Jan 13, 2026 •

edited

Loading