-
Notifications
You must be signed in to change notification settings - Fork 243
Fix a nvfp4 weight amax attribute issue during export #785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #785 +/- ##
==========================================
- Coverage 74.66% 74.18% -0.48%
==========================================
Files 192 192
Lines 18975 19236 +261
==========================================
+ Hits 14167 14271 +104
- Misses 4808 4965 +157 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
| QUANTIZATION_W4A8_NVFP4_FP8, | ||
| ]: | ||
| weight = getattr(module, weight_name) | ||
| _ensure_weight_quantizer_calibrated(weight_quantizer, weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq, is this only NVFP4 specific? Do we need this for W4A8_AWQ (int4)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and will we hit similar issue with FP8 for get_activation_scaling_factor ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far we only had an issue with NVFP4. I think we can include other cases as needed later. I feel FP8 should be fine.
|
I think @meenchen 's question is legitimate: Do you feel we need to do this for all the quant formats not just for NVFP4? And even with this weight calibration, the activation amax is still not present. How will this PR be able to generate a valid HF checkpoint? |
I think we can include other cases as needed later. "How will this PR be able to generate a valid HF checkpoint?" What do you mean? This patch has been tested by Google team, they were able to generate the kimi-k2-thinking nvfp4 checkpoint. |
My question is: If the weights are not quantized because the expert has not been activated yet. Even you quantize the weights, the inputs are not quantized and the input scales are not available. How can the deployment framework deploy this checkpoint without complaining the input scales not present? |
What does this PR do?
Type of change: Bugfix
Overview: Fix a nvfp4 weight amax attribute issue during export, especially when calibration size is small. Context: sgl-project/sglang#14677 (comment)
Usage
Testing
Before your PR is "Ready for review"
Additional Information