Skip to content

Support GPTQ for LLMs INT4 quantization #4100

@Yichangfa

Description

@Yichangfa

I am trying to quantize Qwen3-VL (a multimodal LLM) to INT4 using AIMET. Due to the hardware (Snapdragon 8295), I can only use per‑channel quantization. I first tried basic PTQ, but the accuracy was insufficient. Then I used SpinQuant, which improved accuracy, but it is still not enough. I read the SpinQuant paper, which states that after applying the rotation, GPTQ is used for quantization. However, AIMET's implementation does not seem to follow this approach.

Metadata

Metadata

Assignees

Labels

questionQuestion for AIMET team

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions