A survey of modern quantization formats (e.g., MXFP8, NVFP4) and inference optimization tools (e.g., TorchAO, GemLite), illustrated through the example of Llama-3.1 inference.
-
Updated
Nov 11, 2025 - Python
A survey of modern quantization formats (e.g., MXFP8, NVFP4) and inference optimization tools (e.g., TorchAO, GemLite), illustrated through the example of Llama-3.1 inference.
Add a description, image, and links to the gemlite topic page so that developers can more easily learn about it.
To associate your repository with the gemlite topic, visit your repo's landing page and select "manage topics."