Quantization - Reduce memory usage by >40% during inference!

QAI Hub - Qualcomm's open-source library used to quantize models for a specific chipset

Linear Quantization - Simple and easy to use

Asymmetric Quantization - Good for activations that aren't centered at zero, because it uses the range dynamically

Symmetric Quantization - Centered at zero, but extremely fast due to the simple math and low memory usage

Per-Channel Quantization - Rather than quantizing the whole tensor we can quantize each output specifically for the highest accuracy with minimal performance drop

Per-Group Quantization - Best for maintaining precision, improves memory slightly, really for training at scale like LLMs

Inference Quantization - Using quantization for the weights in an activation

8-bit Quantizer - Performs an accelerated forward pass in the neural network

Quantize Layers - Replaces the linear layers in a model with quantized layers

Quantizing Models - Testing the Quantizer on an open-source LLM and Object Detection model, we see significant memory reduction with the same performance

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
Outputs/TinyLlama_v1.1		Outputs/TinyLlama_v1.1
8-bit_quantizer.py		8-bit_quantizer.py
README.md		README.md
asymmetric_quantization.py		asymmetric_quantization.py
inference_quantization.py		inference_quantization.py
linear_quantization.py		linear_quantization.py
per-channel_quantization.py		per-channel_quantization.py
per-group_quantization.py		per-group_quantization.py
qai_hub.py		qai_hub.py
quantize_layers.py		quantize_layers.py
quantizing_model.py		quantizing_model.py
symmetric_quantization.py		symmetric_quantization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quantization - Reduce memory usage by >40% during inference!

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quantization - Reduce memory usage by >40% during inference!

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages