Skip to content

itsananytripathi/QuantForge-AI

Repository files navigation

QuantForge AI

QuantForge AI is an interactive platform for benchmarking, comparing, and analyzing LLM quantization techniques including GPTQ, AWQ, QLoRA, and custom quantization pipelines.

Overview

QuantForge AI provides a comprehensive suite of tools for evaluating and comparing different quantization methods on large language models. The platform enables researchers and engineers to:

  • Benchmark multiple quantization techniques (GPTQ, AWQ, QLoRA)
  • Compare model performance across different bit-widths
  • Visualize results with interactive dashboards
  • Export benchmark results for analysis
  • Implement custom quantization research

Features

  • GPTQ Benchmarking - Evaluate GPTQ quantization performance
  • AWQ Benchmarking - Test AWQ quantization methods
  • QLoRA Benchmarking - Assess QLoRA fine-tuning quantization
  • Quantization Comparison Dashboard - Interactive comparison of methods
  • HuggingFace Model Loader - Seamless integration with HF models
  • Interactive Streamlit UI - User-friendly web interface
  • Benchmark Visualization - Plotly-based charts and graphs
  • Result Export - Export results in JSON format
  • Performance Analytics - Detailed performance metrics
  • Custom Quantization Research - Implement novel quantization methods

Architecture

QuantForge-AI/
├── app.py                      # Streamlit dashboard
├── src/
│   ├── model_loader.py         # HuggingFace model integration
│   ├── benchmark.py            # Benchmark engine
│   ├── comparison.py           # Quantization comparison engine
│   ├── visualization.py        # Plotly visualization
│   └── quantforge_quant.py     # Custom ALAQ quantization
├── results/                    # Benchmark results storage
├── assets/                     # Static assets
├── docs/                       # Documentation
└── requirements.txt            # Python dependencies

Installation

# Clone the repository
git clone https://github.com/yourusername/QuantForge-AI.git
cd QuantForge-AI

# Install dependencies
pip install -r requirements.txt

Quick Start

# Launch the Streamlit dashboard
streamlit run app.py

Streamlit Dashboard

The dashboard includes the following tabs:

  1. Dashboard - Overview of selected model and quantization metrics
  2. Model Loader - Load and configure models from HuggingFace
  3. Quantization - Apply quantization methods to loaded models
  4. Benchmark - Run benchmark tests on quantized models
  5. Comparison - Compare multiple quantization methods side-by-side
  6. Reports - View and export benchmark results

Dashboard Metrics

  • Selected Model
  • Quantization Method
  • VRAM Usage
  • Inference Speed
  • Accuracy
  • Latency

Benchmark Engine

The benchmark engine supports evaluation on multiple datasets:

  • WikiText2
  • C4
  • PTB
  • Pile
  • Custom datasets

Metrics collected:

  • Perplexity (PPL)
  • Inference latency
  • Memory usage
  • Throughput

Quantization Comparison

Compare between quantization methods:

  • GPTQ - Post-training quantization
  • AWQ - Activation-aware weight quantization
  • QLoRA - Quantized LoRA fine-tuning
  • ALAQ - Adaptive Layer-Aware Quantization (custom)

Comparison metrics:

  • Model Size
  • VRAM Usage
  • Inference Speed
  • Latency
  • Accuracy
  • Perplexity

Results are stored in JSON format in the results/ directory.

HuggingFace Integration

Supported Models

  • Meta-Llama-3-8B
  • Mistral-7B
  • Gemma-7B
  • Additional models can be added via configuration

API Functions

from src.model_loader import load_model, load_tokenizer, list_supported_models

# List available models
models = list_supported_models()

# Load a model
model = load_model("meta-llama/Meta-Llama-3-8B")

# Load tokenizer
tokenizer = load_tokenizer("meta-llama/Meta-Llama-3-8B")

Custom Quantization

QuantForge AI includes a novel quantization method:

Adaptive Layer-Aware Quantization (ALAQ)

ALAQ dynamically assigns bit-widths based on layer importance:

  • Important layers → 8-bit quantization
  • Medium layers → 6-bit quantization
  • Less important layers → 4-bit quantization

Layer importance is determined through sensitivity analysis during calibration.

Roadmap

  • Add support for more quantization methods (SpQR, SmoothQuant)
  • Expand model support (Llama-2, Falcon, Mixtral)
  • Implement distributed benchmarking
  • Add API endpoints for programmatic access
  • Create Docker container for easy deployment
  • Add collaborative benchmark sharing

Author

Anany Tripathi

About

AI-powered LLM Quantization Benchmarking Platform with ALAQ, GPTQ, AWQ and QLoRA comparison dashboard.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors