polarquant

Here are 10 public repositories matching this topic...

Ryuketsukami / turboquant-skill

AI agent skill implementing Google's TurboQuant compression algorithm (ICLR 2026) — 6x KV cache memory reduction, 8x speedup, zero accuracy loss. Compatible with Claude Code, Codex CLI, and all Agent Skills-compatible tools.

Updated Mar 28, 2026
Python

Ryuketsukami / turboquant-compression

Star

Near-optimal vector quantization for LLM KV cache compression. Python implementation of TurboQuant (ICLR 2026) — PolarQuant + QJL for 3-bit quantization with minimal accuracy loss and up to 8x memory reduction.

Updated Mar 28, 2026
Python

Whiteflagnorthplatte622 / polarquant-kv

Star

Compress LLM KV cache with PolarQuant K+V quantization for 73-99% VRAM savings on consumer GPUs with zero token loss

ai kernel deep-learning inference pytorch vector-quantization model-compression huggingface on-device-ai kv-cache vllm llm-compression kv-cache-compression mlx-lm polarquant

Updated Jul 16, 2026
Python

Smilefounder / TurboMLX

Star

Drop-in KV cache compression for MLX on Apple Silicon. Brings PolarQuant (Google, ICLR 2026) to mlx-lm with first-class Gemma 4 support: MatFormer, dual head_dim, hybrid sliding/global attention, cross-layer KV sharing. 3-bit → 4.8× smaller cache, 0.995 logit cosine @ 4-bit.

memory-efficient gemma mlx on-device-ai kv-cache apple-silicon llm kv-cache-compression mlx-lm turboquant polarquant gemma-4

Updated Apr 16, 2026
Python

kpkaranam / OpenQuanta

Star

Universal vector compression — from embeddings to KV cache.

turboquant polarquant

Updated Apr 4, 2026
Rust

consilium-ai / consilium-ai

Star

Local AI agent with 16K context on 8GB RAM. No cloud, no API keys.

apple google web ai model cache opus kv silicon distilled rlm 16k 8gb recurcive qwen3 turboquant polarquant

Updated Mar 29, 2026
Python

NeuraLiying / TurboQuant

Star

Reproduction of TurboQuant.

transformers quantization kv-cache large-language-models llm-inference kv-cache-quantization polarquant

Updated Jul 14, 2026
Python

imprecise-nest694 / consilium-ai

Star

Build a local offline AI agent for Apple Silicon with 16K context, 8GB RAM support, and no cloud, API keys, or subscriptions

ai model cache multi-agent opus kv silicon distilled rlm 16k recurcive ai-orchestration qwen3 polarquant artifact-first premortem

Updated Jul 16, 2026
Python

Vincent-PRO-AI / Open_Turboquant

Star

High-performance LLM Quantization & KV Cache Compression Engine. Featuring PolarQuant technology, universal model patching, and 4-bit KV caching for extreme memory efficiency.

ai cuda transformers inference pytorch triton quantization inference-optimization huggingface kv-cache llm polarquant

Updated Apr 23, 2026
Python

mew-sh / TurboQuant

Sponsor

Star

A Rust implementation of TurboQuant - Google Research's near-lossless KV-cache compression algorithm for large language models (to be presented at ICLR 2026)

rust google research kv language-model turboquant polarquant qjl

Updated Apr 2, 2026
Rust

Improve this page

Add a description, image, and links to the polarquant topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the polarquant topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

polarquant

Here are 10 public repositories matching this topic...

Ryuketsukami / turboquant-skill

Ryuketsukami / turboquant-compression

Whiteflagnorthplatte622 / polarquant-kv

Smilefounder / TurboMLX

kpkaranam / OpenQuanta

consilium-ai / consilium-ai

NeuraLiying / TurboQuant

imprecise-nest694 / consilium-ai

Vincent-PRO-AI / Open_Turboquant

mew-sh / TurboQuant

Improve this page

Add this topic to your repo