turboquant

Star

Here are 25 public repositories matching this topic...

arozanov / turboquant-mlx

Star

TurboQuant KV cache compression for MLX with fused Metal kernels. 4.6x compression at 98% FP16 speed.

metal quantization mlx kv-cache apple-silicon llm turboquant

Updated Mar 28, 2026
Python

Firmamento-Technologies / TurboQuant

Star

Near-optimal vector quantization from Google's ICLR 2026 paper — 95% recall, 5x compression, zero preprocessing, pure Python FAISS replacement

Updated Mar 28, 2026
Python

back2matching / turboquant

Star

First open-source TurboQuant KV cache compression for LLM inference. Drop-in for HuggingFace. pip install turboquant.

machine-learning compression gpu transformers inference pytorch quantization vram huggingface kv-cache llm turboquant

Updated Mar 27, 2026
Python

Minimal, zero-dependency LLM inference in pure C11. CPU-first with NEON/AVX2 SIMD. Flash MoE (pread + LRU expert cache). TurboQuant 3-bit KV compression (8.9x less memory per session). 20+ GGUF quant formats. Compiles to WASM.

c neon wasm inference simd moe avx2 quantization kv-cache cpu-inference llm gguf turboquant

Updated Mar 28, 2026
C

jjang-ai / exploitbot

Star

No bs theatricals. Real automated pentesting. Mac only.

Updated Mar 26, 2026
Python

LostBeard / SpawnDev.ILGPU.ML

Sponsor

Star

Hardware-agnostic machine learning infrastructure for .NET. Implements high-performance neural network layers in C# that are transpiled to run on WebGPU, CUDA, OpenCL, WebGL, CPU, and Wasm via SpawnDev.ILGPU. Optimized for Blazor WebAssembly and native GPU execution.

Updated Mar 29, 2026
WGSL

diyagk01 / TurboRAG

Star

TurboQuant‑style embedding compression for RAG: an SDK using fixed rotations, PolarQuant, and QJL residual sketches for compact storage and fast similarity search

rag turboquant

Updated Mar 28, 2026
Python

Lucien2468 / Ollama-TurboQuant-Integration

Star

TurboQuant: Native 3-Bit Quantization for Ollama - Achieve 25-28% better compression than Q4_0 while maintaining high-speed CPU inference. Experimentally integrated into Ollama with custom GGML kernels for LLM efficiency.

llama quantization ggml ollama turboquant

Updated Mar 28, 2026
Go

amitshekhariitbhu / turboquant-experiment

Star

KV Cache with PagedAttention vs PagedAttention + TurboQuant - experiments across token sizes comparing memory, latency, and accuracy.

inference large-language-models llm llms llm-inference kvcache kvcache-optimization kvcache-compression turboquant

Updated Mar 26, 2026
Python

yzamari / turboQuantPlayground

Star

TurboQuant (ICLR 2026) ported to Apple Silicon — KV cache compression with MLX Metal kernels + PyTorch CPU

machine-learning deep-learning metal transformers inference pytorch attention quantization mlx iclr kv-cache apple-silicon llm llm-inference turboquant

Updated Mar 28, 2026
Python

chahero / turboquant-experiments

Star

Interactive Benchmarking Tool for TurboQuant KV Cache Compression. Supports 2-4 bit quantization with Real-time Metrics

nlp machine-learning deep-learning pytorch transformer mistral vector-quantization model-compression inference-optimization kv-cache llm vllm qwen iclr-2026 turboquant

Updated Mar 28, 2026
Python

wjddusrb03 / diffmind

Star

AI Code Review Memory - learns from your team's bug history and warns when similar patterns appear

python git ai developer-tools code-review semantic-search bug-detection turboquant

Updated Mar 28, 2026
Python

tushu1232 / turboquant-server

Star

Turbo Index

google hpc gpu information-theory pytorch nearest-neighbor-search quantization vector-quantization kv-cache large-language-models llm-inference turboquant turboindexer

Updated Mar 25, 2026
Python

wjddusrb03 / chatmind

Star

ChatMind: Semantic search for Discord & KakaoTalk chat messages. Search by meaning, not keywords. Powered by TurboQuant compression (ICLR 2026).

multilingual python nlp discord embeddings developer-tools semantic-search kakaotalk cli-tool sentence-embeddings chat-export iclr2026 natural-language-search turboquant vector-compression chat-search message-search

Updated Mar 28, 2026
Python

Ryuketsukami / turboquant-skill

Star

AI agent skill implementing Google's TurboQuant compression algorithm (ICLR 2026) — 6x KV cache memory reduction, 8x speedup, zero accuracy loss. Compatible with Claude Code, Codex CLI, and all Agent Skills-compatible tools.

Updated Mar 28, 2026
Python

Ryuketsukami / turboquant-compression

Star

Near-optimal vector quantization for LLM KV cache compression. Python implementation of TurboQuant (ICLR 2026) — PolarQuant + QJL for 3-bit quantization with minimal accuracy loss and up to 8x memory reduction.

Updated Mar 28, 2026
Python

wjddusrb03 / commitmind

Star

CommitMind: Semantic search for Git commit history powered by TurboQuant vector compression (ICLR 2026). Search commits by meaning, not just keywords.

python git nlp machine-learning embeddings code-search developer-tools quantization semantic-search git-history cli-tool sentence-embeddings commit-history commit-search iclr2026 natural-language-search turboquant vector-compression

Updated Mar 28, 2026
Python

Sunnyztj / turboquant-memory

Star

TurboQuant (ICLR 2026) vector quantization for memory/RAG embedding compression | 5-8x压缩 98%+召回率 | numpy only, no GPU

numpy vector-quantization rag embedding-compression memory-search iclr2026 openclaw hadamard-transform turboquant

Updated Mar 27, 2026
Python

wjddusrb03 / logmind

Star

AI-powered log anomaly detection CLI — learns normal patterns, detects anomalies with semantic embeddings, matches past incidents. Powered by TurboQuant 3-bit compression (ICLR 2026).

Updated Mar 28, 2026
Python

GenauraApp / TurboQuant

Star

Near-optimal vector quantization with zero metadata overhead — PyTorch SDK based on Google Research ICLR 2026

Updated Mar 25, 2026
Python

Improve this page

Add a description, image, and links to the turboquant topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the turboquant topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

turboquant

Here are 25 public repositories matching this topic...

arozanov / turboquant-mlx

Firmamento-Technologies / TurboQuant

back2matching / turboquant

artalis-io / bitnet.c

jjang-ai / exploitbot

LostBeard / SpawnDev.ILGPU.ML

diyagk01 / TurboRAG

Lucien2468 / Ollama-TurboQuant-Integration

amitshekhariitbhu / turboquant-experiment

yzamari / turboQuantPlayground

chahero / turboquant-experiments

wjddusrb03 / diffmind

tushu1232 / turboquant-server

wjddusrb03 / chatmind

Ryuketsukami / turboquant-skill

Ryuketsukami / turboquant-compression

wjddusrb03 / commitmind

Sunnyztj / turboquant-memory

wjddusrb03 / logmind

GenauraApp / TurboQuant

Improve this page

Add this topic to your repo