High-performance CPU KV-cache quantization engine for LLM inference (~10× speedup, 4× memory reduction) with Python & PyTorch support.
python inference pytorch simd attention avx2 quantization kv-cache llm cpu-optimization machine-learning-performance
-
Updated
Apr 25, 2026 - C++