Training-free KV cache compression for LLMs. 10-33x compression via E8 lattice quantization + attention-aware token eviction. One line of code.
compression transformers inference pytorch attention llama quantization memory-efficient mistral vector-quantization kv-cache llm long-context e8-lattice token-eviction
-
Updated
Apr 8, 2026 - Python