⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
-
Updated
Oct 8, 2024 - Python
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Qwen3-8B quantization study across vLLM, TensorRT-LLM, AutoRound, INT8, and MXFP4
Production runbook for Qwen3.5-122B hybrid INT4+FP8 on NVIDIA DGX Spark GB10 — optimization stack, PD firmware wedge diagnosis, bench results
Stable long-context Cogni-Brain agent on DGX Spark: Qwen3.5-122B-A10B INT4 AutoRound, 262K context, ~40 TPS, 100/100 Tool-Eval, vLLM.
Add a description, image, and links to the autoround topic page so that developers can more easily learn about it.
To associate your repository with the autoround topic, visit your repo's landing page and select "manage topics."