autoround

Here are 6 public repositories matching this topic...

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Qwen3.5-122B-A10B on DGX Spark: 28.3 → 51 tok/s (+80%)

cuda lossless mtp speedup performance-optimization vllm autoround dgx-spark qwen3-5 sm121 qwen3-5-122b-a10b

vLLM for nVidia Volta (sm_70)

cuda pytorch cmp v100 bnb xformers vllm bitsandbytes autoround sm70 sm-70 100-210

Qwen3-8B quantization study across vLLM, TensorRT-LLM, AutoRound, INT8, and MXFP4

quantization int8 vllm smoothquant tensorrt-llm autoround qwen3 mxfp4

Production runbook for Qwen3.5-122B hybrid INT4+FP8 on NVIDIA DGX Spark GB10 — optimization stack, PD firmware wedge diagnosis, bench results

Stable long-context Cogni-Brain agent on DGX Spark: Qwen3.5-122B-A10B INT4 AutoRound, 262K context, ~40 TPS, 100/100 Tool-Eval, vLLM.

benchmark telegram-bot nvidia reasoning tool-use long-context vllm llm-agent local-llm local-ai qwen autoround openai-compatible dgx-spark qwen3-5-122b-a10b

Add a description, image, and links to the autoround topic page so that developers can more easily learn about it.

To associate your repository with the autoround topic, visit your repo's landing page and select "manage topics."