Sandermage / genesis-vllm-patches Star 74 Code Issues Pull requests Discussions vLLM patcher for Qwen3.6 on consumer NVIDIA — Qwen3.6-35B-A3B-FP8 (192 tok/s, +68% over stock) + Qwen3.6-27B-int4-AutoRound + 256K context. 126 patches: TurboQuant k8v4 KV, MTP/DFlash spec-decode, FULL cudagraph, hybrid GDN streaming, structured boot summary, one-command installer, 1958 tests. v7.72.2. cuda nvidia moe gdn ampere structured-output long-context fp8 vllm llm-inference qwen speculative-decoding tool-calling qwen3 rtx-3090 runtime-patches dflash turboquant ampere-sm86 rtx-a5000 Updated May 5, 2026 Python