27b

Here are 2 public repositories matching this topic...

hec-ovi / vllm-awq4-qwen

vLLM Qwen 3.6-27B (AWQ-INT4) + DFlash speculative decoding on AMD Strix Halo (gfx1151 iGPU, 128 GB UMA, ROCm 7.13). 24.8 t/s single-stream, vision, tool calling, 256K context, OpenAI-compatible, Docker. Matches DGX Spark FP8+DFlash+MTP at a third of the cost. No CUDA.

docker rocm openai-api awq vllm llm-inference speculative-decoding multimodal-llm qwen3 gfx1151 ryzen-ai-max dflash amd-strix-halo rdna35 27b

Updated May 2, 2026
Python

aphroditeformal93 / vllm-awq4-qwen

Star

Run Qwen 3.6-27B AWQ-INT4 models with DFlash speculative decoding on AMD Strix Halo hardware using vLLM for high-throughput inference.

docker rocm openai-api awq vllm llm-inference speculative-decoding multimodal-llm qwen3 gfx1151 ryzen-ai-max dflash amd-strix-halo rdna35 27b

Updated May 7, 2026
Python

Improve this page

Add a description, image, and links to the 27b topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the 27b topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

27b

Here are 2 public repositories matching this topic...

hec-ovi / vllm-awq4-qwen

aphroditeformal93 / vllm-awq4-qwen

Improve this page

Add this topic to your repo