FastAPI LLM pool for local and OpenAI-compatible remote inference, with scheduling, replicas, metrics, and admin APIs.
-
Updated
May 17, 2026 - Python
FastAPI LLM pool for local and OpenAI-compatible remote inference, with scheduling, replicas, metrics, and admin APIs.
Add a description, image, and links to the exllamav3 topic page so that developers can more easily learn about it.
To associate your repository with the exllamav3 topic, visit your repo's landing page and select "manage topics."