Adaptive inference budget controller - caps how much your LLM thinks per query, tracks GPU cost in real-time.
gpu inference self-hosted reasoning ai-agents gpu-monitoring cost-optimization budget-control llm vllm thinking-tokens
-
Updated
Mar 15, 2026 - Python