#

thinking-tokens

Here is 1 public repository matching this topic...

Siddhant-K-code / ThinkBudget

Adaptive inference budget controller - caps how much your LLM thinks per query, tracks GPU cost in real-time.

gpu inference self-hosted reasoning ai-agents gpu-monitoring cost-optimization budget-control llm vllm thinking-tokens

Updated Mar 15, 2026
Python

Improve this page

Add a description, image, and links to the thinking-tokens topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the thinking-tokens topic, visit your repo's landing page and select "manage topics."