Predict rate-limit cutoffs inside your AI agent. Local MCP server + CLI that watches your flat-rate AI subscription window โ Claude Max, ChatGPT Plus / Pro / Team, GitHub Copilot, Cursor, Mistral Le Chat Pro, Codex Plus โ and tells the agent, before you hit the cap, to
continue,slow_down,switch_model, orwait_for_reset.
Docs: https://felixgeelhaar.github.io/tokenops/ ยท Releases: https://github.com/felixgeelhaar/tokenops/releases
brew install felixgeelhaar/tap/tokenopsOr via Go:
go install github.com/felixgeelhaar/tokenops/cmd/tokenops@latestOr grab a prebuilt binary from the releases page (darwin amd64/arm64, linux amd64/arm64, windows amd64).
tokenops init --detect # sniff installed AI clients, print plan-set commands
tokenops plan set anthropic claude-max-20x # bind whatever fits (paste from --detect output)
tokenops start # daemon, listens 127.0.0.1:7878 + tokenops.localWire MCP into your agent:
{
"mcpServers": {
"tokenops": { "command": "tokenops", "args": ["serve"] }
}
}Ask the agent for any of: tokenops_session_budget, tokenops_burn_rate,
tokenops_dashboard, tokenops_plan_headroom. Or open the browser dashboard
the agent links you to (http://tokenops.local:7878/dashboard?token=โฆ).
| ๐งฎ 13 plan catalog | Claude Max 5x/20x, Claude Pro, Claude Code (Max + Pro), ChatGPT Plus / Pro / Team, GitHub Copilot Individual / Business, Cursor Pro / Business, Mistral Le Chat Pro, Codex Plus โ each with a dated vendor source URL pinned in code |
| ๐ Provider-agnostic | OpenAI, Anthropic, Google Gemini, Mistral through the same proxy and event schema |
| ๐ Interactive dashboard | Vue 3 + D3 dashboard at /dashboard โ cost line, per-model stacked area, tokens-per-bucket, KPI tiles, 15s auto-refresh, provider + model filters that persist across refresh |
| ๐ mDNS-discoverable | Daemon advertises tokenops.local over zeroconf so the dashboard URL is memorable on every host |
| ๐ Dashboard auth | Shared-secret token, auto-minted on first start, accepted via header / query / cookie. tokenops dashboard rotate-token revokes |
| ๐ก Vendor /usage ingestion | Live per-turn JSONL readers for Claude Code (~/.claude/projects/) and Codex CLI (~/.codex/sessions/), plus GitHub Copilot OAuth quota, Cursor cookie scrape, Anthropic cookie scraper (only source of the official Claude Max weekly %). Each source has a tokenops vendor-usage enable <source> wizard with env-var fallback for secrets |
| ๐ฐ Cache-aware pricing | Claude + Codex cache reads bill at ~10% of the new-input rate. For agent-heavy workloads cache reads are >95% of input โ the dashboard Cache hit: XX.X% tile + cost-aware aggregator make the difference between a naive $94k estimate and the real $10k. Per-provider rate cards ship in code |
| ๐งช Per-project / per-session attribution | JSONL pollers stamp agent_id = "claude-code:<project>" and workflow_id = "claude-code:<project>:<session>" (analogous for Codex). group=agent answers "which project burns the most"; coach finds per-session waste |
| ๐ง Prompt coach | tokenops coach prompts heuristic feedback on your real prompting patterns โ length distribution, vague/ack/repeat detection, concrete recommendations. Auto-discovers Claude Code + Codex JSONLs. Prompt text never persisted. Ranked recommendations (v0.18) project tangible savings: turns ร tokens ร dollars ร hours per win |
| ๐ Reply coach | tokenops coach replies detects output-compression patterns (caveman skill, article density, filler density) per session |
| โฑ๏ธ Task boundaries | tokenops task start "fix X" / done / list --metrics โ operator-marked task units persisted to ~/.tokenops/tasks.jsonl. List view rolls up turns / cost / TTFUO / cost-per-turn from the events store within each task window |
| ๐ 8-KPI agent scorecard | FVT / TEU / SAC (wedge) plus CHR / CGR / RGR / TCS / DAR (agent-workflow), all graded AโF against tuneable thresholds. v0.21.1 honest grading: TEU N/A when optimiser isn't wired; autonomous-loop sentinels filtered from CGR; columnโpayload attribution sync so SAC reflects reality |
| ๐ฏ Honest signal quality | Every prediction carries signal_quality.level (low / medium / high) plus a one-line caveat. Heuristic mode is labelled; proxied mode is labelled |
| ๐ค MCP-first | 25 MCP tools agents call directly. Inline SVG sparkline + headroom gauge rendered in markdown so every MCP client shows them today |
| ๐ง Dynamic-cheapest coaching | Coaching pipeline picks the lowest blended-rate model per provider at runtime from the pricing table โ no hardcoded model names |
| ๐พ Local-first, open source | SQLite database, no cloud account, no telemetry. Apache 2.0. Demo-data isolation by default so synthetic seeds never contaminate the real signal |
See docs/architecture-ddd.md for the bounded contexts and layer rules; docs/plan-cost-model.md for the plan catalog model.
init Scaffold config (sqlite + rules on); --detect sniffs installed clients
start Run the daemon (proxy + analytics + bus + dashboard)
serve MCP server over stdio
demo Seed 7d of synthetic events
status Daemon health + blockers[] / next_actions[]
spend [--forecast] Spend / burn / 7d forecast
plan {list|set|headroom|catalog} Subscription plan headroom
provider {list|set|unset} Upstream LLM provider URLs
vendor-usage {status|backfill} Inspect / backfill vendor-side pollers
dashboard rotate-token Mint + persist a fresh dashboard auth token
config show Active configuration (redacted)
audit Query audit log
events Per-kind domain-event counts
rules {analyze|conflicts|...} Rule intelligence
scorecard Wedge KPI scorecard
coverage-debt Risk-weighted coverage debt
eval Optimizer eval harness + gate
replay <id> Replay a session through the optimizer
Every CLI verb has a matching MCP tool (tokenops_<name>).
Default install reports low confidence (MCP pings only). Two zero-network upgrades:
# ~/.config/tokenops/config.yaml
vendor_usage:
claude_code:
enabled: true # reads ~/.claude/stats-cache.json
interval: 60s
anthropic:
enabled: true # calls Anthropic Admin API
admin_key: sk-ant-admin-โฆ # mint in claude.com console
interval: 5mtokenops vendor-usage status shows whether the pollers are emitting; use
tokenops vendor-usage backfill --hours 168 to pull a week of history from
Anthropic Admin in one shot after configuring the key.
The Anthropic Admin API only covers metered API usage. Claude Max plan window state has no documented endpoint and stays heuristic โ the cache reader is the only locally-available Max signal and reports daily granularity with an explicit caveat.
Clients / SDKs / CLIs / MCP hosts
|
v
Local TokenOps daemon (Go)
/ | \
Proxy MCP Dashboard
| server /api/*
v | |
Provider routes Vue+D3
(OpenAI/Anth/Gem/Mistral)
|
v
SQLite event store
|
v
Spend / forecast / coaching
DDD-organised: contexts under internal/contexts/<ctx>/<pkg>, adapters
(cli, mcp, proxy) stay flat. Layering enforced by internal/archlint
(go test ./internal/archlint/...).
cmd/{tokenops,tokenopsd}/ # binaries
internal/
contexts/ # bounded contexts (rules, spend, security, ...)
cli/ # cobra subcommands
mcp/ # MCP tool surface
proxy/ # HTTP server + dashboard
daemon/ # boot sequence
storage/sqlite/ # event store
pkg/eventschema/ # public envelope + payload types
web/docs/ # VitePress docs site
.roady/ # spec-driven planning
When a subsystem is off, the matching routes return 503 with a structured
{error, hint} body instead of 404. tokenops status (and the MCP
tokenops_status tool, and GET /readyz) surface stable identifiers in
blockers[] plus the exact command in next_actions[]:
| Blocker | Fix |
|---|---|
storage_disabled |
tokenops init then restart |
rules_disabled |
tokenops init then restart |
providers_unconfigured |
tokenops provider set โฆ |
tokenops demo writes synthetic PromptEvents tagged source=demo. Every
default rollup filters them out so first-run exploration never contaminates
production numbers. Pass --include-demo (CLI) or include_demo: true (MCP
tool input) to see the synthetic breakdown alongside real traffic.
See CONTRIBUTING.md, CODE_OF_CONDUCT.md,
and SECURITY.md. Plans and tasks live in .roady/ (see
roady).
See CHANGELOG.md โ latest is v0.21.1.
Apache License 2.0. See LICENSE.