Stackbilt-dev · stackbilt-admin · Apr 18, 2026 · Apr 4, 2026 · Apr 17, 2026 · Apr 18, 2026
diff --git a/README.md b/README.md
@@ -42,6 +42,9 @@ Zero LLM calls for file generation. ~20ms for structure, ~2s with oracle prose.
 
 - **OAuth 2.1 with PKCE** — GitHub SSO, Google SSO, and email/password authentication
 - **Backend adapter pattern** — tool catalogs aggregated from multiple service bindings, namespaced to avoid collisions
+- **Per-tier rate limiting** — fixed-window per-tenant limits via `RATELIMIT_KV` (free=20/min, hobby=60, pro=300, enterprise=1000); 429 with `Retry-After` and `X-RateLimit-*` headers
+- **Cost attribution & quota** — every tool call carries a credit cost; quota is reserved via `edge-auth` before dispatch and committed/refunded on outcome; `image_generate` cost scales with `quality_tier` (1×/1×/3×/5×/8× for draft/standard/premium/ultra/ultra_plus)
+- **Scope + tier enforcement** — `tools/list` is filtered by token scopes; `tools/call` requires the `generate` scope for mutating tools; expensive `image_generate` quality tiers (`premium` and above) are gated to Pro+ plans
 - **Security Constitution compliance** — every tool declares a risk level (`READ_ONLY`, `LOCAL_MUTATION`, `EXTERNAL_MUTATION`); structured audit logging with secret redaction; HMAC-signed identity tokens
 - **Coming-soon gate** — `PUBLIC_SIGNUPS_ENABLED` flag to control public access
 - **MCP JSON-RPC over HTTP** — supports both streaming (SSE) and request/response transport
@@ -84,7 +87,8 @@ Deploys to the `mcp.stackbilt.dev` custom domain via Cloudflare Workers.
 | `AUTH_SERVICE` | Service Binding | RPC to `edge-auth` worker (`AuthEntrypoint`) |
 | `STACKBILDER` | Service Binding | Route to `edge-stack-architect-v2` worker |
 | `IMG_FORGE` | Service Binding | Route to `img-forge-mcp` worker |
-| `OAUTH_KV` | KV Namespace | Stores social OAuth state (5-min TTL entries) |
+| `OAUTH_KV` | KV Namespace | Stores social OAuth state (5-min TTL entries) and MCP sessions |
+| `RATELIMIT_KV` | KV Namespace | Per-tenant fixed-window rate-limit counters (60s TTL) |
 | `PLATFORM_EVENTS_QUEUE` | Queue | Audit event pipeline (`stackbilt-user-events`) |
 | `MCP_REGISTRY_AUTH` | Variable | MCP Registry domain verification string (served at `/.well-known/mcp-registry-auth`) |
 

diff --git a/docs/api-reference.md b/docs/api-reference.md
@@ -104,6 +104,8 @@ Returns the aggregated tool catalog from all backend adapters.
 
 Tools are namespaced by product (e.g. `image_generate`, `flow_create`). Each tool includes a JSON Schema for its `inputSchema`.
 
+The catalog is **filtered by token scope**: tokens without the `generate` scope only see tools with risk level `READ_ONLY`. The full catalog is visible only to tokens that hold `generate`.
+
 ### `tools/call`
 
 Invokes a tool on the appropriate backend.
@@ -122,11 +124,15 @@ Invokes a tool on the appropriate backend.
 The gateway:
 1. Validates the tool name exists in the catalog
 2. Looks up the risk level from the route table
-3. Generates a trace ID for audit
-4. Proxies the call to the appropriate backend service binding
-5. Parses the response (JSON or SSE)
-6. Emits a structured audit event (to console + queue)
-7. Returns the tool result
+3. Enforces scope: tools with risk level `LOCAL_MUTATION`, `EXTERNAL_MUTATION`, or `DESTRUCTIVE` require the `generate` scope (rejected with `INVALID_REQUEST` and audit outcome `insufficient_scope`)
+4. Enforces tier-restricted quality tiers for `image_generate` (`premium`, `ultra`, `ultra_plus` rejected for free/hobby plans with audit outcome `tier_denied`)
+5. Reserves quota via `AUTH_SERVICE.consumeQuota` (cost from `src/cost-attribution.ts`); rejects with `INVALID_PARAMS` and outcome `tier_denied` if exceeded
+6. Generates a trace ID for audit
+7. Proxies the call to the appropriate backend service binding
+8. Settles quota (commit on success, refund on failure) via `commitOrRefundQuota`
+9. Parses the response (JSON or SSE)
+10. Emits a structured audit event (to console + queue)
+11. Returns the tool result, with `X-RateLimit-Limit`, `X-RateLimit-Remaining`, and `X-RateLimit-Reset` headers attached on success
 
 ### `ping`
 
@@ -319,10 +325,48 @@ This replaces cookies in the stateless OAuth flow, keeping the gateway fully sta
 
 ## Scopes
 
-| Scope | Allows |
-|-------|--------|
-| `generate` | Create content — images, architecture flows |
-| `read` | View resources — models, job status, flow details |
+| Scope | Allows | Enforced where |
+|-------|--------|----------------|
+| `generate` | Create content — images, scaffolds, architecture flows | `tools/list` filter (mutation tools hidden without it); `tools/call` for any tool with risk level `LOCAL_MUTATION`, `EXTERNAL_MUTATION`, or `DESTRUCTIVE` |
+| `read` | View resources — models, job status, flow details | All `READ_ONLY` tools always visible |
+
+Both scopes are granted by default to new tokens issued via the gateway's OAuth flow.
+
+---
+
+## Rate Limiting
+
+The gateway enforces a per-tenant fixed-window rate limit on every authenticated MCP request. Limits are tier-driven:
+
+| Tier | Requests / minute |
+|------|-------------------|
+| Free | 20 |
+| Hobby | 60 |
+| Pro | 300 |
+| Enterprise | 1,000 |
+
+When exceeded, the gateway returns `429 Too Many Requests` with:
+
+| Header | Meaning |
+|--------|---------|
+| `Retry-After` | Seconds until the current window resets |
+| `X-RateLimit-Limit` | Tier ceiling (e.g. `20`) |
+| `X-RateLimit-Remaining` | Always `0` on a 429 response |
+| `X-RateLimit-Reset` | Unix timestamp when the window resets |
+
+The same `X-RateLimit-*` headers are attached to successful `tools/call` responses so clients can pace themselves. `initialize`, `tools/list`, `ping`, and notifications currently do **not** echo rate-limit headers on success — those calls still count against the window, just without surfacing the counter to the client.
+
+The window is fixed (aligned to the start of each 60-second slot), not sliding.
+
+---
+
+## Quota & Cost Attribution
+
+Mutating tool calls reserve credits via `AUTH_SERVICE.consumeQuota` before dispatch. The cost table lives in `src/cost-attribution.ts`; `image_generate` cost is `5 × quality multiplier` where multipliers are `draft=1, standard=1, premium=3, ultra=5, ultra_plus=8`. Read-only tools (`*_status`, `*_classify`, `image_list_models`, etc.) are free.
+
+If quota is exceeded, the call is rejected with `INVALID_PARAMS` and the message `Quota exceeded for <tool>`.
+
+For free and hobby tiers, `image_generate` quality tiers above `standard` are rejected at the gateway with `Quality tier "<x>" requires a Pro plan or higher` — these calls do not reach the backend or consume quota.
 
 ---
 
@@ -334,7 +378,7 @@ Standard MCP JSON-RPC error codes:
 |------|---------|
 | `-32600` | Invalid request |
 | `-32601` | Method not found |
-| `-32602` | Invalid params |
+| `-32602` | Invalid params (also used for `Quota exceeded` and `Quality tier requires Pro plan` rejections) |
 | `-32603` | Internal error |
 
 HTTP-level errors:
@@ -343,9 +387,10 @@ HTTP-level errors:
 |--------|---------|
 | `400` | Missing or malformed request |
 | `401` | Invalid or expired token (`invalid_token`) |
-| `403` | Rate limited or payment delinquent (`insufficient_scope`) |
+| `403` | `insufficient_scope` (token lacks a required scope) or auth-service-level denial |
 | `404` | Unknown path |
 | `405` | Method not allowed |
+| `429` | Per-tenant rate limit exceeded (see [Rate Limiting](#rate-limiting)) |
 
 ---
 

diff --git a/docs/architecture.md b/docs/architecture.md
@@ -166,7 +166,11 @@ enum RiskLevel {
 | `image.list_models` | IMG_FORGE | READ_ONLY |
 | `image.check_job` | IMG_FORGE | READ_ONLY |
 
-Risk levels are used for audit classification, not for authorization enforcement — all authenticated users can call all tools within their quota.
+Risk levels drive both audit classification AND authorization:
+
+- **`tools/list` filter** — `READ_ONLY` tools are visible to any authenticated session; tools with any other risk level are hidden from sessions that lack the `generate` scope.
+- **`tools/call` enforcement** — `LOCAL_MUTATION`, `EXTERNAL_MUTATION`, and `DESTRUCTIVE` tools require the `generate` scope and return `INVALID_REQUEST` with audit outcome `insufficient_scope` otherwise.
+- **Tier-restricted quality tiers** — `image_generate` arguments with `quality_tier` of `premium`, `ultra`, or `ultra_plus` require a Pro+ plan; free/hobby calls are rejected at the gateway with audit outcome `tier_denied` (see `enforceTierRestriction` in `src/gateway.ts`).
 
 ## Audit — `audit.ts`
 
@@ -222,10 +226,27 @@ Bearer token extraction and validation for non-OAuth paths:
 
 ### Rate Limiting
 
-Enforced by `AUTH_SERVICE` (delegated to the auth worker). The gateway receives:
+Two independent layers:
+
+1. **Gateway-side, per-tenant fixed-window limiter** (`src/rate-limiter.ts`) — counts every authenticated MCP request against a 60-second window in `RATELIMIT_KV`. Tier-driven ceiling: free=20, hobby=60, pro=300, enterprise=1000 req/min. Exceeding returns `429` with `Retry-After` and `X-RateLimit-*` headers. Window starts are aligned to `now - (now % 60)` so all tenants share the same boundaries.
+2. **Auth-service-side checks** — `AUTH_SERVICE` may still reject upstream with:
+   - `insufficient_scope` (403) — payment delinquent
+   - `invalid_token` (401) — expired or invalid token
+
+The gateway-side limiter fires first (immediately after auth resolution) and short-circuits before any quota reserve or backend dispatch. Read-only and free tools both count against the limiter — only the `tools/call` quota path is gated by `isFreeTool`.
+
+### Quota & Cost Attribution
+
+`src/cost-attribution.ts` declares per-tool credit costs and an `image_generate` quality multiplier (`draft=1, standard=1, premium=3, ultra=5, ultra_plus=8` × `image_generate.baseCost: 5`). On `tools/call`:
+
+1. Resolve cost via `resolveToolCost(toolName, args)`.
+2. If cost is non-zero, call `AUTH_SERVICE.consumeQuota({tenantId, userId, feature, amount})`. On failure, reject with `INVALID_PARAMS` and audit outcome `tier_denied` (overloaded — see follow-ups).
+3. Dispatch to the backend.
+4. Settle via `AUTH_SERVICE.commitOrRefundQuota(reservationId, success|failed)`. Settlement is best-effort; reservations auto-expire on the auth side if it fails.
+
+The gateway never holds canonical quota state — it is a metering/dispatch layer in front of `edge-auth`.
 
-- `insufficient_scope` (403) — rate limited or payment delinquent
-- `invalid_token` (401) — expired or invalid token
+> **Note:** the `/api/scaffold` REST endpoint (used by the CLI) bypasses both the rate limiter and the quota/cost-attribution path. CLI traffic is unmetered today; only `/mcp` traffic exercises this enforcement layer.
 
 ## Dependencies
 

diff --git a/docs/user-guide.md b/docs/user-guide.md
@@ -26,7 +26,7 @@ Stackbilt exposes AI tools through the [Model Context Protocol](https://modelcon
 | `flow_advance` | Advance a flow to the next stage | LOCAL_MUTATION |
 | `flow_recover` | Recover a failed flow | LOCAL_MUTATION |
 
-**Free tier**: 50 credits/month. No credit card required. Credits are weighted by operation complexity.
+**Free tier**: 25 credits/month. No credit card required. Credits are weighted by operation complexity. See [§5 Quota & Billing](#5-quota--billing) for the full table.
 
 ---
 
@@ -237,7 +237,7 @@ The client calls `image_generate` with your prompt. img-forge enhances the promp
 }
 ```
 
-**Quality tiers**: `draft` (fastest, SDXL), `standard` (FLUX Klein, default), `premium` (FLUX Dev), `ultra` (Gemini 2.5 Flash), `ultra_plus` (Gemini 3.1 Flash).
+**Quality tiers**: `draft` (fastest, SDXL), `standard` (FLUX Klein, default), `premium` (FLUX Dev), `ultra` (Gemini 2.5 Flash), `ultra_plus` (Gemini 3.1 Flash). See [§5 Quota & Billing](#5-quota--billing) for credit costs and plan availability — `premium` and above require Pro or Enterprise.
 
 ### Classify Intent
 
@@ -318,22 +318,62 @@ Both scopes are granted by default on the free tier.
 
 ## 5. Quota & Billing
 
+### Monthly credit allocation
+
 | Tier | Credits/month | Price |
 |------|--------------|-------|
-| Free | 50 | $0 |
-| Pro | 500 | Coming soon |
-| Enterprise | 2,000 | Coming soon |
+| Free | 25 | $0 |
+| Hobby | 65 | Coming soon |
+| Pro | 580 | Coming soon |
+| Enterprise | 2,320 | Coming soon |
+
+### Per-call credit cost
+
+Most read-only tools (`*_status`, `*_classify`, `*_summary`, `*_quality`, `*_governance`, `*_pages`, `image_list_models`, `image_check_job`) cost **0 credits**. Mutating tools have a base cost:
+
+| Tool | Base cost |
+|------|-----------|
+| `image_generate` | 5 credits × quality multiplier (see below) |
+| `scaffold_create` | 2 credits |
+| `scaffold_publish` | 3 credits |
+| `scaffold_deploy` | 5 credits |
+| `scaffold_import` | 1 credit |
+| `flow_create` | 2 credits |
+| `visual_screenshot` | 1 credit |
+| `visual_analyze` | 2 credits |
+
+### `image_generate` quality multipliers
+
+| Quality tier | Multiplier | Effective cost | Available on |
+|--------------|-----------|----------------|--------------|
+| `draft` | 1× | 5 credits | All tiers |
+| `standard` | 1× | 5 credits | All tiers |
+| `premium` | 3× | 15 credits | Pro + Enterprise only |
+| `ultra` | 5× | 25 credits | Pro + Enterprise only |
+| `ultra_plus` | 8× | 40 credits | Pro + Enterprise only |
 
-Credits are weighted by operation:
+Free and Hobby plans can request `draft` or `standard` only. Calls with higher quality tiers are rejected at the gateway with `Quality tier "<x>" requires a Pro plan or higher`.
 
-| Operation | Credits |
-|-----------|---------|
-| Draft quality | 1x |
-| Standard quality | 2x |
-| Premium quality | 5x |
-| Ultra quality | 10x |
+### How metering works
 
-Your remaining quota is tracked automatically. When you hit the limit, tool calls return a quota error until the next billing cycle.
+1. Before each call, the gateway reserves credits via `edge-auth`'s `consumeQuota` RPC.
+2. If the reservation succeeds, the tool runs and the reservation is committed (success) or refunded (failure) via `commitOrRefundQuota`.
+3. If the reservation fails (insufficient quota), the call is rejected with `Quota exceeded for <tool>`.
+
+Free-tier quota resets monthly. When you hit the limit, tool calls return a quota error until the next cycle.
+
+### Rate limits
+
+Independent of credit quota, every authenticated MCP request counts against a per-tenant fixed-window rate limit:
+
+| Tier | Requests / minute |
+|------|-------------------|
+| Free | 20 |
+| Hobby | 60 |
+| Pro | 300 |
+| Enterprise | 1,000 |
+
+When a request would exceed the limit, the gateway returns `429 Too Many Requests` with `Retry-After: <seconds>` and `X-RateLimit-Limit` / `X-RateLimit-Remaining` / `X-RateLimit-Reset` headers. The same headers are also attached to successful `tools/call` responses so clients can pace themselves; other MCP methods (`initialize`, `tools/list`, `ping`, notifications) currently do not echo rate-limit headers on success.
 
 ---
 
@@ -359,6 +399,9 @@ Pass `github_token` as a parameter with a GitHub PAT that has `repo` scope. Or a
 ### Quota exceeded
 Check your usage at the beginning of each month. Free tier resets monthly. Upgrade options coming soon.
 
+### Rate limited (HTTP 429)
+You exceeded your tier's per-minute request budget (free=20, hobby=60, pro=300, enterprise=1000). Wait the number of seconds in the `Retry-After` response header and resume. The window is fixed (60s aligned), not sliding.
+
 ---
 
 ## 7. Security

diff --git a/src/audit.ts b/src/audit.ts
@@ -12,7 +12,7 @@ export interface AuditArtifact {
   risk_level: RiskLevel | 'UNKNOWN';
   policy_decision: 'ALLOW' | 'DENY';
   redacted_input_summary: string;
-  outcome: 'success' | 'error' | 'backend_error' | 'auth_denied' | 'unknown_tool' | 'invalid_params' | 'tier_denied' | 'insufficient_scope';
+  outcome: 'success' | 'error' | 'backend_error' | 'auth_denied' | 'unknown_tool' | 'invalid_params' | 'tier_denied' | 'insufficient_scope' | 'rate_limited';
   timestamp: string;
   latency_ms?: number;
 }