Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,9 @@ Zero LLM calls for file generation. ~20ms for structure, ~2s with oracle prose.

- **OAuth 2.1 with PKCE** — GitHub SSO, Google SSO, and email/password authentication
- **Backend adapter pattern** — tool catalogs aggregated from multiple service bindings, namespaced to avoid collisions
- **Per-tier rate limiting** — fixed-window per-tenant limits via `RATELIMIT_KV` (free=20/min, hobby=60, pro=300, enterprise=1000); 429 with `Retry-After` and `X-RateLimit-*` headers
- **Cost attribution & quota** — every tool call carries a credit cost; quota is reserved via `edge-auth` before dispatch and committed/refunded on outcome; `image_generate` cost scales with `quality_tier` (1×/1×/3×/5×/8× for draft/standard/premium/ultra/ultra_plus)
- **Scope + tier enforcement** — `tools/list` is filtered by token scopes; `tools/call` requires the `generate` scope for mutating tools; expensive `image_generate` quality tiers (`premium` and above) are gated to Pro+ plans
- **Security Constitution compliance** — every tool declares a risk level (`READ_ONLY`, `LOCAL_MUTATION`, `EXTERNAL_MUTATION`); structured audit logging with secret redaction; HMAC-signed identity tokens
- **Coming-soon gate** — `PUBLIC_SIGNUPS_ENABLED` flag to control public access
- **MCP JSON-RPC over HTTP** — supports both streaming (SSE) and request/response transport
Expand Down Expand Up @@ -84,7 +87,8 @@ Deploys to the `mcp.stackbilt.dev` custom domain via Cloudflare Workers.
| `AUTH_SERVICE` | Service Binding | RPC to `edge-auth` worker (`AuthEntrypoint`) |
| `STACKBILDER` | Service Binding | Route to `edge-stack-architect-v2` worker |
| `IMG_FORGE` | Service Binding | Route to `img-forge-mcp` worker |
| `OAUTH_KV` | KV Namespace | Stores social OAuth state (5-min TTL entries) |
| `OAUTH_KV` | KV Namespace | Stores social OAuth state (5-min TTL entries) and MCP sessions |
| `RATELIMIT_KV` | KV Namespace | Per-tenant fixed-window rate-limit counters (60s TTL) |
| `PLATFORM_EVENTS_QUEUE` | Queue | Audit event pipeline (`stackbilt-user-events`) |
| `MCP_REGISTRY_AUTH` | Variable | MCP Registry domain verification string (served at `/.well-known/mcp-registry-auth`) |

Expand Down
67 changes: 56 additions & 11 deletions docs/api-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ Returns the aggregated tool catalog from all backend adapters.

Tools are namespaced by product (e.g. `image_generate`, `flow_create`). Each tool includes a JSON Schema for its `inputSchema`.

The catalog is **filtered by token scope**: tokens without the `generate` scope only see tools with risk level `READ_ONLY`. The full catalog is visible only to tokens that hold `generate`.

### `tools/call`

Invokes a tool on the appropriate backend.
Expand All @@ -122,11 +124,15 @@ Invokes a tool on the appropriate backend.
The gateway:
1. Validates the tool name exists in the catalog
2. Looks up the risk level from the route table
3. Generates a trace ID for audit
4. Proxies the call to the appropriate backend service binding
5. Parses the response (JSON or SSE)
6. Emits a structured audit event (to console + queue)
7. Returns the tool result
3. Enforces scope: tools with risk level `LOCAL_MUTATION`, `EXTERNAL_MUTATION`, or `DESTRUCTIVE` require the `generate` scope (rejected with `INVALID_REQUEST` and audit outcome `insufficient_scope`)
4. Enforces tier-restricted quality tiers for `image_generate` (`premium`, `ultra`, `ultra_plus` rejected for free/hobby plans with audit outcome `tier_denied`)
5. Reserves quota via `AUTH_SERVICE.consumeQuota` (cost from `src/cost-attribution.ts`); rejects with `INVALID_PARAMS` and outcome `tier_denied` if exceeded
6. Generates a trace ID for audit
7. Proxies the call to the appropriate backend service binding
8. Settles quota (commit on success, refund on failure) via `commitOrRefundQuota`
9. Parses the response (JSON or SSE)
10. Emits a structured audit event (to console + queue)
11. Returns the tool result, with `X-RateLimit-Limit`, `X-RateLimit-Remaining`, and `X-RateLimit-Reset` headers attached on success

### `ping`

Expand Down Expand Up @@ -319,10 +325,48 @@ This replaces cookies in the stateless OAuth flow, keeping the gateway fully sta

## Scopes

| Scope | Allows |
|-------|--------|
| `generate` | Create content — images, architecture flows |
| `read` | View resources — models, job status, flow details |
| Scope | Allows | Enforced where |
|-------|--------|----------------|
| `generate` | Create content — images, scaffolds, architecture flows | `tools/list` filter (mutation tools hidden without it); `tools/call` for any tool with risk level `LOCAL_MUTATION`, `EXTERNAL_MUTATION`, or `DESTRUCTIVE` |
| `read` | View resources — models, job status, flow details | All `READ_ONLY` tools always visible |

Both scopes are granted by default to new tokens issued via the gateway's OAuth flow.

---

## Rate Limiting

The gateway enforces a per-tenant fixed-window rate limit on every authenticated MCP request. Limits are tier-driven:

| Tier | Requests / minute |
|------|-------------------|
| Free | 20 |
| Hobby | 60 |
| Pro | 300 |
| Enterprise | 1,000 |

When exceeded, the gateway returns `429 Too Many Requests` with:

| Header | Meaning |
|--------|---------|
| `Retry-After` | Seconds until the current window resets |
| `X-RateLimit-Limit` | Tier ceiling (e.g. `20`) |
| `X-RateLimit-Remaining` | Always `0` on a 429 response |
| `X-RateLimit-Reset` | Unix timestamp when the window resets |

The same `X-RateLimit-*` headers are attached to successful `tools/call` responses so clients can pace themselves. `initialize`, `tools/list`, `ping`, and notifications currently do **not** echo rate-limit headers on success — those calls still count against the window, just without surfacing the counter to the client.

The window is fixed (aligned to the start of each 60-second slot), not sliding.

---

## Quota & Cost Attribution

Mutating tool calls reserve credits via `AUTH_SERVICE.consumeQuota` before dispatch. The cost table lives in `src/cost-attribution.ts`; `image_generate` cost is `5 × quality multiplier` where multipliers are `draft=1, standard=1, premium=3, ultra=5, ultra_plus=8`. Read-only tools (`*_status`, `*_classify`, `image_list_models`, etc.) are free.

If quota is exceeded, the call is rejected with `INVALID_PARAMS` and the message `Quota exceeded for <tool>`.

For free and hobby tiers, `image_generate` quality tiers above `standard` are rejected at the gateway with `Quality tier "<x>" requires a Pro plan or higher` — these calls do not reach the backend or consume quota.

---

Expand All @@ -334,7 +378,7 @@ Standard MCP JSON-RPC error codes:
|------|---------|
| `-32600` | Invalid request |
| `-32601` | Method not found |
| `-32602` | Invalid params |
| `-32602` | Invalid params (also used for `Quota exceeded` and `Quality tier requires Pro plan` rejections) |
| `-32603` | Internal error |

HTTP-level errors:
Expand All @@ -343,9 +387,10 @@ HTTP-level errors:
|--------|---------|
| `400` | Missing or malformed request |
| `401` | Invalid or expired token (`invalid_token`) |
| `403` | Rate limited or payment delinquent (`insufficient_scope`) |
| `403` | `insufficient_scope` (token lacks a required scope) or auth-service-level denial |
| `404` | Unknown path |
| `405` | Method not allowed |
| `429` | Per-tenant rate limit exceeded (see [Rate Limiting](#rate-limiting)) |

---

Expand Down
29 changes: 25 additions & 4 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,11 @@ enum RiskLevel {
| `image.list_models` | IMG_FORGE | READ_ONLY |
| `image.check_job` | IMG_FORGE | READ_ONLY |

Risk levels are used for audit classification, not for authorization enforcement — all authenticated users can call all tools within their quota.
Risk levels drive both audit classification AND authorization:

- **`tools/list` filter** — `READ_ONLY` tools are visible to any authenticated session; tools with any other risk level are hidden from sessions that lack the `generate` scope.
- **`tools/call` enforcement** — `LOCAL_MUTATION`, `EXTERNAL_MUTATION`, and `DESTRUCTIVE` tools require the `generate` scope and return `INVALID_REQUEST` with audit outcome `insufficient_scope` otherwise.
- **Tier-restricted quality tiers** — `image_generate` arguments with `quality_tier` of `premium`, `ultra`, or `ultra_plus` require a Pro+ plan; free/hobby calls are rejected at the gateway with audit outcome `tier_denied` (see `enforceTierRestriction` in `src/gateway.ts`).

## Audit — `audit.ts`

Expand Down Expand Up @@ -222,10 +226,27 @@ Bearer token extraction and validation for non-OAuth paths:

### Rate Limiting

Enforced by `AUTH_SERVICE` (delegated to the auth worker). The gateway receives:
Two independent layers:

1. **Gateway-side, per-tenant fixed-window limiter** (`src/rate-limiter.ts`) — counts every authenticated MCP request against a 60-second window in `RATELIMIT_KV`. Tier-driven ceiling: free=20, hobby=60, pro=300, enterprise=1000 req/min. Exceeding returns `429` with `Retry-After` and `X-RateLimit-*` headers. Window starts are aligned to `now - (now % 60)` so all tenants share the same boundaries.
2. **Auth-service-side checks** — `AUTH_SERVICE` may still reject upstream with:
- `insufficient_scope` (403) — payment delinquent
- `invalid_token` (401) — expired or invalid token

The gateway-side limiter fires first (immediately after auth resolution) and short-circuits before any quota reserve or backend dispatch. Read-only and free tools both count against the limiter — only the `tools/call` quota path is gated by `isFreeTool`.

### Quota & Cost Attribution

`src/cost-attribution.ts` declares per-tool credit costs and an `image_generate` quality multiplier (`draft=1, standard=1, premium=3, ultra=5, ultra_plus=8` × `image_generate.baseCost: 5`). On `tools/call`:

1. Resolve cost via `resolveToolCost(toolName, args)`.
2. If cost is non-zero, call `AUTH_SERVICE.consumeQuota({tenantId, userId, feature, amount})`. On failure, reject with `INVALID_PARAMS` and audit outcome `tier_denied` (overloaded — see follow-ups).
3. Dispatch to the backend.
4. Settle via `AUTH_SERVICE.commitOrRefundQuota(reservationId, success|failed)`. Settlement is best-effort; reservations auto-expire on the auth side if it fails.

The gateway never holds canonical quota state — it is a metering/dispatch layer in front of `edge-auth`.

- `insufficient_scope` (403) — rate limited or payment delinquent
- `invalid_token` (401) — expired or invalid token
> **Note:** the `/api/scaffold` REST endpoint (used by the CLI) bypasses both the rate limiter and the quota/cost-attribution path. CLI traffic is unmetered today; only `/mcp` traffic exercises this enforcement layer.

## Dependencies

Expand Down
69 changes: 56 additions & 13 deletions docs/user-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Stackbilt exposes AI tools through the [Model Context Protocol](https://modelcon
| `flow_advance` | Advance a flow to the next stage | LOCAL_MUTATION |
| `flow_recover` | Recover a failed flow | LOCAL_MUTATION |

**Free tier**: 50 credits/month. No credit card required. Credits are weighted by operation complexity.
**Free tier**: 25 credits/month. No credit card required. Credits are weighted by operation complexity. See [§5 Quota & Billing](#5-quota--billing) for the full table.

---

Expand Down Expand Up @@ -237,7 +237,7 @@ The client calls `image_generate` with your prompt. img-forge enhances the promp
}
```

**Quality tiers**: `draft` (fastest, SDXL), `standard` (FLUX Klein, default), `premium` (FLUX Dev), `ultra` (Gemini 2.5 Flash), `ultra_plus` (Gemini 3.1 Flash).
**Quality tiers**: `draft` (fastest, SDXL), `standard` (FLUX Klein, default), `premium` (FLUX Dev), `ultra` (Gemini 2.5 Flash), `ultra_plus` (Gemini 3.1 Flash). See [§5 Quota & Billing](#5-quota--billing) for credit costs and plan availability — `premium` and above require Pro or Enterprise.

### Classify Intent

Expand Down Expand Up @@ -318,22 +318,62 @@ Both scopes are granted by default on the free tier.

## 5. Quota & Billing

### Monthly credit allocation

| Tier | Credits/month | Price |
|------|--------------|-------|
| Free | 50 | $0 |
| Pro | 500 | Coming soon |
| Enterprise | 2,000 | Coming soon |
| Free | 25 | $0 |
| Hobby | 65 | Coming soon |
| Pro | 580 | Coming soon |
| Enterprise | 2,320 | Coming soon |

### Per-call credit cost

Most read-only tools (`*_status`, `*_classify`, `*_summary`, `*_quality`, `*_governance`, `*_pages`, `image_list_models`, `image_check_job`) cost **0 credits**. Mutating tools have a base cost:

| Tool | Base cost |
|------|-----------|
| `image_generate` | 5 credits × quality multiplier (see below) |
| `scaffold_create` | 2 credits |
| `scaffold_publish` | 3 credits |
| `scaffold_deploy` | 5 credits |
| `scaffold_import` | 1 credit |
| `flow_create` | 2 credits |
| `visual_screenshot` | 1 credit |
| `visual_analyze` | 2 credits |

### `image_generate` quality multipliers

| Quality tier | Multiplier | Effective cost | Available on |
|--------------|-----------|----------------|--------------|
| `draft` | 1× | 5 credits | All tiers |
| `standard` | 1× | 5 credits | All tiers |
| `premium` | 3× | 15 credits | Pro + Enterprise only |
| `ultra` | 5× | 25 credits | Pro + Enterprise only |
| `ultra_plus` | 8× | 40 credits | Pro + Enterprise only |

Credits are weighted by operation:
Free and Hobby plans can request `draft` or `standard` only. Calls with higher quality tiers are rejected at the gateway with `Quality tier "<x>" requires a Pro plan or higher`.

| Operation | Credits |
|-----------|---------|
| Draft quality | 1x |
| Standard quality | 2x |
| Premium quality | 5x |
| Ultra quality | 10x |
### How metering works

Your remaining quota is tracked automatically. When you hit the limit, tool calls return a quota error until the next billing cycle.
1. Before each call, the gateway reserves credits via `edge-auth`'s `consumeQuota` RPC.
2. If the reservation succeeds, the tool runs and the reservation is committed (success) or refunded (failure) via `commitOrRefundQuota`.
3. If the reservation fails (insufficient quota), the call is rejected with `Quota exceeded for <tool>`.

Free-tier quota resets monthly. When you hit the limit, tool calls return a quota error until the next cycle.

### Rate limits

Independent of credit quota, every authenticated MCP request counts against a per-tenant fixed-window rate limit:

| Tier | Requests / minute |
|------|-------------------|
| Free | 20 |
| Hobby | 60 |
| Pro | 300 |
| Enterprise | 1,000 |

When a request would exceed the limit, the gateway returns `429 Too Many Requests` with `Retry-After: <seconds>` and `X-RateLimit-Limit` / `X-RateLimit-Remaining` / `X-RateLimit-Reset` headers. The same headers are also attached to successful `tools/call` responses so clients can pace themselves; other MCP methods (`initialize`, `tools/list`, `ping`, notifications) currently do not echo rate-limit headers on success.

---

Expand All @@ -359,6 +399,9 @@ Pass `github_token` as a parameter with a GitHub PAT that has `repo` scope. Or a
### Quota exceeded
Check your usage at the beginning of each month. Free tier resets monthly. Upgrade options coming soon.

### Rate limited (HTTP 429)
You exceeded your tier's per-minute request budget (free=20, hobby=60, pro=300, enterprise=1000). Wait the number of seconds in the `Retry-After` response header and resume. The window is fixed (60s aligned), not sliding.

---

## 7. Security
Expand Down
2 changes: 1 addition & 1 deletion src/audit.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ export interface AuditArtifact {
risk_level: RiskLevel | 'UNKNOWN';
policy_decision: 'ALLOW' | 'DENY';
redacted_input_summary: string;
outcome: 'success' | 'error' | 'backend_error' | 'auth_denied' | 'unknown_tool' | 'invalid_params' | 'tier_denied' | 'insufficient_scope';
outcome: 'success' | 'error' | 'backend_error' | 'auth_denied' | 'unknown_tool' | 'invalid_params' | 'tier_denied' | 'insufficient_scope' | 'rate_limited';
timestamp: string;
latency_ms?: number;
}
Expand Down
Loading
Loading