Skip to content

Commit 51110cd

Browse files
Sbussisoclaude
andcommitted
sentinel: multi-tenant agent — single deployed agent serves all orgs
Previous slice 3 design had one Fly app per org (single MCP key, single org_id env var, processor filtered foreign-org runs). That doesn't scale — one Fly deploy per customer is operational suicide. Switching to one shared agent that handles every org via per-call scoping at the MCP layer. Design agent → MCP request: Authorization: Bearer <SENTINEL_AGENT_MCP_KEY> X-Agent-Org-Override: <run.org_id> MCP server _resolve_org(): if bearer == SENTINEL_AGENT_MCP_KEY (constant-time compare): → require X-Agent-Org-Override header → use that as org_id → require org has Pro Plus + Sentinel enabled (defence in depth against the dispatcher's gate having lapsed between dispatch and pickup) → rate-limit on per-org bucket "sentinel-agent:{org_id}" so agent traffic for org X can't burn org Y's tool budget → audit-log with key_name="<sentinel-agent>" + the override org else: → existing per-org osc_* key flow (unchanged) The two auth paths are completely independent — leaking the agent key gives an attacker the ability to make MCP calls as any org, but doesn't forge a per-org osc_* key (which is bound to a specific org_id by hash). Changes 1. backend/app/core/config.py: - New env var SENTINEL_AGENT_MCP_KEY (distinct from SENTINEL_AGENT_KEY to keep blast radii separate; one is for run-lifecycle callbacks, the other is for MCP tool calls). 2. backend/app/mcp/server.py: - hmac added to imports + settings imported. - _auth() now also pulls the x-agent-org-override header so it's available to _resolve_org. - _resolve_org() branches: if bearer matches SENTINEL_AGENT_MCP_KEY, hand off to _resolve_via_agent_key (new function). - _resolve_via_agent_key() validates the override header, enforces plan + sentinel-enabled checks, applies a per-override-org rate limit bucket, sets the activity tracker context with key_name= "<sentinel-agent>" so audit attribution is preserved. The McpApiKey table is unchanged — the agent key isn't an osc_* row, it's a bearer that lives in env vars on both sides. No new schema, no encrypted-at-rest column. Honest trust model: the agent has the secret, therefore it can act as any org. Verified locally: - 549/549 backend tests green - Ruff clean - _resolve_org dispatches both paths correctly Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 8f4e4c6 commit 51110cd

2 files changed

Lines changed: 133 additions & 2 deletions

File tree

backend/app/core/config.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -166,5 +166,26 @@ def is_email_configured(cls) -> bool:
166166
# operator triggers a drain manually.
167167
SENTINEL_AGENT_WEBHOOK_URL: str = os.getenv("SENTINEL_AGENT_WEBHOOK_URL", "")
168168

169+
# Multi-tenant MCP credential for the Sentinel agent. ONE shared
170+
# secret used by the single multi-tenant agent across all orgs;
171+
# the agent presents this as its Bearer token AND sends an
172+
# ``X-Agent-Org-Override`` header per call to declare which org
173+
# the request is on behalf of. The MCP server validates the
174+
# bearer against this env var and scopes to the override org.
175+
#
176+
# Compromise blast radius: same as any service-to-service shared
177+
# secret — an attacker with this key can act as any org via the
178+
# MCP tools. Mitigations: only ever set as a Fly secret, audit-
179+
# log every call with the override org_id, rotate when needed.
180+
#
181+
# Leave blank in environments where the agent isn't deployed —
182+
# the MCP server then doesn't accept any request via this auth
183+
# path, which is the right closed-by-default behaviour.
184+
#
185+
# Distinct from SENTINEL_AGENT_KEY (which authenticates agent →
186+
# CC callbacks for run lifecycle) so a leak of one doesn't
187+
# automatically grant the capability of the other.
188+
SENTINEL_AGENT_MCP_KEY: str = os.getenv("SENTINEL_AGENT_MCP_KEY", "")
189+
169190

170191
settings = Config()

backend/app/mcp/server.py

Lines changed: 112 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
import contextvars
1414
import functools
1515
import hashlib
16+
import hmac
1617
import logging
1718
import threading
1819
import time
@@ -29,6 +30,7 @@
2930
from sqlalchemy.orm import Session
3031

3132
from app.api.hls import _segment_cache
33+
from app.core.config import settings
3234
from app.core.database import SessionLocal
3335
from app.mcp.activity import McpEvent, tracker
3436
from app.models.models import (
@@ -277,7 +279,17 @@ async def on_call_tool(self, context, call_next):
277279
# ---------------------------------------------------------------------------
278280

279281
def _resolve_org(headers: dict | None) -> tuple[str, Session]:
280-
"""Validate the Bearer token, enforce rate limit, return (org_id, db_session)."""
282+
"""Validate the Bearer token, enforce rate limit, return (org_id, db_session).
283+
284+
Two auth paths:
285+
1. Per-org MCP key (osc_*) — bearer matches an McpApiKey row;
286+
org_id comes from that row.
287+
2. Multi-tenant agent key — bearer matches the
288+
``SENTINEL_AGENT_MCP_KEY`` env var; org_id comes from the
289+
``X-Agent-Org-Override`` header. Used by the SourceBox
290+
Sentinel agent to make tool calls on behalf of any org
291+
it's processing a pending run for.
292+
"""
281293
if not headers:
282294
raise ToolError("Unauthorized: no headers present")
283295

@@ -289,6 +301,16 @@ def _resolve_org(headers: dict | None) -> tuple[str, Session]:
289301
if not raw_key:
290302
raise ToolError("Unauthorized: empty Bearer token")
291303

304+
# ── Path 2: agent multi-tenant key ──────────────────────────────
305+
# Constant-time compare against the configured agent key so a
306+
# length-leak on `==` doesn't reveal anything about the secret.
307+
# Empty agent key (unset env var) hard-rejects every attempt
308+
# because hmac.compare_digest("", anything) is False.
309+
agent_key = settings.SENTINEL_AGENT_MCP_KEY
310+
if agent_key and hmac.compare_digest(raw_key, agent_key):
311+
return _resolve_via_agent_key(headers, agent_key)
312+
313+
# ── Path 1: per-org osc_* key (existing behaviour) ──────────────
292314
key_hash = hashlib.sha256(raw_key.encode()).hexdigest()
293315

294316
db = SessionLocal()
@@ -351,9 +373,97 @@ def _resolve_org(headers: dict | None) -> tuple[str, Session]:
351373
raise ToolError("Authentication error") from None
352374

353375

376+
def _resolve_via_agent_key(headers: dict, _agent_key: str) -> tuple[str, Session]:
377+
"""Auth path for the multi-tenant Sentinel agent.
378+
379+
The bearer token has already been verified against
380+
``SENTINEL_AGENT_MCP_KEY`` by the caller. Now:
381+
382+
1. Read the override org_id from ``X-Agent-Org-Override``.
383+
2. Verify the override org actually has a Pro Plus plan
384+
(Sentinel is Pro-Plus-only) AND has Sentinel enabled.
385+
Otherwise the agent is acting on behalf of an org that
386+
shouldn't be served — likely a stale pending run, or
387+
impersonation if the secret leaked.
388+
3. Apply rate limits scoped to the override org so a runaway
389+
agent loop on org X can't burn org Y's tool budget.
390+
4. Audit-log via the standard tracker context with key_name
391+
set to "<sentinel-agent>" so the audit trail shows the
392+
tool call came from the agent (and which org it was for).
393+
"""
394+
override_org = headers.get("x-agent-org-override", "").strip()
395+
if not override_org:
396+
raise ToolError(
397+
"Unauthorized: agent key requires X-Agent-Org-Override header"
398+
)
399+
400+
db = SessionLocal()
401+
try:
402+
from app.core.plans import resolve_org_plan
403+
from app.models.models import SentinelConfig
404+
405+
plan = resolve_org_plan(db, override_org)
406+
# Reuse the same Pro/Pro Plus rate-limit table so the agent
407+
# respects per-org plan caps; if the override org isn't a
408+
# paying customer, hard-reject (Sentinel is Pro-Plus-only
409+
# downstream too, so this is defence in depth).
410+
limits = RATE_LIMITS.get(plan)
411+
if limits is None:
412+
db.close()
413+
raise ToolError(
414+
f"Agent override target org has no Pro/Pro Plus plan "
415+
f"(plan={plan!r})"
416+
)
417+
418+
# Defence in depth — the dispatcher already gates on
419+
# sentinel_config.enabled before creating a pending run, but
420+
# an operator could disable Sentinel between dispatch and
421+
# the agent picking it up. Don't run tool calls for an org
422+
# that's currently disabled.
423+
cfg = db.query(SentinelConfig).filter_by(org_id=override_org).first()
424+
if cfg is None or not cfg.enabled:
425+
db.close()
426+
raise ToolError("Sentinel disabled for this org")
427+
428+
# Per-org bucket so agent traffic for org X doesn't throttle
429+
# org Y. Distinct from per-osc-key buckets so direct dashboard
430+
# MCP usage isn't accidentally affected by agent activity on
431+
# the same org either.
432+
rate_bucket = f"sentinel-agent:{override_org}"
433+
allowed, _remaining, breach = _rate_limiter.check(
434+
rate_bucket,
435+
minute_limit=limits["minute"],
436+
daily_limit=limits["daily"],
437+
)
438+
if not allowed:
439+
db.close()
440+
if breach == "minute":
441+
raise ToolError(
442+
"Sentinel agent rate limit: too many tool calls in one "
443+
"minute for this org. Tune the per-camera cooldown or "
444+
"narrow the scope."
445+
)
446+
raise ToolError(
447+
"Sentinel agent daily cap reached for this org — check the "
448+
"agent's run log for a stuck loop."
449+
)
450+
451+
# Activity-tracker context — key_name is the audit handle that
452+
# surfaces in the MCP activity log.
453+
_ctx_org_id.set(override_org)
454+
_ctx_key_name.set("<sentinel-agent>")
455+
456+
return override_org, db
457+
except ToolError:
458+
raise
459+
except Exception:
460+
db.close()
461+
raise ToolError("Authentication error") from None
462+
463+
354464
def _auth():
355465
"""Shortcut: get headers, resolve org, return (org_id, db)."""
356-
headers = get_http_headers(include={"authorization"})
466+
headers = get_http_headers(include={"authorization", "x-agent-org-override"})
357467
return _resolve_org(headers)
358468

359469

0 commit comments

Comments
 (0)