feat: external control-plane protocol (key resolve, usage API, tenant BYO keys)#72
feat: external control-plane protocol (key resolve, usage API, tenant BYO keys)#72jmlago wants to merge 1 commit into
Conversation
… BYO keys) Adds an optional "bring your own control plane" integration, entirely off unless CONTROL_PLANE_URL + CONTROL_PLANE_INTERNAL_SECRET are set: - control_plane_client.py (new leaf module): resolve_key with positive/ negative caching, single-flight, and stale-grace served only while the control plane is unreachable; tenant_env (allowlisted, cached, fail-soft to platform keys); env_get — a request-scoped credential chain (ContextVar tenant map, then process env). - Ingress: _caller_auth_async falls through to the control plane only on a pure local miss (local stores stay authoritative; explicit local status=inactive is an operator kill-switch for a tenant slug). Plan rate limits ride the resolve response into the existing _rate_ok meta. x-llm-router-tenant / x-internal-secret are stripped from client input; the tenant header is stamped only from the authenticated resolve. - internal_api.py (new): GET /internal/usage[?bucket=day] and /internal/usage/recent, gated by x-internal-secret (hidden 404 while unconfigured), backed by new host_store.usage_totals (includes cached tokens) and recent_calls(caller=...). - Router: per-tenant BYO provider credentials — the shim activates the tenant env from the trusted header; serve.py threads control_plane_client.env_get into the openai-compatible + anthropic + google adapters (bedrock/codex/antseed stay platform-only; background source pollers never see tenant keys). _rate_ok/_route_allowed now accept the already-resolved meta, dropping a duplicate consumer_keys read per proxied request.
📝 WalkthroughWalkthroughThis PR introduces an optional external "bring your own control plane" integration. A new ChangesControl plane integration
Estimated code review effort: 4 (Complex) | ~75 minutes Sequence Diagram(s)sequenceDiagram
participant Client
participant AuthProxy
participant ControlPlaneClient
participant Upstream
Client->>AuthProxy: Request with API key
AuthProxy->>AuthProxy: Check local caller keys
alt key unknown locally and CP enabled
AuthProxy->>ControlPlaneClient: resolve_key(digest)
ControlPlaneClient-->>AuthProxy: ResolvedKey (tenant, limits) or None
end
AuthProxy->>AuthProxy: _route_allowed(meta), _rate_ok(meta)
AuthProxy->>Upstream: Forward request with x-llm-router-tenant header
Upstream-->>AuthProxy: Response
AuthProxy-->>Client: Response
sequenceDiagram
participant Shim
participant ControlPlaneClient
participant ProviderAdapter
Shim->>Shim: Read x-llm-router-tenant header
Shim->>ControlPlaneClient: tenant_env(tenant_id)
ControlPlaneClient-->>Shim: env mapping
Shim->>ControlPlaneClient: activate_tenant_env(env)
Shim->>ProviderAdapter: Handle request
ProviderAdapter->>ControlPlaneClient: env_get("API_KEY")
ControlPlaneClient-->>ProviderAdapter: Tenant or platform key
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
control_plane_client.py (2)
212-242: 🚀 Performance & Scalability | 🔵 Trivial | 💤 Low valueOptional:
tenant_envlacks the single-flight coalescing thatresolve_keyhas.On a cold cache or TTL expiry, concurrent requests for the same tenant will each issue a
/provider-envfetch (a small stampede against the control plane). It's fail-soft and short-lived, so this is minor, but mirroring the_resolve_inflightpattern here would bound the fan-out.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@control_plane_client.py` around lines 212 - 242, The tenant_env helper currently allows multiple concurrent cache misses to trigger duplicate /provider-env requests for the same tenant. Add single-flight coalescing similar to the _resolve_inflight pattern used by resolve_key so only one fetch runs per tenant_id while others await the same result; update tenant_env and any supporting cache/inflight state accordingly.
105-113: 🚀 Performance & Scalability | 🔵 Trivial | 💤 Low valueOptional: full-cache clear can evict live positive entries.
When the cache is full of still-valid entries,
_evict_if_full()clears everything, forcing a re-resolve stampede for active keys until the cache repopulates. Consider an LRU/oldest-first partial eviction to preserve hot positive entries.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@control_plane_client.py` around lines 105 - 113, The cache eviction logic in _evict_if_full() is too aggressive because it clears the entire _resolve_cache when the cache is still full after removing expired items, which can evict still-valid hot entries. Update _evict_if_full() to evict only the oldest or least-recently-used entries from _resolve_cache instead of calling clear(), preserving live positive entries while making room for new ones. Use the existing _resolve_cache, _RESOLVE_CACHE_MAX, and _entry_ttl(e) logic to identify expired entries first, then perform partial eviction for the remaining over-capacity entries.tests/test_auth_proxy_control_plane.py (1)
232-242: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueConsider mirroring the 404/403 gating tests for
/internal/usage/recent.Only
/internal/usagehas explicit tests for the unconfigured-secret (404) and wrong-secret (403) cases. Since both routes share_gate, this is low risk today, but a symmetric test would catch future regressions if the routes ever diverge.Also applies to: 266-279
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_auth_proxy_control_plane.py` around lines 232 - 242, Add symmetric gate coverage for the recent-usage endpoint by mirroring the existing `test_internal_usage_hidden_without_secret` and `test_internal_usage_wrong_secret_403` cases for `/internal/usage/recent`. Reuse the same `TestClient(auth_proxy.app)`, `CONTROL_PLANE_INTERNAL_SECRET` monkeypatch, and `x-internal-secret` header patterns to assert 404 when the secret is unset and 403 when the secret is incorrect, so `_gate` behavior is verified consistently across both routes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@auth_proxy.py`:
- Around line 3306-3316: The header sanitization in auth_proxy.py is missing
x-llm-router-caller, so a client can smuggle a mixed-case version through and
have it forwarded alongside the trusted value. Update the inbound header filter
in the request header-building block to strip x-llm-router-caller before setting
headers["x-llm-router-caller"] from the authenticated caller value, keeping the
trust boundary in the same place as the existing x-llm-router-tenant handling.
In `@tests/test_tenant_env_shim.py`:
- Around line 171-174: The streaming test helper in streaming_call is invoking
the async emit emitter without awaiting it, so the streaming path is never
exercised. Update the streaming_call helper to await emit with the expected
delta payload shape, and keep the assertion on cpc.env_get("OPENAI_API_KEY") in
place so the test still verifies the tenant env shim behavior.
---
Nitpick comments:
In `@control_plane_client.py`:
- Around line 212-242: The tenant_env helper currently allows multiple
concurrent cache misses to trigger duplicate /provider-env requests for the same
tenant. Add single-flight coalescing similar to the _resolve_inflight pattern
used by resolve_key so only one fetch runs per tenant_id while others await the
same result; update tenant_env and any supporting cache/inflight state
accordingly.
- Around line 105-113: The cache eviction logic in _evict_if_full() is too
aggressive because it clears the entire _resolve_cache when the cache is still
full after removing expired items, which can evict still-valid hot entries.
Update _evict_if_full() to evict only the oldest or least-recently-used entries
from _resolve_cache instead of calling clear(), preserving live positive entries
while making room for new ones. Use the existing _resolve_cache,
_RESOLVE_CACHE_MAX, and _entry_ttl(e) logic to identify expired entries first,
then perform partial eviction for the remaining over-capacity entries.
In `@tests/test_auth_proxy_control_plane.py`:
- Around line 232-242: Add symmetric gate coverage for the recent-usage endpoint
by mirroring the existing `test_internal_usage_hidden_without_secret` and
`test_internal_usage_wrong_secret_403` cases for `/internal/usage/recent`. Reuse
the same `TestClient(auth_proxy.app)`, `CONTROL_PLANE_INTERNAL_SECRET`
monkeypatch, and `x-internal-secret` header patterns to assert 404 when the
secret is unset and 403 when the secret is incorrect, so `_gate` behavior is
verified consistently across both routes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 66fb9615-9a00-4c13-8540-91e4b2db8c4c
📒 Files selected for processing (13)
.env.exampleauth_proxy.pycompose.ymlcontrol_plane_client.pyhost_store.pyinternal_api.pyproviders.pyserve.pyshim.pytests/test_auth_proxy_control_plane.pytests/test_control_plane_client.pytests/test_host_store.pytests/test_tenant_env_shim.py
| headers = { | ||
| k: v for k, v in request.headers.items() | ||
| if k.lower() not in {"authorization", "host", "connection", "content-length"} | ||
| if k.lower() not in {"authorization", "host", "connection", "content-length", | ||
| "x-llm-router-tenant", "x-internal-secret"} | ||
| } | ||
| headers["x-llm-router-caller"] = caller | ||
| # Tenant identity for per-tenant provider credentials (BYO keys): set ONLY | ||
| # from the authenticated resolve — the client-sent header is stripped above, | ||
| # so it can never be smuggled past auth. | ||
| if auth.get("tenant_id") is not None: | ||
| headers["x-llm-router-tenant"] = str(auth["tenant_id"]) |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm upstream trusts x-llm-router-caller and whether it's already stripped elsewhere.
rg -nP --type=py -C3 'x-llm-router-caller'Repository: genlayerlabs/unhardcoded
Length of output: 162
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n== file locations ==\n'
git ls-files | rg 'auth_proxy\.py$|README|pyproject|requirements|setup\.py|package\.json|\.py$' | head -n 50
printf '\n== header usages ==\n'
rg -n --hidden --glob '!**/.git/**' 'x-llm-router-(caller|tenant)|x-internal-secret' .
printf '\n== auth_proxy.py around target lines ==\n'
python3 - <<'PY'
from pathlib import Path
p = Path('auth_proxy.py')
lines = p.read_text().splitlines()
for start in (3296, 3306, 3316, 3326):
end = min(start+20, len(lines))
print(f'\n-- {start}-{end} --')
for i in range(start-1, end):
print(f'{i+1}: {lines[i]}')
PYRepository: genlayerlabs/unhardcoded
Length of output: 11616
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n== shim.py caller/tenant sections ==\n'
python3 - <<'PY'
from pathlib import Path
p = Path('shim.py')
lines = p.read_text().splitlines()
for start in (80, 270, 836, 882, 890):
end = min(start+30, len(lines))
print(f'\n-- {start}-{end} --')
for i in range(start-1, end):
print(f'{i+1}: {lines[i]}')
PY
printf '\n== auth_proxy control-plane tests around header forwarding ==\n'
python3 - <<'PY'
from pathlib import Path
p = Path('tests/test_auth_proxy_control_plane.py')
lines = p.read_text().splitlines()
for start in (110, 120, 126):
end = min(start+25, len(lines))
print(f'\n-- {start}-{end} --')
for i in range(start-1, end):
print(f'{i+1}: {lines[i]}')
PYRepository: genlayerlabs/unhardcoded
Length of output: 13243
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python3 - <<'PY'
def try_probe():
probes = []
try:
import httpx
h = httpx.Headers({"X-Llm-Router-Caller": "client", "x-llm-router-caller": "server"})
probes.append(("httpx", {
"type": type(h).__name__,
"items": list(h.items()),
"raw": getattr(h, "raw", None),
"get": h.get("x-llm-router-caller"),
}))
except Exception as e:
probes.append(("httpx_error", repr(e)))
try:
from starlette.datastructures import Headers
h = Headers(raw=[
(b"X-Llm-Router-Caller", b"client"),
(b"x-llm-router-caller", b"server"),
])
probes.append(("starlette", {
"type": type(h).__name__,
"items": list(h.items()),
"raw": getattr(h, "raw", None),
"get": h.get("x-llm-router-caller"),
}))
except Exception as e:
probes.append(("starlette_error", repr(e)))
for name, data in probes:
print(name, data)
try_probe()
PYRepository: genlayerlabs/unhardcoded
Length of output: 561
Strip x-llm-router-caller from inbound headers too
x-llm-router-caller is consumed downstream for session ownership and request attribution, so a client-supplied mixed-case variant can survive the filter and get forwarded alongside the injected value. Exclude it before re-adding the trusted caller header.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@auth_proxy.py` around lines 3306 - 3316, The header sanitization in
auth_proxy.py is missing x-llm-router-caller, so a client can smuggle a
mixed-case version through and have it forwarded alongside the trusted value.
Update the inbound header filter in the request header-building block to strip
x-llm-router-caller before setting headers["x-llm-router-caller"] from the
authenticated caller value, keeping the trust boundary in the same place as the
existing x-llm-router-tenant handling.
| async def streaming_call(request, emit): | ||
| seen.append(cpc.env_get("OPENAI_API_KEY")) | ||
| emit({"delta": "hi"}) | ||
| return _ok_result() |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Await the streaming emitter with the expected delta type.
emit is async; calling it without await leaves the coroutine unexecuted, so this test does not exercise the streaming emission path.
Suggested fix
async def streaming_call(request, emit):
seen.append(cpc.env_get("OPENAI_API_KEY"))
- emit({"delta": "hi"})
+ await emit("hi")
return _ok_result()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def streaming_call(request, emit): | |
| seen.append(cpc.env_get("OPENAI_API_KEY")) | |
| emit({"delta": "hi"}) | |
| return _ok_result() | |
| async def streaming_call(request, emit): | |
| seen.append(cpc.env_get("OPENAI_API_KEY")) | |
| await emit("hi") | |
| return _ok_result() |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_tenant_env_shim.py` around lines 171 - 174, The streaming test
helper in streaming_call is invoking the async emit emitter without awaiting it,
so the streaming path is never exercised. Update the streaming_call helper to
await emit with the expected delta payload shape, and keep the assertion on
cpc.env_get("OPENAI_API_KEY") in place so the test still verifies the tenant env
shim behavior.
|
Consumer side is up: genlayerlabs/unhardcoded-cloud#1 (pins this branch as the |
What
Optional "bring your own control plane" integration for the host — entirely off unless
CONTROL_PLANE_URL+CONTROL_PLANE_INTERNAL_SECRETare set. Nothing SaaS-specific lands here: the host speaks a small generic HTTP contract any external control plane can implement (first consumer:unhardcoded-cloud).Contract
GET {CP}/internal/keys/resolve?sha256=…→{active, consumer, tenant_id, rate_per_min, burst};GET {CP}/internal/tenants/<id>/provider-env→{env: {NAME: secret}}. Both carryx-internal-secret.GET /internal/usage?caller=…&since_ts=…[&bucket=day]andGET /internal/usage/recent?caller=…&limit=…, same shared secret, hidden (404) while unconfigured.Pieces
control_plane_client.py(new leaf, shared by ingress + router): resolve cache (60s positive / 15s negative TTL, single-flight, stale-grace 300s served only while the CP is unreachable — a definitive answer always replaces the cache),tenant_env(allowlist-filtered, cached, fail-soft to platform keys),env_get(ContextVar tenant map → process env)._caller_auth_async— local key stores stay authoritative; only a pure local miss consults the CP. Resolved caller = tenant slug (usage attribution lands incalls.callerfor free). Plan rate limits ride the resolve into the existing_rate_ok. Operator kill-switch: a localstatus=inactiverecord for the slug blocks it.x-llm-router-tenant/x-internal-secretstripped from client input; tenant header stamped only post-auth.serve.pythreadsenv_getinto openai-compatible/anthropic/google adapters (bedrock/codex/antseed stay platform-only; source pollers never see tenant keys). Context propagates through streaming/flowcreate_taskboundaries — covered by tests._rate_ok/_route_allowednow reuse the already-resolved meta → one fewer Postgres read per proxied request.Known limits (documented)
_sync_provider_auth_statereads the platform env).cost_usdis still stamped on BYO calls (informational for the CP).Tests
nix-shell --run 'python -m pytest tests -q'→ 518 passed, 2 skipped (38 new: client cache/grace/single-flight, auth fallback + header hygiene + kill-switch + rate limits, /internal/* gating + shapes + catch-all regression, BYO isolation incl. concurrent tenants and streaming, store aggregates).Part 1 of the SaaS integration; the Django side (unhardcoded-cloud) consumes this branch. Do not merge yet — jm reviews tomorrow morning.
Summary by CodeRabbit
New Features
Bug Fixes