Skip to content

feat: external control-plane protocol (key resolve, usage API, tenant BYO keys)#72

Open
jmlago wants to merge 1 commit into
mainfrom
feat/control-plane-integration
Open

feat: external control-plane protocol (key resolve, usage API, tenant BYO keys)#72
jmlago wants to merge 1 commit into
mainfrom
feat/control-plane-integration

Conversation

@jmlago

@jmlago jmlago commented Jul 4, 2026

Copy link
Copy Markdown
Member

What

Optional "bring your own control plane" integration for the host — entirely off unless CONTROL_PLANE_URL + CONTROL_PLANE_INTERNAL_SECRET are set. Nothing SaaS-specific lands here: the host speaks a small generic HTTP contract any external control plane can implement (first consumer: unhardcoded-cloud).

Contract

  • Outbound: GET {CP}/internal/keys/resolve?sha256=…{active, consumer, tenant_id, rate_per_min, burst}; GET {CP}/internal/tenants/<id>/provider-env{env: {NAME: secret}}. Both carry x-internal-secret.
  • Inbound (ingress): GET /internal/usage?caller=…&since_ts=…[&bucket=day] and GET /internal/usage/recent?caller=…&limit=…, same shared secret, hidden (404) while unconfigured.

Pieces

  • control_plane_client.py (new leaf, shared by ingress + router): resolve cache (60s positive / 15s negative TTL, single-flight, stale-grace 300s served only while the CP is unreachable — a definitive answer always replaces the cache), tenant_env (allowlist-filtered, cached, fail-soft to platform keys), env_get (ContextVar tenant map → process env).
  • Ingress: _caller_auth_async — local key stores stay authoritative; only a pure local miss consults the CP. Resolved caller = tenant slug (usage attribution lands in calls.caller for free). Plan rate limits ride the resolve into the existing _rate_ok. Operator kill-switch: a local status=inactive record for the slug blocks it. x-llm-router-tenant/x-internal-secret stripped from client input; tenant header stamped only post-auth.
  • Router BYO keys: shim activates the tenant env from the trusted header; serve.py threads env_get into openai-compatible/anthropic/google adapters (bedrock/codex/antseed stay platform-only; source pollers never see tenant keys). Context propagates through streaming/flow create_task boundaries — covered by tests.
  • Perf side-effect: _rate_ok/_route_allowed now reuse the already-resolved meta → one fewer Postgres read per proxied request.

Known limits (documented)

  • Revocation latency ≤60s (≤TTL+grace during a CP outage).
  • BYO only overrides providers the platform has enabled (_sync_provider_auth_state reads the platform env).
  • cost_usd is still stamped on BYO calls (informational for the CP).

Tests

nix-shell --run 'python -m pytest tests -q'518 passed, 2 skipped (38 new: client cache/grace/single-flight, auth fallback + header hygiene + kill-switch + rate limits, /internal/* gating + shapes + catch-all regression, BYO isolation incl. concurrent tenants and streaming, store aggregates).

Part 1 of the SaaS integration; the Django side (unhardcoded-cloud) consumes this branch. Do not merge yet — jm reviews tomorrow morning.

Summary by CodeRabbit

  • New Features

    • Added optional control-plane support for tenant-aware key resolution and provider credentials.
    • Introduced internal usage endpoints for viewing caller-specific usage and recent calls.
    • Enabled per-tenant routing and request-scoped credentials for supported providers.
  • Bug Fixes

    • Improved access control and header filtering for proxy traffic.
    • Added safer fallback behavior when control-plane lookups fail or are unavailable.
    • Preserved local operator overrides and kill-switch behavior when resolving callers.

… BYO keys)

Adds an optional "bring your own control plane" integration, entirely off
unless CONTROL_PLANE_URL + CONTROL_PLANE_INTERNAL_SECRET are set:

- control_plane_client.py (new leaf module): resolve_key with positive/
  negative caching, single-flight, and stale-grace served only while the
  control plane is unreachable; tenant_env (allowlisted, cached, fail-soft
  to platform keys); env_get — a request-scoped credential chain
  (ContextVar tenant map, then process env).
- Ingress: _caller_auth_async falls through to the control plane only on a
  pure local miss (local stores stay authoritative; explicit local
  status=inactive is an operator kill-switch for a tenant slug). Plan rate
  limits ride the resolve response into the existing _rate_ok meta.
  x-llm-router-tenant / x-internal-secret are stripped from client input;
  the tenant header is stamped only from the authenticated resolve.
- internal_api.py (new): GET /internal/usage[?bucket=day] and
  /internal/usage/recent, gated by x-internal-secret (hidden 404 while
  unconfigured), backed by new host_store.usage_totals (includes cached
  tokens) and recent_calls(caller=...).
- Router: per-tenant BYO provider credentials — the shim activates the
  tenant env from the trusted header; serve.py threads
  control_plane_client.env_get into the openai-compatible + anthropic +
  google adapters (bedrock/codex/antseed stay platform-only; background
  source pollers never see tenant keys).

_rate_ok/_route_allowed now accept the already-resolved meta, dropping a
duplicate consumer_keys read per proxied request.
@coderabbitai

coderabbitai Bot commented Jul 4, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces an optional external "bring your own control plane" integration. A new control_plane_client module resolves API keys and tenant provider credentials against an external control plane with caching/grace-period fallback. auth_proxy.py adds async caller authentication with control-plane fallback, tightened header forwarding, and metadata-driven route/rate authorization. A new internal_api.py exposes secret-gated /internal/usage and /internal/usage/recent endpoints backed by new host_store.usage_totals()/updated recent_calls(). providers.py, serve.py, and shim.py thread per-tenant credential resolution (env_get) into provider adapters. Configuration and compose files add the new environment variables, and extensive tests cover all new behaviors.

Changes

Control plane integration

Layer / File(s) Summary
Control plane client module
control_plane_client.py, tests/test_control_plane_client.py
New module implementing resolve_key, tenant_env, caching with TTL/grace, secret validation, ContextVar-based tenant env override, and lifecycle helpers, with corresponding unit tests.
Async caller auth and authorization
auth_proxy.py
Adds _caller_auth_async() with control-plane fallback, updates _route_allowed()/_rate_ok() to use auth metadata, tightens forwarded headers, adds x-llm-router-tenant, registers internal_api.router, and closes the client on shutdown.
Internal metering API
internal_api.py, host_store.py, tests/test_auth_proxy_control_plane.py, tests/test_host_store.py
New secret-gated /internal/usage and /internal/usage/recent endpoints backed by new host_store.usage_totals() and updated recent_calls(caller=...).
Tenant credential threading
providers.py, serve.py, shim.py, tests/test_tenant_env_shim.py
Adds optional env_get to provider adapter factories/dispatchers and a shim _activate_tenant helper activating per-tenant credentials before request handling.
Configuration
.env.example, compose.yml
Adds CONTROL_PLANE_URL and CONTROL_PLANE_INTERNAL_SECRET environment variables for router and ingress services.

Estimated code review effort: 4 (Complex) | ~75 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant AuthProxy
  participant ControlPlaneClient
  participant Upstream
  Client->>AuthProxy: Request with API key
  AuthProxy->>AuthProxy: Check local caller keys
  alt key unknown locally and CP enabled
    AuthProxy->>ControlPlaneClient: resolve_key(digest)
    ControlPlaneClient-->>AuthProxy: ResolvedKey (tenant, limits) or None
  end
  AuthProxy->>AuthProxy: _route_allowed(meta), _rate_ok(meta)
  AuthProxy->>Upstream: Forward request with x-llm-router-tenant header
  Upstream-->>AuthProxy: Response
  AuthProxy-->>Client: Response
Loading
sequenceDiagram
  participant Shim
  participant ControlPlaneClient
  participant ProviderAdapter
  Shim->>Shim: Read x-llm-router-tenant header
  Shim->>ControlPlaneClient: tenant_env(tenant_id)
  ControlPlaneClient-->>Shim: env mapping
  Shim->>ControlPlaneClient: activate_tenant_env(env)
  Shim->>ProviderAdapter: Handle request
  ProviderAdapter->>ControlPlaneClient: env_get("API_KEY")
  ControlPlaneClient-->>ProviderAdapter: Tenant or platform key
Loading

Possibly related PRs

  • genlayerlabs/unhardcoded#25: Both PRs touch the native provider adapter factories, threading an env_get parameter for request-scoped credential lookup.
  • genlayerlabs/unhardcoded#36: Both PRs modify host_store.py's recent_calls() signature/behavior for caller scoping.
  • genlayerlabs/unhardcoded#38: The main PR's usage_totals()/recent_calls() changes including tokens_cached build on this PR's calls table schema evolution.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: an external control-plane integration for key resolution, usage APIs, and tenant-scoped BYO credentials.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/control-plane-integration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
control_plane_client.py (2)

212-242: 🚀 Performance & Scalability | 🔵 Trivial | 💤 Low value

Optional: tenant_env lacks the single-flight coalescing that resolve_key has.

On a cold cache or TTL expiry, concurrent requests for the same tenant will each issue a /provider-env fetch (a small stampede against the control plane). It's fail-soft and short-lived, so this is minor, but mirroring the _resolve_inflight pattern here would bound the fan-out.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@control_plane_client.py` around lines 212 - 242, The tenant_env helper
currently allows multiple concurrent cache misses to trigger duplicate
/provider-env requests for the same tenant. Add single-flight coalescing similar
to the _resolve_inflight pattern used by resolve_key so only one fetch runs per
tenant_id while others await the same result; update tenant_env and any
supporting cache/inflight state accordingly.

105-113: 🚀 Performance & Scalability | 🔵 Trivial | 💤 Low value

Optional: full-cache clear can evict live positive entries.

When the cache is full of still-valid entries, _evict_if_full() clears everything, forcing a re-resolve stampede for active keys until the cache repopulates. Consider an LRU/oldest-first partial eviction to preserve hot positive entries.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@control_plane_client.py` around lines 105 - 113, The cache eviction logic in
_evict_if_full() is too aggressive because it clears the entire _resolve_cache
when the cache is still full after removing expired items, which can evict
still-valid hot entries. Update _evict_if_full() to evict only the oldest or
least-recently-used entries from _resolve_cache instead of calling clear(),
preserving live positive entries while making room for new ones. Use the
existing _resolve_cache, _RESOLVE_CACHE_MAX, and _entry_ttl(e) logic to identify
expired entries first, then perform partial eviction for the remaining
over-capacity entries.
tests/test_auth_proxy_control_plane.py (1)

232-242: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Consider mirroring the 404/403 gating tests for /internal/usage/recent.

Only /internal/usage has explicit tests for the unconfigured-secret (404) and wrong-secret (403) cases. Since both routes share _gate, this is low risk today, but a symmetric test would catch future regressions if the routes ever diverge.

Also applies to: 266-279

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_auth_proxy_control_plane.py` around lines 232 - 242, Add symmetric
gate coverage for the recent-usage endpoint by mirroring the existing
`test_internal_usage_hidden_without_secret` and
`test_internal_usage_wrong_secret_403` cases for `/internal/usage/recent`. Reuse
the same `TestClient(auth_proxy.app)`, `CONTROL_PLANE_INTERNAL_SECRET`
monkeypatch, and `x-internal-secret` header patterns to assert 404 when the
secret is unset and 403 when the secret is incorrect, so `_gate` behavior is
verified consistently across both routes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@auth_proxy.py`:
- Around line 3306-3316: The header sanitization in auth_proxy.py is missing
x-llm-router-caller, so a client can smuggle a mixed-case version through and
have it forwarded alongside the trusted value. Update the inbound header filter
in the request header-building block to strip x-llm-router-caller before setting
headers["x-llm-router-caller"] from the authenticated caller value, keeping the
trust boundary in the same place as the existing x-llm-router-tenant handling.

In `@tests/test_tenant_env_shim.py`:
- Around line 171-174: The streaming test helper in streaming_call is invoking
the async emit emitter without awaiting it, so the streaming path is never
exercised. Update the streaming_call helper to await emit with the expected
delta payload shape, and keep the assertion on cpc.env_get("OPENAI_API_KEY") in
place so the test still verifies the tenant env shim behavior.

---

Nitpick comments:
In `@control_plane_client.py`:
- Around line 212-242: The tenant_env helper currently allows multiple
concurrent cache misses to trigger duplicate /provider-env requests for the same
tenant. Add single-flight coalescing similar to the _resolve_inflight pattern
used by resolve_key so only one fetch runs per tenant_id while others await the
same result; update tenant_env and any supporting cache/inflight state
accordingly.
- Around line 105-113: The cache eviction logic in _evict_if_full() is too
aggressive because it clears the entire _resolve_cache when the cache is still
full after removing expired items, which can evict still-valid hot entries.
Update _evict_if_full() to evict only the oldest or least-recently-used entries
from _resolve_cache instead of calling clear(), preserving live positive entries
while making room for new ones. Use the existing _resolve_cache,
_RESOLVE_CACHE_MAX, and _entry_ttl(e) logic to identify expired entries first,
then perform partial eviction for the remaining over-capacity entries.

In `@tests/test_auth_proxy_control_plane.py`:
- Around line 232-242: Add symmetric gate coverage for the recent-usage endpoint
by mirroring the existing `test_internal_usage_hidden_without_secret` and
`test_internal_usage_wrong_secret_403` cases for `/internal/usage/recent`. Reuse
the same `TestClient(auth_proxy.app)`, `CONTROL_PLANE_INTERNAL_SECRET`
monkeypatch, and `x-internal-secret` header patterns to assert 404 when the
secret is unset and 403 when the secret is incorrect, so `_gate` behavior is
verified consistently across both routes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 66fb9615-9a00-4c13-8540-91e4b2db8c4c

📥 Commits

Reviewing files that changed from the base of the PR and between 23ffc37 and 5bbc419.

📒 Files selected for processing (13)
  • .env.example
  • auth_proxy.py
  • compose.yml
  • control_plane_client.py
  • host_store.py
  • internal_api.py
  • providers.py
  • serve.py
  • shim.py
  • tests/test_auth_proxy_control_plane.py
  • tests/test_control_plane_client.py
  • tests/test_host_store.py
  • tests/test_tenant_env_shim.py

Comment thread auth_proxy.py
Comment on lines 3306 to +3316
headers = {
k: v for k, v in request.headers.items()
if k.lower() not in {"authorization", "host", "connection", "content-length"}
if k.lower() not in {"authorization", "host", "connection", "content-length",
"x-llm-router-tenant", "x-internal-secret"}
}
headers["x-llm-router-caller"] = caller
# Tenant identity for per-tenant provider credentials (BYO keys): set ONLY
# from the authenticated resolve — the client-sent header is stripped above,
# so it can never be smuggled past auth.
if auth.get("tenant_id") is not None:
headers["x-llm-router-tenant"] = str(auth["tenant_id"])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm upstream trusts x-llm-router-caller and whether it's already stripped elsewhere.
rg -nP --type=py -C3 'x-llm-router-caller'

Repository: genlayerlabs/unhardcoded

Length of output: 162


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== file locations ==\n'
git ls-files | rg 'auth_proxy\.py$|README|pyproject|requirements|setup\.py|package\.json|\.py$' | head -n 50

printf '\n== header usages ==\n'
rg -n --hidden --glob '!**/.git/**' 'x-llm-router-(caller|tenant)|x-internal-secret' .

printf '\n== auth_proxy.py around target lines ==\n'
python3 - <<'PY'
from pathlib import Path
p = Path('auth_proxy.py')
lines = p.read_text().splitlines()
for start in (3296, 3306, 3316, 3326):
    end = min(start+20, len(lines))
    print(f'\n-- {start}-{end} --')
    for i in range(start-1, end):
        print(f'{i+1}: {lines[i]}')
PY

Repository: genlayerlabs/unhardcoded

Length of output: 11616


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== shim.py caller/tenant sections ==\n'
python3 - <<'PY'
from pathlib import Path
p = Path('shim.py')
lines = p.read_text().splitlines()
for start in (80, 270, 836, 882, 890):
    end = min(start+30, len(lines))
    print(f'\n-- {start}-{end} --')
    for i in range(start-1, end):
        print(f'{i+1}: {lines[i]}')
PY

printf '\n== auth_proxy control-plane tests around header forwarding ==\n'
python3 - <<'PY'
from pathlib import Path
p = Path('tests/test_auth_proxy_control_plane.py')
lines = p.read_text().splitlines()
for start in (110, 120, 126):
    end = min(start+25, len(lines))
    print(f'\n-- {start}-{end} --')
    for i in range(start-1, end):
        print(f'{i+1}: {lines[i]}')
PY

Repository: genlayerlabs/unhardcoded

Length of output: 13243


🏁 Script executed:

#!/bin/bash
set -euo pipefail

python3 - <<'PY'
def try_probe():
    probes = []
    try:
        import httpx
        h = httpx.Headers({"X-Llm-Router-Caller": "client", "x-llm-router-caller": "server"})
        probes.append(("httpx", {
            "type": type(h).__name__,
            "items": list(h.items()),
            "raw": getattr(h, "raw", None),
            "get": h.get("x-llm-router-caller"),
        }))
    except Exception as e:
        probes.append(("httpx_error", repr(e)))

    try:
        from starlette.datastructures import Headers
        h = Headers(raw=[
            (b"X-Llm-Router-Caller", b"client"),
            (b"x-llm-router-caller", b"server"),
        ])
        probes.append(("starlette", {
            "type": type(h).__name__,
            "items": list(h.items()),
            "raw": getattr(h, "raw", None),
            "get": h.get("x-llm-router-caller"),
        }))
    except Exception as e:
        probes.append(("starlette_error", repr(e)))

    for name, data in probes:
        print(name, data)

try_probe()
PY

Repository: genlayerlabs/unhardcoded

Length of output: 561


Strip x-llm-router-caller from inbound headers too

x-llm-router-caller is consumed downstream for session ownership and request attribution, so a client-supplied mixed-case variant can survive the filter and get forwarded alongside the injected value. Exclude it before re-adding the trusted caller header.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@auth_proxy.py` around lines 3306 - 3316, The header sanitization in
auth_proxy.py is missing x-llm-router-caller, so a client can smuggle a
mixed-case version through and have it forwarded alongside the trusted value.
Update the inbound header filter in the request header-building block to strip
x-llm-router-caller before setting headers["x-llm-router-caller"] from the
authenticated caller value, keeping the trust boundary in the same place as the
existing x-llm-router-tenant handling.

Comment on lines +171 to +174
async def streaming_call(request, emit):
seen.append(cpc.env_get("OPENAI_API_KEY"))
emit({"delta": "hi"})
return _ok_result()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Await the streaming emitter with the expected delta type.

emit is async; calling it without await leaves the coroutine unexecuted, so this test does not exercise the streaming emission path.

Suggested fix
     async def streaming_call(request, emit):
         seen.append(cpc.env_get("OPENAI_API_KEY"))
-        emit({"delta": "hi"})
+        await emit("hi")
         return _ok_result()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def streaming_call(request, emit):
seen.append(cpc.env_get("OPENAI_API_KEY"))
emit({"delta": "hi"})
return _ok_result()
async def streaming_call(request, emit):
seen.append(cpc.env_get("OPENAI_API_KEY"))
await emit("hi")
return _ok_result()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_tenant_env_shim.py` around lines 171 - 174, The streaming test
helper in streaming_call is invoking the async emit emitter without awaiting it,
so the streaming path is never exercised. Update the streaming_call helper to
await emit with the expected delta payload shape, and keep the assertion on
cpc.env_get("OPENAI_API_KEY") in place so the test still verifies the tenant env
shim behavior.

@jmlago

jmlago commented Jul 4, 2026

Copy link
Copy Markdown
Member Author

Consumer side is up: genlayerlabs/unhardcoded-cloud#1 (pins this branch as the dataplane/ submodule). A live two-process E2E of the full loop (mint → resolve → proxy headers → ledger → /internal/usage → plan rate-limit 429 → revocation 401) is documented in that PR's comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant