feat: external control-plane protocol (key resolve, usage API, tenant BYO keys) by jmlago · Pull Request #72 · genlayerlabs/unhardcoded

jmlago · 2026-07-04T00:10:31Z

What

Optional "bring your own control plane" integration for the host — entirely off unless CONTROL_PLANE_URL + CONTROL_PLANE_INTERNAL_SECRET are set. Nothing SaaS-specific lands here: the host speaks a small generic HTTP contract any external control plane can implement (first consumer: unhardcoded-cloud).

Contract

Outbound: GET {CP}/internal/keys/resolve?sha256=… → {active, consumer, tenant_id, rate_per_min, burst}; GET {CP}/internal/tenants/<id>/provider-env → {env: {NAME: secret}}. Both carry x-internal-secret.
Inbound (ingress): GET /internal/usage?caller=…&since_ts=…[&bucket=day] and GET /internal/usage/recent?caller=…&limit=…, same shared secret, hidden (404) while unconfigured.

Pieces

control_plane_client.py (new leaf, shared by ingress + router): resolve cache (60s positive / 15s negative TTL, single-flight, stale-grace 300s served only while the CP is unreachable — a definitive answer always replaces the cache), tenant_env (allowlist-filtered, cached, fail-soft to platform keys), env_get (ContextVar tenant map → process env).
Ingress: _caller_auth_async — local key stores stay authoritative; only a pure local miss consults the CP. Resolved caller = tenant slug (usage attribution lands in calls.caller for free). Plan rate limits ride the resolve into the existing _rate_ok. Operator kill-switch: a local status=inactive record for the slug blocks it. x-llm-router-tenant/x-internal-secret stripped from client input; tenant header stamped only post-auth.
Router BYO keys: shim activates the tenant env from the trusted header; serve.py threads env_get into openai-compatible/anthropic/google adapters (bedrock/codex/antseed stay platform-only; source pollers never see tenant keys). Context propagates through streaming/flow create_task boundaries — covered by tests.
Perf side-effect: _rate_ok/_route_allowed now reuse the already-resolved meta → one fewer Postgres read per proxied request.

Known limits (documented)

Revocation latency ≤60s (≤TTL+grace during a CP outage).
BYO only overrides providers the platform has enabled (_sync_provider_auth_state reads the platform env).
cost_usd is still stamped on BYO calls (informational for the CP).

Tests

nix-shell --run 'python -m pytest tests -q' → 518 passed, 2 skipped (38 new: client cache/grace/single-flight, auth fallback + header hygiene + kill-switch + rate limits, /internal/* gating + shapes + catch-all regression, BYO isolation incl. concurrent tenants and streaming, store aggregates).

Part 1 of the SaaS integration; the Django side (unhardcoded-cloud) consumes this branch. Do not merge yet — jm reviews tomorrow morning.

Summary by CodeRabbit

New Features
- Added optional control-plane support for tenant-aware key resolution and provider credentials.
- Introduced internal usage endpoints for viewing caller-specific usage and recent calls.
- Enabled per-tenant routing and request-scoped credentials for supported providers.
Bug Fixes
- Improved access control and header filtering for proxy traffic.
- Added safer fallback behavior when control-plane lookups fail or are unavailable.
- Preserved local operator overrides and kill-switch behavior when resolving callers.

… BYO keys) Adds an optional "bring your own control plane" integration, entirely off unless CONTROL_PLANE_URL + CONTROL_PLANE_INTERNAL_SECRET are set: - control_plane_client.py (new leaf module): resolve_key with positive/ negative caching, single-flight, and stale-grace served only while the control plane is unreachable; tenant_env (allowlisted, cached, fail-soft to platform keys); env_get — a request-scoped credential chain (ContextVar tenant map, then process env). - Ingress: _caller_auth_async falls through to the control plane only on a pure local miss (local stores stay authoritative; explicit local status=inactive is an operator kill-switch for a tenant slug). Plan rate limits ride the resolve response into the existing _rate_ok meta. x-llm-router-tenant / x-internal-secret are stripped from client input; the tenant header is stamped only from the authenticated resolve. - internal_api.py (new): GET /internal/usage[?bucket=day] and /internal/usage/recent, gated by x-internal-secret (hidden 404 while unconfigured), backed by new host_store.usage_totals (includes cached tokens) and recent_calls(caller=...). - Router: per-tenant BYO provider credentials — the shim activates the tenant env from the trusted header; serve.py threads control_plane_client.env_get into the openai-compatible + anthropic + google adapters (bedrock/codex/antseed stay platform-only; background source pollers never see tenant keys). _rate_ok/_route_allowed now accept the already-resolved meta, dropping a duplicate consumer_keys read per proxied request.

coderabbitai · 2026-07-04T00:10:49Z

📝 Walkthrough

Walkthrough

This PR introduces an optional external "bring your own control plane" integration. A new control_plane_client module resolves API keys and tenant provider credentials against an external control plane with caching/grace-period fallback. auth_proxy.py adds async caller authentication with control-plane fallback, tightened header forwarding, and metadata-driven route/rate authorization. A new internal_api.py exposes secret-gated /internal/usage and /internal/usage/recent endpoints backed by new host_store.usage_totals()/updated recent_calls(). providers.py, serve.py, and shim.py thread per-tenant credential resolution (env_get) into provider adapters. Configuration and compose files add the new environment variables, and extensive tests cover all new behaviors.

Changes

Control plane integration

Layer / File(s)	Summary
Control plane client module `control_plane_client.py`, `tests/test_control_plane_client.py`	New module implementing `resolve_key`, `tenant_env`, caching with TTL/grace, secret validation, ContextVar-based tenant env override, and lifecycle helpers, with corresponding unit tests.
Async caller auth and authorization `auth_proxy.py`	Adds `_caller_auth_async()` with control-plane fallback, updates `_route_allowed()`/`_rate_ok()` to use auth metadata, tightens forwarded headers, adds `x-llm-router-tenant`, registers `internal_api.router`, and closes the client on shutdown.
Internal metering API `internal_api.py`, `host_store.py`, `tests/test_auth_proxy_control_plane.py`, `tests/test_host_store.py`	New secret-gated `/internal/usage` and `/internal/usage/recent` endpoints backed by new `host_store.usage_totals()` and updated `recent_calls(caller=...)`.
Tenant credential threading `providers.py`, `serve.py`, `shim.py`, `tests/test_tenant_env_shim.py`	Adds optional `env_get` to provider adapter factories/dispatchers and a shim `_activate_tenant` helper activating per-tenant credentials before request handling.
Configuration `.env.example`, `compose.yml`	Adds `CONTROL_PLANE_URL` and `CONTROL_PLANE_INTERNAL_SECRET` environment variables for router and ingress services.

Estimated code review effort: 4 (Complex) | ~75 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant AuthProxy
  participant ControlPlaneClient
  participant Upstream
  Client->>AuthProxy: Request with API key
  AuthProxy->>AuthProxy: Check local caller keys
  alt key unknown locally and CP enabled
    AuthProxy->>ControlPlaneClient: resolve_key(digest)
    ControlPlaneClient-->>AuthProxy: ResolvedKey (tenant, limits) or None
  end
  AuthProxy->>AuthProxy: _route_allowed(meta), _rate_ok(meta)
  AuthProxy->>Upstream: Forward request with x-llm-router-tenant header
  Upstream-->>AuthProxy: Response
  AuthProxy-->>Client: Response

sequenceDiagram
  participant Shim
  participant ControlPlaneClient
  participant ProviderAdapter
  Shim->>Shim: Read x-llm-router-tenant header
  Shim->>ControlPlaneClient: tenant_env(tenant_id)
  ControlPlaneClient-->>Shim: env mapping
  Shim->>ControlPlaneClient: activate_tenant_env(env)
  Shim->>ProviderAdapter: Handle request
  ProviderAdapter->>ControlPlaneClient: env_get("API_KEY")
  ControlPlaneClient-->>ProviderAdapter: Tenant or platform key

Possibly related PRs

genlayerlabs/unhardcoded#25: Both PRs touch the native provider adapter factories, threading an env_get parameter for request-scoped credential lookup.
genlayerlabs/unhardcoded#36: Both PRs modify host_store.py's recent_calls() signature/behavior for caller scoping.
genlayerlabs/unhardcoded#38: The main PR's usage_totals()/recent_calls() changes including tokens_cached build on this PR's calls table schema evolution.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: an external control-plane integration for key resolution, usage APIs, and tenant-scoped BYO credentials.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/control-plane-integration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

control_plane_client.py (2)
212-242: 🚀 Performance & Scalability | 🔵 Trivial | 💤 Low value

Optional: tenant_env lacks the single-flight coalescing that resolve_key has.

On a cold cache or TTL expiry, concurrent requests for the same tenant will each issue a /provider-env fetch (a small stampede against the control plane). It's fail-soft and short-lived, so this is minor, but mirroring the _resolve_inflight pattern here would bound the fan-out.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@control_plane_client.py` around lines 212 - 242, The tenant_env helper
currently allows multiple concurrent cache misses to trigger duplicate
/provider-env requests for the same tenant. Add single-flight coalescing similar
to the _resolve_inflight pattern used by resolve_key so only one fetch runs per
tenant_id while others await the same result; update tenant_env and any
supporting cache/inflight state accordingly.
105-113: 🚀 Performance & Scalability | 🔵 Trivial | 💤 Low value

Optional: full-cache clear can evict live positive entries.

When the cache is full of still-valid entries, _evict_if_full() clears everything, forcing a re-resolve stampede for active keys until the cache repopulates. Consider an LRU/oldest-first partial eviction to preserve hot positive entries.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@control_plane_client.py` around lines 105 - 113, The cache eviction logic in
_evict_if_full() is too aggressive because it clears the entire _resolve_cache
when the cache is still full after removing expired items, which can evict
still-valid hot entries. Update _evict_if_full() to evict only the oldest or
least-recently-used entries from _resolve_cache instead of calling clear(),
preserving live positive entries while making room for new ones. Use the
existing _resolve_cache, _RESOLVE_CACHE_MAX, and _entry_ttl(e) logic to identify
expired entries first, then perform partial eviction for the remaining
over-capacity entries.
tests/test_auth_proxy_control_plane.py (1)
232-242: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Consider mirroring the 404/403 gating tests for /internal/usage/recent.

Only /internal/usage has explicit tests for the unconfigured-secret (404) and wrong-secret (403) cases. Since both routes share _gate, this is low risk today, but a symmetric test would catch future regressions if the routes ever diverge.

Also applies to: 266-279
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_auth_proxy_control_plane.py` around lines 232 - 242, Add symmetric
gate coverage for the recent-usage endpoint by mirroring the existing
`test_internal_usage_hidden_without_secret` and
`test_internal_usage_wrong_secret_403` cases for `/internal/usage/recent`. Reuse
the same `TestClient(auth_proxy.app)`, `CONTROL_PLANE_INTERNAL_SECRET`
monkeypatch, and `x-internal-secret` header patterns to assert 404 when the
secret is unset and 403 when the secret is incorrect, so `_gate` behavior is
verified consistently across both routes.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@auth_proxy.py`:
- Around line 3306-3316: The header sanitization in auth_proxy.py is missing
x-llm-router-caller, so a client can smuggle a mixed-case version through and
have it forwarded alongside the trusted value. Update the inbound header filter
in the request header-building block to strip x-llm-router-caller before setting
headers["x-llm-router-caller"] from the authenticated caller value, keeping the
trust boundary in the same place as the existing x-llm-router-tenant handling.

In `@tests/test_tenant_env_shim.py`:
- Around line 171-174: The streaming test helper in streaming_call is invoking
the async emit emitter without awaiting it, so the streaming path is never
exercised. Update the streaming_call helper to await emit with the expected
delta payload shape, and keep the assertion on cpc.env_get("OPENAI_API_KEY") in
place so the test still verifies the tenant env shim behavior.

---

Nitpick comments:
In `@control_plane_client.py`:
- Around line 212-242: The tenant_env helper currently allows multiple
concurrent cache misses to trigger duplicate /provider-env requests for the same
tenant. Add single-flight coalescing similar to the _resolve_inflight pattern
used by resolve_key so only one fetch runs per tenant_id while others await the
same result; update tenant_env and any supporting cache/inflight state
accordingly.
- Around line 105-113: The cache eviction logic in _evict_if_full() is too
aggressive because it clears the entire _resolve_cache when the cache is still
full after removing expired items, which can evict still-valid hot entries.
Update _evict_if_full() to evict only the oldest or least-recently-used entries
from _resolve_cache instead of calling clear(), preserving live positive entries
while making room for new ones. Use the existing _resolve_cache,
_RESOLVE_CACHE_MAX, and _entry_ttl(e) logic to identify expired entries first,
then perform partial eviction for the remaining over-capacity entries.

In `@tests/test_auth_proxy_control_plane.py`:
- Around line 232-242: Add symmetric gate coverage for the recent-usage endpoint
by mirroring the existing `test_internal_usage_hidden_without_secret` and
`test_internal_usage_wrong_secret_403` cases for `/internal/usage/recent`. Reuse
the same `TestClient(auth_proxy.app)`, `CONTROL_PLANE_INTERNAL_SECRET`
monkeypatch, and `x-internal-secret` header patterns to assert 404 when the
secret is unset and 403 when the secret is incorrect, so `_gate` behavior is
verified consistently across both routes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 66fb9615-9a00-4c13-8540-91e4b2db8c4c

📥 Commits

Reviewing files that changed from the base of the PR and between 23ffc37 and 5bbc419.

📒 Files selected for processing (13)

.env.example
auth_proxy.py
compose.yml
control_plane_client.py
host_store.py
internal_api.py
providers.py
serve.py
shim.py
tests/test_auth_proxy_control_plane.py
tests/test_control_plane_client.py
tests/test_host_store.py
tests/test_tenant_env_shim.py

coderabbitai · 2026-07-04T00:17:32Z

    headers = {
        k: v for k, v in request.headers.items()
-        if k.lower() not in {"authorization", "host", "connection", "content-length"}
+        if k.lower() not in {"authorization", "host", "connection", "content-length",
+                             "x-llm-router-tenant", "x-internal-secret"}
    }
    headers["x-llm-router-caller"] = caller
+    # Tenant identity for per-tenant provider credentials (BYO keys): set ONLY
+    # from the authenticated resolve — the client-sent header is stripped above,
+    # so it can never be smuggled past auth.
+    if auth.get("tenant_id") is not None:
+        headers["x-llm-router-tenant"] = str(auth["tenant_id"])


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Confirm upstream trusts x-llm-router-caller and whether it's already stripped elsewhere. rg -nP --type=py -C3 'x-llm-router-caller'

Repository: genlayerlabs/unhardcoded

Length of output: 162

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '\n== file locations ==\n' git ls-files | rg 'auth_proxy\.py$|README|pyproject|requirements|setup\.py|package\.json|\.py$' | head -n 50 printf '\n== header usages ==\n' rg -n --hidden --glob '!**/.git/**' 'x-llm-router-(caller|tenant)|x-internal-secret' . printf '\n== auth_proxy.py around target lines ==\n' python3 - <<'PY' from pathlib import Path p = Path('auth_proxy.py') lines = p.read_text().splitlines() for start in (3296, 3306, 3316, 3326): end = min(start+20, len(lines)) print(f'\n-- {start}-{end} --') for i in range(start-1, end): print(f'{i+1}: {lines[i]}') PY

Repository: genlayerlabs/unhardcoded

Length of output: 11616

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '\n== shim.py caller/tenant sections ==\n' python3 - <<'PY' from pathlib import Path p = Path('shim.py') lines = p.read_text().splitlines() for start in (80, 270, 836, 882, 890): end = min(start+30, len(lines)) print(f'\n-- {start}-{end} --') for i in range(start-1, end): print(f'{i+1}: {lines[i]}') PY printf '\n== auth_proxy control-plane tests around header forwarding ==\n' python3 - <<'PY' from pathlib import Path p = Path('tests/test_auth_proxy_control_plane.py') lines = p.read_text().splitlines() for start in (110, 120, 126): end = min(start+25, len(lines)) print(f'\n-- {start}-{end} --') for i in range(start-1, end): print(f'{i+1}: {lines[i]}') PY

Repository: genlayerlabs/unhardcoded

Length of output: 13243

🏁 Script executed:

#!/bin/bash set -euo pipefail python3 - <<'PY' def try_probe(): probes = [] try: import httpx h = httpx.Headers({"X-Llm-Router-Caller": "client", "x-llm-router-caller": "server"}) probes.append(("httpx", { "type": type(h).__name__, "items": list(h.items()), "raw": getattr(h, "raw", None), "get": h.get("x-llm-router-caller"), })) except Exception as e: probes.append(("httpx_error", repr(e))) try: from starlette.datastructures import Headers h = Headers(raw=[ (b"X-Llm-Router-Caller", b"client"), (b"x-llm-router-caller", b"server"), ]) probes.append(("starlette", { "type": type(h).__name__, "items": list(h.items()), "raw": getattr(h, "raw", None), "get": h.get("x-llm-router-caller"), })) except Exception as e: probes.append(("starlette_error", repr(e))) for name, data in probes: print(name, data) try_probe() PY

Repository: genlayerlabs/unhardcoded

Length of output: 561

Strip x-llm-router-caller from inbound headers too

x-llm-router-caller is consumed downstream for session ownership and request attribution, so a client-supplied mixed-case variant can survive the filter and get forwarded alongside the injected value. Exclude it before re-adding the trusted caller header.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@auth_proxy.py` around lines 3306 - 3316, The header sanitization in auth_proxy.py is missing x-llm-router-caller, so a client can smuggle a mixed-case version through and have it forwarded alongside the trusted value. Update the inbound header filter in the request header-building block to strip x-llm-router-caller before setting headers["x-llm-router-caller"] from the authenticated caller value, keeping the trust boundary in the same place as the existing x-llm-router-tenant handling.

coderabbitai · 2026-07-04T00:17:32Z

+    async def streaming_call(request, emit):
+        seen.append(cpc.env_get("OPENAI_API_KEY"))
+        emit({"delta": "hi"})
+        return _ok_result()


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Await the streaming emitter with the expected delta type.

emit is async; calling it without await leaves the coroutine unexecuted, so this test does not exercise the streaming emission path.

Suggested fix

async def streaming_call(request, emit): seen.append(cpc.env_get("OPENAI_API_KEY")) - emit({"delta": "hi"}) + await emit("hi") return _ok_result()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async def streaming_call(request, emit):

seen.append(cpc.env_get("OPENAI_API_KEY"))

emit({"delta": "hi"})

return _ok_result()

async def streaming_call(request, emit):

seen.append(cpc.env_get("OPENAI_API_KEY"))

await emit("hi")

return _ok_result()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_tenant_env_shim.py` around lines 171 - 174, The streaming test helper in streaming_call is invoking the async emit emitter without awaiting it, so the streaming path is never exercised. Update the streaming_call helper to await emit with the expected delta payload shape, and keep the assertion on cpc.env_get("OPENAI_API_KEY") in place so the test still verifies the tenant env shim behavior.

jmlago · 2026-07-04T00:24:22Z

Consumer side is up: genlayerlabs/unhardcoded-cloud#1 (pins this branch as the dataplane/ submodule). A live two-process E2E of the full loop (mint → resolve → proxy headers → ledger → /internal/usage → plan rate-limit 429 → revocation 401) is documented in that PR's comments.

coderabbitai Bot reviewed Jul 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: external control-plane protocol (key resolve, usage API, tenant BYO keys)#72

feat: external control-plane protocol (key resolve, usage API, tenant BYO keys)#72
jmlago wants to merge 1 commit into
mainfrom
feat/control-plane-integration

jmlago commented Jul 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 4, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jul 4, 2026

Uh oh!

coderabbitai Bot Jul 4, 2026

Uh oh!

jmlago commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jmlago commented Jul 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Contract

Pieces

Known limits (documented)

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

jmlago commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jmlago commented Jul 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 4, 2026 •

edited

Loading