Implement CORE-01: Per-Tenant Routing Policies#9
Conversation
Reviewer's GuideImplements per-tenant routing preferences that flow from Supabase user records through auth middleware into the routing engine and policy/scoring logic, plus adds tenant settings APIs and a new routing_preference column in the users table. Sequence diagram for per-tenant routing preference in chat completionssequenceDiagram
actor Tenant
participant ClientApp
participant APIService
participant AuthMiddleware
participant Supabase
participant RoutingEngine
participant PolicyEngine
participant Scorer
Tenant->>ClientApp: Send chat request with API key
ClientApp->>APIService: POST /v1/chat/completions
APIService->>AuthMiddleware: Incoming request
AuthMiddleware->>Supabase: Select api_keys and users(tier, routing_preference)
Supabase-->>AuthMiddleware: user_id, tier, routing_preference
AuthMiddleware->>APIService: Set request.state.user_id, tier, routing_preference
APIService->>RoutingEngine: route(request, user_id, tier, routing_preference)
RoutingEngine->>RoutingEngine: _prepare_context(..., routing_preference)
RoutingEngine->>PolicyEngine: apply(context with tenant.routing_preference)
PolicyEngine->>PolicyEngine: Set directive.routing_preference = tenant_pref
PolicyEngine-->>RoutingEngine: ordered_providers, directive
RoutingEngine->>Scorer: compute_expected_utility(..., directive)
Scorer->>Scorer: Adjust weights based on directive.routing_preference
Scorer-->>RoutingEngine: scores per provider
RoutingEngine-->>APIService: ChatCompletionResponse
APIService-->>ClientApp: Response to chat request
ER diagram for users table with routing_preferenceerDiagram
USERS {
uuid id PK
text email
text tier
text routing_preference
timestamptz created_at
}
Class diagram for routing context and tenant settings modelsclassDiagram
class TenantSettingsRequest {
+str routing_preference
}
class TenantSettingsResponse {
+bool success
+str routing_preference
}
class RequestContext {
+ChatCompletionRequest request
+WorkloadProfile workload_profile
+ContextBundle compression
+str user_id
+int total_tokens
+float schema_success_ratio
+str tenant_tier
+str routing_preference
+RoutingDirective policy_directive
+dict policy_context()
}
class RoutingDirective {
+list~str~ prefer
+list~str~ exclude
+str require_hedging
+bool human_gate
+float policy_weight
+str routing_preference
}
class RoutingPolicy {
+list~RoutingRule~ rules
+tuple~list~str~~ RoutingDirective~ apply(context, available_providers)
}
class RoutingEngine {
+RequestContext _prepare_context(request, user_id, tier, routing_preference)
+ChatCompletionResponse route(request, user_id, tier, routing_preference)
+Any route_stream(request, user_id, tier, routing_preference)
}
class AuthMiddleware {
+str api_key
+dispatch(request, call_next)
-_verify_token_supabase(token_hash) dict
}
RoutingEngine --> RequestContext : creates
RoutingEngine --> RoutingPolicy : uses
RoutingPolicy --> RoutingDirective : returns
RoutingDirective <-- RequestContext : policy_directive
AuthMiddleware --> TenantSettingsRequest : validates routing_preference via API
AuthMiddleware --> TenantSettingsResponse : returns responses
TenantSettingsRequest <.. TenantSettingsResponse : API models
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The set of allowed routing_preference values is hardcoded in multiple places (Pydantic model comment, validation in the settings endpoint, default values in policy/routing/auth); consider centralizing these as a shared enum/constant to avoid drift and typos.
- The tenant settings endpoints are annotated with response_model=TenantSettingsResponse but sometimes return JSONResponse with error payloads; it may be clearer to use HTTPException or adjust response_model/return types so success and error responses are consistent with FastAPI’s typing.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The set of allowed routing_preference values is hardcoded in multiple places (Pydantic model comment, validation in the settings endpoint, default values in policy/routing/auth); consider centralizing these as a shared enum/constant to avoid drift and typos.
- The tenant settings endpoints are annotated with response_model=TenantSettingsResponse but sometimes return JSONResponse with error payloads; it may be clearer to use HTTPException or adjust response_model/return types so success and error responses are consistent with FastAPI’s typing.
## Individual Comments
### Comment 1
<location path="freerelay/core/routing/policy.py" line_range="87-88" />
<code_context>
require_hedging=self.require_hedging,
human_gate=self.human_gate,
policy_weight=self.policy_weight,
+ routing_preference="balanced", # Default, can be overridden by rule if we want
)
</code_context>
<issue_to_address>
**issue (bug_risk):** Rule-level routing preference is effectively ignored/overridden by tenant preference in `apply`.
Because `apply` always overwrites `directive.routing_preference` with `tenant_pref`, any rule-specific routing preference is effectively ignored. If rule-level overrides are intended, only apply `tenant_pref` when the directive is still at its default (e.g., `"balanced"`). If tenant preference should always win, update the comment/default on `RoutingRule.directive` to match that behavior.
</issue_to_address>
### Comment 2
<location path="freerelay/core/routing/scorer.py" line_range="118-120" />
<code_context>
- return success_prob * quality * schema * latency * cost * safety * policy_weight
+ # Adjust weights based on routing preference
+ pref = directive.routing_preference if directive else "balanced"
+ if pref == "cost-optimized":
+ # Square cost to make it more dominant, root latency and quality
+ cost = cost**1.5
+ latency = latency**0.5
+ quality = quality**0.5
</code_context>
<issue_to_address>
**nitpick:** The cost-optimized comment doesn't match the actual exponent used.
The inline description and the actual exponent don’t match: the comment says “square cost” but the code uses `cost**1.5`. Please update either the comment or the exponent so the intended weighting is clear to future readers.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| policy_weight=self.policy_weight, | ||
| routing_preference="balanced", # Default, can be overridden by rule if we want |
There was a problem hiding this comment.
issue (bug_risk): Rule-level routing preference is effectively ignored/overridden by tenant preference in apply.
Because apply always overwrites directive.routing_preference with tenant_pref, any rule-specific routing preference is effectively ignored. If rule-level overrides are intended, only apply tenant_pref when the directive is still at its default (e.g., "balanced"). If tenant preference should always win, update the comment/default on RoutingRule.directive to match that behavior.
| if pref == "cost-optimized": | ||
| # Square cost to make it more dominant, root latency and quality | ||
| cost = cost**1.5 |
There was a problem hiding this comment.
nitpick: The cost-optimized comment doesn't match the actual exponent used.
The inline description and the actual exponent don’t match: the comment says “square cost” but the code uses cost**1.5. Please update either the comment or the exponent so the intended weighting is clear to future readers.
| id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), | ||
| email TEXT UNIQUE NOT NULL, | ||
| tier TEXT NOT NULL DEFAULT 'free', -- 'free', 'bronze', 'silver', 'gold' | ||
| routing_preference TEXT NOT NULL DEFAULT 'balanced', -- 'cost-optimized', 'balanced', 'performance-first' |
There was a problem hiding this comment.
[🟠 High] [🔵 Bug]
This adds routing_preference only inside the bootstrap CREATE TABLE IF NOT EXISTS users block, which means existing Supabase deployments will never receive the new column because PostgreSQL skips the whole statement once users already exists. Verified against @README.md:462, which instructs operators to run supabase_schema.sql, and against @freerelay/middleware/auth.py:36-50 and @freerelay/main.py:322-361, which now immediately read and write users.routing_preference; on an upgraded install those queries will fail due to the missing column. Add an explicit ALTER TABLE users ADD COLUMN IF NOT EXISTS routing_preference TEXT NOT NULL DEFAULT 'balanced' migration/backfill step before shipping the new readers. ```sql
-- supabase_schema.sql
CREATE TABLE IF NOT EXISTS users (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
email TEXT UNIQUE NOT NULL,
tier TEXT NOT NULL DEFAULT 'free',
routing_preference TEXT NOT NULL DEFAULT 'balanced',
| tier = str(users_data.get("tier", "free")) | ||
| return {"user_id": user_id, "tier": tier} | ||
| routing_preference = str(users_data.get("routing_preference", "balanced")) | ||
| return {"user_id": user_id, "tier": tier, "routing_preference": routing_preference} |
There was a problem hiding this comment.
[🟠 High] [🔵 Bug]
_verify_token_supabase is wrapped in @lru_cache, and this PR adds routing_preference to the cached payload without adding any invalidation when /v1/tenant/settings updates users.routing_preference. After a tenant changes their preference, subsequent requests with the same API key will keep reusing the old cached value until process restart or cache eviction, so the new routing policy never takes effect even though the settings API returns success. Fix by not caching mutable tenant settings here, or by clearing/refreshing the cache when settings are updated.
# freerelay/middleware/auth.py
if isinstance(users_data, dict):
tier = str(users_data.get("tier", "free"))
routing_preference = str(users_data.get("routing_preference", "balanced"))
return {"user_id": user_id, "tier": tier, "routing_preference": routing_preference}|
|
||
| try: | ||
| supabase = get_supabase_admin_client() | ||
| supabase.table("users").update( |
There was a problem hiding this comment.
[🟡 Medium] [🔵 Bug]
POST /v1/tenant/settings unconditionally returns success after the update call, and the paired GET path turns an empty result into 200 {success:false} instead of surfacing a missing tenant. This is reachable because authentication is cached in AuthMiddleware, so a token can remain authorized after its backing tenant row has been removed or the data becomes inconsistent; in that state the update matches zero rows, but the client still gets a success response even though nothing was persisted. Check the Supabase result and return 404/401 when no row matched before reporting success.
# freerelay/main.py
supabase.table("users").update(
{"routing_preference": settings_req.routing_preference}
).eq("id", user_id).execute()
return TenantSettingsResponse(
success=True,
This PR implements per-tenant routing policies by adding a routing_preference column to the database and updating the routing engine to respect these preferences during the scoring phase. It also adds API endpoints to get and update these settings.
Summary by Sourcery
Add support for per-tenant routing preferences and propagate them through authentication, routing policies, and scoring to influence provider selection.
New Features:
Enhancements: