feat(service): add external model API support for inference service#1183
feat(service): add external model API support for inference service#1183
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces an "External Model Mode" to the inference service, enabling the routing of inference requests to external OpenAI-compatible providers. The implementation involves updates across the controller, gateway, router, and data proxy to support external model registration, request forwarding, and interaction caching. Feedback highlights an opportunity to improve security by removing a fallback to the admin API key in external mode and suggests a more robust design for interaction caches to avoid inheritance issues.
| ] | ||
|
|
||
|
|
||
| class ExternalSessionData(SessionData): |
There was a problem hiding this comment.
The implementation of ExternalSessionData violates the Liskov Substitution Principle. It inherits from SessionData but replaces _active_completions (which is an InteractionCache in the base class) with an ExternalInteractionCache, which has a different interface. This is why type: ignore and cast are needed, which is a code smell and makes the code less maintainable and more fragile to changes in the base class.
A better design would be to define a common interface for caches using a Protocol and have both InteractionCache and ExternalInteractionCache implement it. This would make the class hierarchy more robust and type-safe.
For example:
from typing import Protocol
class TrajectoryCache(Protocol):
def add(self, ...) -> None: ...
def set_reward(self, ...) -> None: ...
def export(self, ...) -> Any: ...
# ... other common methods
class InteractionCache(TrajectoryCache):
...
class ExternalInteractionCache(TrajectoryCache):
...Then SessionData and ExternalSessionData could use a TrajectoryCache without breaking type contracts. This could be addressed in a follow-up, but it's worth noting for future maintainability.
There was a problem hiding this comment.
This should be resolved after, when moving inference service out of experimental, since it requires changes in areal/experimental/openai/cache.py to unify string and token-based interaction and cache.
garrett4wade
left a comment
There was a problem hiding this comment.
In general, we don't require "external" everything to support a real external model.
And the validation of your implementation should be very simple - you use the inference controller to set up an internal model as the service, and set up another inference service that connects the former service as an external model. The code change of your example script should be minimal.
|
|
||
| # -- External model API ------------------------------------------------ | ||
| external_api_url: str | None = None | ||
| external_api_key: str | None = None | ||
| external_api_model: str | None = None | ||
| external_model_name: str = "ext-model" |
There was a problem hiding this comment.
We'd better not differentiate internal and external configurations. The external model should have a different name than the internal models (nobody will name its local model as gpt-5.2). We can use the same set of endpoints as the "internal" model - no new endpoints should be added. As long as an api_url is provided, we know that it is an external model.
| if cfg.external_api_key is None: | ||
| raise ValueError( | ||
| "external_api_key must be set when using external model mode. " | ||
| "Without it, the internal admin API key would be leaked to the " | ||
| "external provider." | ||
| ) |
There was a problem hiding this comment.
Wierd error message.
| resp = requests.post( | ||
| f"{self._gateway_addr}/register_model", | ||
| json={ | ||
| "name": cfg.external_model_name, | ||
| "url": cfg.external_api_url, | ||
| "model": cfg.external_api_model, | ||
| }, | ||
| headers={"Authorization": f"Bearer {cfg.openai.admin_api_key}"}, | ||
| timeout=cfg.request_timeout, | ||
| ) |
There was a problem hiding this comment.
You call /register_model rather than /register_external_model. Then why do you need the latter endpoint?
| @dataclass | ||
| class ExternalInteractionEntry: | ||
| interaction_id: str | ||
| request: str | ||
| response: str | ||
| reward: float | None = None | ||
|
|
||
|
|
||
| class ExternalInteractionCache: |
There was a problem hiding this comment.
You don't have to add the "external" variant of everything. The major difference is that we are storing strings rather than token IDs, or in other words, we can't access token IDs. Therefore, the "external" interactions can just set token_ids to None. Adding the "external" session cache is too verbose.
| @app.post("/external/chat/completions") | ||
| async def external_chat_completions(request: Request): |
| except (json.JSONDecodeError, AttributeError): | ||
| pass | ||
|
|
||
| if model_name is not None: |
There was a problem hiding this comment.
We should assume that model_name is always non-None. We should use "model_name" to route over both internal and external models.
Description
Add support for routing chat completions to external OpenAI-compatible APIs through the unified gateway/router/data-proxy stack. This enables interaction caching, reward assignment, and trajectory export for models from external providers.
Details of Important API Changes
Registering an external model
Before sending chat requests to an external provider you must register it with the gateway. Registration tells the router which upstream URL to forward to and which provider-side model name to use:
namemodelfield of subsequent requests.urlhttps://api.openai.com/v1).modelmodelfield is stripped from forwarded requests, letting the provider use its default.The gateway forwards the registration through the router (which picks a healthy data proxy worker) and then to the data proxy itself. If the data proxy registration fails the router entry is automatically rolled back.
You can list registered models with
GET /modelsand remove one withPOST /remove_model(both require the admin key).Using the
modelfield in requestsOnce a model is registered, the
modelfield in request bodies acts as the routing key that connects chat completions, reward assignment (which follows OpenRouter request standard), and trajectory export to the same external provider session.POST /chat/completions(or/v1/chat/completions)Include
"model": "<name>"in the request body. The gateway checks whether<name>matches a registered external model. If it does, the request is proxied to the external API; otherwise it falls through to the normal session-based routing for local inference. Both streaming and non-streaming modes are supported. TheAuthorizationheader you provide is forwarded as-is to the external provider.POST /rl/set_rewardInclude
"model": "<name>"to attach a reward to the most recent interaction of that external model. Whenmodelis present the gateway routes by model name instead of the bearer token, so no session API key is needed.POST /export_trajectoriesUse
"model": "<name>"as a shorthand for"session_id". The gateway translatesmodelinto the corresponding session ID and routes the export request to the correct data proxy:The response includes
"external_api": trueand the raw request/response pairs (no token log-probs, since the external provider does not expose them).Related Issue
Type of Change
Checklist
pre-commit run --all-files)./docs/build_all.sh)main/review-prcommand/create-prBreaking Change Details (if applicable):
N/A
Additional Context
Key changes:
ExternalSessionDatafor interaction caching and trajectory exportexternal_mode: skip inference server launch, validate configexternal_api_keyto prevent admin key leakageNoneinf_bridgein external modeworkflow=Noneandgroup_size=1/v1/chat/completionsand/v1/modelsOpenAI-compatible aliases--external-urlflagsFiles changed (16):
areal/experimental/inference_service/controller/config.py— external model config fieldsareal/experimental/inference_service/controller/controller.py— external mode logic, model registrationareal/experimental/inference_service/controller/workflow.py— external API trajectory passthroughareal/experimental/inference_service/data_proxy/app.py— external model endpoints, proxy forwardingareal/experimental/inference_service/data_proxy/session.py—ExternalSessionData, interaction cacheareal/experimental/inference_service/gateway/app.py— model-based routing, /v1 aliasesareal/experimental/inference_service/gateway/streaming.py— router helpers for external modelsareal/experimental/inference_service/router/app.py— model registry endpointsareal/experimental/inference_service/router/state.py—ExternalModelRegistryexamples/experimental/inference_service/README.md— updated docsexamples/experimental/inference_service/human_in_the_loop_demo.py—--external-urlsupportexamples/experimental/inference_service/online_rollout.py—--external-urlsupporttests/experimental/inference_service/test_controller.py— updated controller teststests/experimental/inference_service/test_data_proxy_rtensor.py— updated proxy teststests/experimental/inference_service/test_external_model.py— new unit tests (755 lines)tests/experimental/inference_service/test_external_model_integration.py— new integration tests (402 lines)