feat(service): add external model API support for inference service by nuzant · Pull Request #1183 · inclusionAI/AReaL

nuzant · 2026-04-14T08:51:08Z

Description

Add support for routing chat completions to external OpenAI-compatible APIs through the unified gateway/router/data-proxy stack. This enables interaction caching, reward assignment, and trajectory export for models from external providers.

Details of Important API Changes

Registering an external model

Before sending chat requests to an external provider you must register it with the gateway. Registration tells the router which upstream URL to forward to and which provider-side model name to use:

curl -X POST http://127.0.0.1:<PORT>/register_model \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer <ADMIN_API_KEY>" \
    -d '{
      "name": "gpt-4o",
      "url": "https://api.openai.com/v1",
      "model": "gpt-4o-2024-08-06"
    }'

Field	Required	Description
`name`	Yes	A local alias used to route requests. This is the value you put in the `model` field of subsequent requests.
`url`	Yes	Base URL of the external OpenAI-compatible API (e.g. `https://api.openai.com/v1`).
`model`	No	The model name sent to the external provider. If omitted, the `model` field is stripped from forwarded requests, letting the provider use its default.

The gateway forwards the registration through the router (which picks a healthy data proxy worker) and then to the data proxy itself. If the data proxy registration fails the router entry is automatically rolled back.

You can list registered models with GET /models and remove one with POST /remove_model (both require the admin key).

Using the `model` field in requests

Once a model is registered, the model field in request bodies acts as the routing key that connects chat completions, reward assignment (which follows OpenRouter request standard), and trajectory export to the same external provider session.

POST /chat/completions (or /v1/chat/completions)

Include "model": "<name>" in the request body. The gateway checks whether <name> matches a registered external model. If it does, the request is proxied to the external API; otherwise it falls through to the normal session-based routing for local inference. Both streaming and non-streaming modes are supported. The Authorization header you provide is forwarded as-is to the external provider.

curl -X POST http://127.0.0.1:<PORT>/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer <EXTERNAL_API_KEY>" \
    -d '{
      "model": "gpt-4o",
      "messages": [{"role": "user", "content": "Hello!"}]
    }'

POST /rl/set_reward

Include "model": "<name>" to attach a reward to the most recent interaction of that external model. When model is present the gateway routes by model name instead of the bearer token, so no session API key is needed.

curl -X POST http://127.0.0.1:<PORT>/rl/set_reward \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer <ADMIN_API_KEY>" \
    -d '{"model": "gpt-4o", "reward": 1.0}'

POST /export_trajectories

Use "model": "<name>" as a shorthand for "session_id". The gateway translates model into the corresponding session ID and routes the export request to the correct data proxy:

curl -X POST http://127.0.0.1:<PORT>/export_trajectories \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer <ADMIN_API_KEY>" \
    -d '{"model": "gpt-4o"}'

The response includes "external_api": true and the raw request/response pairs (no token log-probs, since the external provider does not expose them).

Related Issue

Type of Change

Checklist

I have read the Contributing Guide
Pre-commit hooks pass (pre-commit run --all-files)
Relevant tests pass; new tests added for new functionality
Documentation updated (if applicable; built with ./docs/build_all.sh)
Branch is up to date with main
Self-reviewed via /review-pr command
This PR was created by a coding agent via /create-pr
This PR is a breaking change

Breaking Change Details (if applicable):

N/A

Additional Context

Key changes:

External model registration flow: gateway -> router -> data proxy
Proxy forwarding for streaming and non-streaming chat completions
ExternalSessionData for interaction caching and trajectory export
Controller external_mode: skip inference server launch, validate config
Fail-fast on missing external_api_key to prevent admin key leakage
Rollback router entry when data proxy registration fails
Guard pause/continue against None inf_bridge in external mode
Only cache successful external API responses
Assert external mode requires workflow=None and group_size=1
/v1/chat/completions and /v1/models OpenAI-compatible aliases
HITL demo and online_rollout support for --external-url flags
Comprehensive unit and integration tests (1150+ lines)

Files changed (16):

areal/experimental/inference_service/controller/config.py — external model config fields
areal/experimental/inference_service/controller/controller.py — external mode logic, model registration
areal/experimental/inference_service/controller/workflow.py — external API trajectory passthrough
areal/experimental/inference_service/data_proxy/app.py — external model endpoints, proxy forwarding
areal/experimental/inference_service/data_proxy/session.py — ExternalSessionData, interaction cache
areal/experimental/inference_service/gateway/app.py — model-based routing, /v1 aliases
areal/experimental/inference_service/gateway/streaming.py — router helpers for external models
areal/experimental/inference_service/router/app.py — model registry endpoints
areal/experimental/inference_service/router/state.py — ExternalModelRegistry
examples/experimental/inference_service/README.md — updated docs
examples/experimental/inference_service/human_in_the_loop_demo.py — --external-url support
examples/experimental/inference_service/online_rollout.py — --external-url support
tests/experimental/inference_service/test_controller.py — updated controller tests
tests/experimental/inference_service/test_data_proxy_rtensor.py — updated proxy tests
tests/experimental/inference_service/test_external_model.py — new unit tests (755 lines)
tests/experimental/inference_service/test_external_model_integration.py — new integration tests (402 lines)

gemini-code-assist

Code Review

This pull request introduces an "External Model Mode" to the inference service, enabling the routing of inference requests to external OpenAI-compatible providers. The implementation involves updates across the controller, gateway, router, and data proxy to support external model registration, request forwarding, and interaction caching. Feedback highlights an opportunity to improve security by removing a fallback to the admin API key in external mode and suggests a more robust design for interaction caches to avoid inheritance issues.

gemini-code-assist · 2026-04-14T08:54:53Z

+            ]
+
+
+class ExternalSessionData(SessionData):


The implementation of ExternalSessionData violates the Liskov Substitution Principle. It inherits from SessionData but replaces _active_completions (which is an InteractionCache in the base class) with an ExternalInteractionCache, which has a different interface. This is why type: ignore and cast are needed, which is a code smell and makes the code less maintainable and more fragile to changes in the base class.

A better design would be to define a common interface for caches using a Protocol and have both InteractionCache and ExternalInteractionCache implement it. This would make the class hierarchy more robust and type-safe.

For example:

from typing import Protocol class TrajectoryCache(Protocol): def add(self, ...) -> None: ... def set_reward(self, ...) -> None: ... def export(self, ...) -> Any: ... # ... other common methods class InteractionCache(TrajectoryCache): ... class ExternalInteractionCache(TrajectoryCache): ...

Then SessionData and ExternalSessionData could use a TrajectoryCache without breaking type contracts. This could be addressed in a follow-up, but it's worth noting for future maintainability.

This should be resolved after, when moving inference service out of experimental, since it requires changes in areal/experimental/openai/cache.py to unify string and token-based interaction and cache.

garrett4wade

In general, we don't require "external" everything to support a real external model.

And the validation of your implementation should be very simple - you use the inference controller to set up an internal model as the service, and set up another inference service that connects the former service as an external model. The code change of your example script should be minimal.

garrett4wade · 2026-04-15T02:42:39Z

+
+    # -- External model API ------------------------------------------------
+    external_api_url: str | None = None
+    external_api_key: str | None = None
+    external_api_model: str | None = None
+    external_model_name: str = "ext-model"


We'd better not differentiate internal and external configurations. The external model should have a different name than the internal models (nobody will name its local model as gpt-5.2). We can use the same set of endpoints as the "internal" model - no new endpoints should be added. As long as an api_url is provided, we know that it is an external model.

garrett4wade · 2026-04-15T02:44:15Z

+        if cfg.external_api_key is None:
+            raise ValueError(
+                "external_api_key must be set when using external model mode. "
+                "Without it, the internal admin API key would be leaked to the "
+                "external provider."
+            )


Wierd error message.

garrett4wade · 2026-04-15T02:45:27Z

+        resp = requests.post(
+            f"{self._gateway_addr}/register_model",
+            json={
+                "name": cfg.external_model_name,
+                "url": cfg.external_api_url,
+                "model": cfg.external_api_model,
+            },
+            headers={"Authorization": f"Bearer {cfg.openai.admin_api_key}"},
+            timeout=cfg.request_timeout,
+        )


You call /register_model rather than /register_external_model. Then why do you need the latter endpoint?

garrett4wade · 2026-04-15T02:49:59Z

+@dataclass
+class ExternalInteractionEntry:
+    interaction_id: str
+    request: str
+    response: str
+    reward: float | None = None
+
+
+class ExternalInteractionCache:


You don't have to add the "external" variant of everything. The major difference is that we are storing strings rather than token IDs, or in other words, we can't access token IDs. Therefore, the "external" interactions can just set token_ids to None. Adding the "external" session cache is too verbose.

garrett4wade · 2026-04-15T02:52:25Z

+    @app.post("/external/chat/completions")
+    async def external_chat_completions(request: Request):


garrett4wade · 2026-04-15T02:55:54Z

+        except (json.JSONDecodeError, AttributeError):
+            pass
+
+        if model_name is not None:


We should assume that model_name is always non-None. We should use "model_name" to route over both internal and external models.

nuzant requested review from garrett4wade and rchardx as code owners April 14, 2026 08:51

nuzant changed the title ~~feat(service): add external model API with safety guards~~ feat(service): add external model API support for inference serivce Apr 14, 2026

nuzant changed the title ~~feat(service): add external model API support for inference serivce~~ feat(service): add external model API support for inference service Apr 14, 2026

gemini-code-assist bot reviewed Apr 14, 2026

View reviewed changes

nuzant added the safe-to-test Ready to run unit-tests in a PR. label Apr 14, 2026

nuzant temporarily deployed to AReaL-unittests April 14, 2026 09:17 — with GitHub Actions Inactive

garrett4wade reviewed Apr 15, 2026

View reviewed changes

feat(service): add external model API support for inference service

2a6003a

nuzant force-pushed the mzy/inf-ext-api branch from ce3c3fb to 2a6003a Compare April 16, 2026 10:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(service): add external model API support for inference service#1183

feat(service): add external model API support for inference service#1183
nuzant wants to merge 1 commit intomainfrom
mzy/inf-ext-api

nuzant commented Apr 14, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Apr 14, 2026

Uh oh!

nuzant Apr 14, 2026 •

edited

Loading

Uh oh!

garrett4wade left a comment

Uh oh!

garrett4wade Apr 15, 2026

Uh oh!

garrett4wade Apr 15, 2026

Uh oh!

garrett4wade Apr 15, 2026

Uh oh!

garrett4wade Apr 15, 2026

Uh oh!

garrett4wade Apr 15, 2026

Uh oh!

garrett4wade Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@app.post("/external/chat/completions")
		async def external_chat_completions(request: Request):

Conversation

nuzant commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Details of Important API Changes

Registering an external model

Using the model field in requests

Related Issue

Type of Change

Checklist

Additional Context

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

nuzant Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

garrett4wade left a comment

Choose a reason for hiding this comment

Uh oh!

garrett4wade Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

garrett4wade Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

garrett4wade Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

garrett4wade Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

garrett4wade Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

garrett4wade Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nuzant commented Apr 14, 2026 •

edited

Loading

Using the `model` field in requests

nuzant Apr 14, 2026 •

edited

Loading