Skip to content

feat(service): add external model API support for inference service#1183

Open
nuzant wants to merge 1 commit intomainfrom
mzy/inf-ext-api
Open

feat(service): add external model API support for inference service#1183
nuzant wants to merge 1 commit intomainfrom
mzy/inf-ext-api

Conversation

@nuzant
Copy link
Copy Markdown
Collaborator

@nuzant nuzant commented Apr 14, 2026

Description

Add support for routing chat completions to external OpenAI-compatible APIs through the unified gateway/router/data-proxy stack. This enables interaction caching, reward assignment, and trajectory export for models from external providers.

Details of Important API Changes

Registering an external model

Before sending chat requests to an external provider you must register it with the gateway. Registration tells the router which upstream URL to forward to and which provider-side model name to use:

curl -X POST http://127.0.0.1:<PORT>/register_model \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer <ADMIN_API_KEY>" \
    -d '{
      "name": "gpt-4o",
      "url": "https://api.openai.com/v1",
      "model": "gpt-4o-2024-08-06"
    }'
Field Required Description
name Yes A local alias used to route requests. This is the value you put in the model field of subsequent requests.
url Yes Base URL of the external OpenAI-compatible API (e.g. https://api.openai.com/v1).
model No The model name sent to the external provider. If omitted, the model field is stripped from forwarded requests, letting the provider use its default.

The gateway forwards the registration through the router (which picks a healthy data proxy worker) and then to the data proxy itself. If the data proxy registration fails the router entry is automatically rolled back.

You can list registered models with GET /models and remove one with POST /remove_model (both require the admin key).

Using the model field in requests

Once a model is registered, the model field in request bodies acts as the routing key that connects chat completions, reward assignment (which follows OpenRouter request standard), and trajectory export to the same external provider session.

POST /chat/completions (or /v1/chat/completions)

Include "model": "<name>" in the request body. The gateway checks whether <name> matches a registered external model. If it does, the request is proxied to the external API; otherwise it falls through to the normal session-based routing for local inference. Both streaming and non-streaming modes are supported. The Authorization header you provide is forwarded as-is to the external provider.

curl -X POST http://127.0.0.1:<PORT>/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer <EXTERNAL_API_KEY>" \
    -d '{
      "model": "gpt-4o",
      "messages": [{"role": "user", "content": "Hello!"}]
    }'

POST /rl/set_reward

Include "model": "<name>" to attach a reward to the most recent interaction of that external model. When model is present the gateway routes by model name instead of the bearer token, so no session API key is needed.

curl -X POST http://127.0.0.1:<PORT>/rl/set_reward \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer <ADMIN_API_KEY>" \
    -d '{"model": "gpt-4o", "reward": 1.0}'

POST /export_trajectories

Use "model": "<name>" as a shorthand for "session_id". The gateway translates model into the corresponding session ID and routes the export request to the correct data proxy:

curl -X POST http://127.0.0.1:<PORT>/export_trajectories \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer <ADMIN_API_KEY>" \
    -d '{"model": "gpt-4o"}'

The response includes "external_api": true and the raw request/response pairs (no token log-probs, since the external provider does not expose them).

Related Issue

Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 💥 Breaking change
  • 📝 Documentation update
  • ♻️ Refactoring
  • ⚡ Performance improvement
  • ✅ Test coverage improvement

Checklist

  • I have read the Contributing Guide
  • Pre-commit hooks pass (pre-commit run --all-files)
  • Relevant tests pass; new tests added for new functionality
  • Documentation updated (if applicable; built with ./docs/build_all.sh)
  • Branch is up to date with main
  • Self-reviewed via /review-pr command
  • This PR was created by a coding agent via /create-pr
  • This PR is a breaking change

Breaking Change Details (if applicable):

N/A

Additional Context

Key changes:

  • External model registration flow: gateway -> router -> data proxy
  • Proxy forwarding for streaming and non-streaming chat completions
  • ExternalSessionData for interaction caching and trajectory export
  • Controller external_mode: skip inference server launch, validate config
  • Fail-fast on missing external_api_key to prevent admin key leakage
  • Rollback router entry when data proxy registration fails
  • Guard pause/continue against None inf_bridge in external mode
  • Only cache successful external API responses
  • Assert external mode requires workflow=None and group_size=1
  • /v1/chat/completions and /v1/models OpenAI-compatible aliases
  • HITL demo and online_rollout support for --external-url flags
  • Comprehensive unit and integration tests (1150+ lines)

Files changed (16):

  • areal/experimental/inference_service/controller/config.py — external model config fields
  • areal/experimental/inference_service/controller/controller.py — external mode logic, model registration
  • areal/experimental/inference_service/controller/workflow.py — external API trajectory passthrough
  • areal/experimental/inference_service/data_proxy/app.py — external model endpoints, proxy forwarding
  • areal/experimental/inference_service/data_proxy/session.pyExternalSessionData, interaction cache
  • areal/experimental/inference_service/gateway/app.py — model-based routing, /v1 aliases
  • areal/experimental/inference_service/gateway/streaming.py — router helpers for external models
  • areal/experimental/inference_service/router/app.py — model registry endpoints
  • areal/experimental/inference_service/router/state.pyExternalModelRegistry
  • examples/experimental/inference_service/README.md — updated docs
  • examples/experimental/inference_service/human_in_the_loop_demo.py--external-url support
  • examples/experimental/inference_service/online_rollout.py--external-url support
  • tests/experimental/inference_service/test_controller.py — updated controller tests
  • tests/experimental/inference_service/test_data_proxy_rtensor.py — updated proxy tests
  • tests/experimental/inference_service/test_external_model.py — new unit tests (755 lines)
  • tests/experimental/inference_service/test_external_model_integration.py — new integration tests (402 lines)

@nuzant nuzant changed the title feat(service): add external model API with safety guards feat(service): add external model API support for inference serivce Apr 14, 2026
@nuzant nuzant changed the title feat(service): add external model API support for inference serivce feat(service): add external model API support for inference service Apr 14, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an "External Model Mode" to the inference service, enabling the routing of inference requests to external OpenAI-compatible providers. The implementation involves updates across the controller, gateway, router, and data proxy to support external model registration, request forwarding, and interaction caching. Feedback highlights an opportunity to improve security by removing a fallback to the admin API key in external mode and suggests a more robust design for interaction caches to avoid inheritance issues.

Comment thread areal/experimental/inference_service/controller/controller.py Outdated
]


class ExternalSessionData(SessionData):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The implementation of ExternalSessionData violates the Liskov Substitution Principle. It inherits from SessionData but replaces _active_completions (which is an InteractionCache in the base class) with an ExternalInteractionCache, which has a different interface. This is why type: ignore and cast are needed, which is a code smell and makes the code less maintainable and more fragile to changes in the base class.

A better design would be to define a common interface for caches using a Protocol and have both InteractionCache and ExternalInteractionCache implement it. This would make the class hierarchy more robust and type-safe.

For example:

from typing import Protocol

class TrajectoryCache(Protocol):
    def add(self, ...) -> None: ...
    def set_reward(self, ...) -> None: ...
    def export(self, ...) -> Any: ...
    # ... other common methods

class InteractionCache(TrajectoryCache):
    ...

class ExternalInteractionCache(TrajectoryCache):
    ...

Then SessionData and ExternalSessionData could use a TrajectoryCache without breaking type contracts. This could be addressed in a follow-up, but it's worth noting for future maintainability.

Copy link
Copy Markdown
Collaborator Author

@nuzant nuzant Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be resolved after, when moving inference service out of experimental, since it requires changes in areal/experimental/openai/cache.py to unify string and token-based interaction and cache.

@nuzant nuzant added the safe-to-test Ready to run unit-tests in a PR. label Apr 14, 2026
@nuzant nuzant temporarily deployed to AReaL-unittests April 14, 2026 09:17 — with GitHub Actions Inactive
@nuzant nuzant temporarily deployed to AReaL-unittests April 14, 2026 09:17 — with GitHub Actions Inactive
Copy link
Copy Markdown
Collaborator

@garrett4wade garrett4wade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we don't require "external" everything to support a real external model.

And the validation of your implementation should be very simple - you use the inference controller to set up an internal model as the service, and set up another inference service that connects the former service as an external model. The code change of your example script should be minimal.

Comment on lines +57 to +62

# -- External model API ------------------------------------------------
external_api_url: str | None = None
external_api_key: str | None = None
external_api_model: str | None = None
external_model_name: str = "ext-model"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better not differentiate internal and external configurations. The external model should have a different name than the internal models (nobody will name its local model as gpt-5.2). We can use the same set of endpoints as the "internal" model - no new endpoints should be added. As long as an api_url is provided, we know that it is an external model.

Comment on lines +567 to +572
if cfg.external_api_key is None:
raise ValueError(
"external_api_key must be set when using external model mode. "
"Without it, the internal admin API key would be leaked to the "
"external provider."
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wierd error message.

Comment on lines +573 to +582
resp = requests.post(
f"{self._gateway_addr}/register_model",
json={
"name": cfg.external_model_name,
"url": cfg.external_api_url,
"model": cfg.external_api_model,
},
headers={"Authorization": f"Bearer {cfg.openai.admin_api_key}"},
timeout=cfg.request_timeout,
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You call /register_model rather than /register_external_model. Then why do you need the latter endpoint?

Comment on lines +329 to +337
@dataclass
class ExternalInteractionEntry:
interaction_id: str
request: str
response: str
reward: float | None = None


class ExternalInteractionCache:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to add the "external" variant of everything. The major difference is that we are storing strings rather than token IDs, or in other words, we can't access token IDs. Therefore, the "external" interactions can just set token_ids to None. Adding the "external" session cache is too verbose.

Comment on lines +482 to +483
@app.post("/external/chat/completions")
async def external_chat_completions(request: Request):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

except (json.JSONDecodeError, AttributeError):
pass

if model_name is not None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should assume that model_name is always non-None. We should use "model_name" to route over both internal and external models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe-to-test Ready to run unit-tests in a PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants