feat(inference): per-request VLM model override by Liuhaai · Pull Request #64 · machinefi/trio-core

Liuhaai · 2026-05-07T20:42:18Z

Description

Adds an optional model field to DescribeRequest and CropDescribeRequest so callers can pick a different upstream model per request without restarting the server. The cortex client uses this to route VLM scene-describe and segmentation ground-region requests to different upstream Qwen models from a single trio-core instance (e.g. qwen3-vl-flash for scenes, qwen3.6-plus for segmentation).

Type of Change

New feature (non-breaking change that adds functionality)

Changes

DescribeRequest / CropDescribeRequest: add optional model: str | None (default None)
/api/inference/describe and /api/inference/crop-describe handlers: forward req.model into engine.analyze_frame(...)
TrioCore.analyze_video / analyze_frame: accept and forward model
BaseBackend.generate / stream_generate: gain optional model kwarg
RemoteHTTPBackend: uses model or self._remote_model in the upstream chat.completions.create call; log line includes the effective model
Local backends (transformers, mlx, tome, compressed): accept the kwarg and call BaseBackend._warn_model_override_once(model) — one warning per backend instance, then ignore (local backends can't swap models per-request)

When the request omits model, behavior is identical to today.

Test Plan

All existing tests pass (python -m pytest tests/ -v for backends/engine/config/inference_router → 60 passed)
New tests added for new functionality
- test_generate_uses_per_request_model_override — asserts the OpenAI client receives the per-request model
- test_generate_falls_back_to_configured_model — asserts fallback to _remote_model when override is None
Regression test passes (python examples/run_regression.py) — not run; this PR doesn't change inference math (only routes the model name through), and remote inference depends on a live DashScope endpoint
Manual sanity: targeted tests cover the changed code path; cortex-side caller wiring in a separate PR has been validated end-to-end against the new request schema

Checklist

My code follows the project's code style (type hints, no unnecessary abstractions)
I have not introduced any new AGPL-licensed dependencies
This PR contains no breaking API changes — model is optional with default None and existing callers are unaffected
I have not committed secrets, API keys, or credentials

🤖 Generated with Claude Code

Adds an optional `model` field to `DescribeRequest` and `CropDescribeRequest` so callers can pick a different upstream model per request without a server reload. Threaded through `TrioCore.analyze_video/analyze_frame` and `BaseBackend.generate/stream_generate`. `RemoteHTTPBackend` honors the override on the OpenAI-compatible chat.completions call (falls back to the configured `remote_vlm_model` when omitted). Local backends (transformers, mlx, tome, compressed) log a one-shot warning and ignore the override since they cannot swap models per request — `BaseBackend._warn_model_override_once` centralizes this. Use case: route VLM scene-describe and segmentation ground-region requests to different upstream Qwen models from a single trio-core instance. Tests: two new RemoteHTTPBackend tests cover the override path and the fallback to the configured default. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Liuhaai merged commit 5c9e0d3 into main May 7, 2026
7 checks passed

Liuhaai deleted the feat/per-request-vlm-model-override branch May 7, 2026 20:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(inference): per-request VLM model override#64

feat(inference): per-request VLM model override#64
Liuhaai merged 1 commit into
mainfrom
feat/per-request-vlm-model-override

Liuhaai commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Liuhaai commented May 7, 2026

Description

Type of Change

Changes

Test Plan

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant