Summary
Calls to POST /v1/chat/completions against a model that's served by a local slot (agent-hermes in our case) consistently 502 with:
{"error":{"code":"dispatch.upstream_unavailable","message":"upstream 'hal0' unreachable: PoolTimeout","details":{"upstream":"hal0","target":"http://127.0.0.1:8080/v1/chat/completions","error":""}}}
The dispatch target is hal0-api's own bind URL. From the api startup log:
slots.autoregistered_composite upstream=hal0
omni_router.attached base_url=http://127.0.0.1:13305
/api/slots lists a kind: "slot" entry named hal0 with _synthetic: true, advertising url=http://127.0.0.1:8080/v1. Even with all advertised models matched against a real local slot (agent-hermes, primary), the synthetic-hal0 slot wins routing and the request ends up dispatched to a URL that resolves to hal0-api itself.
Reproduction
On hal0 v0.3.0a1, with agent-hermes slot loaded and /v1/models advertising qwen3-coder-reap-25b-a3b-q5km:
curl -sS -X POST http://127.0.0.1:8080/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"qwen3-coder-reap-25b-a3b-q5km","messages":[{"role":"user","content":"hi"}],"max_tokens":5}' \
--max-time 60
→ 502 dispatch.upstream_unavailable after ~30-60s.
The same request against the llama-server slot directly works:
curl http://127.0.0.1:8001/v1/chat/completions ... # → 200, generates response
Expected
Either the synthetic upstream should not exist when a same-model local slot is configured, or _synthetic_reason: "install a local slot of the same name to take over" should describe how to actually achieve that (creating a slot with name = hal0 via hal0 slot create doesn't take over the synthetic — the route still loses).
Workaround
Point OpenAI-compatible clients at the slot's port directly (http://127.0.0.1:8001/v1) instead of the omnirouter. This loses model swap routing across slots but at least chat works.
Suggested fix area
omni_router should refuse to dispatch to a URL that resolves to its own bind.
slots.autoregistered_composite should defer to same-named real slots when present, instead of being preferred.
Environment
- hal0 v0.3.0a1, CT 105 (LXC), AMD Strix Halo
- Discovered while wiring NousResearch/hermes-agent via openrouter spawn
Summary
Calls to
POST /v1/chat/completionsagainst a model that's served by a local slot (agent-hermesin our case) consistently 502 with:{"error":{"code":"dispatch.upstream_unavailable","message":"upstream 'hal0' unreachable: PoolTimeout","details":{"upstream":"hal0","target":"http://127.0.0.1:8080/v1/chat/completions","error":""}}}The dispatch target is hal0-api's own bind URL. From the api startup log:
/api/slotslists akind: "slot"entry namedhal0with_synthetic: true, advertisingurl=http://127.0.0.1:8080/v1. Even with all advertised models matched against a real local slot (agent-hermes,primary), the synthetic-hal0slot wins routing and the request ends up dispatched to a URL that resolves to hal0-api itself.Reproduction
On hal0 v0.3.0a1, with
agent-hermesslot loaded and/v1/modelsadvertisingqwen3-coder-reap-25b-a3b-q5km:→ 502 dispatch.upstream_unavailable after ~30-60s.
The same request against the llama-server slot directly works:
curl http://127.0.0.1:8001/v1/chat/completions ... # → 200, generates responseExpected
Either the synthetic upstream should not exist when a same-model local slot is configured, or
_synthetic_reason: "install a local slot of the same name to take over"should describe how to actually achieve that (creating a slot withname = hal0viahal0 slot createdoesn't take over the synthetic — the route still loses).Workaround
Point OpenAI-compatible clients at the slot's port directly (
http://127.0.0.1:8001/v1) instead of the omnirouter. This loses model swap routing across slots but at least chat works.Suggested fix area
omni_routershould refuse to dispatch to a URL that resolves to its own bind.slots.autoregistered_compositeshould defer to same-named real slots when present, instead of being preferred.Environment