hal0-api connection pool wedges under sustained upstream-slot load (concurrent /v1/chat/completions polling)

## Summary

A modest keep-warm loop (one `/v1/chat/completions` request every 15s, `max_tokens=1`) against a local slot wedged hal0-api: subsequent `/api/slots`, `/api/slots/{name}`, and `/v1/*` requests timed out indefinitely until I `systemctl restart hal0-api`.

## Reproduction

In one shell on the LXC:
```bash
while true; do
  curl -sS -X POST http://127.0.0.1:8001/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{"model":"qwen3-coder-reap-25b-a3b-q5km","messages":[{"role":"user","content":"hi"}],"max_tokens":1}' \
    -o /dev/null --max-time 30
  sleep 15
done
```

In another shell, after ~5 minutes:
```bash
curl http://127.0.0.1:8080/api/slots --max-time 20
# → curl: (28) Operation timed out
```

After `systemctl restart hal0-api`, normal service resumes.

## Expected

A keep-warm loop at single-digit-per-minute QPS should never wedge the API. Either the per-upstream client pool should have bounded queueing with timeouts and a circuit breaker, or hal0-api should expose pool saturation in `/api/health` so operators can detect and react.

## Hypothesis

The omnirouter's httpx client to lemond either:
- has an unbounded queue and never times out individual upstream calls
- shares a pool across `/v1` and `/api` routes so saturation on one starves the other

## Workaround

Don't run high-frequency keep-warm loops. Use `hal0 slot load` via systemd timer at 4-minute cadence instead. Tradeoff: lemond's own eviction (see #B4) still kicks in between timer firings.

## Suggested fix area

- Per-upstream httpx client with per-request timeout (e.g. 5s for `/v1/models` probe, 60s for `/v1/chat/completions`).
- Bounded pool with explicit overflow handling instead of silent queue growth.
- `/api/health` should report upstream pool state.

## Environment

- hal0 v0.3.0a1, CT 105

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hal0-api connection pool wedges under sustained upstream-slot load (concurrent /v1/chat/completions polling) #415

Summary

Reproduction

Expected

Hypothesis

Workaround

Suggested fix area

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

hal0-api connection pool wedges under sustained upstream-slot load (concurrent /v1/chat/completions polling) #415

Description

Summary

Reproduction

Expected

Hypothesis

Workaround

Suggested fix area

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions