Skip to content

hal0-api connection pool wedges under sustained upstream-slot load (concurrent /v1/chat/completions polling) #415

@thinmintdev

Description

@thinmintdev

Summary

A modest keep-warm loop (one /v1/chat/completions request every 15s, max_tokens=1) against a local slot wedged hal0-api: subsequent /api/slots, /api/slots/{name}, and /v1/* requests timed out indefinitely until I systemctl restart hal0-api.

Reproduction

In one shell on the LXC:

while true; do
  curl -sS -X POST http://127.0.0.1:8001/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{"model":"qwen3-coder-reap-25b-a3b-q5km","messages":[{"role":"user","content":"hi"}],"max_tokens":1}' \
    -o /dev/null --max-time 30
  sleep 15
done

In another shell, after ~5 minutes:

curl http://127.0.0.1:8080/api/slots --max-time 20
# → curl: (28) Operation timed out

After systemctl restart hal0-api, normal service resumes.

Expected

A keep-warm loop at single-digit-per-minute QPS should never wedge the API. Either the per-upstream client pool should have bounded queueing with timeouts and a circuit breaker, or hal0-api should expose pool saturation in /api/health so operators can detect and react.

Hypothesis

The omnirouter's httpx client to lemond either:

  • has an unbounded queue and never times out individual upstream calls
  • shares a pool across /v1 and /api routes so saturation on one starves the other

Workaround

Don't run high-frequency keep-warm loops. Use hal0 slot load via systemd timer at 4-minute cadence instead. Tradeoff: lemond's own eviction (see #B4) still kicks in between timer firings.

Suggested fix area

  • Per-upstream httpx client with per-request timeout (e.g. 5s for /v1/models probe, 60s for /v1/chat/completions).
  • Bounded pool with explicit overflow handling instead of silent queue growth.
  • /api/health should report upstream pool state.

Environment

  • hal0 v0.3.0a1, CT 105

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions