Summary
Setting idle_timeout_s on a slot's config (via PUT /api/slots/{name}/config) does not change lemond's eviction behaviour. Setting it to 86400 (24h) results in eviction within ~60-90s of the last request anyway.
Reproduction
curl -X PUT http://127.0.0.1:8080/api/slots/agent-hermes/config \
-H 'Content-Type: application/json' \
-d '{"name":"agent-hermes","type":"llm","device":"gpu-vulkan","group":"chat",
"port":8001,"idle_timeout_s":86400,
"model":{"default":"qwen3-coder-reap-25b-a3b-q5km","ctx_size":65536}}'
# Confirm the config was accepted:
curl -sS http://127.0.0.1:8080/api/slots/agent-hermes/config
# → ...,"idle_timeout_s": 86400,...
# Load it:
curl -X POST http://127.0.0.1:8080/api/slots/agent-hermes/load \
-d '{"model_id":"qwen3-coder-reap-25b-a3b-q5km"}'
# Wait 90s, then re-check:
sleep 90
curl http://127.0.0.1:8080/api/slots/agent-hermes
# → state: offline, message: "model evicted from lemond (auto-reloads on next request)"
Expected
With idle_timeout_s: 86400 configured, the slot should remain loaded for up to 24h of idle, not 60-90s.
Impact
For agent runtimes that depend on multi-turn tool-call flows (hermes-agent here), the slot evicts mid-conversation between the first inference pass (which generates the tool call JSON) and the second pass (which processes the tool result). The result is Connection error on the second hop, even though the slot was healthy 30s earlier.
Workaround
Systemd timer that re-issues hal0 slot load <name> every 4 minutes. Functional but adds load to lemond and races against the eviction.
Suggested fix area
The slot config's idle_timeout_s needs to actually reach lemond's eviction loop. Right now it appears to be stored but ignored.
Environment
Summary
Setting
idle_timeout_son a slot's config (viaPUT /api/slots/{name}/config) does not change lemond's eviction behaviour. Setting it to86400(24h) results in eviction within ~60-90s of the last request anyway.Reproduction
Expected
With
idle_timeout_s: 86400configured, the slot should remain loaded for up to 24h of idle, not 60-90s.Impact
For agent runtimes that depend on multi-turn tool-call flows (hermes-agent here), the slot evicts mid-conversation between the first inference pass (which generates the tool call JSON) and the second pass (which processes the tool result). The result is
Connection erroron the second hop, even though the slot was healthy 30s earlier.Workaround
Systemd timer that re-issues
hal0 slot load <name>every 4 minutes. Functional but adds load to lemond and races against the eviction.Suggested fix area
The slot config's
idle_timeout_sneeds to actually reach lemond's eviction loop. Right now it appears to be stored but ignored.Environment