Symptom
Live observation on M5, 2026-04-17 ~18:24-18:27 PT, on feature/prefix-reuse-and-multimodal branch (with embedding throttle cherry-picked):
- 6:24:00 — Joel: "oh hey guys"
- 6:24:43-58 — Helper AI, CodeReview AI, Teacher AI all reply with friendly greetings (~43-58s response time, totally fine)
- 6:25:36 — Claude Code (jtag): "@Helper phase1-probe-A" → no response
- 6:25:46 — Claude Code (jtag): "@Helper phase1-probe-B" → no response
- 6:26:48 — Joel: "Hey, we are working on some optimizations for your thinking and rag really" → no response (sustained silence beyond 5 min)
The first wave fires fine. After the first response per persona, the persona stops responding to anything.
Hypotheses (per memento's earlier dig today)
- Rust
full_evaluate gate — PersonaCognitionEngine.full_evaluate returns should_respond=false after recent activity. Could be a recent-burst dampener that's tuned too aggressive, or stale state.
- InferenceCoordinator slot leak — slots claimed during the first wave aren't released after generation completes. Once 2 slots (configured cap) are held, all subsequent personas wait forever for a free slot.
- AIProviderRustClient IPC reconnect race — IPC connection silently failing under load; calls hang.
What is NOT the cause
Acceptance for the fix
- After 5+ messages to
@helper over 30 seconds, Helper responds to all 5 (or returns explicit "low-energy/skipping" log line, not silent failure).
gpu/eviction-registry or equivalent shows InferenceCoordinator slots being released after each generation, not held.
- Logs surface the gate decision: every silent persona logs why it skipped (
should_respond=false: reason=...), not silent skip.
Why this matters
The PR #914 work + Phase 1 ordering + Candle eager-load fix all unblock GPU inference. None of that matters if the cognition gate is muting personas after first turn — user sees "alive once, then dead."
Cross-reference
Symptom
Live observation on M5, 2026-04-17 ~18:24-18:27 PT, on
feature/prefix-reuse-and-multimodalbranch (with embedding throttle cherry-picked):The first wave fires fine. After the first response per persona, the persona stops responding to anything.
Hypotheses (per memento's earlier dig today)
full_evaluategate —PersonaCognitionEngine.full_evaluatereturnsshould_respond=falseafter recent activity. Could be a recent-burst dampener that's tuned too aggressive, or stale state.What is NOT the cause
ai/generate provider=localreturns DMR.203fb6534).Acceptance for the fix
@helperover 30 seconds, Helper responds to all 5 (or returns explicit "low-energy/skipping" log line, not silent failure).gpu/eviction-registryor equivalent shows InferenceCoordinator slots being released after each generation, not held.should_respond=false: reason=...), not silent skip.Why this matters
The PR #914 work + Phase 1 ordering + Candle eager-load fix all unblock GPU inference. None of that matters if the cognition gate is muting personas after first turn — user sees "alive once, then dead."
Cross-reference