Personas go silent after first response wave — Rust full_evaluate gate or InferenceCoordinator slot leak

## Symptom

Live observation on M5, 2026-04-17 ~18:24-18:27 PT, on `feature/prefix-reuse-and-multimodal` branch (with embedding throttle cherry-picked):

1. **6:24:00** — Joel: \"oh hey guys\"
2. **6:24:43-58** — Helper AI, CodeReview AI, Teacher AI all reply with friendly greetings (~43-58s response time, totally fine)
3. **6:25:36** — Claude Code (jtag): \"@helper phase1-probe-A\" → **no response**
4. **6:25:46** — Claude Code (jtag): \"@helper phase1-probe-B\" → **no response**
5. **6:26:48** — Joel: \"Hey, we are working on some optimizations for your thinking and rag really\" → **no response (sustained silence beyond 5 min)**

The first wave fires fine. After the first response per persona, the persona stops responding to anything.

## Hypotheses (per memento's earlier dig today)

1. **Rust `full_evaluate` gate** — `PersonaCognitionEngine.full_evaluate` returns `should_respond=false` after recent activity. Could be a recent-burst dampener that's tuned too aggressive, or stale state.
2. **InferenceCoordinator slot leak** — slots claimed during the first wave aren't released after generation completes. Once 2 slots (configured cap) are held, all subsequent personas wait forever for a free slot.
3. **AIProviderRustClient IPC reconnect race** — IPC connection silently failing under load; calls hang.

## What is NOT the cause

- **Provider routing** — confirmed working (PR #914 fix). `ai/generate provider=local` returns DMR.
- **Embedding storm** — throttle is in place (commit `203fb6534`).
- **Phase 1 ordering** (issue #918) — pre-existing on main; this branch's Phase 1 changes only sort sections, can't introduce a silence bug.

## Acceptance for the fix

- After 5+ messages to `@helper` over 30 seconds, Helper responds to all 5 (or returns explicit \"low-energy/skipping\" log line, not silent failure).
- `gpu/eviction-registry` or equivalent shows InferenceCoordinator slots being released after each generation, not held.
- Logs surface the gate decision: every silent persona logs *why* it skipped (`should_respond=false: reason=...`), not silent skip.

## Why this matters

The PR #914 work + Phase 1 ordering + Candle eager-load fix all *unblock GPU inference*. None of that matters if the cognition gate is muting personas after first turn — user sees \"alive once, then dead.\"

## Cross-reference

- Issue #918 — Phase 1 prefix-stability verification is currently blocked by this; can't fire repeated identical probes to confirm deterministic ordering when the second probe is silently dropped.
- PR #914 — landing this PR doesn't fix the gate, just unblocks routing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Personas go silent after first response wave — Rust full_evaluate gate or InferenceCoordinator slot leak #919

Symptom

Hypotheses (per memento's earlier dig today)

What is NOT the cause

Acceptance for the fix

Why this matters

Cross-reference

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Personas go silent after first response wave — Rust full_evaluate gate or InferenceCoordinator slot leak #919

Description

Symptom

Hypotheses (per memento's earlier dig today)

What is NOT the cause

Acceptance for the fix

Why this matters

Cross-reference

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions