Summary
The one-shot openlogi_hid::enumerate() free fn — used only by the CLI (openlogi list, openlogi diag *) — retries through a transient probe miss (added in #237), but its early-exit gate never trips for a Unifying receiver. A Unifying probe that drops a device because its arrival event landed after the drain window reports healthy = true, so the retry loop returns the partial list immediately instead of retrying. #237's retry therefore covers the Bolt path (the #218 reporter's case) but is effectively a no-op on Unifying.
The continuous consumer (the GUI/agent inventory watcher) is not affected: it re-enumerates every ~2 s forever, so a single partial drain self-corrects on a later tick. This gap is specific to the single-shot CLI path.
Mechanism
#237 made the one-shot enumerate() retry while any probed node is unhealthy:
|
pub async fn enumerate() -> Result<Vec<DeviceInventory>, InventoryError> { |
|
// The polling [`Enumerator`] keeps a per-node ledger across ticks, so a |
|
// transient probe miss replays the node's last good inventory. A one-shot |
|
// caller (CLI `list` / `diag`) builds a fresh `Enumerator` whose ledger is |
|
// empty, so a miss has nothing to replay and would surface as an empty or |
|
// partial list — the two isolated runs in #218 read 3 devices and 0. Retry a |
|
// few times instead, reusing the same enumerator so its ledger accumulates a |
|
// snapshot a later attempt can replay and the opened channel stays warm. |
|
// #226's 5 s request timeout inside `HidppChannel::send` makes a dead probe |
|
// fail fast, so a short bounded retry is cheap. |
|
let mut enumerator = Enumerator::default(); |
|
let mut attempt = 1u8; |
|
loop { |
|
let (inventories, all_healthy) = enumerator.enumerate_reporting_health().await?; |
|
if all_healthy || attempt >= ONESHOT_ATTEMPTS { |
|
return Ok(inventories); |
|
} |
|
debug!( |
|
attempt, |
|
"one-shot enumerate saw an unhealthy node — retrying" |
|
); |
|
tokio::time::sleep(ONESHOT_RETRY_DELAY).await; |
|
attempt += 1; |
|
} |
|
} |
|
|
|
/// Attempts a one-shot [`enumerate`] makes before returning whatever it last |
|
/// read, when a node keeps coming back unhealthy. |
|
const ONESHOT_ATTEMPTS: u8 = 4; |
|
|
|
/// Delay between one-shot [`enumerate`] retries. A first probe usually wakes an |
|
/// asleep device, so a short pause lets the next attempt read it cleanly. |
|
const ONESHOT_RETRY_DELAY: Duration = Duration::from_millis(300); |
|
|
The exit condition is all_healthy || attempt >= ONESHOT_ATTEMPTS (inventory.rs:271), where all_healthy is the logical AND of each probed node's probe.healthy. The two receiver kinds define healthy very differently:
- Bolt (
inventory.rs:537, probe_bolt_receiver): healthy = pairing_count.is_some_and(|count| paired.len() == count). A device missing from the arrival drain lowers paired.len() below count → healthy = false → the retry fires. ✅
- Unifying (
inventory.rs:622, probe_unifying_receiver): healthy = pairing_count.is_some(). Offline Unifying devices aren't enumerable yet (the 0xB5/0x5N pairing-info sub-register format isn't resolved), so paired.len() != count is expected and cannot be the health signal. The arrival drain is the only device source, so a device whose arrival event arrives after ARRIVAL_DRAIN (1.5 s) is silently dropped — yet pairing_count.is_some() is still true → healthy = true → the retry never fires. ❌
So on Unifying, a single timing-induced partial drain in openlogi list / diag returns a short device set with no retry, and (because that partial list is written to the per-node ledger's last_good) it is returned as-is.
Originally flagged by Greptile as a P2 on #237 and acknowledged as a known limitation in the probe_unifying_receiver code comment.
Impact
Proposed fix
Make the retry's early-exit transport-agnostic: stop when all_healthy OR the returned inventory is unchanged from the previous attempt.
- A Unifying partial drain → the missing device appears on a later attempt → the inventory changes → keep retrying until it stabilizes.
- Also removes a smaller wart: a chronically-unhealthy node (e.g. an asleep BT-direct device that never answers) keeps
all_healthy = false forever, forcing every one-shot call to burn all ONESHOT_ATTEMPTS even when the receiver's devices read cleanly on attempt 1. A "stable inventory" stop returns as soon as the set stops growing.
Cost: needs DeviceInventory / PairedDevice equality + a stable ordering for the comparison, and (if all_healthy is dropped as the fast path) a minimum of two passes on the happy path. Keeping all_healthy as the fast-exit and adding "unchanged inventory" as the fallback stop preserves the one-pass happy path while closing the Unifying gap.
Rejected alternative: a Unifying-specific "online-device count vs a known-good baseline" health signal — stateful, needs persistence, and has its own flapping failure modes.
Refs #237, #222, #218.
Summary
The one-shot
openlogi_hid::enumerate()free fn — used only by the CLI (openlogi list,openlogi diag *) — retries through a transient probe miss (added in #237), but its early-exit gate never trips for a Unifying receiver. A Unifying probe that drops a device because its arrival event landed after the drain window reportshealthy = true, so the retry loop returns the partial list immediately instead of retrying. #237's retry therefore covers the Bolt path (the #218 reporter's case) but is effectively a no-op on Unifying.The continuous consumer (the GUI/agent inventory watcher) is not affected: it re-enumerates every ~2 s forever, so a single partial drain self-corrects on a later tick. This gap is specific to the single-shot CLI path.
Mechanism
#237 made the one-shot
enumerate()retry while any probed node is unhealthy:OpenLogi/crates/openlogi-hid/src/inventory.rs
Lines 257 to 290 in a823986
The exit condition is
all_healthy || attempt >= ONESHOT_ATTEMPTS(inventory.rs:271), whereall_healthyis the logical AND of each probed node'sprobe.healthy. The two receiver kinds definehealthyvery differently:inventory.rs:537,probe_bolt_receiver):healthy = pairing_count.is_some_and(|count| paired.len() == count). A device missing from the arrival drain lowerspaired.len()belowcount→healthy = false→ the retry fires. ✅inventory.rs:622,probe_unifying_receiver):healthy = pairing_count.is_some(). Offline Unifying devices aren't enumerable yet (the0xB5/0x5Npairing-info sub-register format isn't resolved), sopaired.len() != countis expected and cannot be the health signal. The arrival drain is the only device source, so a device whose arrival event arrives afterARRIVAL_DRAIN(1.5 s) is silently dropped — yetpairing_count.is_some()is still true →healthy = true→ the retry never fires. ❌So on Unifying, a single timing-induced partial drain in
openlogi list/diagreturns a short device set with no retry, and (because that partial list is written to the per-node ledger'slast_good) it is returned as-is.Originally flagged by Greptile as a P2 on #237 and acknowledged as a known limitation in the
probe_unifying_receivercode comment.Impact
openlogi list/openlogi diag *occasionally under-reports devices on a Unifying receiver, with no automatic retry to recover within the single invocation.Proposed fix
Make the retry's early-exit transport-agnostic: stop when
all_healthyOR the returned inventory is unchanged from the previous attempt.all_healthy = falseforever, forcing every one-shot call to burn allONESHOT_ATTEMPTSeven when the receiver's devices read cleanly on attempt 1. A "stable inventory" stop returns as soon as the set stops growing.Cost: needs
DeviceInventory/PairedDeviceequality + a stable ordering for the comparison, and (ifall_healthyis dropped as the fast path) a minimum of two passes on the happy path. Keepingall_healthyas the fast-exit and adding "unchanged inventory" as the fallback stop preserves the one-pass happy path while closing the Unifying gap.Rejected alternative: a Unifying-specific "online-device count vs a known-good baseline" health signal — stateful, needs persistence, and has its own flapping failure modes.
Refs #237, #222, #218.