fix(hid): keep last-known-good devices when a receiver read is incomplete (#218)#221
fix(hid): keep last-known-good devices when a receiver read is incomplete (#218)#221buliwyf42 wants to merge 1 commit into
Conversation
Greptile SummaryThis PR fixes the device-list flapping bug (#218) on Logi Bolt receivers by introducing
Confidence Score: 5/5Safe to merge — the change is narrowly scoped to inventory polling, adds no new externally-visible states that existing callers need to handle exhaustively, and all affected code paths are covered by new unit tests. The completeness check (receiver_read_complete) is simple, deterministic, and well-tested; the Enumerator correctly updates the probe cache for successfully-read devices even on incomplete cycles while skipping eviction; and WatchState::classify preserves the pre-existing Err→suppress / Ok→forward contract while making it testable. The one-shot retry loop in the free enumerate() function is bounded and reuses the warmed channel. No regressions are apparent for the direct-device or multi-receiver paths. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant W as Watcher Thread
participant E as Enumerator
participant R as Bolt Receiver (HID++)
participant A as Agent Orchestrator
loop Every ~2 s
W->>E: enumerate()
E->>R: count_pairings()
alt count_pairings succeeds → N
E->>R: probe_bolt_slots 1..6
alt "paired.len() >= N (complete)"
E-->>W: Ok(full inventory)
W->>W: classify → Some(Snapshot)
W->>A: Snapshot(inventory)
Note over A: Overwrites inventory state ✓
else "paired.len() < N (slot timeout)"
Note over E: cache updated for read slots, evict_unseen skipped
E-->>W: Err(Incomplete)
W->>W: "classify → None (succeeded=true)"
Note over A: Keeps last-known-good snapshot ✓
end
else count_pairings fails → None
Note over E: receiver_read_complete(None, _) = false
E-->>W: Err(Incomplete)
W->>W: "classify → None (succeeded=true)"
Note over A: Keeps last-known-good snapshot ✓
end
end
Note over W,A: Genuine disconnect → clean Ok(smaller count) → forwarded as Snapshot
Reviews (2): Last reviewed commit: "fix(hid): keep last-known-good devices w..." | Re-trigger Greptile |
…lete (AprilNEA#218) The device list flapped between all paired devices and "No devices connected" roughly every 2s on a Logi Bolt receiver (macOS 27 beta, MX Master 4 + MX Master 3S + MX Mechanical Mini). Root cause: a transient HID++ register read — count_pairings() or a per-slot get_device_pairing_information() timing out, or a whole device probe exceeding PROBE_BUDGET — left enumerate() returning Ok with a *truncated* device list. The agent's resilience is built on the rule "Err = couldn't check, keep the last snapshot; Ok = authoritative", but a partial read was mis-classified as an authoritative Ok. So Orchestrator::refresh_inventory overwrote the last-known-good set with the short/empty list, the IPC inventory() poll served it, and the GUI rendered "No devices connected" — then a good cycle restored all three. Fix: surface a partial read as InventoryError::Incomplete so the existing keep-last-snapshot machinery engages instead of publishing it. - inventory.rs: a Bolt receiver read is "complete" only when count_pairings() succeeds and at least that many slots read (receiver_read_complete); a probe timeout counts as incomplete too. On an incomplete cycle enumerate() skips cache eviction and returns Incomplete rather than a truncated list. - The one-shot enumerate() free fn (CLI / diagnostics) has no last-known-good to fall back on, so it retries an Incomplete cycle a few times before giving up — the two isolated runs in the report read 3 devices and 0; a retry would have read 3 both times. - watchers/inventory.rs: factor the tick->event decision into WatchState::classify (no behaviour change — Err already kept the last snapshot) and unit-test that an incomplete read keeps the last snapshot while a clean empty Ok is still forwarded as a disconnect. A genuine unpair/disconnect still reads cleanly as an Ok carrying the receiver's real, smaller count and propagates exactly as before. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
0bf7384 to
560efef
Compare
|
Closing in favor of #222 — it fixes the same transient-probe-failure root cause behind #218, and more thoroughly than this PR. We independently landed on the same core signal (a Bolt read is healthy only when The only bit here that #222 doesn't cover is the one-shot |
Summary
Fixes the device-list flapping half of #218: on a Logi Bolt receiver the GUI oscillated between showing all paired devices and "No devices connected" roughly every 2 s, so they never stayed long enough to configure. Reported on 0.6.6, macOS 27 beta, Apple Silicon, with an MX Master 4 + MX Master 3S + MX Mechanical Mini on one receiver.
This PR is scoped to the flap only. The device-images half of #218 is already addressed by #142 (Settings → Assets runtime sync, no CLI needed), #137 (match every
modelId), and #140 (retry sync with backoff), so I've left it to those.Root cause
The agent's inventory resilience rests on one rule, stated in
watchers/inventory.rs:A transient HID++ register read broke that rule.
count_pairings()and the per-slotget_device_pairing_information()are independent HID++ 1.0 register reads; on the macOS 27-beta IOHID stack (with the brand-new MX Master 4 adding notification traffic) they time out intermittently. A failed slot read was silently skipped, soprobe_onereturnedOkwith a truncatedpairedlist — possibly empty — and a whole-receiver probe timeout dropped the receiver entirely.enumerate()already detected the shortfall (paired.len() != pairing_count) but only logged a warning.That truncated
OkreachedOrchestrator::refresh_inventory, which unconditionally overwritesInventoryState::Ready; the IPCinventory()poll served the short/empty list and the GUI rendered "No devices connected" until the next good cycle ~2 s later. The GUI's ownINVENTORY_MISS_GRACE = 2rides out 1–2 transient empties, but a run of ≥3 (plausible on this stack) drops the devices anyway — and the agent-side snapshot flapped regardless.I also checked the duplicate-agent spawn race from #207's review (the
open -npath inipc_client.rs). It's bounded —just_disconnectedskips the spawn gate on the drop tick,SPAWN_RETRY_PERIOD = 30 srate-limits respawns, and the agent'ssingle_instance::acquire("agent.lock")makes any duplicate exit — so it can't produce a 2 s flap, and the report's own evidence (one agent process; isolated cold agent runs flap) rules it out. This PR doesn't touchipc_client.rs, so it won't conflict with #207.The fix
Make
enumerate()honest about partial reads so the existing keep-last-snapshot machinery engages, instead of papering over it in every consumer.openlogi-hid/inventory.rsInventoryError::Incomplete.receiver_read_complete(pairing_count, paired_len): a Bolt read is complete only whencount_pairings()succeeded and at least that many slots read. An unreadable count or a short read is incomplete; reading more than the count is still complete.probe_onereports that completeness; a probe timeout counts as incomplete too.enumerate()skips cache eviction (a device merely unread this cycle keeps its entry) and returnsIncompleterather than publishing a truncated list.enumerate()free fn (CLI /diag) has no last-known-good to fall back on, so it retries anIncompletecycle a few times before giving up — the report's two isolated runs read 3 devices and 0; a retry would have read 3 both times.openlogi-agent-core/watchers/inventory.rs— factor the tick→event decision intoWatchState::classify(no behaviour change; theErrarm already kept the last snapshot) so it's unit-testable.A genuine unpair/disconnect still reads cleanly as an
Okcarrying the receiver's real, smaller count, so the list still shrinks when devices actually leave.Tests
New unit tests (host-only, no hardware):
receiver_read_complete:Some(3)/3andSome(0)/0complete;Some(3)/2,Some(3)/0,None/*incomplete.WatchState::classify: an incomplete read after a success keeps the last snapshot; a clean emptyOkis still forwarded as a disconnect; persistent initial failure reportsUnavailableonce, then recovers.I couldn't build
openlogi-guilocally (it compiles Metal shaders against a full Xcode toolchain I don't have), but the GUI is untouched,InventoryErrorhas no exhaustive match anywhere, and the new variant is additive — so the GUI build is unaffected. CI covers it.Manual hardware verification still needed
I don't have the Bolt-receiver hardware, so the live repro needs a hand (ideally @buliwyf42, who filed it):
receiver read incomplete this cycle … keeping last snapshotand emits no snapshot, instead of flapping to "No devices".Okwith the smaller count), confirming genuine disconnects still propagate.Closes the flapping half of #218.