fix(hook): break the macOS run loop on a stop flag, not CFRunLoopStop alone#263
fix(hook): break the macOS run loop on a stop flag, not CFRunLoopStop alone#263AprilNEA wants to merge 1 commit into
Conversation
… alone CFRunLoopStop is a no-op when it lands in the gap between the hook thread's 500ms run_in_mode slices: the shutdown signal is dropped, the loop re-enters a fresh slice, and stop()'s thread.join() blocks forever with the event tap still live at the HID location. Add an AtomicBool checked at the top of every slice. stop() now sets the flag before calling run_loop.stop(), so shutdown always terminates: promptly via the CF stop while a slice is running, otherwise within one 500ms slice via the flag. Mirrors the Linux backend's stop-flag pattern.
Greptile SummaryThis PR fixes a genuine hang-on-teardown race in the macOS hook backend where
Confidence Score: 4/5Safe to merge — the fix is a small, self-contained addition that eliminates a real permanent-hang path and always falls through to the existing disable_tap() cleanup. The AtomicBool pattern is correct and already proven in the Linux backend. The only gap is that the flag is not rechecked in the TimedOut/HandledSource arm, leaving a cosmetic enabled→disabled tap toggle and a small IPC overhead in the race case. Only crates/openlogi-hook/src/macos.rs changed; the race-case flag-recheck gap in the TimedOut/HandledSource arm is worth a follow-up. Important Files Changed
Sequence DiagramsequenceDiagram
participant C as Caller (stop())
participant T as Hook Thread
Note over C,T: Common case — stop fires during run_in_mode slice
C->>C: stop.store(true, Relaxed)
C->>T: run_loop.stop()
T-->>T: run_in_mode returns Stopped
T-->>T: break → disable_tap()
C->>T: thread.join()
T-->>C: joined (~0 ms)
Note over C,T: Race case — stop fires between slices
T-->>T: flag check passes (false)
T-->>T: enter run_in_mode (500 ms slice)
C->>C: stop.store(true, Relaxed)
C->>T: run_loop.stop() [no-op, loop not running]
T-->>T: run_in_mode returns TimedOut
T-->>T: has_accessibility() + tap.enable()
T-->>T: loop top: flag check → true → break → disable_tap()
C->>T: thread.join()
T-->>C: joined (≤500 ms + IPC overhead)
|
Problem
macos::stop()relies solely onCFRunLoopStopto terminate the hook thread. ButCFRunLoopStopis a no-op unless the run loop is currently running: it only sets the stop flag whenrl->_currentModeis non-null. The hook thread services the tap in 500 msrun_in_modeslices and, between slices, runshas_accessibility()(a TCC IPC) andtap.enable().If
stop()fires in that between-slices gap, the CF stop is dropped, nothing records the request, the thread re-enters a fresh 500 ms slice, andstop()'sthread.join()blocks forever — with theCGEventTapstill live at the HID location. The window is small (a few ms out of every 500 ms) but the failure mode is a permanent hang on hook teardown.Fix
Add an
AtomicBoolstop flag, checked at the top of every run-loop slice — mirroring the Linux backend's existing stop-flag pattern.stop()now sets the flag before callingrun_loop.stop(), so shutdown always terminates:run_in_mode):run_loop.stop()wakes it immediately → returnsStopped→ breaks, ~0 ms;Either way the loop exits and falls through to the existing
disable_tap(), so the tap is always detached on teardown.Relaxedordering suffices: the flag carries no other data, andthread.join()provides the final synchronisation.Testing
cargo clippy -p openlogi-hook --all-targets -- -D warnings— clean.Not unit-tested: the path is FFI/timing-bound (a real repro needs Accessibility + a live
CGEventTap), and the crate has no macOS unit tests today. A flag-only assertion would be tautological. The change is small and verifiable by inspection; happy to extract the slice-loop's termination decision behind a seam if a regression test is wanted.