Skip to content

fix(netwatch): drain AF_ROUTE socket to avoid macOS route monitor stall#165

Merged
flub merged 1 commit into
n0-computer:mainfrom
cbenhagen:fix/macos-route-monitor-stall
Jun 16, 2026
Merged

fix(netwatch): drain AF_ROUTE socket to avoid macOS route monitor stall#165
flub merged 1 commit into
n0-computer:mainfrom
cbenhagen:fix/macos-route-monitor-stall

Conversation

@cbenhagen

@cbenhagen cbenhagen commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Description

Fixes a read stall in the BSD/macOS route monitor.

netwatch/src/netmon/bsd.rs reads the AF_ROUTE socket through a tokio::net::UnixStream with one AsyncRead::read per await, reading a single message per wakeup and never draining. On BSD/macOS the fd is registered edge-triggered, so leaving data queued can drop the next readiness notification: the reader then stalls with data still queued in the socket (FIONREAD > 0, poll reports it readable) and stops delivering NetworkMessage::Change. The loop only recreates the socket on a read error, and the stall surfaces no error, so it does not recover.

This drain-until-WouldBlock requirement is the documented contract of mio: mio's Poll says an operation "must be performed repeatedly until it returns WouldBlock", else "there is no guarantee that another readiness event will be delivered, even if further data is received".

Changes:

  • Read the socket via readable() + try_read() drained to WouldBlock on each wakeup, instead of one AsyncRead::read per await. The UnixStream and its construction are unchanged.
  • Keep the existing buffer growth, parse_rib filtering, and socket recreation with exponential backoff.
  • Add a deterministic test that read_available drains all queued messages in one readiness episode (via an AF_UNIX datagram socketpair).

Verified on macOS: under sustained route churn the old one-read-per-await reader stalls with data queued and the OS still reporting its socket readable, while the drain-to-WouldBlock reader keeps its socket empty.

Breaking Changes

None.

Notes & open questions

None.

Change checklist

  • Self-review.
  • Documentation updates following the style guide, if relevant.
  • Tests if relevant.
  • All breaking changes documented. (None.)

@cbenhagen cbenhagen force-pushed the fix/macos-route-monitor-stall branch from 74a39df to 0c744fe Compare June 16, 2026 08:33
@n0bot n0bot Bot added this to iroh Jun 16, 2026
@github-project-automation github-project-automation Bot moved this to 🚑 Needs Triage in iroh Jun 16, 2026

@flub flub left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is looking good! Mostly some nits, thanks for finding this bug!

Comment thread netwatch/src/netmon/bsd.rs Outdated
Comment thread netwatch/src/netmon/bsd.rs
Comment thread netwatch/src/netmon/bsd.rs Outdated
Comment thread netwatch/src/netmon/bsd.rs Outdated
@github-project-automation github-project-automation Bot moved this from 🚑 Needs Triage to 🏗 In progress in iroh Jun 16, 2026
The BSD/macOS route monitor read the AF_ROUTE socket with one
AsyncRead::read per await and never drained. On an edge-triggered
reactor that can drop the next readiness notification, so under
sustained route churn the reader stalled with data still queued and
stopped delivering NetworkMessage::Change for the rest of the process.

Read via readable() + try_read() drained to WouldBlock instead. The
UnixStream and its construction are unchanged.
@cbenhagen cbenhagen force-pushed the fix/macos-route-monitor-stall branch from 0c744fe to bb19855 Compare June 16, 2026 10:34
@cbenhagen

Copy link
Copy Markdown
Contributor Author

Thanks for the quick review! Addressed all your comments.

@flub flub left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

(And please use the small "re-request review" button on github, it makes notifications a bit easier to handle on the reviewer-side. It is hidden under the "reviewers" menu to the right of each reviewer and only shows up when you have un-reviewed changed - not the best UX!)

@flub flub merged commit 6206b74 into n0-computer:main Jun 16, 2026
29 checks passed
@github-project-automation github-project-automation Bot moved this from 🏗 In progress to ✅ Done in iroh Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

2 participants