fix(iroh): evict cached mapped addrs when a RemoteStateActor shuts down#4294
Open
drlukeangel wants to merge 1 commit into
Open
fix(iroh): evict cached mapped addrs when a RemoteStateActor shuts down#4294drlukeangel wants to merge 1 commit into
drlukeangel wants to merge 1 commit into
Conversation
AddrMap (the per-remote endpoint/relay address cache behind mapped_addrs) has no eviction path and is never pruned. On clean RemoteStateActor shutdown, remove_or_restart_actor removed the senders entry but left the remote's cached addresses in endpoint_addrs/relay_addrs forever, so they grew once per remote-ever-seen under churn. Add remove()/retain() to AddrMap (keeping the forward addrs and reverse lookup maps consistent) and evict the departing remote's cached addresses on the clean-shutdown path. Add a unit test for remove/retain forward+reverse consistency.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
While running iroh under sustained remote churn (remotes continuously appearing and going away) in
a downstream project, we saw the per-remote address cache grow without bound.
AddrMap— the cacheof mapped endpoint/relay addresses behind
mapped_addrs— has no eviction path and is never pruned.When a
RemoteStateActorshuts down cleanly,RemoteMap::remove_or_restart_actorremoves itssendersentry but leaves that remote's cached addresses inmapped_addrsforever, soendpoint_addrs/relay_addrsgrow once per remote-ever-seen.(Also: thank you for iroh — the endpoint/discovery/QUIC stack let us build a multi-node mesh without
hand-rolling NAT traversal or relay handling, which is exactly why we could focus on this one cache.)
Fix
Add
remove()andretain()toAddrMap(keeping the forwardaddrsand reverselookupmapsconsistent), and call them in the clean-shutdown branch of
remove_or_restart_actorto drop thegone remote's cached endpoint and relay addresses. Eviction happens only on clean shutdown, so
active remotes are untouched.
relay_addrsis keyed by(addr, remote_id), so it usesretaintodrop all entries for the departing remote.
Breaking Changes
None.
remove/retainarepub(super); behavior change is limited to releasing state for remotesthat have already gone away.
Notes & open questions
AddrMap::remove/retaincovering forward removal andreverse-lookup consistency (the subtle part — keeping
addrsandlookupin sync). Theend-to-end "cache tracks only live remotes" behavior is churn-dependent; we validated it via a
downstream heap profile under continuous remote kill/respawn, where the
mapped_addrsentriesstopped growing with remotes-ever-seen. A before/after
dhatcapture on your churn harness is thecleanest way to quantify it.
mapped_addrsentries (e.g. deliberatelycached across reconnects)? If so, the eviction point may want to be narrower — guidance welcome.
Checklist
retainremove/retain; integration validation noted aboveRelated
Found together with two other churn-related leak fixes in the iroh stack:
Connectingis dropped pre-handshakeResolves #4293