fix: action server clock use-after-free#651
Open
azerupi wants to merge 3 commits into
Open
Conversation
An action server's expired-goals timer borrows the node's clock by raw pointer (passed to rcl_action_server_init), and rcl_action_notify_goal_done -- called whenever a goal terminates or is dropped -- dereferences it. But ActionServerHandle kept only the node handle alive, not the clock (which is owned by the node's NodeState). A goal holds an Arc<ActionServerHandle>, so a goal that outlives the node -- e.g. a goal-handler future still being torn down at executor/context shutdown -- would, on drop, call rcl_action_notify_goal_done against a freed clock: an intermittent use-after-free at teardown (manifesting as a "transition_to_aborted RCL_RET_ERROR" log or a SIGSEGV). Keep the clock alive by holding it in ActionServerHandle so it lives as long as the rcl_action_server and any goal handle that references it. Adds a regression test asserting the server takes a reference to the clock it borrows; without the fix it fails deterministically. The full single-process test suite, which segfaulted ~25-33% of runs before, was clean across 32 runs under gdb after.
ActionServerHandle had no Drop, so rcl_action_server_fini was never called: the rcl action server -- and its underlying services, publishers, and expired-goals timer -- leaked every time an action server was dropped. The action *client* already finalizes itself this way (ActionClientHandle::drop), so the server was an oversight. Add a Drop that finalizes the server, mirroring the action client and the service/client handles. It runs before the struct's fields are dropped, so the clock the handle now holds is still alive when rcl_action_server_fini touches the expired-goals timer.
mxgrey
approved these changes
Jun 22, 2026
mxgrey
left a comment
Collaborator
There was a problem hiding this comment.
Thanks for this fix, looks good! I just left one minor remark on a field attribute.
3628296 to
8ffffec
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
While working on something else I discovered the action server had two bugs related to teardown. These bugs have been narrowed down and fixed using Claude code.
Bug 1: Action server dereferences a freed clock on goal teardown
rcl_action_server_initis given a raw pointer to the node'srcl_clockfor the action server's expired-goals timer.rcl_action_notify_goal_done, called whenever a goal terminates or is dropped, dereferences that clock.But
ActionServerHandleonly kept the node handle alive, not the clock (which is owned by the node'sNodeState). A goal holdsArc<ActionServerHandle>, so it kept thercl_action_serverandrcl_nodealive but not the clock. When a node'sNodeStateis dropped while a goal still lives (e.g. a goal-handler future still being torn down at executor/context shutdown), the clock is freed and the goal'sDropcallsrcl_action_notify_goal_doneagainst it.Fix:
ActionServerHandlenow holds theClock(it'sArc-refcounted), so the clock lives as long as the server and any goal handle that references it.Bug 2 —
rcl_action_server_finiis never called (resource leak)ActionServerHandlehad noDrop, sorcl_action_server_finiwas never invoked. The action server and its underlying services, publishers, and expired-goals timer leaked on every drop. The action client (ActionClientHandle::drop) already finalizes itself the standard way (as doServiceHandle/ClientHandle). The server was simply missing it.Fix: add a
Dropthat finalizes the server, mirroring the action client. It runs before the struct's fields are dropped, so the clock kept alive by Bug 1's fix is still valid whenfinitouches the expired-goals timer, which is why the two fixes belong together.Testing
SIGSEGV'd ~25–33% of runs before ran clean across 32 runs under gdb after the fix.