Skip to content

fix: action server clock use-after-free#651

Open
azerupi wants to merge 3 commits into
ros2-rust:mainfrom
azerupi:fix/action-server-clock-uaf
Open

fix: action server clock use-after-free#651
azerupi wants to merge 3 commits into
ros2-rust:mainfrom
azerupi:fix/action-server-clock-uaf

Conversation

@azerupi

@azerupi azerupi commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

While working on something else I discovered the action server had two bugs related to teardown. These bugs have been narrowed down and fixed using Claude code.

Bug 1: Action server dereferences a freed clock on goal teardown

rcl_action_server_init is given a raw pointer to the node's rcl_clock for the action server's expired-goals timer. rcl_action_notify_goal_done, called whenever a goal terminates or is dropped, dereferences that clock.

But ActionServerHandle only kept the node handle alive, not the clock (which is owned by the node's NodeState). A goal holds Arc<ActionServerHandle>, so it kept the rcl_action_server and rcl_node alive but not the clock. When a node's NodeState is dropped while a goal still lives (e.g. a goal-handler future still being torn down at executor/context shutdown), the clock is freed and the goal's Drop calls rcl_action_notify_goal_done against it.

Fix: ActionServerHandle now holds the Clock (it's Arc-refcounted), so the clock lives as long as the server and any goal handle that references it.

Bug 2 — rcl_action_server_fini is never called (resource leak)

ActionServerHandle had no Drop, so rcl_action_server_fini was never invoked. The action server and its underlying services, publishers, and expired-goals timer leaked on every drop. The action client (ActionClientHandle::drop) already finalizes itself the standard way (as do ServiceHandle/ClientHandle). The server was simply missing it.

Fix: add a Drop that finalizes the server, mirroring the action client. It runs before the struct's fields are dropped, so the clock kept alive by Bug 1's fix is still valid when fini touches the expired-goals timer, which is why the two fixes belong together.

Testing

  • Regression test added: asserts that creating an action server takes a strong reference to the clock it borrows by raw pointer. Fails deterministically without Bug 1's fix, passes with it. I chose this invariant assertion over a crash-based test because a UAF doesn't crash reliably, so a crash test would be flaky, whereas this fails the instant the clock reference is dropped.
  • The full single-process lib suite — which SIGSEGV'd ~25–33% of runs before ran clean across 32 runs under gdb after the fix.

azerupi added 2 commits June 20, 2026 18:01
An action server's expired-goals timer borrows the node's clock by raw
pointer (passed to rcl_action_server_init), and rcl_action_notify_goal_done
-- called whenever a goal terminates or is dropped -- dereferences it. But
ActionServerHandle kept only the node handle alive, not the clock (which is
owned by the node's NodeState). A goal holds an Arc<ActionServerHandle>, so
a goal that outlives the node -- e.g. a goal-handler future still being torn
down at executor/context shutdown -- would, on drop, call
rcl_action_notify_goal_done against a freed clock: an intermittent
use-after-free at teardown (manifesting as a "transition_to_aborted
RCL_RET_ERROR" log or a SIGSEGV).

Keep the clock alive by holding it in ActionServerHandle so it lives as long
as the rcl_action_server and any goal handle that references it.

Adds a regression test asserting the server takes a reference to the clock it
borrows; without the fix it fails deterministically. The full single-process
test suite, which segfaulted ~25-33% of runs before, was clean across 32 runs
under gdb after.
ActionServerHandle had no Drop, so rcl_action_server_fini was never called:
the rcl action server -- and its underlying services, publishers, and
expired-goals timer -- leaked every time an action server was dropped. The
action *client* already finalizes itself this way (ActionClientHandle::drop),
so the server was an oversight.

Add a Drop that finalizes the server, mirroring the action client and the
service/client handles. It runs before the struct's fields are dropped, so
the clock the handle now holds is still alive when rcl_action_server_fini
touches the expired-goals timer.
@azerupi azerupi changed the title Fix action server clock use-after-free fix: action server clock use-after-free Jun 21, 2026

@mxgrey mxgrey left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix, looks good! I just left one minor remark on a field attribute.

Comment thread rclrs/src/action/action_server.rs Outdated
@azerupi azerupi force-pushed the fix/action-server-clock-uaf branch from 3628296 to 8ffffec Compare June 22, 2026 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants