Skip to content

feat: add an event-driven multi-threaded Tokio executor (Tokio Executor 2/3)#654

Draft
azerupi wants to merge 2 commits into
ros2-rust:mainfrom
azerupi:pr/2-tokio-executor
Draft

feat: add an event-driven multi-threaded Tokio executor (Tokio Executor 2/3)#654
azerupi wants to merge 2 commits into
ros2-rust:mainfrom
azerupi:pr/2-tokio-executor

Conversation

@azerupi

@azerupi azerupi commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Note: This is based on #653 , I was hoping I could target that PR as base branch but that seems to only work if the branches are in the ros2_rust repo and not in my fork.


This is part of a set of PRs that attempt to add an event-driven Tokio executor to rclrs. The goal is to have something that:

  1. Works in a similar way than the events executor in rclcpp
  2. Interoperates nicely with the tokio ecosystem

I tried to split the work up into multiple PRs that build on top of each other to make the reviewing easier.

  1. feat: add push-callback registration for rcl primitives (Tokio Executor 1/3) #653
  2. feat: add an event-driven multi-threaded Tokio executor (Tokio Executor 2/3) #654
  3. feat: support actions on the Tokio executor (Tokio Executor 3/3) #655

add an event-driven multi-threaded Tokio executor

Why

The basic executor discovers work by polling rcl_wait. This PR adds a second executor, built on Tokio, that learns about work from the push callbacks added in #653. There are two main motivations for this:

  1. Performance
    Using the event interface from rcl we can get better performance with less CPU overhead.

  2. Interoperability
    Callbacks and tasks run on a real Tokio runtime, so you can use Tokio from inside a ROS callback: spawn a task, await a tokio::time timer, drive async I/O, or call a library built on Tokio.

How it works

Each worker gets its own tokio::mpsc mailbox and a single Tokio task that drains it. A push callback firing in the middleware sends a message into the mailbox. The worker task takes it and runs the entity's callback against the worker's payload.

One task per worker gives a few things for free. A Tokio task is never polled by two threads at once, so that task provides mutual exclusion and FIFO ordering within a worker without locking on the hot path. Tokio's scheduler provides the thread pool and work stealing, so different workers run on different threads automatically. There is no per-event task spawn: a single subscriber runs at full speed, and independent workers scale across cores without extra configuration.

Subscriptions, services, and clients are driven by push callbacks. Timers have no push callback, so each timer gets a small tokio::time driver task that sleeps until the next deadline and enqueues a tick on the worker's mailbox.

The spin contract

Callbacks for ROS entities only run while spin() is active. Worker tasks observe a gate and park when spinning stops. This preserves the rclrs contract that nothing runs before you spin, and that nothing is still running once you stop. spin() waits for any in-flight callback to finish before it returns.

spin() honors the same SpinOptions as the basic executor. A zero or short timeout with only_next_available_work drains the currently available work and returns (the spin_once pattern). A timeout that elapses is reported as a Timeout error rather than a silent return, matching the basic executor.

Coalescing

While spinning is paused, or under a burst, the middleware can fire many notifications for the same entity. Each entity coalesces rather than letting the mailbox grow without bound. A notification only enqueues a mailbox message
if one is not already pending, and it accumulates the event count in a pending counter. When the worker handles the entity it takes that many items.

This bounds the mailbox to at most one pending message per entity, and it does not lose messages when spinning resumes.

Feature flag

The executor lives behind a tokio-executor Cargo feature, enabled by default. Opting out with default-features = false drops the Tokio multi-threaded runtime and macros for a lighter build, without affecting the basic executor.

Performance

Numbers below were measured on a Ryzen 9 9900X (12 cores) under a continuous saturate workload: the executor spins on a background thread, a publisher thread publishes as fast as it can, and the main thread samples received count
in one-second windows.

Figures are the median of at least five reps, with threads pinned to a single CCD (publisher on core 0, executor on cores 1 to 5). Comparison points are rclcpp's SingleThreaded, MultiThreaded, and Events executors built and run the same way. All subscribers here are worker subscribers.

For a single subscriber the Tokio executor is competitive with the basic executor and ahead of every rclcpp executor. Throughput in messages per second:

Executor 0 B 64 KB
rclrs basic ~478k ~237k
rclrs tokio ~455k ~235k
rclcpp events ~393k ~232k
rclcpp single-thread ~134k ~179k
rclcpp multi-thread ~4k ~91k

At a fixed 10 kHz delivery rate with no loss, I observed the following CPU usage at 0 B

  • rclcpp events 5%
  • rclrs tokio 6%
  • rclcpp single 8%
  • rclrs basic 9%.

I think there are a lot of different benchmarks we could run here that each show different scenarios. I would encourage everyone to run their own benchmarks in order for us to find pathological cases or current limitations and potential performance optimizations for follow-up PRs.

Known limitations

I will keep a list of the known current limitations and major performance issues that have been identified so far.

Actions

In the current PR, Actions are not supported yet but I have a follow-up PR to fix that.

High-rate timers

High-rate timers are currently capped. For example a 1 kHz timer fires at roughly 600 Hz (measured). Lower rates such as 100 Hz are fine. Each timer is driven by its own task that uses tokio sleeps to the next deadline and then waits for the worker to run the callback before scheduling the next fire, so the rate is bounded by Tokio's timer granularity (around 1 ms) plus that per-fire round-trip.

To support high-rate timers we will most likely need to use something else than the tokio timers. But I'm leaving this for a follow-up PR.

Simulation time

The current PR does not make an attempt at supporting simulation time (see #654 (comment))

azerupi added 2 commits June 21, 2026 01:54
Add an opt-in way for primitives to report readiness via rcl's push
callbacks (rcl_*_set_on_new_*_callback) instead of being polled in a wait
set, as the foundation for an event-driven executor.

`RclPrimitive::register_on_ready` installs a callback that the middleware
invokes when the entity becomes ready and returns an `OnReadyHandle`
(RAII) that deregisters on drop. `OnReadyRegistration` wraps the unsafe
rcl setter: it boxes the callback context for a stable address and, on
drop, clears the callback before freeing the context (finalizing the rcl
entity first) so the middleware can never invoke a freed context during
teardown.

Implemented for subscriptions, services, and clients. No executor consumes
this yet, so the basic executor is unchanged.
Add a Tokio-based executor (opt-in via the `tokio-executor` feature,
enabled by default) that learns entity readiness from the rcl push
callbacks added in the previous commit instead of polling rcl_wait.

Each Worker drains its own mailbox on a dedicated Tokio task, so one
Worker's callbacks are serialized and ordered while independent Workers
run concurrently across Tokio's thread pool — multi-core concurrency with
no per-event task spawn. Subscriptions, services, and clients are driven
by push callbacks; timers by tokio::time. Worker tasks are gated by
spinning (callbacks only run while spinning, and spin() waits for
in-flight callbacks before returning); spin() honors
only_next_available_work (spin_once) and reports a timeout as a Timeout
error, matching the basic executor. Notifications coalesce per entity to
bound the mailbox, a panicking callback is contained rather than wedging
the worker, and push-callback registrations finalize the rcl entity before
freeing their context to avoid a teardown use-after-free.

Opt out with `default-features = false` to drop the Tokio multi-threaded
runtime and macros for a lighter build. Action support follows in a
separate commit.
@balthasarschuess

Copy link
Copy Markdown
Contributor

First of all thanks for your effort and this PR gets me excited!

I think the main point these PRs do not address yet is handling simulation time

  • timer handling on simulation time (custom sleep futures for tokio on ros time)
  • sleeps in callbacks
  • jump time callbacks (also scoped by worker payload)

A key difficulty is, that clock updates (imo) should not run inside the executor (as it does in cpp and simulation time sleeps in callbacks block single threaded executors and may block multi threaded executors), but at the same time we need some synchronization mechanism to make sure all jump callbacks ran before the clock continues and old messages are processed/discarded.

@azerupi

azerupi commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

Indeed, good call out. I didn't even attempt to support simulation time at the moment. That sounds like a deep rabbit hole I'm not totally ready to jump into yet. 😄 I've added it to the list of known limitations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants