feat: add an event-driven multi-threaded Tokio executor (Tokio Executor 2/3) by azerupi · Pull Request #654 · ros2-rust/ros2_rust

azerupi · 2026-06-21T17:38:09Z

Note: This is based on #653 , I was hoping I could target that PR as base branch but that seems to only work if the branches are in the ros2_rust repo and not in my fork.

This is part of a set of PRs that attempt to add an event-driven Tokio executor to rclrs. The goal is to have something that:

Works in a similar way than the events executor in rclcpp
Interoperates nicely with the tokio ecosystem

I tried to split the work up into multiple PRs that build on top of each other to make the reviewing easier.

add an event-driven multi-threaded Tokio executor

Why

The basic executor discovers work by polling rcl_wait. This PR adds a second executor, built on Tokio, that learns about work from the push callbacks added in #653. There are two main motivations for this:

Performance
Using the event interface from rcl we can get better performance with less CPU overhead.
Interoperability
Callbacks and tasks run on a real Tokio runtime, so you can use Tokio from inside a ROS callback: spawn a task, await a tokio::time timer, drive async I/O, or call a library built on Tokio.

How it works

Each worker gets its own tokio::mpsc mailbox and a single Tokio task that drains it. A push callback firing in the middleware sends a message into the mailbox. The worker task takes it and runs the entity's callback against the worker's payload.

One task per worker gives a few things for free. A Tokio task is never polled by two threads at once, so that task provides mutual exclusion and FIFO ordering within a worker without locking on the hot path. Tokio's scheduler provides the thread pool and work stealing, so different workers run on different threads automatically. There is no per-event task spawn: a single subscriber runs at full speed, and independent workers scale across cores without extra configuration.

Subscriptions, services, and clients are driven by push callbacks. Timers have no push callback, so each timer gets a small tokio::time driver task that sleeps until the next deadline and enqueues a tick on the worker's mailbox.

The spin contract

Callbacks for ROS entities only run while spin() is active. Worker tasks observe a gate and park when spinning stops. This preserves the rclrs contract that nothing runs before you spin, and that nothing is still running once you stop. spin() waits for any in-flight callback to finish before it returns.

spin() honors the same SpinOptions as the basic executor. A zero or short timeout with only_next_available_work drains the currently available work and returns (the spin_once pattern). A timeout that elapses is reported as a Timeout error rather than a silent return, matching the basic executor.

Coalescing

While spinning is paused, or under a burst, the middleware can fire many notifications for the same entity. Each entity coalesces rather than letting the mailbox grow without bound. A notification only enqueues a mailbox message
if one is not already pending, and it accumulates the event count in a pending counter. When the worker handles the entity it takes that many items.

This bounds the mailbox to at most one pending message per entity, and it does not lose messages when spinning resumes.

Feature flag

The executor lives behind a tokio-executor Cargo feature, enabled by default. Opting out with default-features = false drops the Tokio multi-threaded runtime and macros for a lighter build, without affecting the basic executor.

Performance

Numbers below were measured on a Ryzen 9 9900X (12 cores) under a continuous saturate workload: the executor spins on a background thread, a publisher thread publishes as fast as it can, and the main thread samples received count
in one-second windows.

Figures are the median of at least five reps, with threads pinned to a single CCD (publisher on core 0, executor on cores 1 to 5). Comparison points are rclcpp's SingleThreaded, MultiThreaded, and Events executors built and run the same way. All subscribers here are worker subscribers.

For a single subscriber the Tokio executor is competitive with the basic executor and ahead of every rclcpp executor. Throughput in messages per second:

Executor	0 B	64 KB
rclrs basic	~478k	~237k
rclrs tokio	~455k	~235k
rclcpp events	~393k	~232k
rclcpp single-thread	~134k	~179k
rclcpp multi-thread	~4k	~91k

At a fixed 10 kHz delivery rate with no loss, I observed the following CPU usage at 0 B

rclcpp events 5%
rclrs tokio 6%
rclcpp single 8%
rclrs basic 9%.

I think there are a lot of different benchmarks we could run here that each show different scenarios. I would encourage everyone to run their own benchmarks in order for us to find pathological cases or current limitations and potential performance optimizations for follow-up PRs.

Known limitations

I will keep a list of the known current limitations and major performance issues that have been identified so far.

Actions

In the current PR, Actions are not supported yet but I have a follow-up PR to fix that.

High-rate timers

High-rate timers are currently capped. For example a 1 kHz timer fires at roughly 600 Hz (measured). Lower rates such as 100 Hz are fine. Each timer is driven by its own task that uses tokio sleeps to the next deadline and then waits for the worker to run the callback before scheduling the next fire, so the rate is bounded by Tokio's timer granularity (around 1 ms) plus that per-fire round-trip.

To support high-rate timers we will most likely need to use something else than the tokio timers. But I'm leaving this for a follow-up PR.

Simulation time

The current PR does not make an attempt at supporting simulation time (see #654 (comment))

Add an opt-in way for primitives to report readiness via rcl's push callbacks (rcl_*_set_on_new_*_callback) instead of being polled in a wait set, as the foundation for an event-driven executor. `RclPrimitive::register_on_ready` installs a callback that the middleware invokes when the entity becomes ready and returns an `OnReadyHandle` (RAII) that deregisters on drop. `OnReadyRegistration` wraps the unsafe rcl setter: it boxes the callback context for a stable address and, on drop, clears the callback before freeing the context (finalizing the rcl entity first) so the middleware can never invoke a freed context during teardown. Implemented for subscriptions, services, and clients. No executor consumes this yet, so the basic executor is unchanged.

Add a Tokio-based executor (opt-in via the `tokio-executor` feature, enabled by default) that learns entity readiness from the rcl push callbacks added in the previous commit instead of polling rcl_wait. Each Worker drains its own mailbox on a dedicated Tokio task, so one Worker's callbacks are serialized and ordered while independent Workers run concurrently across Tokio's thread pool — multi-core concurrency with no per-event task spawn. Subscriptions, services, and clients are driven by push callbacks; timers by tokio::time. Worker tasks are gated by spinning (callbacks only run while spinning, and spin() waits for in-flight callbacks before returning); spin() honors only_next_available_work (spin_once) and reports a timeout as a Timeout error, matching the basic executor. Notifications coalesce per entity to bound the mailbox, a panicking callback is contained rather than wedging the worker, and push-callback registrations finalize the rcl entity before freeing their context to avoid a teardown use-after-free. Opt out with `default-features = false` to drop the Tokio multi-threaded runtime and macros for a lighter build. Action support follows in a separate commit.

balthasarschuess · 2026-06-22T11:52:13Z

First of all thanks for your effort and this PR gets me excited!

I think the main point these PRs do not address yet is handling simulation time

timer handling on simulation time (custom sleep futures for tokio on ros time)
sleeps in callbacks
jump time callbacks (also scoped by worker payload)

A key difficulty is, that clock updates (imo) should not run inside the executor (as it does in cpp and simulation time sleeps in callbacks block single threaded executors and may block multi threaded executors), but at the same time we need some synchronization mechanism to make sure all jump callbacks ran before the clock continues and old messages are processed/discarded.

azerupi · 2026-06-22T18:43:52Z

Indeed, good call out. I didn't even attempt to support simulation time at the moment. That sounds like a deep rabbit hole I'm not totally ready to jump into yet. 😄 I've added it to the list of known limitations.

azerupi added 2 commits June 21, 2026 01:54

This was referenced Jun 21, 2026

feat: support actions on the Tokio executor (Tokio Executor 3/3) #655

Draft

feat: add push-callback registration for rcl primitives (Tokio Executor 1/3) #653

Draft

azerupi force-pushed the pr/2-tokio-executor branch 4 times, most recently from 39c7750 to 9b519ea Compare June 21, 2026 19:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add an event-driven multi-threaded Tokio executor (Tokio Executor 2/3)#654

feat: add an event-driven multi-threaded Tokio executor (Tokio Executor 2/3)#654
azerupi wants to merge 2 commits into
ros2-rust:mainfrom
azerupi:pr/2-tokio-executor

azerupi commented Jun 21, 2026 •

edited

Loading

Uh oh!

balthasarschuess commented Jun 22, 2026

Uh oh!

azerupi commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

azerupi commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

add an event-driven multi-threaded Tokio executor

Why

How it works

The spin contract

Coalescing

Feature flag

Performance

Known limitations

Actions

High-rate timers

Simulation time

Uh oh!

balthasarschuess commented Jun 22, 2026

Uh oh!

azerupi commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

azerupi commented Jun 21, 2026 •

edited

Loading