feat: add an event-driven multi-threaded Tokio executor (Tokio Executor 2/3)#654
Draft
azerupi wants to merge 2 commits into
Draft
feat: add an event-driven multi-threaded Tokio executor (Tokio Executor 2/3)#654azerupi wants to merge 2 commits into
azerupi wants to merge 2 commits into
Conversation
Add an opt-in way for primitives to report readiness via rcl's push callbacks (rcl_*_set_on_new_*_callback) instead of being polled in a wait set, as the foundation for an event-driven executor. `RclPrimitive::register_on_ready` installs a callback that the middleware invokes when the entity becomes ready and returns an `OnReadyHandle` (RAII) that deregisters on drop. `OnReadyRegistration` wraps the unsafe rcl setter: it boxes the callback context for a stable address and, on drop, clears the callback before freeing the context (finalizing the rcl entity first) so the middleware can never invoke a freed context during teardown. Implemented for subscriptions, services, and clients. No executor consumes this yet, so the basic executor is unchanged.
Add a Tokio-based executor (opt-in via the `tokio-executor` feature, enabled by default) that learns entity readiness from the rcl push callbacks added in the previous commit instead of polling rcl_wait. Each Worker drains its own mailbox on a dedicated Tokio task, so one Worker's callbacks are serialized and ordered while independent Workers run concurrently across Tokio's thread pool — multi-core concurrency with no per-event task spawn. Subscriptions, services, and clients are driven by push callbacks; timers by tokio::time. Worker tasks are gated by spinning (callbacks only run while spinning, and spin() waits for in-flight callbacks before returning); spin() honors only_next_available_work (spin_once) and reports a timeout as a Timeout error, matching the basic executor. Notifications coalesce per entity to bound the mailbox, a panicking callback is contained rather than wedging the worker, and push-callback registrations finalize the rcl entity before freeing their context to avoid a teardown use-after-free. Opt out with `default-features = false` to drop the Tokio multi-threaded runtime and macros for a lighter build. Action support follows in a separate commit.
This was referenced Jun 21, 2026
39c7750 to
9b519ea
Compare
Contributor
|
First of all thanks for your effort and this PR gets me excited! I think the main point these PRs do not address yet is handling simulation time
A key difficulty is, that clock updates (imo) should not run inside the executor (as it does in cpp and simulation time sleeps in callbacks block single threaded executors and may block multi threaded executors), but at the same time we need some synchronization mechanism to make sure all jump callbacks ran before the clock continues and old messages are processed/discarded. |
Contributor
Author
|
Indeed, good call out. I didn't even attempt to support simulation time at the moment. That sounds like a deep rabbit hole I'm not totally ready to jump into yet. 😄 I've added it to the list of known limitations. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note: This is based on #653 , I was hoping I could target that PR as base branch but that seems to only work if the branches are in the ros2_rust repo and not in my fork.
This is part of a set of PRs that attempt to add an event-driven Tokio executor to rclrs. The goal is to have something that:
I tried to split the work up into multiple PRs that build on top of each other to make the reviewing easier.
add an event-driven multi-threaded Tokio executor
Why
The basic executor discovers work by polling
rcl_wait. This PR adds a second executor, built on Tokio, that learns about work from the push callbacks added in #653. There are two main motivations for this:Performance
Using the event interface from rcl we can get better performance with less CPU overhead.
Interoperability
Callbacks and tasks run on a real Tokio runtime, so you can use Tokio from inside a ROS callback: spawn a task, await a
tokio::timetimer, drive async I/O, or call a library built on Tokio.How it works
Each worker gets its own
tokio::mpscmailbox and a single Tokio task that drains it. A push callback firing in the middleware sends a message into the mailbox. The worker task takes it and runs the entity's callback against the worker's payload.One task per worker gives a few things for free. A Tokio task is never polled by two threads at once, so that task provides mutual exclusion and FIFO ordering within a worker without locking on the hot path. Tokio's scheduler provides the thread pool and work stealing, so different workers run on different threads automatically. There is no per-event task spawn: a single subscriber runs at full speed, and independent workers scale across cores without extra configuration.
Subscriptions, services, and clients are driven by push callbacks. Timers have no push callback, so each timer gets a small
tokio::timedriver task that sleeps until the next deadline and enqueues a tick on the worker's mailbox.The spin contract
Callbacks for ROS entities only run while
spin()is active. Worker tasks observe a gate and park when spinning stops. This preserves the rclrs contract that nothing runs before you spin, and that nothing is still running once you stop.spin()waits for any in-flight callback to finish before it returns.spin()honors the sameSpinOptionsas the basic executor. A zero or short timeout withonly_next_available_workdrains the currently available work and returns (the spin_once pattern). A timeout that elapses is reported as aTimeouterror rather than a silent return, matching the basic executor.Coalescing
While spinning is paused, or under a burst, the middleware can fire many notifications for the same entity. Each entity coalesces rather than letting the mailbox grow without bound. A notification only enqueues a mailbox message
if one is not already pending, and it accumulates the event count in a
pendingcounter. When the worker handles the entity it takes that many items.This bounds the mailbox to at most one pending message per entity, and it does not lose messages when spinning resumes.
Feature flag
The executor lives behind a
tokio-executorCargo feature, enabled by default. Opting out withdefault-features = falsedrops the Tokio multi-threaded runtime and macros for a lighter build, without affecting the basic executor.Performance
Numbers below were measured on a Ryzen 9 9900X (12 cores) under a continuous saturate workload: the executor spins on a background thread, a publisher thread publishes as fast as it can, and the main thread samples received count
in one-second windows.
Figures are the median of at least five reps, with threads pinned to a single CCD (publisher on core 0, executor on cores 1 to 5). Comparison points are rclcpp's SingleThreaded, MultiThreaded, and Events executors built and run the same way. All subscribers here are worker subscribers.
For a single subscriber the Tokio executor is competitive with the basic executor and ahead of every rclcpp executor. Throughput in messages per second:
At a fixed 10 kHz delivery rate with no loss, I observed the following CPU usage at 0 B
I think there are a lot of different benchmarks we could run here that each show different scenarios. I would encourage everyone to run their own benchmarks in order for us to find pathological cases or current limitations and potential performance optimizations for follow-up PRs.
Known limitations
I will keep a list of the known current limitations and major performance issues that have been identified so far.
Actions
In the current PR, Actions are not supported yet but I have a follow-up PR to fix that.
High-rate timers
High-rate timers are currently capped. For example a 1 kHz timer fires at roughly 600 Hz (measured). Lower rates such as 100 Hz are fine. Each timer is driven by its own task that uses tokio sleeps to the next deadline and then waits for the worker to run the callback before scheduling the next fire, so the rate is bounded by Tokio's timer granularity (around 1 ms) plus that per-fire round-trip.
To support high-rate timers we will most likely need to use something else than the tokio timers. But I'm leaving this for a follow-up PR.
Simulation time
The current PR does not make an attempt at supporting simulation time (see #654 (comment))