A lightweight, C++23 coroutine library for embedded systems and resource-constrained environments. Built with stack-based allocation to avoid heap usage and designed to fit within a single cache line for optimal performance.
Caution
🚧 This project is still under construction! 🚧
#include <cassert>
#include <chrono>
#include <coroutine>
#include <print>
#include <thread>
import async_context;
using namespace std::chrono_literals;
// Simulates reading sensor data with I/O delay
async::future<int> read_sensor(async::context& ctx, std::string_view p_name)
{
std::println("['{}': Sensor] Starting read...", p_name);
co_await ctx.block_by_io(); // Simulate I/O operation
std::println("['{}': Sensor] Read complete: 42", p_name);
co_return 42;
}
// Processes data with computation delay
async::future<int> process_data(async::context& ctx,
std::string_view p_name,
int value)
{
std::println("['{}': Process] Processing {}...", p_name, value);
co_await 10ms; // Simulate processing time
int result = value * 2;
std::println("['{}': Process] Result: {}", p_name, result);
co_return result;
}
// Writes result with I/O delay
async::future<void> write_actuator(async::context& ctx,
std::string_view p_name,
int value)
{
std::println("['{}': Actuator] Writing {}...", p_name, value);
co_await ctx.block_by_io();
std::println("['{}': Actuator] Write complete!", p_name);
}
// Coordinates the full pipeline
async::future<void> sensor_pipeline(async::context& ctx,
std::string_view p_name)
{
std::println("Pipeline '{}' starting...", p_name);
int sensor_value = co_await read_sensor(ctx, p_name);
int processed = co_await process_data(ctx, p_name, sensor_value);
co_await write_actuator(ctx, p_name, processed);
std::println("Pipeline '{}' complete!\n", p_name);
}
int main()
{
// Create context and add them to the scheduler
basic_context<8192> ctx1(scheduler);
basic_context<8192> ctx2(scheduler);
// Run two independent pipelines concurrently
auto pipeline1 = sensor_pipeline(ctx1, "🌟 System 1");
auto pipeline2 = sensor_pipeline(ctx2, "🔥 System 2");
// Round robin between each context
while (true) {
bool all_done = true;
for (auto& ctx : std::to_array({&ctx1, &ctx2}) {
if (not ctx->done()) {
all_done = false;
if (ctx->state() == async::blocked_by::nothing) {
ctx->resume();
}
if (ctx->state() == async::blocked_by::time) {
std::this_thread::sleep(ctx.pending_delay());
ctx.unblock();
}
}
}
if (all_done) {
break;
}
}
assert(pipeline1.done());
assert(pipeline2.done());
std::println("Both pipelines completed successfully!");
return 0;
}Output:
Pipeline '🌟 System 1' starting...
['🌟 System 1': Sensor] Starting read...
Pipeline '🔥 System 2' starting...
['🔥 System 2': Sensor] Starting read...
['🌟 System 1': Sensor] Read complete: 42
['🌟 System 1': Process] Processing 42...
['🔥 System 2': Sensor] Read complete: 42
['🔥 System 2': Process] Processing 42...
['🌟 System 1': Process] Result: 84
['🌟 System 1': Actuator] Writing 84...
['🔥 System 2': Process] Result: 84
['🔥 System 2': Actuator] Writing 84...
['🌟 System 1': Actuator] Write complete!
Pipeline '🌟 System 1' complete!
['🔥 System 2': Actuator] Write complete!
Pipeline '🔥 System 2' complete!
Both pipelines completed successfully!
- Stack-based coroutine allocation - No heap allocations; coroutine frames are allocated from a user-provided stack buffer
- Cache-line optimized - Context object fits within
std::hardware_constructive_interference_size(typically 64 bytes) - Blocking state tracking - Built-in support for time, I/O, sync, and external blocking states
- Scheduler integration - Virtual
do_schedule()method allows custom scheduler implementations - Proxy contexts - Support for supervised coroutines with timeout capabilities
- Exception propagation - Proper exception handling through the coroutine chain
- Cancellation support - Clean cancellation with RAII-based resource cleanup
- C++23 compiler with coroutine support
- Tested with Clang 20+
- Usage of C++20 modules
Unlike typical coroutine implementations that allocate frames on the heap,
async_context uses a stack-based allocation scheme. Each context owns a
contiguous buffer of memory that grows upward as coroutines are called.
┌─────────────────────────────┐ Address 0
│ &context::m_stack_pointer │
├─────────────────────────────┤
│ Coroutine Frame A │
│ (promise + locals) │
| (96 B) │
├─────────────────────────────┤
│ &context::m_stack_pointer │
├─────────────────────────────┤
│ Coroutine Frame B │
| (192 B) │
│ (promise + locals) │
│ │
│ │
│ │
├─────────────────────────────┤
│ &context::m_stack_pointer │
├─────────────────────────────┤
│ Coroutine Frame C │
| (128 B) │
│ (promise + locals) │
│ │
├─────────────────────────────┤
│ Unused Memory │ <-- context::m_stack_pointer
│ │
│ │
│ │
│ │
│ │
│ │
│ │
└─────────────────────────────┘ Address N (bytes of stack memory)
-
Allocation: When a coroutine is created, the promise's
operator newrequests memory from the context. The context:- Stores the address of
m_stack_pointerat the current position - Returns the next address as the coroutine frame location
- Advances
m_stack_pointerpast the allocated frame
- Stores the address of
-
Deallocation: When a coroutine completes,
operator delete:- Reads the stored
&m_stack_pointerfrom just before the frame - Resets
m_stack_pointerback to that position
- Reads the stored
This creates a strict LIFO stack where coroutines must complete in reverse
order of their creation, which naturally matches how co_await chains
work.
- No heap allocation on frame creation: Ideal for embedded systems without dynamic memory
- Deterministic: Memory usage is bounded by the stack buffer size
- Cache-friendly: Coroutine frames are contiguous in memory
- Fast: Simple pointer arithmetic instead of malloc/free
The base context class that manages coroutine execution and memory. Derived classes must:
- Provide stack memory via
initialize_stack_memory(), preferably within the constructor. - Implement
do_schedule()to handle blocking state notifications
A coroutine return type containing either a value, asynchronous operation, or
an std::exception_ptr. If this object is contains a coroutine handle, then it
the future must be resumed until the future object is converted into the value
of type T.
- Synchronous returns (no coroutine frame allocation)
co_awaitfor composing asynchronous operationsco_returnfor returning values- Move semantics (non-copyable)
An alias for async::future<void> - an async operation with no return value.
An enum describing what a coroutine is blocked by:
nothing- Ready to runio- Blocked by I/O operationsync- Blocked by resource contention (mutex, semaphore)external- Blocked by external coroutine systemtime- Blocked until a duration elapses
The state of this can be found from the async::context::state(). All states
besides time are safe to resume at any point. If a context has been blocked by
time, then it must defer calling resume until that time has elapsed.
import async_context;
async::future<int> compute(async::context& p_ctx) {
co_return 42;
}async::future<void> delay_example(async::context& p_ctx) {
using namespace std::chrono_literals;
co_await 100ms; // Request the scheduler resume this coroutine >= 100ms
co_return;
}async::future<void> io_example(async::context& p_ctx) {
dma_controller.on_completion([&p_ctx]() {
p_ctx.unblock();
});
// Start DMA transaction...
while (!dma_complete) {
co_await p_ctx.block_by_io();
}
co_return;
}Please note that this coroutine has a loop where it continually reports that
its blocked by IO. It is important that any coroutine blocking by IO check if
the IO has completed before proceeding. If not, it must
co_await ctx.block_by_io(); at some point to give control back to the resumer.
async::future<int> inner(async::context& p_ctx) {
co_return 10;
}
async::future<int> outer(async::context& p_ctx) {
int value = co_await inner(p_ctx);
co_return value * 2;
}class my_context : public async::context {
public:
std::array<async::uptr, 1024> m_stack{};
my_context() {
initialize_stack_memory(m_stack);
}
~my_context() {
// ‼️ The most derived context must call cancel in its destructor
cancel();
// If memory was allocated, deallocate it here...
}
private:
void do_schedule(async::blocked_by p_state,
async::block_info p_info) noexcept override {
// Notify your scheduler of state changes
}
};In order to create a usable custom context, the stack memory must be
initialized with a call to initialize_stack_memory(span) with a span to the
memory for the stack. There is no requirements of where this memory comes from
except that it be a valid source. Such sources can be array thats member of
this object, dynamically allocated memory that the context has sole ownership
of, or it can point to statically allocated memory that it has sole control and
ownership over.
The custom context must call cancel() before deallocating the stack memory.
Once cancel completes, the stack memory may be deallocated.
class simple_context : public async::basic_context {
public:
std::array<async::uptr, 8192> m_stack{};
simple_context() {
initialize_stack_memory(m_stack);
}
~simple_context() {
cancel();
}
};
simple_context ctx;
auto future = my_coroutine(ctx);
ctx.sync_wait([](async::sleep_duration p_sleep_time) {
std::this_thread::sleep_for(p_sleep_time);
});Exceptions thrown in coroutines are propagated through the coroutine chain
until it reaches the top level coroutine. When the top level is reached, the
exception will be thrown from a call to .resume().
async::future<void> may_throw(async::context& p_ctx) {
throw std::runtime_error("error");
co_return;
}
async::future<void> just_calls(async::context& p_ctx) {
co_await may_throw(p_ctx);
co_return;
}
simple_context ctx;
auto future = may_throw(ctx);
try {
future.resume();
} catch (const std::runtime_error& e) {
// Handle exception
}Operation stacking is when you load an additional operation into a coroutine's stack memory before finishing the previous operation. This results in a memory leak where the previous coroutines frame is no longer accessible and cannot be deallocated, permanently reducing the memory of the context and preserving the lifetime of the objects held within. It is UB to allow the context to be destroyed at this point.
my_context ctx;
auto future1 = async_op1(ctx); // ✅ Okay, may create some objects on its stack
auto future2 = async_op2(ctx); // ❌ Memory leak! Don't do this
// UB to allow ctx to be destroyed at this point 😱async::future<int> supervised(async::context& p_ctx) {
auto proxy = async::proxy_context::from(p_ctx);
auto child_future = child_coroutine(proxy);
int timeout = 10;
while (!child_future.done() && timeout-- > 0) {
child_future.resume();
co_await std::suspend_always{};
}
if (timeout <= 0) {
throw timed_out();
}
co_return child_future.value();
}async::proxy_context::from(): consumes the rest of the stack memory of the
context for itself. The original context's stack memory will be clamped to
where it was when it called the supervised function. The stack memory is
restored to the original context, once the proxy is destroyed. This prevents
the context from being used again and overwriting the memory of the stack.
Each coroutine frame is allowed to have at most 1 proxy on its stack. This allows for top down chaining of proxies as shown below. When a proxy blocks by something, that blocking state is communicated to the original context and its schedule function is executed. When the original context is resumed, it will execute from its active coroutine which contains a proxy. That coroutine may check for timeouts and resume its supervised future. Then that future may have yet another proxy which performs the same work of timeout detection as before and resumes the future it has supervision for. This continues on until the bottom is reached or a coroutine decides to cancel its future or exit via exception or normal return path.
This naturally gives the top most supervising coroutine priority to determine if its time frame has expired.
┌─────────────────────────────┐ Address 0
│ &context::m_stack_pointer │
├─────────────────────────────┤
│ Coroutine Frame A │
│ (promise + locals) │
│ (96 B) │
├─────────────────────────────┤
│ &context::m_stack_pointer │
├─────────────────────────────┤
│ Coroutine Frame B │
│ (64 B) │
│ (promise + locals) │
├─────────────────────────────┤
│ &context::m_stack_pointer │ <-- Origin Stack ends here
├─────────────────────────────┤ (m_stack_pointer shrunk to here)
│ Coroutine Frame C │
│ (128 B) │ Proxy-1 Stack begins
│ (promise + locals) │
│ │
├─────────────────────────────┤
│ &proxy1::m_stack_pointer │
├─────────────────────────────┤
│ Coroutine Frame D │
│ (80 B) │
│ (promise + locals) │
├─────────────────────────────┤
│ &proxy1::m_stack_pointer │
├─────────────────────────────┤
│ Coroutine Frame E │
│ (72 B) │
│ (promise + locals) │
├─────────────────────────────┤
│ &proxy1::m_stack_pointer │ <-- Proxy-1 Stack ends here
├─────────────────────────────┤
│ Coroutine Frame F │
│ (96 B) │ Proxy-2 Stack begins
│ (promise + locals) │
├─────────────────────────────┤
│ &proxy2::m_stack_pointer │
├─────────────────────────────┤
│ Coroutine Frame G │
│ (144 B) │
│ (promise + locals) │
│ │
├─────────────────────────────┤
│ &proxy2::m_stack_pointer │ <-- Proxy-2 Stack ends here
├─────────────────────────────┤
│ Coroutine Frame H │
│ (88 B) │ Proxy-3 Stack begins
│ (promise + locals) │
├─────────────────────────────┤
│ &proxy3::m_stack_pointer │ <-- Proxy-3 current position
├─────────────────────────────┤
│ Unused Memory │
│ │
│ │
│ │
│ │
│ │
│ │
└─────────────────────────────┘ Address N (bytes of stack memory)
Before getting started, if you haven't used libhal before, follow the Getting Started guide.
To create the library package call:
conan create . -pr hal/tc/llvm-20 -pr hal/os/mac --version=<insert-version>Replace mac with linux or windows if that is what you are building on.
This will build and run unit tests, benchmarks, and a test package to confirm that the package was built correctly.
To run tests on their own:
./build/Release/async_context_testsTo run the benchmarks on their own:
./build/Release/async_benchmarkWithin the CMakeList.txt, you can disable unit test or benchmarking by setting the following to OFF:
set(BUILD_UNIT_TESTS OFF)
set(BUILD_BENCHMARKS OFF)Apache License 2.0 - See LICENSE for details.
Copyright 2024 - 2025 Khalil Estell and the libhal contributors