Add cuda_buffer_backend and torch_buffer_backend for rosidl::Buffer by yuanknv · Pull Request #1 · ros2/rosidl_buffer_backends

yuanknv · 2026-04-07T07:30:21Z

Description

This pull request adds CUDA and PyTorch buffer backend implementations for the rosidl::Buffer, enabling zero-copy GPU memory sharing between ROS 2 publishers and subscribers .

CUDA buffer backend: Enables zero-copy GPU data transport with fully asynchronous - data could stay on the GPU accorss ROS nodes. allocate_msg allocates from a CUDA Virtual Memory Management (VMM) based IPC memory pool; each block carries a pre-exported POSIX FD for zero-overhead IPC reuse. from_buffer returns a WriteHandle/ReadHandle that manages GPU stream ordering via CUDA events (no cudaStreamSynchronize in the pipeline). On transmit, the plugin checks locality via a shared-memory endpoint registry: for same-host same-GPU peers, it sends the block's FD over a Unix socket and an IPC event handle for cross-process GPU sync; otherwise it falls back to CPU serialization. On receive, the block is imported and mapped (cached per source block), with a shared-memory refcount and UID validation to prevent stale reuse. A background recycler thread handles event synchronization and block reclamation off the callback thread.

Torch Buffer Backend: A device-agnostic layer on top of device buffer backends (e.g. cuda_buffer_backend) that lets users work with torch::Tensor directly. allocate_msg creates a TorchBufferImpl wrapping a buffer with tensor metadata (shape, strides, dtype); the device is auto-detected at compile time - if no accelerated buffer backend is installed, falls back to CPU. from_buffer returns a torch::Tensor view backed by the device buffer's handle (write or read, captured in the tensor deleter for event lifetime safety). to_buffer copies a pre-existing torch tensor into the allocated buffer. On transmit, the TorchBufferDescriptor carries tensor metadata alongside a nested device_data field that RMW serializes via whichever device backend plugin is registered.

This pull request consists of the following key components:

cuda_buffer: Core CUDA buffer library providing a VMM-backed CUDA IPC memory pool, a host endpoint manager for locality discovery over shared memory, and user-facing allocate_msg/from_buffer/to_buffer APIs with RAII CUDA event based GPU synchronization (ReadHandle/WriteHandle).
cuda_buffer_backend: BufferBackend plugin registered via pluginlib. Handles endpoint discovery, CudaBufferDescriptor serialization with VMM IPC handles, IPC refcount lifecycle, and automatic CPU fallback when CUDA IPC is unavailable.
cuda_buffer_backend_msgs: ROS 2 message definition for CudaBufferDescriptor.
torch_buffer: PyTorch buffer library wrapping device buffers with tensor metadata (shape, strides, dtype). Provides allocate_msg/from_buffer/to_buffer APIs that auto-detect device backend at compile time.
torch_buffer_backend: BufferBackend plugin for PyTorch tensors. Handles TorchBufferDescriptor serialization with nested device buffer delegation.
torch_buffer_backend_msgs: ROS 2 message definition for TorchBufferDescriptor.

Is this user-facing behavior change?

No.

Did you use Generative AI?

Yes. Claude (claude-4.6-opus) via Cursor was used to assist with creating an initial prototype version of the changes contained in this PR.

Additional Information

This PR is part of the broader ROS 2 native buffer feature introduced in this post.

ahcorde · 2026-04-08T14:17:12Z

cuda_buffer_backend/cuda_buffer/CMakeLists.txt

@@ -0,0 +1,84 @@
+cmake_minimum_required(VERSION 3.8)


Suggested change

cmake_minimum_required(VERSION 3.8)

cmake_minimum_required(VERSION 3.20)

ahcorde · 2026-04-08T14:17:28Z

torch_buffer_backend/torch_buffer_backend_msgs/CMakeLists.txt

@@ -0,0 +1,24 @@
+cmake_minimum_required(VERSION 3.8)


Suggested change

cmake_minimum_required(VERSION 3.8)

cmake_minimum_required(VERSION 3.20)

ahcorde · 2026-04-08T14:17:36Z

torch_buffer_backend/torch_buffer_backend_msgs/package.xml

+<?xml-model href="http://download.ros.org/schema/package_format3.xsd" schematypens="http://www.w3.org/2001/XMLSchema"?>
+<package format="3">
+  <name>torch_buffer_backend_msgs</name>
+  <version>0.1.0</version>


Suggested change

<version>0.1.0</version>

<version>0.0.0</version>

ahcorde · 2026-04-08T14:17:47Z

cuda_buffer_backend/cuda_buffer/package.xml

+<?xml-model href="http://download.ros.org/schema/package_format3.xsd" schematypens="http://www.w3.org/2001/XMLSchema"?>
+<package format="3">
+  <name>cuda_buffer</name>
+  <version>0.1.0</version>


Suggested change

<version>0.1.0</version>

<version>0.0.0</version>

ahcorde · 2026-04-08T14:18:16Z

cuda_buffer_backend/cuda_buffer/package.xml

+  <description>CUDA buffer implementation (CudaBuffer, CudaBufferImpl, CUDAMemoryPool, IPC)
+  for the ROS2 Buffer backend system. Contains both headers and compiled sources
+  for IPC manager and host endpoint manager.</description>
+


missing <author> tag.

ahcorde · 2026-04-08T14:47:15Z

torch_buffer_backend/torch_buffer/include/torch_buffer/torch_buffer_api.hpp

+  at::ScalarType dtype = at::kByte)
+{
+  if (buffer.empty()) {return {};}
+  const auto * impl = static_cast<const TorchBufferImpl<uint8_t> *>(buffer.get_impl());


Suggested change

const auto * impl = static_cast<const TorchBufferImpl<uint8_t> *>(buffer.get_impl());

const auto * impl = detail::get_torch_impl<uint8_t>(buffer);

ahcorde · 2026-04-08T14:48:10Z

torch_buffer_backend/torch_buffer_backend/include/torch_buffer_backend/torch_buffer_backend.hpp

+    const auto * torch_impl =
+      static_cast<const torch_buffer_backend::TorchBufferImpl<uint8_t> *>(impl);


Suggested change

const auto * torch_impl =

static_cast<const torch_buffer_backend::TorchBufferImpl<uint8_t> *>(impl);

const auto * torch_impl = dynamic_cast<const torch_buffer_backend::TorchBufferImpl<uint8_t> *>(

static_cast<const rosidl::BufferImplBase<uint8_t> *>(impl));

if (!torch_impl) {

return nullptr;

}

ahcorde · 2026-04-08T14:50:09Z

torch_buffer_backend/torch_buffer_backend/include/torch_buffer_backend/torch_buffer_backend.hpp

+  {
+    (void)endpoint_info;
+    (void)existing_endpoints;
+    (void)endpoint_supported_backends;


check if backend exists in endpoint_supported_backends ?

ahcorde · 2026-04-08T14:51:58Z

torch_buffer_backend/torch_buffer/include/torch_buffer/torch_buffer_impl.hpp

+
+  std::unique_ptr<rosidl::BufferImplBase<T>> to_cpu() const override
+  {
+    if (device_buffer_.empty()) {return nullptr;}


Consistent with CudaBufferImpl::to_cpu()

Suggested change

if (device_buffer_.empty()) {return nullptr;}

if (device_buffer_.empty()) {return std::make_unique<rosidl::CpuBufferImpl<T>>();}

ahcorde · 2026-04-08T14:56:34Z

torch_buffer_backend/torch_buffer/include/torch_buffer/torch_buffer_api.hpp

+
+  int64_t numel = 1;
+  for (auto s : shape) {
+    numel *= s;


Suggested change

numel *= s;

if (s < 0) {

throw std::runtime_error(

"allocate_msg: negative shape dimension (" + std::to_string(s) + ")");

}

numel *= s;

ahcorde · 2026-04-08T21:05:50Z

cuda_buffer_backend/cuda_buffer/CMakeLists.txt

+find_package(cuda_buffer_backend_msgs REQUIRED)
+find_package(rmw REQUIRED)
+find_package(rcutils REQUIRED)
+find_package(CUDAToolkit REQUIRED)


we need to include this dependency in the package.xml, the requirement it's probrably nvcc ?
is this key enough https://github.com/ros/rosdistro/blob/master/rosdep/base.yaml#L8367C1-L8367C12 ?

ahcorde · 2026-04-08T21:06:28Z

cuda_buffer_backend/cuda_buffer/package.xml

+  <depend>rcutils</depend>
+  <depend>rmw</depend>
+  <depend>rosidl_buffer</depend>
+


Suggested change

<depend>nvidia-cuda</depend>

ahcorde · 2026-04-08T21:07:57Z

torch_buffer_backend/torch_buffer/CMakeLists.txt

+  set(CMAKE_CUDA_HOST_COMPILER "${CMAKE_CXX_COMPILER}")
+endif()
+
+find_package(Torch REQUIRED)


Not sure about this one. is there a ubuntu package fot this one ?

wget https://download.pytorch.org/libtorch/nightly/cpu/libtorch-shared-with-deps-latest.zip unzip libtorch-shared-with-deps-latest.zip

then I used -DCMAKE_PREFIX_PATH=<path to pytorch>

I finally used version 11.8, but nvcc is 12.0 in ubuntu

https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Bcu118.zip

Maybe we should think if it's worth it to add a vendor package to install libtorch

@mjcarroll

ahcorde

I also detected some linters failures

and this is not passing

test_cuda_image_cpu_fallback_fastrtps_launch with this error:

6: FAIL: test_cpu_fallback_paths (cuda_buffer_backend.TestCudaImageCpuFallbackFastRTPS.test_cpu_fallback_paths)
6: Test all CPU fallback paths and normal IPC simultaneously over FastRTPS.
6: ----------------------------------------------------------------------
6: Traceback (most recent call last):
6:   File "/tmp/ws/src/rosidl_buffer_backends/cuda_buffer_backend/cuda_buffer_backend/test/test_cuda_image_cpu_fallback_fastrtps_launch.py", line 203, in test_cpu_fallback_paths
6:     self.assertTrue(
6: AssertionError: False is not true : Cross-device fallback validation failed (expected backend="cpu")

yuanknv added 3 commits April 7, 2026 00:14

initial implementation of cuda_buffer_backend and torch_buffer_backend

0b2e189

clean up

ab2d00f

bug fix

d8b838f

ahcorde requested changes Apr 8, 2026

View reviewed changes

	cmake_minimum_required(VERSION 3.8)
	cmake_minimum_required(VERSION 3.20)

	const auto * impl = static_cast<const TorchBufferImpl<uint8_t> *>(buffer.get_impl());
	const auto * impl = detail::get_torch_impl<uint8_t>(buffer);

		const auto * torch_impl =
		static_cast<const torch_buffer_backend::TorchBufferImpl<uint8_t> *>(impl);

	if (device_buffer_.empty()) {return nullptr;}
	if (device_buffer_.empty()) {return std::make_unique<rosidl::CpuBufferImpl<T>>();}

-    numel *= s;
+    if (s < 0) {
+      throw std::runtime_error(
+        "allocate_msg: negative shape dimension (" + std::to_string(s) + ")");
+    }
+    numel *= s;

Conversation

yuanknv commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Is this user-facing behavior change?

Did you use Generative AI?

Additional Information

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahcorde left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuanknv commented Apr 7, 2026 •

edited

Loading