Add cuda_buffer_backend and torch_buffer_backend for rosidl::Buffer#1
Add cuda_buffer_backend and torch_buffer_backend for rosidl::Buffer#1
Conversation
| @@ -0,0 +1,84 @@ | |||
| cmake_minimum_required(VERSION 3.8) | |||
There was a problem hiding this comment.
| cmake_minimum_required(VERSION 3.8) | |
| cmake_minimum_required(VERSION 3.20) |
| @@ -0,0 +1,24 @@ | |||
| cmake_minimum_required(VERSION 3.8) | |||
There was a problem hiding this comment.
| cmake_minimum_required(VERSION 3.8) | |
| cmake_minimum_required(VERSION 3.20) |
| <?xml-model href="http://download.ros.org/schema/package_format3.xsd" schematypens="http://www.w3.org/2001/XMLSchema"?> | ||
| <package format="3"> | ||
| <name>torch_buffer_backend_msgs</name> | ||
| <version>0.1.0</version> |
There was a problem hiding this comment.
| <version>0.1.0</version> | |
| <version>0.0.0</version> |
| <?xml-model href="http://download.ros.org/schema/package_format3.xsd" schematypens="http://www.w3.org/2001/XMLSchema"?> | ||
| <package format="3"> | ||
| <name>cuda_buffer</name> | ||
| <version>0.1.0</version> |
There was a problem hiding this comment.
| <version>0.1.0</version> | |
| <version>0.0.0</version> |
| <description>CUDA buffer implementation (CudaBuffer, CudaBufferImpl, CUDAMemoryPool, IPC) | ||
| for the ROS2 Buffer backend system. Contains both headers and compiled sources | ||
| for IPC manager and host endpoint manager.</description> | ||
|
|
| at::ScalarType dtype = at::kByte) | ||
| { | ||
| if (buffer.empty()) {return {};} | ||
| const auto * impl = static_cast<const TorchBufferImpl<uint8_t> *>(buffer.get_impl()); |
There was a problem hiding this comment.
| const auto * impl = static_cast<const TorchBufferImpl<uint8_t> *>(buffer.get_impl()); | |
| const auto * impl = detail::get_torch_impl<uint8_t>(buffer); |
| const auto * torch_impl = | ||
| static_cast<const torch_buffer_backend::TorchBufferImpl<uint8_t> *>(impl); |
There was a problem hiding this comment.
| const auto * torch_impl = | |
| static_cast<const torch_buffer_backend::TorchBufferImpl<uint8_t> *>(impl); | |
| const auto * torch_impl = dynamic_cast<const torch_buffer_backend::TorchBufferImpl<uint8_t> *>( | |
| static_cast<const rosidl::BufferImplBase<uint8_t> *>(impl)); | |
| if (!torch_impl) { | |
| return nullptr; | |
| } |
| { | ||
| (void)endpoint_info; | ||
| (void)existing_endpoints; | ||
| (void)endpoint_supported_backends; |
There was a problem hiding this comment.
check if backend exists in endpoint_supported_backends ?
|
|
||
| std::unique_ptr<rosidl::BufferImplBase<T>> to_cpu() const override | ||
| { | ||
| if (device_buffer_.empty()) {return nullptr;} |
There was a problem hiding this comment.
Consistent with CudaBufferImpl::to_cpu()
| if (device_buffer_.empty()) {return nullptr;} | |
| if (device_buffer_.empty()) {return std::make_unique<rosidl::CpuBufferImpl<T>>();} |
|
|
||
| int64_t numel = 1; | ||
| for (auto s : shape) { | ||
| numel *= s; |
There was a problem hiding this comment.
| numel *= s; | |
| if (s < 0) { | |
| throw std::runtime_error( | |
| "allocate_msg: negative shape dimension (" + std::to_string(s) + ")"); | |
| } | |
| numel *= s; |
| find_package(cuda_buffer_backend_msgs REQUIRED) | ||
| find_package(rmw REQUIRED) | ||
| find_package(rcutils REQUIRED) | ||
| find_package(CUDAToolkit REQUIRED) |
There was a problem hiding this comment.
we need to include this dependency in the package.xml, the requirement it's probrably nvcc ?
is this key enough https://github.com/ros/rosdistro/blob/master/rosdep/base.yaml#L8367C1-L8367C12 ?
| <depend>rcutils</depend> | ||
| <depend>rmw</depend> | ||
| <depend>rosidl_buffer</depend> | ||
|
|
There was a problem hiding this comment.
| <depend>nvidia-cuda</depend> |
| set(CMAKE_CUDA_HOST_COMPILER "${CMAKE_CXX_COMPILER}") | ||
| endif() | ||
|
|
||
| find_package(Torch REQUIRED) |
There was a problem hiding this comment.
Not sure about this one. is there a ubuntu package fot this one ?
wget https://download.pytorch.org/libtorch/nightly/cpu/libtorch-shared-with-deps-latest.zip
unzip libtorch-shared-with-deps-latest.zip
then I used -DCMAKE_PREFIX_PATH=<path to pytorch>
There was a problem hiding this comment.
I finally used version 11.8, but nvcc is 12.0 in ubuntu
https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Bcu118.zip
Maybe we should think if it's worth it to add a vendor package to install libtorch
ahcorde
left a comment
There was a problem hiding this comment.
I also detected some linters failures
and this is not passing
test_cuda_image_cpu_fallback_fastrtps_launch with this error:
6: FAIL: test_cpu_fallback_paths (cuda_buffer_backend.TestCudaImageCpuFallbackFastRTPS.test_cpu_fallback_paths)
6: Test all CPU fallback paths and normal IPC simultaneously over FastRTPS.
6: ----------------------------------------------------------------------
6: Traceback (most recent call last):
6: File "/tmp/ws/src/rosidl_buffer_backends/cuda_buffer_backend/cuda_buffer_backend/test/test_cuda_image_cpu_fallback_fastrtps_launch.py", line 203, in test_cpu_fallback_paths
6: self.assertTrue(
6: AssertionError: False is not true : Cross-device fallback validation failed (expected backend="cpu")
Description
This pull request adds CUDA and PyTorch buffer backend implementations for the rosidl::Buffer, enabling zero-copy GPU memory sharing between ROS 2 publishers and subscribers .
CUDA buffer backend: Enables zero-copy GPU data transport with fully asynchronous - data could stay on the GPU accorss ROS nodes.
allocate_msgallocates from a CUDA Virtual Memory Management (VMM) based IPC memory pool; each block carries a pre-exported POSIX FD for zero-overhead IPC reuse.from_bufferreturns aWriteHandle/ReadHandlethat manages GPU stream ordering via CUDA events (nocudaStreamSynchronizein the pipeline). On transmit, the plugin checks locality via a shared-memory endpoint registry: for same-host same-GPU peers, it sends the block's FD over a Unix socket and an IPC event handle for cross-process GPU sync; otherwise it falls back to CPU serialization. On receive, the block is imported and mapped (cached per source block), with a shared-memory refcount and UID validation to prevent stale reuse. A background recycler thread handles event synchronization and block reclamation off the callback thread.Torch Buffer Backend: A device-agnostic layer on top of device buffer backends (e.g. cuda_buffer_backend) that lets users work with torch::Tensor directly.
allocate_msgcreates aTorchBufferImplwrapping a buffer with tensor metadata (shape, strides, dtype); the device is auto-detected at compile time - if no accelerated buffer backend is installed, falls back to CPU.from_bufferreturns a torch::Tensor view backed by the device buffer's handle (write or read, captured in the tensor deleter for event lifetime safety).to_buffercopies a pre-existing torch tensor into the allocated buffer. On transmit, the TorchBufferDescriptor carries tensor metadata alongside a nested device_data field that RMW serializes via whichever device backend plugin is registered.This pull request consists of the following key components:
cuda_buffer:Core CUDA buffer library providing a VMM-backed CUDA IPC memory pool, a host endpoint manager for locality discovery over shared memory, and user-facing allocate_msg/from_buffer/to_buffer APIs with RAII CUDA event based GPU synchronization (ReadHandle/WriteHandle).cuda_buffer_backend:BufferBackend plugin registered via pluginlib. Handles endpoint discovery, CudaBufferDescriptor serialization with VMM IPC handles, IPC refcount lifecycle, and automatic CPU fallback when CUDA IPC is unavailable.cuda_buffer_backend_msgs:ROS 2 message definition for CudaBufferDescriptor.torch_buffer:PyTorch buffer library wrapping device buffers with tensor metadata (shape, strides, dtype). Provides allocate_msg/from_buffer/to_buffer APIs that auto-detect device backend at compile time.torch_buffer_backend:BufferBackend plugin for PyTorch tensors. Handles TorchBufferDescriptor serialization with nested device buffer delegation.torch_buffer_backend_msgs:ROS 2 message definition for TorchBufferDescriptor.Is this user-facing behavior change?
No.
Did you use Generative AI?
Yes. Claude (claude-4.6-opus) via Cursor was used to assist with creating an initial prototype version of the changes contained in this PR.
Additional Information
This PR is part of the broader ROS 2 native buffer feature introduced in this post.