Add support for using CCCL for CUDA buffers by vyasr · Pull Request #641 · rapidsai/ucxx

vyasr · 2026-04-29T00:52:23Z

This PR adds support for using CCCL instead of rmm for handling device memory.

copy-pr-bot · 2026-04-29T00:52:26Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Introduces cpp/cmake/thirdparty/get_cccl.cmake using rapids_cpm_cccl() and adds UCXX_ENABLE_CCCL option to cpp/CMakeLists.txt with conditional compile definitions, CCCL link, CUDA language enable, and buffer_cccl.cu source registration. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Adds BufferType::CCCL enum value and CCCLBuffer class to buffer.h using the PIMPL pattern (opaque CCCLBufferImpl* in header, full definition in buffer_cccl.cu). Implementation uses cuda::buffer<cuda::std::byte, cuda::mr::device_accessible> with cudaFree(0) context initialization guard. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Adds CCCL branch to allocateBuffer() factory in buffer.cpp and adds UCXX_ENABLE_CCCL/#elif UCXX_ENABLE_RMM/#else dispatch to request_tag_multi.cpp so isCUDA allocations prefer CCCLBuffer when CCCL is enabled. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Mirrors the existing RMM parameterized test suite for CCCLBuffer: TestType, TestSize, TestData, TestThrowAfterRelease — all guarded by UCXX_ENABLE_CCCL. 36 tests total (12 Host + 12 RMM + 12 CCCL). Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Adds CcclBufferInterfaceBase and 4 CCCL memory type variants (cccl-device, cccl-shared, cccl-cuda-async, cccl-cuda-async-managed) to buffer_interface.hpp, UCXX_BENCHMARKS_ENABLE_CCCL CMake option, and isCcclMemoryType() dispatch in perftest.cpp. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Updates basic.cpp example with CCCL buffer type parsing and allocation. Adds CCCLBuffer Cython class declaration (ucxx_api.pxd), _CCCLBufferWrapper, _get_cccl_buffer(), _cccl_am_allocator(), and UCXWorker AM allocator registration under IF UCXX_ENABLE_CCCL (libucxx.pyx). Propagates UCXX_ENABLE_CCCL flag to Cython compilation in both CMakeLists.txt files. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Adds CCCL Memory Support subsection to the benchmark documentation showing cccl-device, cccl-shared, cccl-cuda-async, and cccl-cuda-async-managed memory type examples and requirements. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

- Use std::unique_ptr<CCCLBufferImpl> internally (RAII-safe) - Destructor becomes = default (compiler-generated via unique_ptr) - Constructor uses std::make_unique (exception-safe) - release() uses _impl.release() to return raw CCCLBufferImpl* (preserves C/Cython ABI -- callers cannot consume unique_ptr) - Add Doxygen @code examples to constructor, release(), data() - Add 6 explanatory comments at intentional CCCL/RMM deviation points: PIMPL pattern rationale, cudaFree(0) context init, static_cast necessity, CCCL priority over RMM for AM allocator, no Python DeviceBuffer equivalent, nullptr stream in benchmarks

CCCLBufferImpl is an internal type that requires nvcc to instantiate; it should not be part of the public C++ or Cython API. No external consumer (Python, benchmarks, production code) calls release() on CCCLBuffer — only tests used it for cleanup. RMMBuffer keeps its release() because rmm.DeviceBuffer.c_from_unique_ptr() is a real consumer in the RMM Python ecosystem. The PIMPL pattern is preserved (forward declaration + unique_ptr member) as the compilation firewall still serves a purpose. Only the public methods that leaked the impl type are removed.

vyasr · 2026-04-29T00:54:30Z

/ok to test

vyasr · 2026-04-29T16:18:15Z

/ok to test

CCCL support is now assumed present when building the Python library, matching the RMM pattern. Remove IF UCXX_ENABLE_CCCL guards and the CMake flag propagation that was required to define them.

Add UCXX_ENABLE_CCCL to build.sh matching the RMM pattern. Set CUDA architectures via rapids_cuda_set_architectures(RAPIDS) when CCCL is enabled, fixing conda builds that lack a GPU for native detection.

vyasr · 2026-04-29T16:55:18Z

/ok to test

pentschev · 2026-04-29T16:42:57Z

+    assert(cudaMemcpy(host_buffer.data(), buffer->data(), buffer->getSize(), cudaMemcpyDefault) ==
+           cudaSuccess);


Why drop the async memcpy API?

pentschev · 2026-04-29T16:53:04Z

Do we really need a .cu file? I would very much like to keep UCXX free from requiring nvcc.

pentschev · 2026-04-29T16:57:59Z

+#if UCXX_ENABLE_CCCL
+    auto buffer = std::dynamic_pointer_cast<ucxx::CCCLBuffer>(_buffer);
+    ASSERT_EQ(buffer->getType(), _type);
+    return;  // CCCLBuffer does not expose release(); post-release assertions do not apply


buffer_cccl only uses host APIs (cudaFree, cuda::buffer) and CCCL headers conditionally define __host__/__device__ to nothing outside nvcc. Rename .cu to .cpp, replace enable_language(CUDA) and rapids_cuda_set_architectures with find_package(CUDAToolkit) and CUDA::cudart, eliminating the CUDA_ARCHITECTURES requirement.

vyasr · 2026-04-29T17:21:30Z

/ok to test

vyasr self-assigned this Apr 29, 2026

vyasr added feature request New feature or request non-breaking Introduces a non-breaking change labels Apr 29, 2026

vyasr and others added 9 commits April 28, 2026 17:52

vyasr force-pushed the feat/cccl_buffer branch from 3fe361e to 5bfc3d5 Compare April 29, 2026 00:52

vyasr added 2 commits April 29, 2026 09:47

Remove CCCL Cython compile-time conditionals

ee8ecbf

CCCL support is now assumed present when building the Python library, matching the RMM pattern. Remove IF UCXX_ENABLE_CCCL guards and the CMake flag propagation that was required to define them.

Enable CCCL in build scripts and set CUDA architectures

b736e74

Add UCXX_ENABLE_CCCL to build.sh matching the RMM pattern. Set CUDA architectures via rapids_cuda_set_architectures(RAPIDS) when CCCL is enabled, fixing conda builds that lack a GPU for native detection.

vyasr force-pushed the feat/cccl_buffer branch from b14651b to b736e74 Compare April 29, 2026 16:55

pentschev reviewed Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for using CCCL for CUDA buffers#641

Add support for using CCCL for CUDA buffers#641
vyasr wants to merge 12 commits intorapidsai:mainfrom
vyasr:feat/cccl_buffer

vyasr commented Apr 29, 2026

Uh oh!

copy-pr-bot Bot commented Apr 29, 2026

Uh oh!

vyasr commented Apr 29, 2026

Uh oh!

vyasr commented Apr 29, 2026

Uh oh!

vyasr commented Apr 29, 2026

Uh oh!

pentschev Apr 29, 2026

Uh oh!

pentschev Apr 29, 2026

Uh oh!

pentschev Apr 29, 2026

Uh oh!

vyasr commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		assert(cudaMemcpy(host_buffer.data(), buffer->data(), buffer->getSize(), cudaMemcpyDefault) ==
		cudaSuccess);

Conversation

vyasr commented Apr 29, 2026

Uh oh!

copy-pr-bot Bot commented Apr 29, 2026

Uh oh!

vyasr commented Apr 29, 2026

Uh oh!

vyasr commented Apr 29, 2026

Uh oh!

vyasr commented Apr 29, 2026

Uh oh!

pentschev Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

pentschev Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

pentschev Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

vyasr commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants