Skip to content

Add support for using CCCL for CUDA buffers#641

Draft
vyasr wants to merge 12 commits intorapidsai:mainfrom
vyasr:feat/cccl_buffer
Draft

Add support for using CCCL for CUDA buffers#641
vyasr wants to merge 12 commits intorapidsai:mainfrom
vyasr:feat/cccl_buffer

Conversation

@vyasr
Copy link
Copy Markdown
Contributor

@vyasr vyasr commented Apr 29, 2026

This PR adds support for using CCCL instead of rmm for handling device memory.

@vyasr vyasr self-assigned this Apr 29, 2026
@vyasr vyasr added feature request New feature or request non-breaking Introduces a non-breaking change labels Apr 29, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 29, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

vyasr and others added 9 commits April 28, 2026 17:52
Introduces cpp/cmake/thirdparty/get_cccl.cmake using rapids_cpm_cccl() and adds UCXX_ENABLE_CCCL option to cpp/CMakeLists.txt with conditional compile definitions, CCCL link, CUDA language enable, and buffer_cccl.cu source registration.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Adds BufferType::CCCL enum value and CCCLBuffer class to buffer.h using the PIMPL pattern (opaque CCCLBufferImpl* in header, full definition in buffer_cccl.cu). Implementation uses cuda::buffer<cuda::std::byte, cuda::mr::device_accessible> with cudaFree(0) context initialization guard.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Adds CCCL branch to allocateBuffer() factory in buffer.cpp and adds UCXX_ENABLE_CCCL/#elif UCXX_ENABLE_RMM/#else dispatch to request_tag_multi.cpp so isCUDA allocations prefer CCCLBuffer when CCCL is enabled.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Mirrors the existing RMM parameterized test suite for CCCLBuffer: TestType, TestSize, TestData, TestThrowAfterRelease — all guarded by UCXX_ENABLE_CCCL. 36 tests total (12 Host + 12 RMM + 12 CCCL).

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Adds CcclBufferInterfaceBase and 4 CCCL memory type variants (cccl-device, cccl-shared, cccl-cuda-async, cccl-cuda-async-managed) to buffer_interface.hpp, UCXX_BENCHMARKS_ENABLE_CCCL CMake option, and isCcclMemoryType() dispatch in perftest.cpp.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Updates basic.cpp example with CCCL buffer type parsing and allocation. Adds CCCLBuffer Cython class declaration (ucxx_api.pxd), _CCCLBufferWrapper, _get_cccl_buffer(), _cccl_am_allocator(), and UCXWorker AM allocator registration under IF UCXX_ENABLE_CCCL (libucxx.pyx). Propagates UCXX_ENABLE_CCCL flag to Cython compilation in both CMakeLists.txt files.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Adds CCCL Memory Support subsection to the benchmark documentation showing cccl-device, cccl-shared, cccl-cuda-async, and cccl-cuda-async-managed memory type examples and requirements.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- Use std::unique_ptr<CCCLBufferImpl> internally (RAII-safe)
- Destructor becomes = default (compiler-generated via unique_ptr)
- Constructor uses std::make_unique (exception-safe)
- release() uses _impl.release() to return raw CCCLBufferImpl*
  (preserves C/Cython ABI -- callers cannot consume unique_ptr)
- Add Doxygen @code examples to constructor, release(), data()
- Add 6 explanatory comments at intentional CCCL/RMM deviation points:
  PIMPL pattern rationale, cudaFree(0) context init, static_cast necessity,
  CCCL priority over RMM for AM allocator, no Python DeviceBuffer equivalent,
  nullptr stream in benchmarks
CCCLBufferImpl is an internal type that requires nvcc to instantiate;
it should not be part of the public C++ or Cython API. No external
consumer (Python, benchmarks, production code) calls release() on
CCCLBuffer — only tests used it for cleanup. RMMBuffer keeps its
release() because rmm.DeviceBuffer.c_from_unique_ptr() is a real
consumer in the RMM Python ecosystem.

The PIMPL pattern is preserved (forward declaration + unique_ptr member)
as the compilation firewall still serves a purpose. Only the public
methods that leaked the impl type are removed.
@vyasr vyasr force-pushed the feat/cccl_buffer branch from 3fe361e to 5bfc3d5 Compare April 29, 2026 00:52
@vyasr
Copy link
Copy Markdown
Contributor Author

vyasr commented Apr 29, 2026

/ok to test

1 similar comment
@vyasr
Copy link
Copy Markdown
Contributor Author

vyasr commented Apr 29, 2026

/ok to test

vyasr added 2 commits April 29, 2026 09:47
CCCL support is now assumed present when building the Python library,
matching the RMM pattern. Remove IF UCXX_ENABLE_CCCL guards and the
CMake flag propagation that was required to define them.
Add UCXX_ENABLE_CCCL to build.sh matching the RMM pattern. Set CUDA
architectures via rapids_cuda_set_architectures(RAPIDS) when CCCL is
enabled, fixing conda builds that lack a GPU for native detection.
@vyasr vyasr force-pushed the feat/cccl_buffer branch from b14651b to b736e74 Compare April 29, 2026 16:55
@vyasr
Copy link
Copy Markdown
Contributor Author

vyasr commented Apr 29, 2026

/ok to test

Comment thread cpp/examples/basic.cpp
Comment on lines +261 to +262
assert(cudaMemcpy(host_buffer.data(), buffer->data(), buffer->getSize(), cudaMemcpyDefault) ==
cudaSuccess);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why drop the async memcpy API?

Comment thread cpp/src/buffer_cccl.cpp
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a .cu file? I would very much like to keep UCXX free from requiring nvcc.

Comment thread cpp/tests/buffer.cpp
#if UCXX_ENABLE_CCCL
auto buffer = std::dynamic_pointer_cast<ucxx::CCCLBuffer>(_buffer);
ASSERT_EQ(buffer->getType(), _type);
return; // CCCLBuffer does not expose release(); post-release assertions do not apply
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not?

buffer_cccl only uses host APIs (cudaFree, cuda::buffer) and CCCL
headers conditionally define __host__/__device__ to nothing outside
nvcc. Rename .cu to .cpp, replace enable_language(CUDA) and
rapids_cuda_set_architectures with find_package(CUDAToolkit) and
CUDA::cudart, eliminating the CUDA_ARCHITECTURES requirement.
@vyasr
Copy link
Copy Markdown
Contributor Author

vyasr commented Apr 29, 2026

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants