Migrate RMM usage to CCCL MR design by bdice · Pull Request #940 · rapidsai/rapidsmpf

bdice · 2026-04-03T14:13:46Z

Summary

Rewrite RmmResourceAdaptor as a thin shell inheriting cuda::mr::shared_resource<detail::RmmResourceAdaptorImpl>, with all mutable state in the impl class for copyable shared ownership.
Replace device_memory_resource* with rmm::device_async_resource_ref for non-owning references and cuda::mr::any_resource for owning storage.
Remove rmm::mr::owning_wrapper usage (removed in RMM 26.06).
Update pool_memory_resource usage (no longer a template; upstream passed as device_async_resource_ref).
Replace set_current_device_resource(ptr) with set_current_device_resource_ref(ref).
Update test resources to satisfy CCCL resource concept (allocate_sync, deallocate_sync, operator==, get_property).
Update Cython bindings (.pxd/.pyx) to use device_async_resource_ref instead of device_memory_resource.
Point get_cucascade.cmake to local cuCascade source directory (companion cuCascade PR: Migrate RMM usage to CCCL MR design NVIDIA/cuCascade#98).

Notes

Depends on cuCascade migration: Migrate RMM usage to CCCL MR design NVIDIA/cuCascade#98
The get_cucascade.cmake change to use SOURCE_DIR is a development convenience and should be updated to point to the merged cuCascade commit before this PR is finalized.

RMM 26.06 removes rmm::mr::device_memory_resource base class and adopts CCCL-native memory resource concepts. This migrates rapidsmpf accordingly: - RmmResourceAdaptor rewritten as thin shell inheriting cuda::mr::shared_resource<detail::RmmResourceAdaptorImpl>, with all mutable state in the impl class for copyable shared ownership. - Replace device_memory_resource* with rmm::device_async_resource_ref for non-owning references and cuda::mr::any_resource for owning storage. - Remove rmm::mr::owning_wrapper usage. - pool_memory_resource is no longer a template; upstream passed as device_async_resource_ref. - Replace set_current_device_resource(ptr) with set_current_device_resource_ref(ref). - Update test resources to satisfy CCCL resource concept (allocate_sync, deallocate_sync, operator==, get_property). - Update Cython bindings to use device_async_resource_ref instead of device_memory_resource. - Point get_cucascade.cmake to local cuCascade source directory.

copy-pr-bot · 2026-04-03T14:13:51Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

nirandaperera · 2026-04-03T15:49:37Z

- * @return A owning memory resource, which must be kept alive.
+ * @return An owning memory resource, which must be kept alive.
 */
 [[nodiscard]] inline std::shared_ptr<rapidsmpf::RmmResourceAdaptor>


this could be a cuda::mr::shared_resource isnt it?

nirandaperera · 2026-04-03T15:58:15Z

+class RmmResourceAdaptor
+    : public cuda::mr::shared_resource<detail::RmmResourceAdaptorImpl> {


Do all rmm adapters follow this pattern now?
As opposed to, move RmmResourceAdaptorImpl into RmmResourceAdaptor, and when ever we want to use the adapter, do,

auto adapter = cuda::mr::make_shared_resource<RmmResourceAdaptor>(mr);

Really we have two types of memory resources:

Stateless: trivially copyable, movable, etc. Example: cuda_memory_resource{}.

Stateful: adaptors and/or nontrivial base resources. Examples: any adaptor (statistics, logging, pool) or cuda_async_memory_resource which holds a CUDA driver pool object.

So yes, all adaptors use this pattern, but it's not just adaptors, it's anything that has shared state. The way we handle stateful resources is PIMPL-like, with some nuances.

AdaptorImpl is a private/detail class that holds the upstream any_resource and/or adaptor state as members,

Adaptor is the public API, it uses cuda::mr::shared_resource which forwards allocation methods to the Impl class.

With that split, copying stateful resources is cheap (just the cost of copying a shared_ptr). Crucially, we don't actually use shared_ptr, we use cuda::mr::shared_resource<Impl> and that does the magic for us. It has its own shared-pointer-like implementation.

nirandaperera · 2026-04-03T16:19:34Z

+    // Manual comparison of optionals to avoid recursive constraint satisfaction in
+    // CCCL 3.2. std::optional::operator== triggers infinite concept checking when the
+    // wrapped type (rmm::device_async_resource_ref) inherits from CCCL's concept-based
+    // resource_ref.
+    // TODO: Revert this after the RMM resource ref types are replaced with
+    // plain cuda::mr ref types. This depends on
+    // https://github.com/rapidsai/rmm/issues/2011.


is this comment still valid?

Good catch, this can be removed now. I'm pretty sure this can be reduced to:

bool RmmResourceAdaptorImpl::operator==( RmmResourceAdaptorImpl const& other ) const noexcept { return this == std::addressof(other); }

See this example pattern.

nirandaperera · 2026-04-03T16:22:46Z

+    RmmResourceAdaptor mr(primary_mr);

-    EXPECT_THROW(mr.allocate_sync(8_MiB), rmm::out_of_memory);
+    EXPECT_THROW((void)mr.allocate_sync(8_MiB), rmm::out_of_memory);


nit

Suggested change

EXPECT_THROW((void)mr.allocate_sync(8_MiB), rmm::out_of_memory);

EXPECT_THROW(std::ignore = mr.allocate_sync(8_MiB), rmm::out_of_memory);

nirandaperera · 2026-04-03T16:24:01Z

+    RmmResourceAdaptor mr(primary_mr, fallback_mr);

-    EXPECT_THROW(mr.allocate_sync(2_MiB), std::logic_error);
+    EXPECT_THROW((void)mr.allocate_sync(2_MiB), std::logic_error);


Suggested change

EXPECT_THROW((void)mr.allocate_sync(2_MiB), std::logic_error);

EXPECT_THROW(std::ignore = mr.allocate_sync(2_MiB), std::logic_error);

This same proposal came up recently and I did some research: rapidsai/rmm#2301 (comment)

I concluded that the current state is the best option we have.

I did some reading on this. Apparently std::ignore is really only meant to ignore members during tuple unpacking. The cppreference docs state:

std::ignore is intended for use with std::tie when unpacking a std::tuple, as a placeholder for the arguments that are not used, but can be used for any unwanted assignment.

Some code guides recommend using std::ignore to avoid warnings from unused return values of [[nodiscard]] functions, even though an assignment isn't required.

For ignoring values not requiring assignment, one may cast to void. For variables that have names, but whose value is unused, one may cast those to void or declare those variables with [[maybe_unused]].

The std::ignore name even requires including <tuple>, suggesting to me that this is probably the wrong direction. I confirmed with a little more googling: we might actually have the best thing with (void).

# Conflicts: # cpp/src/memory/buffer_resource.cpp

…BufferResource Replace non-owning device_async_resource_ref with owning cuda::mr::any_resource<device_accessible> for constructor parameters and member storage in RmmResourceAdaptor, RmmResourceAdaptorImpl, and BufferResource. Getters still return device_async_resource_ref via const_cast, matching the pattern in RMM's prefetch_resource_adaptor. Forward the alignment parameter in RmmResourceAdaptorImpl::allocate and deallocate to fix CCCL deprecation warnings. Update Cython declarations to use any_resource and RMM's make_any_device_resource/make_device_async_resource_ref helpers, replacing the custom make_rapidsmpf_resource_ref inline function.

…evice_resource

throw_at_limit_resource is split into a non-copyable impl holding the mutable allocs set, and a shared_resource<impl> wrapper. When any_resource copies the resource, copies now share the same underlying state via ref counting, so test assertions on the original object correctly observe allocations made through the adaptor's internal copy.

bdice · 2026-04-17T06:14:55Z

rapidsmpf should be pinned to a specific commit of cuCascade (the tip of PR 98) until NVIDIA/cuCascade#98 is merged.

bdice

@nirandaperera Thanks for your help with this so far. I would like to ask if you can help with the remaining refactoring steps here. I will handle the cuCascade pinning soon (just need to get cuDF artifacts built and verify that the cuCascade tests pass, then I will pin to that commit).

bdice · 2026-04-17T06:15:43Z

 create_context(ProgramOptions& arguments, RmmResourceAdaptor* mr) {
-    rmm::mr::set_current_device_resource(mr);
-    rmm::mr::set_current_device_resource_ref(mr);
+    rmm::mr::set_current_device_resource(*mr);


We may be able to pass RmmResourceAdaptor by value rather than by pointer here.

bdice · 2026-04-17T06:16:07Z

    if (args.device_mem_limit_mb >= 0) {
        memory_available[rapidsmpf::MemoryType::DEVICE] = rapidsmpf::LimitAvailableMemory{
-            stat_enabled_mr.get(), args.device_mem_limit_mb << 20
+            &stat_enabled_mr, args.device_mem_limit_mb << 20


We may be able to pass by value rather than by pointer here.

bdice · 2026-04-17T06:17:07Z

We don't really need this to be a "stack" anymore. Previously that was required to keep the MRs alive but now adaptors will keep their upstreams alive.

Action: more renaming / cleanup?

bdice · 2026-04-17T06:18:03Z

@@ -151,7 +151,7 @@ void BM_DeviceToHostCopyInclAlloc(benchmark::State& state) {
    auto device_mr = std::make_unique<rmm::mr::cuda_memory_resource>();


This probably doesn't need to be owned by unique_ptr anymore.

(Pattern repeats.)

bdice · 2026-04-17T06:28:59Z

+class RmmResourceAdaptor
+    : public cuda::mr::shared_resource<detail::RmmResourceAdaptorImpl> {


Really we have two types of memory resources:

Stateless: trivially copyable, movable, etc. Example: cuda_memory_resource{}.

Stateful: adaptors and/or nontrivial base resources. Examples: any adaptor (statistics, logging, pool) or cuda_async_memory_resource which holds a CUDA driver pool object.

So yes, all adaptors use this pattern, but it's not just adaptors, it's anything that has shared state. The way we handle stateful resources is PIMPL-like, with some nuances.

AdaptorImpl is a private/detail class that holds the upstream any_resource and/or adaptor state as members,

Adaptor is the public API, it uses cuda::mr::shared_resource which forwards allocation methods to the Impl class.

With that split, copying stateful resources is cheap (just the cost of copying a shared_ptr). Crucially, we don't actually use shared_ptr, we use cuda::mr::shared_resource<Impl> and that does the magic for us. It has its own shared-pointer-like implementation.

bdice · 2026-04-17T06:35:21Z

+    // Manual comparison of optionals to avoid recursive constraint satisfaction in
+    // CCCL 3.2. std::optional::operator== triggers infinite concept checking when the
+    // wrapped type (rmm::device_async_resource_ref) inherits from CCCL's concept-based
+    // resource_ref.
+    // TODO: Revert this after the RMM resource ref types are replaced with
+    // plain cuda::mr ref types. This depends on
+    // https://github.com/rapidsai/rmm/issues/2011.


Good catch, this can be removed now. I'm pretty sure this can be reduced to:

bool RmmResourceAdaptorImpl::operator==( RmmResourceAdaptorImpl const& other ) const noexcept { return this == std::addressof(other); }

See this example pattern.

bdice · 2026-04-17T06:36:39Z

+// ---------------------------------------------------------------------------
+namespace detail {
+
+RmmResourceAdaptorImpl::RmmResourceAdaptorImpl(


RMM uses a separate .cpp file for the Impl classes to avoid accidental overlap between private and public APIs. Either way is fine.

bdice · 2026-04-17T06:37:30Z

+    [[nodiscard]] bool operator==(
+        throw_at_limit_resource_impl const& other
+    ) const noexcept {
        return this == &other;


Suggested change

return this == &other;

return this == std::addressof(other);

bdice · 2026-04-17T06:39:35Z

-        # need to keep this alive here.
+        # Stored for the Python device_mr/pinned_mr property accessors.
+        # The C++ BufferResource owns the resource via any_resource.
        self._device_mr = device_mr


Previously these members were for keeping the MR alive. However, now it seems like they're used for the property accessors. I updated the comment.

bdice · 2026-04-17T06:49:38Z

+    RmmResourceAdaptor mr(primary_mr, fallback_mr);

-    EXPECT_THROW(mr.allocate_sync(2_MiB), std::logic_error);
+    EXPECT_THROW((void)mr.allocate_sync(2_MiB), std::logic_error);


This same proposal came up recently and I did some research: rapidsai/rmm#2301 (comment)

I concluded that the current state is the best option we have.

I did some reading on this. Apparently std::ignore is really only meant to ignore members during tuple unpacking. The cppreference docs state:

std::ignore is intended for use with std::tie when unpacking a std::tuple, as a placeholder for the arguments that are not used, but can be used for any unwanted assignment.

Some code guides recommend using std::ignore to avoid warnings from unused return values of [[nodiscard]] functions, even though an assignment isn't required.

For ignoring values not requiring assignment, one may cast to void. For variables that have names, but whose value is unused, one may cast those to void or declare those variables with [[maybe_unused]].

The std::ignore name even requires including <tuple>, suggesting to me that this is probably the wrong direction. I confirmed with a little more googling: we might actually have the best thing with (void).

This was referenced Apr 3, 2026

Adopt any_resource in BufferResource and RmmResourceAdaptor #772

Closed

[FEA] Support memory resources from CCCL 3.2 rapidsai/rmm#2011

Open

nirandaperera reviewed Apr 3, 2026

View reviewed changes

bdice added 3 commits April 14, 2026 19:18

Merge remote-tracking branch 'upstream/main' into rmm-cccl-migration

9f8e50e

# Conflicts: # cpp/src/memory/buffer_resource.cpp

Inline upstream memory resource variables in benchmark MR composition

6d3eb86

nirandaperera added breaking Introduces a breaking change improvement Improves an existing functionality labels Apr 16, 2026

bdice added 4 commits April 17, 2026 00:05

Remove RmmStack lifetime wrapper now that adaptors own their upstreams

954c49a

Replace deprecated set_current_device_resource_ref with set_current_d…

d982db6

…evice_resource

Merge remote-tracking branch 'upstream/main' into rmm-cccl-migration

1d63134

bdice force-pushed the rmm-cccl-migration branch from 0b13b4b to 1d28b47 Compare April 17, 2026 05:26

bdice commented Apr 17, 2026

View reviewed changes

bdice force-pushed the rmm-cccl-migration branch from 1d28b47 to fe393cb Compare April 17, 2026 06:31

bdice commented Apr 17, 2026

View reviewed changes

bdice force-pushed the rmm-cccl-migration branch 2 times, most recently from a8012b7 to a871b98 Compare April 17, 2026 07:09

bdice marked this pull request as ready for review April 17, 2026 07:13

bdice requested review from a team as code owners April 17, 2026 07:13

bdice requested a review from msarahan April 17, 2026 07:13

bdice force-pushed the rmm-cccl-migration branch 4 times, most recently from 2350135 to de72e5c Compare April 17, 2026 20:51

bdice force-pushed the rmm-cccl-migration branch from de72e5c to 1c157af Compare April 17, 2026 21:07

TEMP: use CI artifacts from RMM/dependency PRs for testing

1b8e20a

bdice force-pushed the rmm-cccl-migration branch from 1c157af to 1b8e20a Compare April 18, 2026 15:52

		class RmmResourceAdaptor
		: public cuda::mr::shared_resource<detail::RmmResourceAdaptorImpl> {

	EXPECT_THROW((void)mr.allocate_sync(8_MiB), rmm::out_of_memory);
	EXPECT_THROW(std::ignore = mr.allocate_sync(8_MiB), rmm::out_of_memory);

	EXPECT_THROW((void)mr.allocate_sync(2_MiB), std::logic_error);
	EXPECT_THROW(std::ignore = mr.allocate_sync(2_MiB), std::logic_error);

		@@ -151,7 +151,7 @@ void BM_DeviceToHostCopyInclAlloc(benchmark::State& state) {
		auto device_mr = std::make_unique<rmm::mr::cuda_memory_resource>();

Conversation

bdice commented Apr 3, 2026

Summary

Notes

Uh oh!

copy-pr-bot bot commented Apr 3, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bdice left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants