Conversation
RMM 26.06 removes rmm::mr::device_memory_resource base class and adopts CCCL-native memory resource concepts. This migrates rapidsmpf accordingly: - RmmResourceAdaptor rewritten as thin shell inheriting cuda::mr::shared_resource<detail::RmmResourceAdaptorImpl>, with all mutable state in the impl class for copyable shared ownership. - Replace device_memory_resource* with rmm::device_async_resource_ref for non-owning references and cuda::mr::any_resource for owning storage. - Remove rmm::mr::owning_wrapper usage. - pool_memory_resource is no longer a template; upstream passed as device_async_resource_ref. - Replace set_current_device_resource(ptr) with set_current_device_resource_ref(ref). - Update test resources to satisfy CCCL resource concept (allocate_sync, deallocate_sync, operator==, get_property). - Update Cython bindings to use device_async_resource_ref instead of device_memory_resource. - Point get_cucascade.cmake to local cuCascade source directory.
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
| * @return A owning memory resource, which must be kept alive. | ||
| * @return An owning memory resource, which must be kept alive. | ||
| */ | ||
| [[nodiscard]] inline std::shared_ptr<rapidsmpf::RmmResourceAdaptor> |
There was a problem hiding this comment.
this could be a cuda::mr::shared_resource isnt it?
| class RmmResourceAdaptor | ||
| : public cuda::mr::shared_resource<detail::RmmResourceAdaptorImpl> { |
There was a problem hiding this comment.
Do all rmm adapters follow this pattern now?
As opposed to, move RmmResourceAdaptorImpl into RmmResourceAdaptor, and when ever we want to use the adapter, do,
auto adapter = cuda::mr::make_shared_resource<RmmResourceAdaptor>(mr);There was a problem hiding this comment.
Really we have two types of memory resources:
- Stateless: trivially copyable, movable, etc. Example:
cuda_memory_resource{}. - Stateful: adaptors and/or nontrivial base resources. Examples: any adaptor (statistics, logging, pool) or
cuda_async_memory_resourcewhich holds a CUDA driver pool object.
So yes, all adaptors use this pattern, but it's not just adaptors, it's anything that has shared state. The way we handle stateful resources is PIMPL-like, with some nuances.
AdaptorImplis a private/detail class that holds the upstreamany_resourceand/or adaptor state as members,Adaptoris the public API, it usescuda::mr::shared_resourcewhich forwards allocation methods to theImplclass.
With that split, copying stateful resources is cheap (just the cost of copying a shared_ptr). Crucially, we don't actually use shared_ptr, we use cuda::mr::shared_resource<Impl> and that does the magic for us. It has its own shared-pointer-like implementation.
| // Manual comparison of optionals to avoid recursive constraint satisfaction in | ||
| // CCCL 3.2. std::optional::operator== triggers infinite concept checking when the | ||
| // wrapped type (rmm::device_async_resource_ref) inherits from CCCL's concept-based | ||
| // resource_ref. | ||
| // TODO: Revert this after the RMM resource ref types are replaced with | ||
| // plain cuda::mr ref types. This depends on | ||
| // https://github.com/rapidsai/rmm/issues/2011. |
There was a problem hiding this comment.
is this comment still valid?
There was a problem hiding this comment.
Good catch, this can be removed now. I'm pretty sure this can be reduced to:
bool RmmResourceAdaptorImpl::operator==(
RmmResourceAdaptorImpl const& other
) const noexcept {
return this == std::addressof(other);
}| RmmResourceAdaptor mr(primary_mr); | ||
|
|
||
| EXPECT_THROW(mr.allocate_sync(8_MiB), rmm::out_of_memory); | ||
| EXPECT_THROW((void)mr.allocate_sync(8_MiB), rmm::out_of_memory); |
There was a problem hiding this comment.
nit
| EXPECT_THROW((void)mr.allocate_sync(8_MiB), rmm::out_of_memory); | |
| EXPECT_THROW(std::ignore = mr.allocate_sync(8_MiB), rmm::out_of_memory); |
| RmmResourceAdaptor mr(primary_mr, fallback_mr); | ||
|
|
||
| EXPECT_THROW(mr.allocate_sync(2_MiB), std::logic_error); | ||
| EXPECT_THROW((void)mr.allocate_sync(2_MiB), std::logic_error); |
There was a problem hiding this comment.
| EXPECT_THROW((void)mr.allocate_sync(2_MiB), std::logic_error); | |
| EXPECT_THROW(std::ignore = mr.allocate_sync(2_MiB), std::logic_error); |
There was a problem hiding this comment.
This same proposal came up recently and I did some research: rapidsai/rmm#2301 (comment)
I concluded that the current state is the best option we have.
I did some reading on this. Apparently
std::ignoreis really only meant to ignore members during tuple unpacking. The cppreference docs state:
std::ignoreis intended for use withstd::tiewhen unpacking astd::tuple, as a placeholder for the arguments that are not used, but can be used for any unwanted assignment.Some code guides recommend using
std::ignoreto avoid warnings from unused return values of [[nodiscard]] functions, even though an assignment isn't required.For ignoring values not requiring assignment, one may cast to
void. For variables that have names, but whose value is unused, one may cast those tovoidor declare those variables with [[maybe_unused]].The
std::ignorename even requires including<tuple>, suggesting to me that this is probably the wrong direction. I confirmed with a little more googling: we might actually have the best thing with(void).
# Conflicts: # cpp/src/memory/buffer_resource.cpp
…BufferResource Replace non-owning device_async_resource_ref with owning cuda::mr::any_resource<device_accessible> for constructor parameters and member storage in RmmResourceAdaptor, RmmResourceAdaptorImpl, and BufferResource. Getters still return device_async_resource_ref via const_cast, matching the pattern in RMM's prefetch_resource_adaptor. Forward the alignment parameter in RmmResourceAdaptorImpl::allocate and deallocate to fix CCCL deprecation warnings. Update Cython declarations to use any_resource and RMM's make_any_device_resource/make_device_async_resource_ref helpers, replacing the custom make_rapidsmpf_resource_ref inline function.
throw_at_limit_resource is split into a non-copyable impl holding the mutable allocs set, and a shared_resource<impl> wrapper. When any_resource copies the resource, copies now share the same underlying state via ref counting, so test assertions on the original object correctly observe allocations made through the adaptor's internal copy.
0b13b4b to
1d28b47
Compare
There was a problem hiding this comment.
rapidsmpf should be pinned to a specific commit of cuCascade (the tip of PR 98) until NVIDIA/cuCascade#98 is merged.
1d28b47 to
fe393cb
Compare
bdice
left a comment
There was a problem hiding this comment.
@nirandaperera Thanks for your help with this so far. I would like to ask if you can help with the remaining refactoring steps here. I will handle the cuCascade pinning soon (just need to get cuDF artifacts built and verify that the cuCascade tests pass, then I will pin to that commit).
| create_context(ProgramOptions& arguments, RmmResourceAdaptor* mr) { | ||
| rmm::mr::set_current_device_resource(mr); | ||
| rmm::mr::set_current_device_resource_ref(mr); | ||
| rmm::mr::set_current_device_resource(*mr); |
There was a problem hiding this comment.
We may be able to pass RmmResourceAdaptor by value rather than by pointer here.
| if (args.device_mem_limit_mb >= 0) { | ||
| memory_available[rapidsmpf::MemoryType::DEVICE] = rapidsmpf::LimitAvailableMemory{ | ||
| stat_enabled_mr.get(), args.device_mem_limit_mb << 20 | ||
| &stat_enabled_mr, args.device_mem_limit_mb << 20 |
There was a problem hiding this comment.
We may be able to pass by value rather than by pointer here.
There was a problem hiding this comment.
We don't really need this to be a "stack" anymore. Previously that was required to keep the MRs alive but now adaptors will keep their upstreams alive.
Action: more renaming / cleanup?
| @@ -151,7 +151,7 @@ void BM_DeviceToHostCopyInclAlloc(benchmark::State& state) { | |||
| auto device_mr = std::make_unique<rmm::mr::cuda_memory_resource>(); | |||
There was a problem hiding this comment.
This probably doesn't need to be owned by unique_ptr anymore.
(Pattern repeats.)
| class RmmResourceAdaptor | ||
| : public cuda::mr::shared_resource<detail::RmmResourceAdaptorImpl> { |
There was a problem hiding this comment.
Really we have two types of memory resources:
- Stateless: trivially copyable, movable, etc. Example:
cuda_memory_resource{}. - Stateful: adaptors and/or nontrivial base resources. Examples: any adaptor (statistics, logging, pool) or
cuda_async_memory_resourcewhich holds a CUDA driver pool object.
So yes, all adaptors use this pattern, but it's not just adaptors, it's anything that has shared state. The way we handle stateful resources is PIMPL-like, with some nuances.
AdaptorImplis a private/detail class that holds the upstreamany_resourceand/or adaptor state as members,Adaptoris the public API, it usescuda::mr::shared_resourcewhich forwards allocation methods to theImplclass.
With that split, copying stateful resources is cheap (just the cost of copying a shared_ptr). Crucially, we don't actually use shared_ptr, we use cuda::mr::shared_resource<Impl> and that does the magic for us. It has its own shared-pointer-like implementation.
| // Manual comparison of optionals to avoid recursive constraint satisfaction in | ||
| // CCCL 3.2. std::optional::operator== triggers infinite concept checking when the | ||
| // wrapped type (rmm::device_async_resource_ref) inherits from CCCL's concept-based | ||
| // resource_ref. | ||
| // TODO: Revert this after the RMM resource ref types are replaced with | ||
| // plain cuda::mr ref types. This depends on | ||
| // https://github.com/rapidsai/rmm/issues/2011. |
There was a problem hiding this comment.
Good catch, this can be removed now. I'm pretty sure this can be reduced to:
bool RmmResourceAdaptorImpl::operator==(
RmmResourceAdaptorImpl const& other
) const noexcept {
return this == std::addressof(other);
}| // --------------------------------------------------------------------------- | ||
| namespace detail { | ||
|
|
||
| RmmResourceAdaptorImpl::RmmResourceAdaptorImpl( |
There was a problem hiding this comment.
RMM uses a separate .cpp file for the Impl classes to avoid accidental overlap between private and public APIs. Either way is fine.
| [[nodiscard]] bool operator==( | ||
| throw_at_limit_resource_impl const& other | ||
| ) const noexcept { | ||
| return this == &other; |
There was a problem hiding this comment.
| return this == &other; | |
| return this == std::addressof(other); |
| # need to keep this alive here. | ||
| # Stored for the Python device_mr/pinned_mr property accessors. | ||
| # The C++ BufferResource owns the resource via any_resource. | ||
| self._device_mr = device_mr |
There was a problem hiding this comment.
Previously these members were for keeping the MR alive. However, now it seems like they're used for the property accessors. I updated the comment.
| RmmResourceAdaptor mr(primary_mr, fallback_mr); | ||
|
|
||
| EXPECT_THROW(mr.allocate_sync(2_MiB), std::logic_error); | ||
| EXPECT_THROW((void)mr.allocate_sync(2_MiB), std::logic_error); |
There was a problem hiding this comment.
This same proposal came up recently and I did some research: rapidsai/rmm#2301 (comment)
I concluded that the current state is the best option we have.
I did some reading on this. Apparently
std::ignoreis really only meant to ignore members during tuple unpacking. The cppreference docs state:
std::ignoreis intended for use withstd::tiewhen unpacking astd::tuple, as a placeholder for the arguments that are not used, but can be used for any unwanted assignment.Some code guides recommend using
std::ignoreto avoid warnings from unused return values of [[nodiscard]] functions, even though an assignment isn't required.For ignoring values not requiring assignment, one may cast to
void. For variables that have names, but whose value is unused, one may cast those tovoidor declare those variables with [[maybe_unused]].The
std::ignorename even requires including<tuple>, suggesting to me that this is probably the wrong direction. I confirmed with a little more googling: we might actually have the best thing with(void).
a8012b7 to
a871b98
Compare
2350135 to
de72e5c
Compare
de72e5c to
1c157af
Compare
1c157af to
1b8e20a
Compare
Summary
RmmResourceAdaptoras a thin shell inheritingcuda::mr::shared_resource<detail::RmmResourceAdaptorImpl>, with all mutable state in the impl class for copyable shared ownership.device_memory_resource*withrmm::device_async_resource_reffor non-owning references andcuda::mr::any_resourcefor owning storage.rmm::mr::owning_wrapperusage (removed in RMM 26.06).pool_memory_resourceusage (no longer a template; upstream passed asdevice_async_resource_ref).set_current_device_resource(ptr)withset_current_device_resource_ref(ref).allocate_sync,deallocate_sync,operator==,get_property)..pxd/.pyx) to usedevice_async_resource_refinstead ofdevice_memory_resource.get_cucascade.cmaketo local cuCascade source directory (companion cuCascade PR: Migrate RMM usage to CCCL MR design NVIDIA/cuCascade#98).Notes
get_cucascade.cmakechange to useSOURCE_DIRis a development convenience and should be updated to point to the merged cuCascade commit before this PR is finalized.