Skip to content

Commit 5e9a5ea

Browse files
cuda.core: fix C++ teardown leak when buffer has no attached stream (#2001)
Issue: the C++ ``shared_ptr`` deleter for a buffer's device-pointer handle invokes ``MemoryResource.deallocate`` via ``_mr_dealloc_callback``. The handle's deallocation stream is set separately via ``set_deallocation_stream``; if it was never set (e.g. buffers minted via ``Buffer.from_handle(ptr, size, mr=mr)`` from DLPack import, IPC import, or third-party adapters), the callback would pass ``stream=None`` to ``mr.deallocate``. After the strict-stream changes for #2001, the stream-ordered MR overrides reject ``stream=None`` via ``Stream_accept`` and raise ``TypeError``. The ``noexcept`` callback catches the exception, prints a warning to stderr, and returns -- silently **leaking** the underlying CUDA allocation (and any associated IPC handles). Fix: when ``h_stream`` is empty in ``_mr_dealloc_callback``, fall back to ``default_stream()`` instead of ``None``. The C++ teardown path is the unique legitimate "no-stream-context" caller (no Python frame from which to obtain a stream), so this is the one place where an implicit default-stream fallback is necessary; everywhere else the policy remains "stream is required and must be passed explicitly". Add ``test_mr_dealloc_callback_falls_back_to_default_stream`` covering the regression: a strict stream-ordered mock MR is used to back a ``Buffer.from_handle`` (no attached stream), and the test asserts that ``deallocate`` is invoked with the default stream rather than failing with ``TypeError`` and leaking. Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 91807ee commit 5e9a5ea

2 files changed

Lines changed: 66 additions & 4 deletions

File tree

cuda_core/cuda/core/_memory/_buffer.pyx

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -49,11 +49,23 @@ cdef void _mr_dealloc_callback(
4949
size_t size,
5050
const StreamHandle& h_stream,
5151
) noexcept:
52-
"""Called by the C++ deleter to deallocate via MemoryResource.deallocate."""
52+
"""Called by the C++ deleter to deallocate via MemoryResource.deallocate.
53+
54+
This is the C++ teardown path: there is no Python caller frame from
55+
which to obtain a stream. If the device-pointer handle was created
56+
without ``set_deallocation_stream`` being called (e.g. buffers minted
57+
via ``Buffer.from_handle(ptr, size, mr=mr)`` from DLPack import,
58+
third-party adapters, or other foreign sources), ``h_stream`` is
59+
empty here. Stream-ordered MR ``deallocate`` overrides reject
60+
``stream=None`` (issue #2001), so without a fallback the destructor
61+
would print a warning and leak the allocation. Fall back to the
62+
legacy/per-thread default stream so the free still happens; this is
63+
the unique exception to the "no implicit default-stream fallback"
64+
policy because the teardown has no other source of truth.
65+
"""
66+
cdef Stream stream
5367
try:
54-
stream = None
55-
if h_stream:
56-
stream = Stream._from_handle(Stream, h_stream)
68+
stream = Stream._from_handle(Stream, h_stream) if h_stream else default_stream()
5769
mr.deallocate(int(ptr), size, stream=stream)
5870
except Exception as exc:
5971
print(f"Warning: mr.deallocate() failed during Buffer destruction: {exc}",

cuda_core/tests/test_memory.py

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -521,6 +521,56 @@ def deallocate(self, ptr, size, *, stream=None):
521521
assert received["stream"].handle == stream.handle
522522

523523

524+
def test_mr_dealloc_callback_falls_back_to_default_stream():
525+
"""When a Buffer's device-pointer handle has no attached deallocation
526+
stream (e.g. buffers minted via :meth:`Buffer.from_handle` from DLPack
527+
import, IPC import, or third-party adapters), the C++ deleter callback
528+
must fall back to the default stream rather than passing ``stream=None``
529+
to ``mr.deallocate``. Stream-ordered MRs validate the stream and would
530+
otherwise raise ``TypeError`` from inside the ``noexcept`` callback,
531+
which only logs a warning and silently leaks the allocation. See
532+
`#2001 <https://github.com/NVIDIA/cuda-python/issues/2001>`__.
533+
"""
534+
import gc
535+
536+
from cuda.core._stream import Stream_accept, default_stream
537+
538+
device = Device()
539+
device.set_current()
540+
captured = {}
541+
542+
class StrictCapturingMR(MemoryResource):
543+
# Models a stream-ordered MR: deallocate validates the stream
544+
# the same way DeviceMemoryResource.deallocate does.
545+
@property
546+
def is_device_accessible(self):
547+
return True
548+
549+
@property
550+
def is_host_accessible(self):
551+
return False
552+
553+
@property
554+
def device_id(self):
555+
return device.device_id
556+
557+
def allocate(self, size, *, stream):
558+
raise NotImplementedError # not used; we use from_handle below
559+
560+
def deallocate(self, ptr, size, *, stream):
561+
captured["stream"] = Stream_accept(stream)
562+
563+
mr = StrictCapturingMR()
564+
# Buffer.from_handle binds mr but does not attach a deallocation stream.
565+
# ptr=1 is fine because StrictCapturingMR.deallocate does not free.
566+
buf = Buffer.from_handle(1, 1024, mr=mr)
567+
del buf
568+
gc.collect()
569+
570+
assert "stream" in captured, "deallocate was not invoked (callback raised and leaked)"
571+
assert captured["stream"].handle == default_stream().handle
572+
573+
524574
def test_memory_resource_and_owner_disallowed():
525575
with pytest.raises(ValueError, match="cannot be both specified together"):
526576
a = (ctypes.c_byte * 20)()

0 commit comments

Comments
 (0)