Skip to content

Conversation

@Andy-Jost
Copy link
Contributor

@Andy-Jost Andy-Jost commented Jan 12, 2026

Summary

  • Replace the PyCapsule-based function pointer table with direct Cython cimport
  • Consumer modules now call resource handle functions directly through _resource_handles.so
  • Simplifies the architecture while correctly sharing static/thread-local state

Changes

  • Remove _CXX_API capsule infrastructure (resource_handles_cxx_api.hpp, _resource_handles_cxx_api.pxd, get_resource_handles_cxx_api_v1())
  • Remove _init_handles_table() calls from all consumer modules
  • Rename create_event_handle(flags) to create_event_handle_noctx(flags) to avoid C++ overload ambiguity for Cython binding
  • Update DESIGN.md to reflect the simplified architecture
  • Add clarifying comment in build_hooks.py for cpp file discovery

Test Plan

  • Verified resource_handles.cpp symbols exist only in _resource_handles.so (confirmed via nm -C)
  • CI tests pass

Stats

16 files changed, 179 insertions(+), 512 deletions(-)

Net reduction: 333 lines

Closes #1452

Replace the PyCapsule-based function pointer table with direct Cython
cimport. Consumer modules now call resource handle functions directly
through _resource_handles.so, simplifying the architecture while
correctly sharing static/thread-local state.

Changes:
- Remove _CXX_API capsule infrastructure (resource_handles_cxx_api.hpp,
  _resource_handles_cxx_api.pxd, get_resource_handles_cxx_api_v1())
- Remove _init_handles_table() calls from all consumer modules
- Rename create_event_handle(flags) to create_event_handle_noctx(flags)
  to avoid C++ overload ambiguity for Cython binding
- Update DESIGN.md to reflect the simplified architecture
- Add clarifying comment in build_hooks.py for cpp file discovery

Closes NVIDIA#1452
@Andy-Jost Andy-Jost added this to the cuda.core beta 11 milestone Jan 12, 2026
@Andy-Jost Andy-Jost added enhancement Any code-related improvements cuda.core Everything related to the cuda.core module labels Jan 12, 2026
@Andy-Jost Andy-Jost self-assigned this Jan 12, 2026
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Jan 12, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Andy-Jost
Copy link
Contributor Author

/ok to test 6ec8330

@github-actions
Copy link

@Andy-Jost
Copy link
Contributor Author

/ok to test bcefde1

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great simplification! Nothing stood out to me looking through visually. Cursor found one minor inconsistency.


# Context handles
ContextHandle create_context_handle_ref "cuda_core::create_context_handle_ref" (
cydriver.CUcontext ctx) nogil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor discovered this:


Minor Issue: Inconsistent noexcept Annotations

The .pxd file declares all handle functions as noexcept nogil:

# _resource_handles.pxd
cdef ContextHandle create_context_handle_ref(cydriver.CUcontext ctx) noexcept nogil
cdef StreamHandle create_stream_handle(...) noexcept nogil
cdef EventHandle create_event_handle(...) noexcept nogil
# etc.

But the .pyx file's cdef extern from declarations are missing noexcept on many of these same functions:

# _resource_handles.pyx
ContextHandle create_context_handle_ref "cuda_core::create_context_handle_ref" (
    cydriver.CUcontext ctx) nogil  # <-- missing noexcept
StreamHandle create_stream_handle "cuda_core::create_stream_handle" (
    ContextHandle h_ctx, unsigned int flags, int priority) nogil  # <-- missing noexcept
EventHandle create_event_handle "cuda_core::create_event_handle" (
    ContextHandle h_ctx, unsigned int flags) nogil  # <-- missing noexcept

Meanwhile, some functions do have noexcept in both places (e.g., get_primary_context, get_current_context, get_legacy_stream, get_per_thread_stream).

Functions missing noexcept in .pyx:

  • create_context_handle_ref
  • create_stream_handle
  • create_stream_handle_ref
  • create_event_handle
  • create_event_handle_noctx
  • create_event_handle_ipc
  • create_mempool_handle
  • create_mempool_handle_ref
  • create_mempool_handle_ipc
  • deviceptr_alloc_from_pool
  • deviceptr_alloc_async
  • deviceptr_alloc
  • deviceptr_alloc_host
  • deviceptr_create_ref
  • deviceptr_import_ipc

Impact: The C++ functions don't throw exceptions, so noexcept is semantically correct for all of them. The inconsistency probably doesn't cause runtime issues, but it would be cleaner if the .pyx declarations matched the .pxd declarations exactly.

Suggested fix: Add noexcept to all the cdef extern from declarations in _resource_handles.pyx that have it in _resource_handles.pxd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module enhancement Any code-related improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate eliminating _CXX_API capsule for resource handle functions

2 participants