Skip to content

Conversation

@mivertowski
Copy link
Owner

Comprehensive analysis of CUDA features that would enable true
persistent GPU actors, based on RingKernel implementation experience.

Key proposals include:

  • Native host-kernel signaling (replace polling)
  • Kernel-to-kernel mailboxes (first-class messaging)
  • Dynamic block scheduling (work stealing)
  • Persistent kernel preemption (cooperative)
  • Checkpointing and migration (fault tolerance)
  • Extended cooperative groups (hierarchical sync)
  • Memory model enhancements (SC atomics)
  • Multi-GPU persistent kernels

https://claude.ai/code/session_01TD1CHULcRkSAJ1KUqyhpF9

Comprehensive analysis of CUDA features that would enable true
persistent GPU actors, based on RingKernel implementation experience.

Key proposals include:
- Native host-kernel signaling (replace polling)
- Kernel-to-kernel mailboxes (first-class messaging)
- Dynamic block scheduling (work stealing)
- Persistent kernel preemption (cooperative)
- Checkpointing and migration (fault tolerance)
- Extended cooperative groups (hierarchical sync)
- Memory model enhancements (SC atomics)
- Multi-GPU persistent kernels

https://claude.ai/code/session_01TD1CHULcRkSAJ1KUqyhpF9
@mivertowski mivertowski merged commit bd9bd76 into main Jan 30, 2026
7 checks passed
@mivertowski mivertowski deleted the claude/cuda-persistent-gpu-actors-B7wfn branch January 30, 2026 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants