Skip to content

Use CUDA batch memory copy API wherever possible#954

Draft
kingcrimsontianyu wants to merge 5 commits intorapidsai:mainfrom
kingcrimsontianyu:cuda-batch-memcpy
Draft

Use CUDA batch memory copy API wherever possible#954
kingcrimsontianyu wants to merge 5 commits intorapidsai:mainfrom
kingcrimsontianyu:cuda-batch-memcpy

Conversation

@kingcrimsontianyu
Copy link
Copy Markdown
Contributor

@kingcrimsontianyu kingcrimsontianyu commented Apr 14, 2026

The use of CUDA batched memory copy for general CPU-GPU copy is recommended by the driver team, as it avoids certain limitations in the traditional memory copy API, such as unexpected device-wide synchronizations. This PR replaces cuMemcpyHtoDAsync and cuMemcpyDtoHAsync with cuMemcpyBatchAsync.

@kingcrimsontianyu kingcrimsontianyu added improvement Improves an existing functionality non-breaking Introduces a non-breaking change c++ Affects the C++ API of KvikIO labels Apr 14, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 14, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@kingcrimsontianyu
Copy link
Copy Markdown
Contributor Author

/ok to test 4d1e8c1

@kingcrimsontianyu kingcrimsontianyu changed the title Use cuda batch memory copy API wherever possible Use CUDA batch memory copy API wherever possible Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ Affects the C++ API of KvikIO improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant