|
1301 | 1301 | <section id="examples"> |
1302 | 1302 | <h1>Examples<a class="headerlink" href="#examples" title="Link to this heading">#</a></h1> |
1303 | 1303 | <p>This page links to the <code class="docutils literal notranslate"><span class="pre">cuda.bindings</span></code> examples shipped in the |
1304 | | -<a class="extlink-cuda-bindings-examples reference external" href="https://github.com/NVIDIA/cuda-python/tree/1308bc517929e7a17432c413890f96fa6f3b96a2/cuda_bindings/examples/">cuda-python repository</a>. |
| 1304 | +<a class="extlink-cuda-bindings-examples reference external" href="https://github.com/NVIDIA/cuda-python/tree/32da37d3defbc31aef11bfd7e2d35112eb70602b/cuda_bindings/examples/">cuda-python repository</a>. |
1305 | 1305 | Use it as a quick index when you want a runnable sample for a specific API area |
1306 | 1306 | or CUDA feature.</p> |
1307 | 1307 | <section id="introduction"> |
1308 | 1308 | <h2>Introduction<a class="headerlink" href="#introduction" title="Link to this heading">#</a></h2> |
1309 | 1309 | <ul class="simple"> |
1310 | | -<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/1308bc517929e7a17432c413890f96fa6f3b96a2/cuda_bindings/examples/0_Introduction/clock_nvrtc.py">clock_nvrtc.py</a> |
| 1310 | +<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/32da37d3defbc31aef11bfd7e2d35112eb70602b/cuda_bindings/examples/0_Introduction/clock_nvrtc.py">clock_nvrtc.py</a> |
1311 | 1311 | uses NVRTC-compiled CUDA code and the device clock to time a reduction |
1312 | 1312 | kernel.</p></li> |
1313 | | -<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/1308bc517929e7a17432c413890f96fa6f3b96a2/cuda_bindings/examples/0_Introduction/simple_cubemap_texture.py">simple_cubemap_texture.py</a> |
| 1313 | +<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/32da37d3defbc31aef11bfd7e2d35112eb70602b/cuda_bindings/examples/0_Introduction/simple_cubemap_texture.py">simple_cubemap_texture.py</a> |
1314 | 1314 | demonstrates cubemap texture sampling and transformation.</p></li> |
1315 | | -<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/1308bc517929e7a17432c413890f96fa6f3b96a2/cuda_bindings/examples/0_Introduction/simple_p2p.py">simple_p2p.py</a> |
| 1315 | +<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/32da37d3defbc31aef11bfd7e2d35112eb70602b/cuda_bindings/examples/0_Introduction/simple_p2p.py">simple_p2p.py</a> |
1316 | 1316 | shows peer-to-peer memory access and transfers between multiple GPUs.</p></li> |
1317 | | -<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/1308bc517929e7a17432c413890f96fa6f3b96a2/cuda_bindings/examples/0_Introduction/simple_zero_copy.py">simple_zero_copy.py</a> |
| 1317 | +<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/32da37d3defbc31aef11bfd7e2d35112eb70602b/cuda_bindings/examples/0_Introduction/simple_zero_copy.py">simple_zero_copy.py</a> |
1318 | 1318 | uses zero-copy mapped host memory for vector addition.</p></li> |
1319 | | -<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/1308bc517929e7a17432c413890f96fa6f3b96a2/cuda_bindings/examples/0_Introduction/system_wide_atomics.py">system_wide_atomics.py</a> |
| 1319 | +<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/32da37d3defbc31aef11bfd7e2d35112eb70602b/cuda_bindings/examples/0_Introduction/system_wide_atomics.py">system_wide_atomics.py</a> |
1320 | 1320 | demonstrates system-wide atomic operations on managed memory.</p></li> |
1321 | | -<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/1308bc517929e7a17432c413890f96fa6f3b96a2/cuda_bindings/examples/0_Introduction/vector_add_drv.py">vector_add_drv.py</a> |
| 1321 | +<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/32da37d3defbc31aef11bfd7e2d35112eb70602b/cuda_bindings/examples/0_Introduction/vector_add_drv.py">vector_add_drv.py</a> |
1322 | 1322 | uses the CUDA Driver API and unified virtual addressing for vector addition.</p></li> |
1323 | | -<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/1308bc517929e7a17432c413890f96fa6f3b96a2/cuda_bindings/examples/0_Introduction/vector_add_mmap.py">vector_add_mmap.py</a> |
| 1323 | +<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/32da37d3defbc31aef11bfd7e2d35112eb70602b/cuda_bindings/examples/0_Introduction/vector_add_mmap.py">vector_add_mmap.py</a> |
1324 | 1324 | uses virtual memory management APIs such as <code class="docutils literal notranslate"><span class="pre">cuMemCreate</span></code> and |
1325 | 1325 | <code class="docutils literal notranslate"><span class="pre">cuMemMap</span></code> for vector addition.</p></li> |
1326 | 1326 | </ul> |
1327 | 1327 | </section> |
1328 | 1328 | <section id="concepts-and-techniques"> |
1329 | 1329 | <h2>Concepts and techniques<a class="headerlink" href="#concepts-and-techniques" title="Link to this heading">#</a></h2> |
1330 | 1330 | <ul class="simple"> |
1331 | | -<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/1308bc517929e7a17432c413890f96fa6f3b96a2/cuda_bindings/examples/2_Concepts_and_Techniques/stream_ordered_allocation.py">stream_ordered_allocation.py</a> |
| 1331 | +<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/32da37d3defbc31aef11bfd7e2d35112eb70602b/cuda_bindings/examples/2_Concepts_and_Techniques/stream_ordered_allocation.py">stream_ordered_allocation.py</a> |
1332 | 1332 | demonstrates <code class="docutils literal notranslate"><span class="pre">cudaMallocAsync</span></code> and <code class="docutils literal notranslate"><span class="pre">cudaFreeAsync</span></code> together with |
1333 | 1333 | memory-pool release thresholds.</p></li> |
1334 | 1334 | </ul> |
1335 | 1335 | </section> |
1336 | 1336 | <section id="cuda-features"> |
1337 | 1337 | <h2>CUDA features<a class="headerlink" href="#cuda-features" title="Link to this heading">#</a></h2> |
1338 | 1338 | <ul class="simple"> |
1339 | | -<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/1308bc517929e7a17432c413890f96fa6f3b96a2/cuda_bindings/examples/3_CUDA_Features/global_to_shmem_async_copy.py">global_to_shmem_async_copy.py</a> |
| 1339 | +<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/32da37d3defbc31aef11bfd7e2d35112eb70602b/cuda_bindings/examples/3_CUDA_Features/global_to_shmem_async_copy.py">global_to_shmem_async_copy.py</a> |
1340 | 1340 | compares asynchronous global-to-shared-memory copy strategies in matrix |
1341 | 1341 | multiplication kernels.</p></li> |
1342 | | -<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/1308bc517929e7a17432c413890f96fa6f3b96a2/cuda_bindings/examples/3_CUDA_Features/simple_cuda_graphs.py">simple_cuda_graphs.py</a> |
| 1342 | +<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/32da37d3defbc31aef11bfd7e2d35112eb70602b/cuda_bindings/examples/3_CUDA_Features/simple_cuda_graphs.py">simple_cuda_graphs.py</a> |
1343 | 1343 | shows both manual CUDA graph construction and stream-capture-based replay.</p></li> |
1344 | 1344 | </ul> |
1345 | 1345 | </section> |
1346 | 1346 | <section id="libraries-and-tools"> |
1347 | 1347 | <h2>Libraries and tools<a class="headerlink" href="#libraries-and-tools" title="Link to this heading">#</a></h2> |
1348 | 1348 | <ul class="simple"> |
1349 | | -<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/1308bc517929e7a17432c413890f96fa6f3b96a2/cuda_bindings/examples/4_CUDA_Libraries/conjugate_gradient_multi_block_cg.py">conjugate_gradient_multi_block_cg.py</a> |
| 1349 | +<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/32da37d3defbc31aef11bfd7e2d35112eb70602b/cuda_bindings/examples/4_CUDA_Libraries/conjugate_gradient_multi_block_cg.py">conjugate_gradient_multi_block_cg.py</a> |
1350 | 1350 | implements a conjugate-gradient solver with cooperative groups and |
1351 | 1351 | multi-block synchronization.</p></li> |
1352 | | -<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/1308bc517929e7a17432c413890f96fa6f3b96a2/cuda_bindings/examples/4_CUDA_Libraries/nvidia_smi.py">nvidia_smi.py</a> |
| 1352 | +<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/32da37d3defbc31aef11bfd7e2d35112eb70602b/cuda_bindings/examples/4_CUDA_Libraries/nvidia_smi.py">nvidia_smi.py</a> |
1353 | 1353 | uses NVML to implement a Python subset of <code class="docutils literal notranslate"><span class="pre">nvidia-smi</span></code>.</p></li> |
1354 | 1354 | </ul> |
1355 | 1355 | </section> |
1356 | 1356 | <section id="advanced-and-interoperability"> |
1357 | 1357 | <h2>Advanced and interoperability<a class="headerlink" href="#advanced-and-interoperability" title="Link to this heading">#</a></h2> |
1358 | 1358 | <ul class="simple"> |
1359 | | -<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/1308bc517929e7a17432c413890f96fa6f3b96a2/cuda_bindings/examples/extra/iso_fd_modelling.py">iso_fd_modelling.py</a> |
| 1359 | +<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/32da37d3defbc31aef11bfd7e2d35112eb70602b/cuda_bindings/examples/extra/iso_fd_modelling.py">iso_fd_modelling.py</a> |
1360 | 1360 | runs isotropic finite-difference wave propagation across multiple GPUs with |
1361 | 1361 | peer-to-peer halo exchange.</p></li> |
1362 | | -<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/1308bc517929e7a17432c413890f96fa6f3b96a2/cuda_bindings/examples/extra/jit_program.py">jit_program.py</a> |
| 1362 | +<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/32da37d3defbc31aef11bfd7e2d35112eb70602b/cuda_bindings/examples/extra/jit_program.py">jit_program.py</a> |
1363 | 1363 | JIT-compiles a SAXPY kernel with NVRTC and launches it through the Driver |
1364 | 1364 | API.</p></li> |
1365 | 1365 | </ul> |
|
0 commit comments