Skip to content

Commit c0fdf7a

Browse files
author
cuda-python-bot
committed
Deploy doc preview for PR 1972 (bcf5683)
1 parent def7e3c commit c0fdf7a

1,432 files changed

Lines changed: 162886 additions & 93619 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file records the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: 24f700c83f1e43e45cad77d0056d7d36
3+
config: d3ef2eba0d62819a718b215ccbc581b3
44
tags: 645f666f9bcd5a90fca523b33c5a78b7

docs/pr-preview/pr-1972/cuda-bindings/latest/_sources/api.rst.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ CUDA Python API Reference
1515
module/nvjitlink
1616
module/nvvm
1717
module/nvfatbin
18+
module/cudla
1819
module/cufile
1920
module/nvml
2021
module/utils

docs/pr-preview/pr-1972/cuda-bindings/latest/_sources/examples.rst.txt

Lines changed: 15 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -5,64 +5,61 @@ Examples
55
========
66

77
This page links to the ``cuda.bindings`` examples shipped in the
8-
`cuda-python repository <https://github.com/NVIDIA/cuda-python/tree/|cuda_bindings_github_ref|/cuda_bindings/examples>`_.
8+
:cuda-bindings-examples:`cuda-python repository </>`.
99
Use it as a quick index when you want a runnable sample for a specific API area
1010
or CUDA feature.
1111

1212
Introduction
1313
------------
1414

15-
- `clock_nvrtc.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/clock_nvrtc.py>`_
15+
- :cuda-bindings-example:`clock_nvrtc.py <0_Introduction/clock_nvrtc.py>`
1616
uses NVRTC-compiled CUDA code and the device clock to time a reduction
1717
kernel.
18-
- `simple_cubemap_texture.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_cubemap_texture.py>`_
18+
- :cuda-bindings-example:`simple_cubemap_texture.py <0_Introduction/simple_cubemap_texture.py>`
1919
demonstrates cubemap texture sampling and transformation.
20-
- `simple_p2p.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_p2p.py>`_
20+
- :cuda-bindings-example:`simple_p2p.py <0_Introduction/simple_p2p.py>`
2121
shows peer-to-peer memory access and transfers between multiple GPUs.
22-
- `simple_zero_copy.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_zero_copy.py>`_
22+
- :cuda-bindings-example:`simple_zero_copy.py <0_Introduction/simple_zero_copy.py>`
2323
uses zero-copy mapped host memory for vector addition.
24-
- `system_wide_atomics.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/system_wide_atomics.py>`_
24+
- :cuda-bindings-example:`system_wide_atomics.py <0_Introduction/system_wide_atomics.py>`
2525
demonstrates system-wide atomic operations on managed memory.
26-
- `vector_add_drv.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/vector_add_drv.py>`_
26+
- :cuda-bindings-example:`vector_add_drv.py <0_Introduction/vector_add_drv.py>`
2727
uses the CUDA Driver API and unified virtual addressing for vector addition.
28-
- `vector_add_mmap.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/vector_add_mmap.py>`_
28+
- :cuda-bindings-example:`vector_add_mmap.py <0_Introduction/vector_add_mmap.py>`
2929
uses virtual memory management APIs such as ``cuMemCreate`` and
3030
``cuMemMap`` for vector addition.
3131

3232
Concepts and techniques
3333
-----------------------
3434

35-
- `stream_ordered_allocation.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/2_Concepts_and_Techniques/stream_ordered_allocation.py>`_
35+
- :cuda-bindings-example:`stream_ordered_allocation.py <2_Concepts_and_Techniques/stream_ordered_allocation.py>`
3636
demonstrates ``cudaMallocAsync`` and ``cudaFreeAsync`` together with
3737
memory-pool release thresholds.
3838

3939
CUDA features
4040
-------------
4141

42-
- `global_to_shmem_async_copy.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/3_CUDA_Features/global_to_shmem_async_copy.py>`_
42+
- :cuda-bindings-example:`global_to_shmem_async_copy.py <3_CUDA_Features/global_to_shmem_async_copy.py>`
4343
compares asynchronous global-to-shared-memory copy strategies in matrix
4444
multiplication kernels.
45-
- `simple_cuda_graphs.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/3_CUDA_Features/simple_cuda_graphs.py>`_
45+
- :cuda-bindings-example:`simple_cuda_graphs.py <3_CUDA_Features/simple_cuda_graphs.py>`
4646
shows both manual CUDA graph construction and stream-capture-based replay.
4747

4848
Libraries and tools
4949
-------------------
5050

51-
- `conjugate_gradient_multi_block_cg.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/4_CUDA_Libraries/conjugate_gradient_multi_block_cg.py>`_
51+
- :cuda-bindings-example:`conjugate_gradient_multi_block_cg.py <4_CUDA_Libraries/conjugate_gradient_multi_block_cg.py>`
5252
implements a conjugate-gradient solver with cooperative groups and
5353
multi-block synchronization.
54-
- `nvidia_smi.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/4_CUDA_Libraries/nvidia_smi.py>`_
54+
- :cuda-bindings-example:`nvidia_smi.py <4_CUDA_Libraries/nvidia_smi.py>`
5555
uses NVML to implement a Python subset of ``nvidia-smi``.
5656

5757
Advanced and interoperability
5858
-----------------------------
5959

60-
- `iso_fd_modelling.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/extra/iso_fd_modelling.py>`_
60+
- :cuda-bindings-example:`iso_fd_modelling.py <extra/iso_fd_modelling.py>`
6161
runs isotropic finite-difference wave propagation across multiple GPUs with
6262
peer-to-peer halo exchange.
63-
- `jit_program.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/extra/jit_program.py>`_
63+
- :cuda-bindings-example:`jit_program.py <extra/jit_program.py>`
6464
JIT-compiles a SAXPY kernel with NVRTC and launches it through the Driver
6565
API.
66-
- `numba_emm_plugin.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/extra/numba_emm_plugin.py>`_
67-
shows how to back Numba's EMM interface with the NVIDIA CUDA Python Driver
68-
API.

docs/pr-preview/pr-1972/cuda-bindings/latest/_sources/install.rst.txt

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Runtime Requirements
1212
* Linux (x86-64, arm64) and Windows (x86-64)
1313
* Python 3.10 - 3.14
1414
* Driver: Linux (580.65.06 or later) Windows (580.88 or later)
15-
* Optionally, NVRTC, nvJitLink, NVVM, and cuFile from CUDA Toolkit 13.x
15+
* Optionally, NVRTC, nvJitLink, nvFatBin, NVVM, cuFile, and cuDLA from CUDA Toolkit 13.x
1616

1717
.. note::
1818

@@ -52,10 +52,12 @@ Where the optional dependencies include:
5252

5353
* ``nvidia-cuda-nvrtc`` (NVRTC runtime compilation library)
5454
* ``nvidia-nvjitlink`` (nvJitLink library)
55+
* ``nvidia-nvfatbin`` (nvFatBin library)
5556
* ``nvidia-nvvm`` (NVVM library)
5657
* ``nvidia-cufile`` (cuFile library, Linux only)
58+
* ``nvidia-cudla`` (cuDLA library, Linux aarch64 only)
5759

58-
These are now installed through the ``cuda-toolkit`` metapackage for improved dependency resolution.
60+
These are now installed through the ``cuda-toolkit`` metapackage, where available, for improved dependency resolution.
5961

6062
Installing from Conda
6163
---------------------
@@ -74,6 +76,10 @@ For example:
7476
7577
$ conda install -c conda-forge cuda-python cuda-version=13
7678
79+
.. note::
80+
81+
Tegra users can install the cuDLA conda package from conda-forge through ``conda install -c conda-forge libcudla cuda-version=13``, if it does not already exist on the system.
82+
7783
Installing from Source
7884
----------------------
7985

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
.. SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
.. SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE
3+
4+
.. default-role:: cpp:any
5+
6+
cudla
7+
=====
8+
9+
Note
10+
----
11+
12+
The cuDLA bindings require a Jetson platform with DLA hardware (Xavier or Orin).
13+
cuDLA is not available on desktop GPUs.
14+
15+
Functions
16+
---------
17+
18+
cuDLA defines the following functions for DLA device management and inference.
19+
20+
.. autofunction:: cuda.bindings.cudla.get_version
21+
.. autofunction:: cuda.bindings.cudla.device_get_count
22+
.. autofunction:: cuda.bindings.cudla.create_device
23+
.. autofunction:: cuda.bindings.cudla.destroy_device
24+
.. autofunction:: cuda.bindings.cudla.mem_register
25+
.. autofunction:: cuda.bindings.cudla.mem_unregister
26+
.. autofunction:: cuda.bindings.cudla.module_load_from_memory
27+
.. autofunction:: cuda.bindings.cudla.module_get_attributes
28+
.. autofunction:: cuda.bindings.cudla.module_unload
29+
.. autofunction:: cuda.bindings.cudla.submit_task
30+
.. autofunction:: cuda.bindings.cudla.device_get_attribute
31+
.. autofunction:: cuda.bindings.cudla.get_last_error
32+
.. autofunction:: cuda.bindings.cudla.set_task_timeout_in_ms
33+
34+
Types
35+
-----
36+
37+
.. autoclass:: cuda.bindings.cudla.ExternalMemoryHandleDesc
38+
.. autoclass:: cuda.bindings.cudla.ExternalSemaphoreHandleDesc
39+
.. autoclass:: cuda.bindings.cudla.ModuleTensorDescriptor
40+
.. autoclass:: cuda.bindings.cudla.Fence
41+
.. autoclass:: cuda.bindings.cudla.DevAttribute
42+
.. autoclass:: cuda.bindings.cudla.ModuleAttribute
43+
.. autoclass:: cuda.bindings.cudla.WaitEvents
44+
.. autoclass:: cuda.bindings.cudla.SignalEvents
45+
.. autoclass:: cuda.bindings.cudla.Task
46+
47+
Enums
48+
-----
49+
50+
.. autoclass:: cuda.bindings.cudla.Status
51+
52+
.. autoattribute:: cuda.bindings.cudla.Status.Success
53+
54+
.. autoclass:: cuda.bindings.cudla.Mode
55+
56+
.. autoattribute:: cuda.bindings.cudla.Mode.CUDA_DLA
57+
.. autoattribute:: cuda.bindings.cudla.Mode.STANDALONE
58+
59+
.. autoclass:: cuda.bindings.cudla.ModuleAttributeType
60+
.. autoclass:: cuda.bindings.cudla.FenceType
61+
.. autoclass:: cuda.bindings.cudla.ModuleLoadFlags
62+
.. autoclass:: cuda.bindings.cudla.SubmissionFlags
63+
.. autoclass:: cuda.bindings.cudla.AccessPermissionFlags
64+
.. autoclass:: cuda.bindings.cudla.DevAttributeType

0 commit comments

Comments
 (0)