NVIDIA
diff --git a/‎docs/pr-preview/pr-2186/cuda-bindings/latest/_sources/install.rst.txt‎
Lines changed: 35 additions & 3 deletions b/‎docs/pr-preview/pr-2186/cuda-bindings/latest/_sources/install.rst.txt‎
Lines changed: 35 additions & 3 deletions
diff --git a/‎docs/pr-preview/pr-2186/cuda-bindings/latest/_sources/module/driver.rst.txt‎
Lines changed: 205 additions & 81 deletions b/‎docs/pr-preview/pr-2186/cuda-bindings/latest/_sources/module/driver.rst.txt‎
Lines changed: 205 additions & 81 deletions
diff --git a/‎docs/pr-preview/pr-2186/cuda-bindings/latest/_sources/module/runtime.rst.txt‎
Lines changed: 194 additions & 120 deletions b/‎docs/pr-preview/pr-2186/cuda-bindings/latest/_sources/module/runtime.rst.txt‎
Lines changed: 194 additions & 120 deletions
diff --git a/‎docs/pr-preview/pr-2186/cuda-bindings/latest/api.html‎
Lines changed: 4 additions & 4 deletions b/‎docs/pr-preview/pr-2186/cuda-bindings/latest/api.html‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/pr-preview/pr-2186/cuda-bindings/latest/examples.html‎
Lines changed: 15 additions & 15 deletions b/‎docs/pr-preview/pr-2186/cuda-bindings/latest/examples.html‎
Lines changed: 15 additions & 15 deletions
@@ -80,11 +80,43 @@ For example:
 
    Tegra users can install the cuDLA conda package from conda-forge through ``conda install -c conda-forge libcudla cuda-version=13``, if it does not already exist on the system.
 
+Development environment
+-----------------------
+
+The sections above cover end-user installation. The section below focuses on
+a repeatable *development* workflow (editable installs and running tests).
+
+Installing the latest nightly (top-of-tree builds)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+These are useful for users looking to test new features or bug fixes prior to
+their inclusion in a release.
+
+CI publishes wheels as GitHub Actions artifacts on every push to ``main``. To
+obtain the most recent build, use the following commands:
+
+.. code-block:: console
+
+   $ # Find the latest successful CI run on main:
+   $ RUN_ID=$(gh run list -R NVIDIA/cuda-python -w ci.yml -b main -s success -L1 --json databaseId -q '.[0].databaseId')
+
+   $ # Download the wheel (pick your Python version and platform):
+   $ gh run download "$RUN_ID" -R NVIDIA/cuda-python -p "cuda-bindings-python312-cuda13*-linux-64-*"
+
+   $ # Install the downloaded wheel:
+   $ pip install cuda-bindings-python312-cuda13*-linux-64-*/cuda_bindings*.whl[all]
+
+Replace ``python312`` with your Python version (e.g. ``python310``, ``python311``,
+``python313``, ``python314``, ``python314t``). For aarch64, replace ``linux-64``
+with ``linux-aarch64``; for Windows, use ``win-64``. Only the current CUDA
+major version is built on ``main``; wheels for the prior CUDA major are
+published from the corresponding backport branch.
+
 Installing from Source
-----------------------
+~~~~~~~~~~~~~~~~~~~~~~
 
 Requirements
-~~~~~~~~~~~~
+^^^^^^^^^^^^
 
 * CUDA Toolkit headers[^1]
 * CUDA Runtime static library[^2]
@@ -106,7 +138,7 @@ See :doc:`Environment Variables <environment_variables>` for a description of ot
    Only ``cydriver``, ``cyruntime`` and ``cynvrtc`` are impacted by the header requirement.
 
 Editable Install
-~~~~~~~~~~~~~~~~
+^^^^^^^^^^^^^^^^
 
 You can use:
 
 
@@ -2264,10 +2264,10 @@ <h1>CUDA Python API Reference<a class="headerlink" href="#cuda-python-api-refere
 <li class="toctree-l3"><a class="reference internal" href="module/driver.html#cuda.bindings.driver.CUdevResourceType"><code class="docutils literal notranslate"><span class="pre">CUdevResourceType</span></code></a></li>
 <li class="toctree-l3"><a class="reference internal" href="module/driver.html#cuda.bindings.driver.CUdevWorkqueueConfigScope"><code class="docutils literal notranslate"><span class="pre">CUdevWorkqueueConfigScope</span></code></a></li>
 <li class="toctree-l3"><a class="reference internal" href="module/driver.html#cuda.bindings.driver.CUdevResourceDesc"><code class="docutils literal notranslate"><span class="pre">CUdevResourceDesc</span></code></a></li>
-<li class="toctree-l3"><a class="reference internal" href="module/driver.html#id7"><code class="docutils literal notranslate"><span class="pre">CUdevSmResource</span></code></a></li>
-<li class="toctree-l3"><a class="reference internal" href="module/driver.html#id13"><code class="docutils literal notranslate"><span class="pre">CUdevWorkqueueConfigResource</span></code></a></li>
-<li class="toctree-l3"><a class="reference internal" href="module/driver.html#id18"><code class="docutils literal notranslate"><span class="pre">CUdevWorkqueueResource</span></code></a></li>
-<li class="toctree-l3"><a class="reference internal" href="module/driver.html#id21"><code class="docutils literal notranslate"><span class="pre">CU_DEV_SM_RESOURCE_GROUP_PARAMS</span></code></a></li>
+<li class="toctree-l3"><a class="reference internal" href="module/driver.html#id73"><code class="docutils literal notranslate"><span class="pre">CUdevSmResource</span></code></a></li>
+<li class="toctree-l3"><a class="reference internal" href="module/driver.html#id79"><code class="docutils literal notranslate"><span class="pre">CUdevWorkqueueConfigResource</span></code></a></li>
+<li class="toctree-l3"><a class="reference internal" href="module/driver.html#id84"><code class="docutils literal notranslate"><span class="pre">CUdevWorkqueueResource</span></code></a></li>
+<li class="toctree-l3"><a class="reference internal" href="module/driver.html#id87"><code class="docutils literal notranslate"><span class="pre">CU_DEV_SM_RESOURCE_GROUP_PARAMS</span></code></a></li>
 <li class="toctree-l3"><a class="reference internal" href="module/driver.html#cuda.bindings.driver.cuGreenCtxCreate"><code class="docutils literal notranslate"><span class="pre">cuGreenCtxCreate()</span></code></a></li>
 <li class="toctree-l3"><a class="reference internal" href="module/driver.html#cuda.bindings.driver.cuGreenCtxDestroy"><code class="docutils literal notranslate"><span class="pre">cuGreenCtxDestroy()</span></code></a></li>
 <li class="toctree-l3"><a class="reference internal" href="module/driver.html#cuda.bindings.driver.cuCtxFromGreenCtx"><code class="docutils literal notranslate"><span class="pre">cuCtxFromGreenCtx()</span></code></a></li>
 
@@ -1301,65 +1301,65 @@
   <section id="examples">
 <h1>Examples<a class="headerlink" href="#examples" title="Link to this heading">#</a></h1>
 <p>This page links to the <code class="docutils literal notranslate"><span class="pre">cuda.bindings</span></code> examples shipped in the
-<a class="extlink-cuda-bindings-examples reference external" href="https://github.com/NVIDIA/cuda-python/tree/d98094fefa33d0e6629204cf0665a5bce3f66a39/cuda_bindings/examples/">cuda-python repository</a>.
+<a class="extlink-cuda-bindings-examples reference external" href="https://github.com/NVIDIA/cuda-python/tree/3694e06171e9b2316396377103e31ba605eaef6e/cuda_bindings/examples/">cuda-python repository</a>.
 Use it as a quick index when you want a runnable sample for a specific API area
 or CUDA feature.</p>
 <section id="introduction">
 <h2>Introduction<a class="headerlink" href="#introduction" title="Link to this heading">#</a></h2>
 <ul class="simple">
-<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/d98094fefa33d0e6629204cf0665a5bce3f66a39/cuda_bindings/examples/0_Introduction/clock_nvrtc.py">clock_nvrtc.py</a>
+<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/3694e06171e9b2316396377103e31ba605eaef6e/cuda_bindings/examples/0_Introduction/clock_nvrtc.py">clock_nvrtc.py</a>
 uses NVRTC-compiled CUDA code and the device clock to time a reduction
 kernel.</p></li>
-<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/d98094fefa33d0e6629204cf0665a5bce3f66a39/cuda_bindings/examples/0_Introduction/simple_cubemap_texture.py">simple_cubemap_texture.py</a>
+<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/3694e06171e9b2316396377103e31ba605eaef6e/cuda_bindings/examples/0_Introduction/simple_cubemap_texture.py">simple_cubemap_texture.py</a>
 demonstrates cubemap texture sampling and transformation.</p></li>
-<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/d98094fefa33d0e6629204cf0665a5bce3f66a39/cuda_bindings/examples/0_Introduction/simple_p2p.py">simple_p2p.py</a>
+<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/3694e06171e9b2316396377103e31ba605eaef6e/cuda_bindings/examples/0_Introduction/simple_p2p.py">simple_p2p.py</a>
 shows peer-to-peer memory access and transfers between multiple GPUs.</p></li>
-<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/d98094fefa33d0e6629204cf0665a5bce3f66a39/cuda_bindings/examples/0_Introduction/simple_zero_copy.py">simple_zero_copy.py</a>
+<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/3694e06171e9b2316396377103e31ba605eaef6e/cuda_bindings/examples/0_Introduction/simple_zero_copy.py">simple_zero_copy.py</a>
 uses zero-copy mapped host memory for vector addition.</p></li>
-<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/d98094fefa33d0e6629204cf0665a5bce3f66a39/cuda_bindings/examples/0_Introduction/system_wide_atomics.py">system_wide_atomics.py</a>
+<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/3694e06171e9b2316396377103e31ba605eaef6e/cuda_bindings/examples/0_Introduction/system_wide_atomics.py">system_wide_atomics.py</a>
 demonstrates system-wide atomic operations on managed memory.</p></li>
-<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/d98094fefa33d0e6629204cf0665a5bce3f66a39/cuda_bindings/examples/0_Introduction/vector_add_drv.py">vector_add_drv.py</a>
+<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/3694e06171e9b2316396377103e31ba605eaef6e/cuda_bindings/examples/0_Introduction/vector_add_drv.py">vector_add_drv.py</a>
 uses the CUDA Driver API and unified virtual addressing for vector addition.</p></li>
-<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/d98094fefa33d0e6629204cf0665a5bce3f66a39/cuda_bindings/examples/0_Introduction/vector_add_mmap.py">vector_add_mmap.py</a>
+<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/3694e06171e9b2316396377103e31ba605eaef6e/cuda_bindings/examples/0_Introduction/vector_add_mmap.py">vector_add_mmap.py</a>
 uses virtual memory management APIs such as <code class="docutils literal notranslate"><span class="pre">cuMemCreate</span></code> and
 <code class="docutils literal notranslate"><span class="pre">cuMemMap</span></code> for vector addition.</p></li>
 </ul>
 </section>
 <section id="concepts-and-techniques">
 <h2>Concepts and techniques<a class="headerlink" href="#concepts-and-techniques" title="Link to this heading">#</a></h2>
 <ul class="simple">
-<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/d98094fefa33d0e6629204cf0665a5bce3f66a39/cuda_bindings/examples/2_Concepts_and_Techniques/stream_ordered_allocation.py">stream_ordered_allocation.py</a>
+<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/3694e06171e9b2316396377103e31ba605eaef6e/cuda_bindings/examples/2_Concepts_and_Techniques/stream_ordered_allocation.py">stream_ordered_allocation.py</a>
 demonstrates <code class="docutils literal notranslate"><span class="pre">cudaMallocAsync</span></code> and <code class="docutils literal notranslate"><span class="pre">cudaFreeAsync</span></code> together with
 memory-pool release thresholds.</p></li>
 </ul>
 </section>
 <section id="cuda-features">
 <h2>CUDA features<a class="headerlink" href="#cuda-features" title="Link to this heading">#</a></h2>
 <ul class="simple">
-<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/d98094fefa33d0e6629204cf0665a5bce3f66a39/cuda_bindings/examples/3_CUDA_Features/global_to_shmem_async_copy.py">global_to_shmem_async_copy.py</a>
+<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/3694e06171e9b2316396377103e31ba605eaef6e/cuda_bindings/examples/3_CUDA_Features/global_to_shmem_async_copy.py">global_to_shmem_async_copy.py</a>
 compares asynchronous global-to-shared-memory copy strategies in matrix
 multiplication kernels.</p></li>
-<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/d98094fefa33d0e6629204cf0665a5bce3f66a39/cuda_bindings/examples/3_CUDA_Features/simple_cuda_graphs.py">simple_cuda_graphs.py</a>
+<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/3694e06171e9b2316396377103e31ba605eaef6e/cuda_bindings/examples/3_CUDA_Features/simple_cuda_graphs.py">simple_cuda_graphs.py</a>
 shows both manual CUDA graph construction and stream-capture-based replay.</p></li>
 </ul>
 </section>
 <section id="libraries-and-tools">
 <h2>Libraries and tools<a class="headerlink" href="#libraries-and-tools" title="Link to this heading">#</a></h2>
 <ul class="simple">
-<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/d98094fefa33d0e6629204cf0665a5bce3f66a39/cuda_bindings/examples/4_CUDA_Libraries/conjugate_gradient_multi_block_cg.py">conjugate_gradient_multi_block_cg.py</a>
+<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/3694e06171e9b2316396377103e31ba605eaef6e/cuda_bindings/examples/4_CUDA_Libraries/conjugate_gradient_multi_block_cg.py">conjugate_gradient_multi_block_cg.py</a>
 implements a conjugate-gradient solver with cooperative groups and
 multi-block synchronization.</p></li>
-<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/d98094fefa33d0e6629204cf0665a5bce3f66a39/cuda_bindings/examples/4_CUDA_Libraries/nvidia_smi.py">nvidia_smi.py</a>
+<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/3694e06171e9b2316396377103e31ba605eaef6e/cuda_bindings/examples/4_CUDA_Libraries/nvidia_smi.py">nvidia_smi.py</a>
 uses NVML to implement a Python subset of <code class="docutils literal notranslate"><span class="pre">nvidia-smi</span></code>.</p></li>
 </ul>
 </section>
 <section id="advanced-and-interoperability">
 <h2>Advanced and interoperability<a class="headerlink" href="#advanced-and-interoperability" title="Link to this heading">#</a></h2>
 <ul class="simple">
-<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/d98094fefa33d0e6629204cf0665a5bce3f66a39/cuda_bindings/examples/extra/iso_fd_modelling.py">iso_fd_modelling.py</a>
+<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/3694e06171e9b2316396377103e31ba605eaef6e/cuda_bindings/examples/extra/iso_fd_modelling.py">iso_fd_modelling.py</a>
 runs isotropic finite-difference wave propagation across multiple GPUs with
 peer-to-peer halo exchange.</p></li>
-<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/d98094fefa33d0e6629204cf0665a5bce3f66a39/cuda_bindings/examples/extra/jit_program.py">jit_program.py</a>
+<li><p><a class="extlink-cuda-bindings-example reference external" href="https://github.com/NVIDIA/cuda-python/blob/3694e06171e9b2316396377103e31ba605eaef6e/cuda_bindings/examples/extra/jit_program.py">jit_program.py</a>
 JIT-compiles a SAXPY kernel with NVRTC and launches it through the Driver
 API.</p></li>
 </ul>