Skip to content

Add support for cuDSS lib#931

Merged
rwgk merged 8 commits into
NVIDIA:mainfrom
ZzEeKkAa:yhavrylko/fix/cudss_conda
Aug 29, 2025
Merged

Add support for cuDSS lib#931
rwgk merged 8 commits into
NVIDIA:mainfrom
ZzEeKkAa:yhavrylko/fix/cudss_conda

Conversation

@ZzEeKkAa

@ZzEeKkAa ZzEeKkAa commented Aug 29, 2025

Copy link
Copy Markdown
Contributor

Description

Bump pathfinder version to 1.2.1 (for release)

Relevant links:

@copy-pr-bot

copy-pr-bot Bot commented Aug 29, 2025

Copy link
Copy Markdown
Contributor

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@leofang leofang requested a review from rwgk August 29, 2025 17:53
@leofang leofang added bug Something isn't working P0 High priority - Must do! cuda.pathfinder Everything related to the cuda.pathfinder module labels Aug 29, 2025
@ZzEeKkAa ZzEeKkAa marked this pull request as draft August 29, 2025 17:56
@copy-pr-bot

copy-pr-bot Bot commented Aug 29, 2025

Copy link
Copy Markdown
Contributor

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@ZzEeKkAa

ZzEeKkAa commented Aug 29, 2025

Copy link
Copy Markdown
Contributor Author

Putting in draft until I test it on Windows (shortly)

@rwgk

rwgk commented Aug 29, 2025

Copy link
Copy Markdown
Contributor

Could you please also change

cuda_pathfinder/cuda/pathfinder/_version.py

Change to

__version__ = "1.2.1a0"

so that we don't have misleading versioning?

I'll also need to update docs/source/release, I will do that in a separate PR.

@copy-pr-bot

copy-pr-bot Bot commented Aug 29, 2025

Copy link
Copy Markdown
Contributor

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ZzEeKkAa ZzEeKkAa marked this pull request as ready for review August 29, 2025 20:06
@ZzEeKkAa

Copy link
Copy Markdown
Contributor Author

/ok to test

@rwgk

rwgk commented Aug 29, 2025

Copy link
Copy Markdown
Contributor

/ok to test 554ebc7

@github-actions

This comment has been minimized.

@rwgk rwgk changed the title Fix cudss in conda Add support for cuDSS lib Aug 29, 2025
@rwgk

rwgk commented Aug 29, 2025

Copy link
Copy Markdown
Contributor

For easy reference: All tests passed at commit 554ebc7

https://github.com/NVIDIA/cuda-python/actions/runs/17334713680?pr=931

@rwgk

rwgk commented Aug 29, 2025

Copy link
Copy Markdown
Contributor

To ensure our testing covers libname cudss:

https://github.com/NVIDIA/cuda-python/actions/runs/17334713680?pr=931

Download log archive:

logs_44471412178.zip

$ grep 'INFO test_load_nvidia_dynamic_lib\[cudss\]' *.txt
11_Test linux-64 _ py3.10, 12.9.0, local, GPU l4.txt:2025-08-29T21:56:11.3008882Z INFO test_load_nvidia_dynamic_lib[cudss]: abs_path='/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/nvidia/cu12/lib/libcudss.so.0'
12_Test linux-64 _ py3.11, 12.9.0, wheels, GPU l4.txt:2025-08-29T21:54:45.1988660Z INFO test_load_nvidia_dynamic_lib[cudss]: abs_path='/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/nvidia/cu12/lib/libcudss.so.0'
13_Test linux-64 _ py3.12, 12.9.0, local, GPU l4.txt:2025-08-29T21:55:49.8445624Z INFO test_load_nvidia_dynamic_lib[cudss]: abs_path='/opt/hostedtoolcache/Python/3.12.11/x64/lib/python3.12/site-packages/nvidia/cu12/lib/libcudss.so.0'
15_Test linux-64 _ py3.9, 12.9.0, wheels, GPU l4.txt:2025-08-29T21:53:13.5603808Z INFO test_load_nvidia_dynamic_lib[cudss]: abs_path='/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/nvidia/cu12/lib/libcudss.so.0'
16_Test linux-aarch64 _ py3.9, 12.9.0, wheels, GPU a100.txt:2025-08-29T21:47:47.2084003Z INFO test_load_nvidia_dynamic_lib[cudss]: abs_path='/opt/hostedtoolcache/Python/3.9.23/arm64/lib/python3.9/site-packages/nvidia/cu12/lib/libcudss.so.0'
22_Test linux-aarch64 _ py3.13, 12.9.0, wheels, GPU a100.txt:2025-08-29T21:47:26.0070266Z INFO test_load_nvidia_dynamic_lib[cudss]: abs_path='/opt/hostedtoolcache/Python/3.13.7/arm64/lib/python3.13/site-packages/nvidia/cu12/lib/libcudss.so.0'
23_Test linux-aarch64 _ py3.11, 12.9.0, wheels, GPU a100.txt:2025-08-29T21:47:30.4891870Z INFO test_load_nvidia_dynamic_lib[cudss]: abs_path='/opt/hostedtoolcache/Python/3.11.13/arm64/lib/python3.11/site-packages/nvidia/cu12/lib/libcudss.so.0'
24_Test linux-aarch64 _ py3.10, 12.9.0, local, GPU a100.txt:2025-08-29T21:48:12.1113794Z INFO test_load_nvidia_dynamic_lib[cudss]: abs_path='/opt/hostedtoolcache/Python/3.10.18/arm64/lib/python3.10/site-packages/nvidia/cu12/lib/libcudss.so.0'
25_Test linux-aarch64 _ py3.12, 12.9.0, local, GPU a100.txt:2025-08-29T21:48:35.7158367Z INFO test_load_nvidia_dynamic_lib[cudss]: abs_path='/opt/hostedtoolcache/Python/3.12.11/arm64/lib/python3.12/site-packages/nvidia/cu12/lib/libcudss.so.0'
8_Test linux-64 _ py3.13, 12.9.0, wheels, GPU l4.txt:2025-08-29T21:56:00.5487365Z INFO test_load_nvidia_dynamic_lib[cudss]: abs_path='/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/site-packages/nvidia/cu12/lib/libcudss.so.0'

@rwgk

rwgk commented Aug 29, 2025

Copy link
Copy Markdown
Contributor

Additional manual testing with conda:

conda create -n python3.12-cudss python=3.12 libcudss
conda activate python3.12-cudss
(python3.12-cudss) rwgk-win11.localdomain:~/forked/cuda-python/cuda_pathfinder $ pip install .
Processing /home/rgrossekunst/forked/cuda-python/cuda_pathfinder
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: cuda-pathfinder
  Building wheel for cuda-pathfinder (pyproject.toml) ... done
  Created wheel for cuda-pathfinder: filename=cuda_pathfinder-1.2.1a0-py3-none-any.whl size=22230 sha256=eed9ec121225ab962c8764cd71def9c32006b750050f327aa03bce5f9f53ce35
  Stored in directory: /tmp/pip-ephem-wheel-cache-9mgagsaq/wheels/d1/04/8c/c856bb3f85f3511d35e8d69acc9ca63f99f4d52d879aba968a
Successfully built cuda-pathfinder
Installing collected packages: cuda-pathfinder
Successfully installed cuda-pathfinder-1.2.1a0
(python3.12-cudss) rwgk-win11.localdomain:~/forked/cuda-python/cuda_pathfinder $ python -c "from cuda import pathfinder as pf; print(pf.__version__);
print(pf.load_nvidia_dynamic_lib('cudss'))"
1.2.1a0
LoadedDL(abs_path='/home/rgrossekunst/miniforge3/envs/python3.12-cudss/lib/python3.12/lib-dynload/../../libcudss.so.0', was_already_loaded_from_elsewhere=False, _handle_uint=106718980198128)

@rwgk

rwgk commented Aug 29, 2025

Copy link
Copy Markdown
Contributor

/ok to test bc73539

@rwgk

rwgk commented Aug 29, 2025

Copy link
Copy Markdown
Contributor

@leofang I decided it's most efficient to pack the 1.2.1 release prep right into this PR.

Assuming the tests pass (seems extremely likely because there are no code changes compared to the previous pass), and with your approval, I could easily release this still today; it could be really quick.

@rwgk rwgk requested a review from leofang August 29, 2025 22:48
@rparolin rparolin self-requested a review August 29, 2025 22:52

@rparolin rparolin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but still wait for @leofang final approval before merging.

@rwgk

rwgk commented Aug 29, 2025

Copy link
Copy Markdown
Contributor

This is a most bizarre error:

https://github.com/NVIDIA/cuda-python/actions/runs/17335731665/job/49221850858?pr=931

++ cat .github/BACKPORT_BRANCH
+ OLD_BRANCH=12.9.x
+ OLD_BASENAME='cuda-bindings-python311-cuda*-linux-64*'
++ gh run list -b 12.9.x -L 1 -w ci.yml -s completed -R NVIDIA/cuda-python --json databaseId
++ jq '.[]| .databaseId'
unknown shorthand flag: 'b' in -b

Usage:  gh run list [flags]

Flags:
  -q, --jq expression     Filter JSON output using a jq expression
      --json fields       Output JSON with the specified fields
  -L, --limit int         Maximum number of runs to fetch (default 20)
  -t, --template string   Format JSON output using a Go template
  -w, --workflow string   Filter runs by workflow
  
+ LATEST_PRIOR_RUN_ID=

@rwgk

rwgk commented Aug 29, 2025

Copy link
Copy Markdown
Contributor

I'm rerunning the one failed job, in hopes that it was something weird on that particular runner.

@rwgk rwgk merged commit f8c49f3 into NVIDIA:main Aug 29, 2025
94 of 95 checks passed
@github-actions

Copy link
Copy Markdown
Doc Preview CI
Preview removed because the pull request was closed or merged.

@leofang

leofang commented Aug 30, 2025

Copy link
Copy Markdown
Member

This is a most bizarre error:
https://github.com/NVIDIA/cuda-python/actions/runs/17335731665/job/49221850858?pr=931

I'm rerunning the one failed job, in hopes that it was something weird on that particular runner.

Yes I've seen that too. It's due to network glitches. We want to use new enough gh in the runners, self-hosted or GH-hosted, so we have to install it in the CI. But due to the glitch apt thinks the new index is not reachable, so a super old gh is installed instead. From the same CI log, same job step:

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Hit:1 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:2 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:4 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Ign:5 https://cli.github.com/packages stable InRelease
Ign:5 https://cli.github.com/packages stable InRelease
Ign:5 https://cli.github.com/packages stable InRelease
Err:5 https://cli.github.com/packages stable InRelease
  Could not connect to cli.github.com:443 (185.199.110.153), connection timed out Could not connect to cli.github.com:443 (185.199.109.153), connection timed out Could not connect to cli.github.com:443 (185.199.111.153), connection timed out Could not connect to cli.github.com:443 (185.199.108.153), connection timed out
Reading package lists...
Building dependency tree...
Reading state information...
W: Failed to fetch https://cli.github.com/packages/dists/stable/InRelease  Could not connect to cli.github.com:443 (185.199.110.153), connection timed out Could not connect to cli.github.com:443 (185.199.109.153), connection timed out Could not connect to cli.github.com:443 (185.199.111.153), connection timed out Could not connect to cli.github.com:443 (185.199.108.153), connection timed out
3 packages can be upgraded. Run 'apt list --upgradable' to see them.
W: Some index files failed to download. They have been ignored, or old ones used instead.

(...)

Preparing to unpack .../gh_2.4.0+dfsg1-2_amd64.deb ...
Unpacking gh (2.4.0+dfsg1-2) ...
Setting up gh (2.4.0+dfsg1-2) .

https://github.com/NVIDIA/cuda-python/actions/runs/17335731665/job/49221850858?pr=931#step:12:79

@leofang leofang added this to the pathfinder-nvmath-support milestone Aug 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.pathfinder Everything related to the cuda.pathfinder module P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants