Skip to content

Improvements to some cuda bindings example#471

Merged
leofang merged 3 commits into
NVIDIA:mainfrom
oleksandr-pavlyk:fix-cuda-bindings-example
Feb 26, 2025
Merged

Improvements to some cuda bindings example#471
leofang merged 3 commits into
NVIDIA:mainfrom
oleksandr-pavlyk:fix-cuda-bindings-example

Conversation

@oleksandr-pavlyk

Copy link
Copy Markdown
Contributor

Modified cuda_bindings/examples/0_Introduction/vectorAddDrv_test.py to use randn(N) instead of randn(size). Also renamed size to nbytes to readability.

Modified clock_nvrtc_test.py to improve readability.

The argument to rand should be the number of elements, not number of bytes.
Renamed variables for readability
@copy-pr-bot

copy-pr-bot Bot commented Feb 25, 2025

Copy link
Copy Markdown
Contributor

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Introduced elems_to_bytes utility function.
Introduce grid_dim, and block_dim variables and used Python unpacking
to splice them inside cudaLaunchKernel arguments.

Introduced shared_memory_nbytes variable for readability
keenan-simpson
keenan-simpson previously approved these changes Feb 25, 2025

@keenan-simpson keenan-simpson left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. added vlad as reviewer. Thanks!

1. Use NumPy to vectorize operations.
2. Move cuda resource deallocation calls before correctness checking,
   since the checking would terminate the script before deallocating
   resources.
@oleksandr-pavlyk

Copy link
Copy Markdown
Contributor Author

simpleCubmapTexture_test.py just like it's original in cuda-samples does not pass the test when num_layers is other than 1. If this is always the case, the code can be further simplified, since in that case we always have h_data_ref = -h_data.

@leofang leofang added enhancement Any code-related improvements P1 Medium priority - Should do cuda.bindings Everything related to the cuda.bindings module labels Feb 25, 2025
@leofang leofang added this to the cuda-python 12-next, 11-next milestone Feb 25, 2025
@leofang

leofang commented Feb 25, 2025

Copy link
Copy Markdown
Member

/ok to test



def elems_to_bytes(nelems, dt):
return nelems * np.dtype(dt).itemsize

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg this has been an eternal mystery to me 😛 Can the NumPy types (not dtypes) carry an itemsize class attribute so that we can just do dt.itemsize without wrapping it with np.dtype first?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scalar type can't have it e.g. for strings/structs and it mixes the two concepts a bit more. But mostly I think just nobody ever really thought about whether just allowing it may be convenient.
(maybe the more annoying thing is that there is no shorter spelling for the singleton/default np.dtype(np.float32) instance.)

@github-actions

This comment has been minimized.

@leofang leofang merged commit 5d86ebb into NVIDIA:main Feb 26, 2025
@github-actions

Copy link
Copy Markdown
Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.bindings Everything related to the cuda.bindings module enhancement Any code-related improvements P1 Medium priority - Should do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants