Improvements to some cuda bindings example#471
Conversation
The argument to rand should be the number of elements, not number of bytes. Renamed variables for readability
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Introduced elems_to_bytes utility function. Introduce grid_dim, and block_dim variables and used Python unpacking to splice them inside cudaLaunchKernel arguments. Introduced shared_memory_nbytes variable for readability
1404c60 to
8257f10
Compare
keenan-simpson
left a comment
There was a problem hiding this comment.
lgtm. added vlad as reviewer. Thanks!
1. Use NumPy to vectorize operations. 2. Move cuda resource deallocation calls before correctness checking, since the checking would terminate the script before deallocating resources.
|
|
|
/ok to test |
|
|
||
|
|
||
| def elems_to_bytes(nelems, dt): | ||
| return nelems * np.dtype(dt).itemsize |
There was a problem hiding this comment.
@seberg this has been an eternal mystery to me 😛 Can the NumPy types (not dtypes) carry an itemsize class attribute so that we can just do dt.itemsize without wrapping it with np.dtype first?
There was a problem hiding this comment.
The scalar type can't have it e.g. for strings/structs and it mixes the two concepts a bit more. But mostly I think just nobody ever really thought about whether just allowing it may be convenient.
(maybe the more annoying thing is that there is no shorter spelling for the singleton/default np.dtype(np.float32) instance.)
This comment has been minimized.
This comment has been minimized.
|
Modified
cuda_bindings/examples/0_Introduction/vectorAddDrv_test.pyto userandn(N)instead ofrandn(size). Also renamedsizetonbytesto readability.Modified
clock_nvrtc_test.pyto improve readability.