Skip to content

add launching dummy kernels in cuda examples#65

Open
m-fila wants to merge 1 commit intomainfrom
nanospin
Open

add launching dummy kernels in cuda examples#65
m-fila wants to merge 1 commit intomainfrom
nanospin

Conversation

@m-fila
Copy link
Copy Markdown
Member

@m-fila m-fila commented Apr 10, 2026

Adding dumy kernel in "reconstruction", "delegation" and "event poll" examples to keep the GPU busy for longer. In particular in "event poll" examples the cudaEventQuery was usually returning immediately so it was unlikely to observe any repetitions. Now, several repetitions can be observed:

139800136908800     event0:main.reconstruction  Event not ready, retrying...
139800136908800     event1:main.reconstruction  Event not ready, retrying...
139800136908800     event0:main.reconstruction  Event not ready, retrying...
139800136908800     event1:main.reconstruction  Event not ready, retrying...
139800136908800     event0:main.reconstruction  Event completed successfully
139800136908800     event1:main.reconstruction  Event not ready, retrying...
139800136908800     event1:main.reconstruction  Event completed successfully
139799920902144     event0:main.reconstruction  Finishing reconstruction
139799841210368     event1:main.reconstruction  Finishing reconstruction

Now, the examples should be more useful for estimating the costs of synchronization:

nsys (Nvidia L40s):

 Time (%)  Total Time (ns)  Num Calls   Avg (ns)   Med (ns)  Min (ns)  Max (ns)   StdDev (ns)            Name          
 --------  ---------------  ---------  ----------  --------  --------  ---------  -----------  ------------------------
      0.2           236178        506       466.8     460.0       340       3670        236.9  cudaEventQuery
      0.0            33941          9      3771.2    2270.0      1730      14611       4123.1  cudaEventRecord

For comparison, "delegation" example using callback

 Time (%)  Total Time (ns)  Num Calls   Avg (ns)   Med (ns)  Min (ns)  Max (ns)   StdDev (ns)            Name           
 --------  ---------------  ---------  ----------  --------  --------  ---------  -----------  -------------------------
      1.0          1228536          9    136504.0   15571.0      3370    1058232     346172.5  cudaLaunchHostFunc_v10000

@ericcano

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant