GPU Task progression using callback function implemented.#554
GPU Task progression using callback function implemented.#554josephjohnjj wants to merge 1 commit intoICLDisco:masterfrom
Conversation
The Callback function for a stream implemented using cudaLaunchHostFunc(). The callback function stream will push the tasks to the next stream. This replaces the cuda event based task progression.
|
Few high level comments for now:
|
This is the result from the master The callback function call the complete_stage for the stage it completed and pushes the task to the queue for the next stage (for instance, after stage-in, task is pushed to the execution queue). As the manager thread also has access to the queues, there can contention on these queues and this is one aspect which can increase the cost of callbacks. Another drawback nothing can be added to the stream while the callback is being triggered. We still need a progress thread as the progress thread (manager) calls the function that offloads actions (execution or memcopy) to the streams. Immediately after this, the callback is also pushed to the stream, by the progress thread . But the callback trigger itself is a CUDA functionality and we don't need to use a manager or any other worker thread for this. One main advantage of the PR, I think, is that we can progress more tasks. In the master when a task has been successfully progresses, the progression of all other tasks in the same stage, are delayed to move the progressed task to the next stage. And this progression is the responsibility of the manager thread. With this PR the manager thread's responsibility is limited to initiating the task stage-in (by moving the task to the stage-in queue) and task completion (__parsec_complete_execution () and parsec_cuda_kernel_epilog() ), the rest of the task progression will be done by the callbacks. |
The Callback function for a stream implemented using cudaLaunchHostFunc(). The callback function stream will push the tasks to the next stream.
This replaces the cuda event based task progression.