Skip to content

Use VK_KHR_timeline_semaphore to reduce cvk_command_batch submit latency#808

Open
rjodinchr wants to merge 4 commits intokpet:mainfrom
rjodinchr:pr/timeline-sempahores
Open

Use VK_KHR_timeline_semaphore to reduce cvk_command_batch submit latency#808
rjodinchr wants to merge 4 commits intokpet:mainfrom
rjodinchr:pr/timeline-sempahores

Conversation

@rjodinchr
Copy link
Copy Markdown
Contributor

Use a cvk_semaphore instead of a std::condition_variable when VK_KHR_timeline_semaphore is supported.

cvk_event holds a cvk_condition_variable which can be either cvk_std_condition_variable (using std::condition_variable) or a cvk_semaphore_condition_variable (using cvk_semaphore).

When the event is created it will get a cvk_semaphore and a value. It assumes that nothing will be created between the creation of the event and its submission in the queue, thus values will be allocated in order.

As this assumption is not always true, we make sure of it by using 3 timelines where we can ensure that nothing will be created between the creation of the event and its submission in the queue.

Add notify, wait, poll, poll_once implementation to cvk_semaphore.

When cvk_command_queue::end_current_command_batch is called, if we can use timeline semaphore and no synchronous command has been submitted, submit the batch.

Add config option to poll the timeline semaphore instead of waiting. Differentiate the main thread and the executors. This can be useful to understand performance issues, or driver bugs.

@rjodinchr rjodinchr force-pushed the pr/timeline-sempahores branch 2 times, most recently from c39a392 to 9f1879c Compare August 21, 2025 12:29
@oscarbg
Copy link
Copy Markdown

oscarbg commented Aug 21, 2025

curious if it brings enhancements to clpeak "Kernel launch latency" score (--kernel-latency)..

@rjodinchr
Copy link
Copy Markdown
Contributor Author

curious if it brings enhancements to clpeak "Kernel launch latency" score (--kernel-latency)..

I'm seeing an improvement on all the hardware I've tried so far with clpeak. Even with llvmpipe, it goes from 47 to 40 us for the --kernel-latency.

But to be honest, where we gain a lot is when we do not have a clFinish call between kernels. For workloads with lots of kernels between clFinish calls (with potential clFlush calls in between), that's where we have significant improvement.

@rjodinchr rjodinchr force-pushed the pr/timeline-sempahores branch 2 times, most recently from 1808fa2 to 5d09625 Compare August 21, 2025 15:25
@rjodinchr rjodinchr force-pushed the pr/timeline-sempahores branch from 5d09625 to 8584500 Compare August 29, 2025 06:24
@rjodinchr rjodinchr force-pushed the pr/timeline-sempahores branch from 8584500 to 67f7102 Compare September 9, 2025 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants