Use VK_KHR_timeline_semaphore to reduce cvk_command_batch submit latency by rjodinchr · Pull Request #808 · kpet/clvk

rjodinchr · 2025-08-21T09:18:35Z

Use a cvk_semaphore instead of a std::condition_variable when VK_KHR_timeline_semaphore is supported.

cvk_event holds a cvk_condition_variable which can be either cvk_std_condition_variable (using std::condition_variable) or a cvk_semaphore_condition_variable (using cvk_semaphore).

When the event is created it will get a cvk_semaphore and a value. It assumes that nothing will be created between the creation of the event and its submission in the queue, thus values will be allocated in order.

As this assumption is not always true, we make sure of it by using 3 timelines where we can ensure that nothing will be created between the creation of the event and its submission in the queue.

Add notify, wait, poll, poll_once implementation to cvk_semaphore.

When cvk_command_queue::end_current_command_batch is called, if we can use timeline semaphore and no synchronous command has been submitted, submit the batch.

Add config option to poll the timeline semaphore instead of waiting. Differentiate the main thread and the executors. This can be useful to understand performance issues, or driver bugs.

oscarbg · 2025-08-21T13:29:12Z

curious if it brings enhancements to clpeak "Kernel launch latency" score (--kernel-latency)..

rjodinchr · 2025-08-21T14:28:20Z

curious if it brings enhancements to clpeak "Kernel launch latency" score (--kernel-latency)..

I'm seeing an improvement on all the hardware I've tried so far with clpeak. Even with llvmpipe, it goes from 47 to 40 us for the --kernel-latency.

But to be honest, where we gain a lot is when we do not have a clFinish call between kernels. For workloads with lots of kernels between clFinish calls (with potential clFlush calls in between), that's where we have significant improvement.

Try to submit batch as soon as possible to reduce latency to it's minimum.

rjodinchr force-pushed the pr/timeline-sempahores branch 2 times, most recently from c39a392 to 9f1879c Compare August 21, 2025 12:29

rjodinchr force-pushed the pr/timeline-sempahores branch 2 times, most recently from 1808fa2 to 5d09625 Compare August 21, 2025 15:25

rjodinchr force-pushed the pr/timeline-sempahores branch from 5d09625 to 8584500 Compare August 29, 2025 06:24

rjodinchr added 4 commits September 9, 2025 17:57

Add support for timeline semaphores

f65ccd0

Try to submit batch as soon as possible to reduce latency to it's minimum.

Add config option to force polling timeline semaphore state

e2393f0

fix perfetto tests

ce975a5

api_tests: add missing clReleaseEvent

67f7102

rjodinchr force-pushed the pr/timeline-sempahores branch from 8584500 to 67f7102 Compare September 9, 2025 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use VK_KHR_timeline_semaphore to reduce cvk_command_batch submit latency#808

Use VK_KHR_timeline_semaphore to reduce cvk_command_batch submit latency#808
rjodinchr wants to merge 4 commits intokpet:mainfrom
rjodinchr:pr/timeline-sempahores

rjodinchr commented Aug 21, 2025

Uh oh!

oscarbg commented Aug 21, 2025

Uh oh!

rjodinchr commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rjodinchr commented Aug 21, 2025

Uh oh!

oscarbg commented Aug 21, 2025

Uh oh!

rjodinchr commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants