recalibrate timestamps when drift is dectected by rjodinchr · Pull Request #818 · kpet/clvk

rjodinchr · 2025-10-01T09:48:56Z

Because of device clock drift, we can end up with uncoherent submit/start/end timestamps.
When such case is dectected, recalibrate start/end time using the device timestamps.

For start time, consider that end time is correct and put start time before end time of the same offset we have seen on the device domain. Make sure that submit is still happening before start time.

For end time, we can only rely on submit time.

Because of device clock drift, we can end up with uncoherent submit/start/end timestamps. When such case is dectected, recalibrate start/end time using the device timestamps. For start time, consider that end time is correct and put start time before end time of the same offset we have seen on the device domain. Make sure that submit is still happening before start time. For end time, we can only rely on submit time.

rjodinchr · 2025-10-01T12:00:51Z

We might want to put that behind an option?

renatopereyra · 2025-10-01T17:16:44Z

src/event.hpp

+                "due to a drift in the device timer. Recalibrating the host "
+                "end time using the device timestamps.",
+                end_time, submit_time);
+    return submit_time + (end_dev_time - start_dev_time);


This doesn't include the time in between submit_time and start_dev_time. If the GPU is backed up due to a lot of GPU work, I think that could be a non-neglible amount of time

We re-calibrate end _time when it is below submit_time. In that case we have no way to know how long time elapsed between the submit_time and start_time. The only thing available to us is the runtime on the device domain. So we are just using that to put end_time no too far in the future (compared to where we wanted to put it with the translation), and then we have no other choice than putting start_time equal to submit_time.

renatopereyra · 2025-10-01T17:27:54Z

src/event.hpp

+        start_time, submit_time, end_time);
+    start_time = end_time - (end_dev_time - start_dev_time);
+    // Make sure submit time happens before start time.
+    return std::max(start_time, submit_time);


I could be wrong but I'm not sure that start_time is actually being updated above since start_time and end_time would probably be based on the same m_sync_host and m_sync_dev.

If start_time is not being updated, then we're fixing the problem by setting start_time to submit_time. If start_time is getting updating, at least some of the time we're still doing that. Needing to do it this way makes this approach seem a bit like hiding the problem rather than fixing it.

Needing to do it this way makes this approach seem a bit like hiding the problem rather than fixing it.

The goal here is not to find a way to get the correct timestamp. The goal is to expose coherent timestamps even when the device clock has drifted.
When the clock drifts the information that PR is keeping is the elapsed time in the clock domain, as long as we can, considering that the end_time is about right.

We could try to do much better, I tried different approaches keeping a queue of a pair of timestamp (device&host), but it has proven to always keep a little bit of flakiness. I think it could work if we had a side thread taking regular data points to ensure we are not missing any drift. But even like that, we need to consider how many timestamps you want to keep, we cannot just store forever, so you can still have a usecase where you do not have any good data points in the queue leading to inaccuracy.

I think what's important as far as profiling is concerned is to do the minimum when a drift is detected to return something coherent (submit < start < end).

renatopereyra · 2025-10-03T21:23:05Z

I thought of another idea for recalibrating timestamps.

We could add another sync timestamps (let's call it m_sync_clvk for now) to align with m_sync_dev and m_sync_host. We could set it by averaging the result of sample_clock before and after calling vkGetCalibratedTimestampsEXT. The idea would be to use this m_sync_clvk to convert between the sample_clock domain and the domain of the host timestamp returned by the Vulkan driver. Empirically, sample_clock differed from the host timestamp returned by vkGetCalibratedTimestampsEXT by more than 1 ms in some of my testing.

The idea would be that all the timestamps belonging to an event would key off of a single vkGetCalibratedTimestampsEXT sync and they would be recalibrated based on that sync point (either dev->host or clvk->host). The m_sync_* could be stored within the individual cvk_event as an add on.

Don't have to do it but thought I'd share.

renatopereyra reviewed Oct 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recalibrate timestamps when drift is dectected#818

recalibrate timestamps when drift is dectected#818
rjodinchr wants to merge 1 commit intokpet:mainfrom
rjodinchr:pr/timestamps

rjodinchr commented Oct 1, 2025

Uh oh!

rjodinchr commented Oct 1, 2025

Uh oh!

renatopereyra Oct 1, 2025

Uh oh!

rjodinchr Oct 2, 2025

Uh oh!

renatopereyra Oct 1, 2025

Uh oh!

rjodinchr Oct 2, 2025

Uh oh!

renatopereyra commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rjodinchr commented Oct 1, 2025

Uh oh!

rjodinchr commented Oct 1, 2025

Uh oh!

renatopereyra Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

rjodinchr Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

renatopereyra Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

rjodinchr Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

renatopereyra commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants