Skip to content

Commit 200e7de

Browse files
committed
Make latch-kernel helper compile only once
For some reason the latch kernel helper test started failing now (it did not before my update from CUDA 13.2 to 13.3?). The reason isn't that it is not thread-safe, but that something (presumably module loading/unloading) causes synchronizations which in turn cause threads having to wait on their LatchKernel to finish. And of course the test itself really needs that not to happen. Making sure there is only one LatchKernel compiled and loaded exactly once seems to avoid this problem. Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
1 parent a9ef88d commit 200e7de

1 file changed

Lines changed: 25 additions & 6 deletions

File tree

cuda_core/tests/helpers/latch.py

Lines changed: 25 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
1-
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
# SPDX-License-Identifier: Apache-2.0
33

44
import ctypes
5+
import threading
56

67
import pytest
78

@@ -20,9 +21,15 @@ class LatchKernel:
2021
Manages a kernel that blocks stream progress until released.
2122
"""
2223

23-
def __init__(self, device, timeout_sec=60):
24-
if helpers.CUDA_INCLUDE_PATH is None:
25-
pytest.skip("need CUDA header")
24+
_latch_kernel_lock = threading.Lock()
25+
_latch_kernels = {}
26+
27+
@classmethod
28+
def _get_kernel(cls, device):
29+
kernel = cls._latch_kernels.get(device.uuid)
30+
if kernel is not None:
31+
return kernel
32+
2633
code = """
2734
#include <cuda/atomic>
2835
@@ -41,6 +48,7 @@ def __init__(self, device, timeout_sec=60):
4148
4249
// Check for timeout
4350
if (clock64() - start >= timeout_cycles) {
51+
signal.store(-1, cuda::memory_order_relaxed);
4452
break; // Timeout reached
4553
}
4654
@@ -56,14 +64,25 @@ def __init__(self, device, timeout_sec=60):
5664
)
5765
prog = Program(code, code_type="c++", options=program_options)
5866
mod = prog.compile(target_type="cubin")
59-
self.kernel = mod.get_kernel("latch")
67+
kernel = mod.get_kernel("latch")
68+
69+
return cls._latch_kernels.setdefault(device.uuid, kernel)
70+
71+
def __init__(self, device, timeout_sec=60):
72+
if helpers.CUDA_INCLUDE_PATH is None:
73+
pytest.skip("need CUDA header")
74+
75+
with self._latch_kernel_lock:
76+
self.kernel = self._get_kernel(device)
6077

6178
mr = LegacyPinnedMemoryResource()
6279
self.buffer = mr.allocate(4)
63-
self.busy_wait_flag[0] = 0
80+
self.busy_wait_flag[0] = 1
6481
clock_rate_hz = device.properties.clock_rate * 1000
6582
self.timeout_cycles = int(timeout_sec * clock_rate_hz)
6683

84+
self.busy_wait_flag[0] = 0
85+
6786
def launch(self, stream):
6887
"""Launch the latch kernel, blocking stream progress via busy waiting."""
6988
config = LaunchConfig(grid=1, block=1)

0 commit comments

Comments
 (0)