Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .sublime-project
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,8 @@
"format_on_save": true,
"style": "file"
},
"python-black": {
"format_on_save": "on",
},
Comment on lines +16 to +18
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Trailing comma produces invalid JSON.

Line 18 has a trailing comma after "on" which is not valid in strict JSON. Some editors tolerate this, but it may cause parse errors in others.

Proposed fix
 		"python-black": {
-	      "format_on_save": "on",
-	    },
+	      "format_on_save": "on"
+	    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"python-black": {
"format_on_save": "on",
},
"python-black": {
"format_on_save": "on"
}
🤖 Prompt for AI Agents
In @.sublime-project around lines 16 - 18, The JSON for the "python-black"
object contains a trailing comma after the "format_on_save" value which makes
the file invalid; remove the trailing comma after "format_on_save": "on" (i.e.,
ensure the "python-black" object ends with a closing brace and no trailing
comma) so the file becomes strict JSON-parsable while keeping the "python-black"
and "format_on_save" keys intact.

}
}
3 changes: 0 additions & 3 deletions mkn.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,21 +11,18 @@ profile:
- name: rocm
parent: headers
arg: -DMKN_GPU_ROCM=1
# main: tst/any/managed.cpp
test: tst/any/(\w).cpp
tst/hip/(\w).cpp

- name: cuda
parent: headers
arg: -DMKN_GPU_CUDA=1
# main: tst/any/managed.cpp
test: tst/any/(\w).cpp
tst/cuda/(\w).cpp

# if you have no GPU but want to test your code
- name: cpu
parent: headers
# main: tst/any/class1.cpp
arg: -DMKN_GPU_CPU=1
test: tst/any/(\w).cpp
tst/cpu/(\w).cpp
Expand Down
34 changes: 34 additions & 0 deletions res/poc/jit/eg.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
import numpy as np
from numba import config

config.CUDA_ENABLE_PYNVJITLINK = 1

import warnings
from numba import cuda
from numba.core.errors import NumbaPerformanceWarning

warnings.simplefilter("ignore", category=NumbaPerformanceWarning)

import poc_pyb

N = 32 # or warpsize


@cuda.jit
def vadd(i, a, b, c):
c[i] = a[i] + b[i]
Comment on lines +17 to +19
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Numba CUDA @cuda.jit device=True device function kernel calling conventions

💡 Result:

In Numba CUDA, a @cuda.jit(device=True) function is a CUDA device function (roughly analogous to __device__ in CUDA C/C++): it can only be called from GPU code (a kernel or another device function), and it is invoked with a normal Python call expression inside the kernel (not with the kernel[grid, block](...) launch syntax). Device functions may return a value; kernels may not. [1][2]

Defining + calling a device function (the “calling convention”)

from numba import cuda

`@cuda.jit`(device=True)
def bar(a, b):
    return a + b

`@cuda.jit`
def kern(x, y, out):
    i = cuda.grid(1)
    out[i] = bar(x[i], y[i])   # normal call (device-side)

This is the core convention: call like a regular function from within GPU code. [1]

Signatures / specialization

  • You can supply an explicit signature for a device function (and kernel), e.g. 'int32(int32, int32)', but you don’t have to—Numba can specialize based on argument types (and caches compiled variants). Example of a device signature is shown in Numba’s CUDA JIT docs. [3]
  • Kernel arguments support device arrays (CUDA Array Interface), NumPy arrays (with host↔device copying caveats), scalars, and records; device functions use the same underlying type system when called from kernels. [2]

Inlining / call overhead controls

  • @cuda.jit exposes inlining controls (inline="always" at Numba IR level, forceinline=True at NVVM IR level). [2]
  • Practically, many small device helpers get inlined by compilation/optimization anyway; Numba community guidance often treats device-function calls as effectively “inlined into the caller” from a performance perspective. [4]

What you cannot do (common confusion)

  • No dynamic parallelism: you cannot launch a kernel from inside a kernel in Numba (i.e., no device-side kernel launches). [5][6]

Also: calling non-cuda.jit(device=True) helpers

Numba can also compile a regular @numba.jit/@njit helper for the GPU when it’s called from a CUDA kernel (subject to CUDA-Python feature restrictions). [7]

Sources: Numba CUDA “Writing Device Functions” and kernel API docs, plus related Numba docs/discussions. [1][2][4][5][6][7]

[1] (nvidia.github.io)
[2] (nvidia.github.io)
[3] (numba.pydata.org)
[4] (numba.discourse.group)
[5] (nvidia.github.io)
[6] (numba.discourse.group)
[7] (developer.nvidia.com)

Citations:


🏁 Script executed:

cat test/poc/jit/eg.py

Repository: mkn/mkn.gpu

Length of output: 606


vadd must be a device function to be callable from vector_add_gpu.

@cuda.jit without arguments declares a kernel, which cannot be called from another kernel in Numba CUDA. Since vadd is invoked on line 26 from within vector_add_gpu, it must be marked as a device function.

Fix
-@cuda.jit
+@cuda.jit(device=True)
 def vadd(i, a, b, c):
     c[i] = a[i] + b[i]
🤖 Prompt for AI Agents
In `@test/poc/jit/eg.py` around lines 17 - 19, The vadd function is currently
decorated as a kernel (`@cuda.jit`) but is called from inside vector_add_gpu, so
change its decorator to a device function by using `@cuda.jit`(device=True) (i.e.,
make vadd a device function) so it can be invoked from the kernel; keep the
function name vadd and its parameter list unchanged and ensure vector_add_gpu
continues to call vadd from within the kernel.



@cuda.jit
def vector_add_gpu(a, b, c):
vadd(cuda.threadIdx.x, a, b, c)


s = poc_pyb.FunctionSupport()
s.print()
a, b, c = s.A(), s.B(), s.C()
print(c)
vector_add_gpu[1, N](a, b, c)
print(c)
a += 11
Comment on lines +31 to +33
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n res/poc/jit/eg.py

Repository: mkn/mkn.gpu

Length of output: 844


🏁 Script executed:

find . -name "*.py" -o -name "*.pyx" | xargs grep -l "class FunctionSupport\|def FunctionSupport"

Repository: mkn/mkn.gpu

Length of output: 39


🏁 Script executed:

find . -name "*poc*" -type f | head -20

Repository: mkn/mkn.gpu

Length of output: 37


🏁 Script executed:

rg "class FunctionSupport|def FunctionSupport" --type py

Repository: mkn/mkn.gpu

Length of output: 37


🏁 Script executed:

find . -name "poc_pyb*" -o -name "*poc*" | grep -v "__pycache__"

Repository: mkn/mkn.gpu

Length of output: 64


🏁 Script executed:

ls -la res/poc/

Repository: mkn/mkn.gpu

Length of output: 215


🏁 Script executed:

ls -la res/poc/jit/

Repository: mkn/mkn.gpu

Length of output: 405


🏁 Script executed:

cat -n res/poc/jit/pybind.cpp

Repository: mkn/mkn.gpu

Length of output: 2359


🏁 Script executed:

rg "mkn::gpu::alloc_managed" --type cpp -A 2 -B 2

Repository: mkn/mkn.gpu

Length of output: 399


🏁 Script executed:

web_search "Numba CUDA kernel launch synchronous asynchronous cuda.synchronize"

Repository: mkn/mkn.gpu

Length of output: 105


🌐 Web query:

Numba CUDA kernel launch asynchronous synchronize behavior

💡 Result:

  • Kernel launches are asynchronous w.r.t. the host in Numba CUDA: a launch enqueues work on the GPU and returns immediately; use numba.cuda.synchronize() to wait for previously-launched work to finish. [1]
  • Why launches sometimes look synchronous: if you pass a host (NumPy) array to a kernel, Numba may do implicit host↔device transfers, and the launch can block because the copy-back to host is synchronous, making execution appear synchronous. [1]
  • Stream-specific waiting: to wait for just one stream, use that stream’s stream.synchronize() (or with stream.auto_synchronize(): ...). Transfers/operations issued into a specified stream can be asynchronous. [2]
  • Implicit synchronization when consuming __cuda_array_interface__ inputs: when Numba is a consumer (e.g., you pass an external CUDA-array-like object to a kernel), if the interface provides stream as an integer, Numba will immediately synchronize on that stream for correctness (configurable via NUMBA_CUDA_ARRAY_INTERFACE_SYNC / CUDA_ARRAY_INTERFACE_SYNC). [3]
  • Errors often surface only at synchronization: in CUDA generally, some launch/runtime errors may not be reported until you synchronize a stream/device (or wait on an event). [4]

Sources: Numba “Writing CUDA Kernels / kernel invocation” docs [1], Numba CUDA transfer/stream docs [2], Numba CUDA Array Interface sync semantics [3], NVIDIA CUDA Programming Guide (Asynchronous Execution) [4].

[1] (numba.readthedocs.io)
[2] (numba.pydata.org)
[3] (numba.readthedocs.io)
[4] (docs.nvidia.com)

Citations:


Add explicit cuda.synchronize() after kernel launch before accessing arrays on host.

Numba CUDA kernel launches are asynchronous—the launch enqueues work on the GPU and returns immediately to the host. Without synchronization, print(c) and a += 11 execute while vector_add_gpu may still be running on the device. Although passing regular NumPy arrays may trigger implicit synchronization, explicit synchronization is the safe, documented approach.

Minimal fix
 vector_add_gpu[1, N](a, b, c)
+cuda.synchronize()
 print(c)
 a += 11
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
vector_add_gpu[1, N](a, b, c)
print(c)
a += 11
vector_add_gpu[1, N](a, b, c)
cuda.synchronize()
print(c)
a += 11
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@res/poc/jit/eg.py` around lines 31 - 33, The kernel launch using
vector_add_gpu[1, N](a, b, c) is asynchronous so add an explicit
cuda.synchronize() immediately after the launch and before any host-side access
(e.g., before print(c) and before a += 11) to ensure the GPU work has finished;
locate the vector_add_gpu launch and insert a call to cuda.synchronize() right
after it.

s.print()
15 changes: 15 additions & 0 deletions res/poc/jit/mkn.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#! build -p pyb -qx ../../../res/mkn/nvcc.yaml -a "-g"

name: poc
dep: mkn.gpu

profile:
- name: pyb
inc: .
dep: lang.pybind11 mkn.gpu
mod: lang.python3 python3.module
src: pybind.cpp
install: .
env: |
MKN_LIB_LINK_LIB=1
MKN_PYTHON_LIB_EMBED=1
71 changes: 71 additions & 0 deletions res/poc/jit/pybind.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@

#include <pybind11/stl.h>
#include <pybind11/numpy.h>
#include <pybind11/pybind11.h>
#include <pybind11/functional.h>
#include <typeinfo>
#include <iostream>

#define PRINT(x) std::cout << __FILE__ << " " << __LINE__ << " " << x << std::endl;

#include "mkn/gpu.hpp"
#include "mkn/gpu/tuple.hpp"

void init(float* a, float* b) {
mkn::gpu::DLauncher()([a = a, b = b] __device__() {
a[threadIdx.x] = threadIdx.x + 1;
b[threadIdx.x] = threadIdx.x + 2;
});
}

namespace py = pybind11;

struct FunctionSupport {
FunctionSupport() {
mkn::gpu::alloc_managed(a, 32);
mkn::gpu::alloc_managed(b, 32);
mkn::gpu::alloc_managed(c, 32);
print();
init(a, b);
print();
Comment on lines +24 to +30
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't log these buffers before this module has populated them.

Both print() calls happen before this file assigns anything to c, and the first one also happens before init(a, b) writes a / b. That makes the constructor output — and the first print(c) in res/poc/jit/eg.py — read buffers before this module has initialized them.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@res/poc/jit/pybind.cpp` around lines 24 - 30, The constructor
FunctionSupport() currently calls print() twice before any data is written to
the managed buffers a, b, and c (and before init(a, b) populates a/b), causing
unsafe reads; move or remove the early print() calls so that printing only
happens after init(a, b) has run and after this module assigns/initializes c
(i.e., ensure print() is invoked only once buffers a, b, and c are populated),
referencing FunctionSupport(), init(a, b), and print() to locate the changes.

}
~FunctionSupport() {
mkn::gpu::destroy(a);
mkn::gpu::destroy(b);
mkn::gpu::destroy(c);
}

void print() {
PRINT(a[0]);
PRINT(b[0]);
PRINT(c[0]);
}

py::array_t<float> make(auto p) {
return {{32}, {sizeof(float)}, p, py::capsule(p, [](void* f) { /* noop */ })};
}
Comment on lines +44 to +46
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Dangling pointer risk: noop capsule means Python arrays outlive the backing memory.

The make method wraps a raw CUDA-managed pointer into a py::array_t with a no-op destructor capsule. If Python code holds a reference to the returned array after the FunctionSupport object is destroyed, accessing the array will hit freed memory (use-after-free). In a PoC this may be fine, but for anything beyond that, consider having the capsule prevent premature destruction (e.g., tie the capsule's prevent premature destruction to the parent object's prevent premature destruction, or prevent premature destruction).

One approach: pass a reference to the parent py::object as the array's base so NumPy keeps FunctionSupport alive.

💡 Sketch: tie array lifetime to parent
- py::array_t<float> make(auto p) {
-   return {{32}, {sizeof(float)}, p, py::capsule(p, [](void* f) { /* noop */ })};
+ py::array_t<float> make(auto p, py::object self) {
+   // Use `self` as the base object so NumPy keeps FunctionSupport alive
+   return py::array_t<float>({32}, {sizeof(float)}, p, self);
  }

Then pass py::object from each accessor:

- py::array_t<float> A() { return make(a); }
+ py::array_t<float> A(py::object self) { return make(a, self); }
🤖 Prompt for AI Agents
In `@test/poc/jit/pybind.cpp` around lines 44 - 46, The wrapper creates a
py::array_t<float> in make(auto p) with a no-op capsule, risking use-after-free
if the underlying CUDA-managed pointer outlives its parent; change the capsule
to hold a reference to the parent Python object (e.g., the FunctionSupport
py::object) or otherwise set the array base to that parent so NumPy keeps the
parent alive. Locate make(auto p) and modify the capsule construction to capture
and incref (hold) the parent py::object (or pass the parent as the array base)
instead of a no-op destructor, ensuring the parent’s lifetime is extended until
the array is freed. Ensure the symbol FunctionSupport (or the accessor that
calls make) provides the py::object parent to make so the returned
py::array_t<float> keeps the parent alive.


py::array_t<float> A() { return make(a); }
py::array_t<float> B() { return make(b); }
py::array_t<float> C() { return make(c); }

float* a = nullptr;
float* b = nullptr;
float* c = nullptr;
};

PYBIND11_MODULE(poc_pyb, m) {
py::class_<FunctionSupport, py::smart_holder>(m, "FunctionSupport")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check pybind11 version references in configuration files
echo "=== Pybind11 version references ==="
rg -rn 'pybind11' --glob='*.yaml' --glob='*.yml' --glob='*.txt' --glob='*.cmake' --glob='CMakeLists.txt' -C2

Repository: mkn/mkn.gpu

Length of output: 296


🏁 Script executed:

#!/bin/bash
# Check CMakeLists.txt for pybind11 version specifications
echo "=== CMakeLists.txt content (if present) ==="
find . -name 'CMakeLists.txt' -type f -exec head -50 {} + 2>/dev/null

# Check setup.py or pyproject.toml for dependencies
echo -e "\n=== Setup/config files ==="
find . -maxdepth 3 -name 'setup.py' -o -name 'pyproject.toml' -o -name 'requirements.txt' | head -10

Repository: mkn/mkn.gpu

Length of output: 126


🏁 Script executed:

#!/bin/bash
# Check smart_holder usage in the codebase
echo "=== Smart_holder usage ==="
rg -rn 'smart_holder' -C3

Repository: mkn/mkn.gpu

Length of output: 813


🏁 Script executed:

#!/bin/bash
# Check the specific file context around lines 58 and 69
echo "=== Context around pybind.cpp lines 58 and 69 ==="
cat -n test/poc/jit/pybind.cpp | sed -n '50,80p'

Repository: mkn/mkn.gpu

Length of output: 977


🏁 Script executed:

#!/bin/bash
# Read the full mkn.yaml file to see pybind11/dependency specifications
echo "=== Full mkn.yaml content ==="
find . -name '*.yaml' -o -name '*.yml' | xargs cat 2>/dev/null | head -100

Repository: mkn/mkn.gpu

Length of output: 2280


🏁 Script executed:

#!/bin/bash
# Search for any pybind11 version specifications across all files
echo "=== Pybind11 version specifications ==="
rg -rn 'pybind11.*2\.' 
rg -rn 'pybind11.*version'

Repository: mkn/mkn.gpu

Length of output: 94


🏁 Script executed:

#!/bin/bash
# Look for dependency resolution in mkn related files
echo "=== MKN related config files ==="
find . -maxdepth 3 -type f \( -name 'mkn*' -o -name '*.mkn' \) | head -20

Repository: mkn/mkn.gpu

Length of output: 98


🏁 Script executed:

#!/bin/bash
# Read the mkn.yaml in test/poc/jit directory
echo "=== test/poc/jit/mkn.yaml ==="
cat test/poc/jit/mkn.yaml

Repository: mkn/mkn.gpu

Length of output: 348


🏁 Script executed:

#!/bin/bash
# Search for any pybind11 include or dependency statements
echo "=== Pybind11 includes in codebase ==="
rg -rn '#include.*pybind11'

# Check if there are any header files or submodule references
echo -e "\n=== Git submodules ==="
cat .gitmodules 2>/dev/null || echo "No submodules file found"

Repository: mkn/mkn.gpu

Length of output: 278


🏁 Script executed:

#!/bin/bash
# Look for any CI/build documentation that might specify pybind11 version
echo "=== README or documentation files ==="
find . -maxdepth 2 -type f \( -name 'README*' -o -name 'CONTRIBUTING*' \) -exec head -50 {} + 2>/dev/null

# Check for any version pins in GitHub workflows
echo -e "\n=== Full workflow file ==="
find . -path './.github/workflows/*.yml' -o -path './.github/workflows/*.yaml' 2>/dev/null | xargs cat 2>/dev/null

Repository: mkn/mkn.gpu

Length of output: 2389


Ensure pybind11 version >= 2.13 is explicitly specified or verified.

The code uses py::smart_holder at lines 58 and 69, which requires pybind11 v2.13 or later. The mkn.yaml dependency specifies lang.pybind11 without explicit version pinning. Verify that mkn's lang.pybind11 module provides v2.13+, or add an explicit version constraint to the mkn.yaml configuration to guarantee compatibility.

🤖 Prompt for AI Agents
In `@test/poc/jit/pybind.cpp` at line 58, The binding uses py::smart_holder in the
py::class_<FunctionSupport, py::smart_holder> (and the other class_
instantiation) which requires pybind11 v2.13+, so either pin/require that
version in your build config (add an explicit version constraint for
lang.pybind11 in mkn.yaml such as >=2.13) or add a build-time verification
(check PYBIND11_VERSION_MAJOR/MINOR or equivalent) and fail with a clear message
if the installed pybind11 is older; update mkn.yaml to explicitly request
pybind11 >=2.13 and/or add the version check near the module initialization that
defines FunctionSupport to guarantee compatibility.

.def(py::init<>())
.def("print", &FunctionSupport::print)
.def_readwrite("a", &FunctionSupport::a)
.def("A", &FunctionSupport::A)
.def_readwrite("b", &FunctionSupport::b)
.def("B", &FunctionSupport::B)
.def_readwrite("a", &FunctionSupport::a)
.def("C", &FunctionSupport::C);
Comment on lines +57 to +66
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Duplicate binding: Line 65 binds "a" again instead of "c".

Line 61 already binds def_readwrite("a", &FunctionSupport::a). Line 65 repeats the exact same binding. This almost certainly should be "c" to expose the c pointer, matching the pattern for a and b.

🐛 Proposed fix
       .def_readwrite("b", &FunctionSupport::b)
       .def("B", &FunctionSupport::B)
-      .def_readwrite("a", &FunctionSupport::a)
+      .def_readwrite("c", &FunctionSupport::c)
       .def("C", &FunctionSupport::C);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
PYBIND11_MODULE(poc_pyb, m) {
py::class_<FunctionSupport, py::smart_holder>(m, "FunctionSupport")
.def(py::init<>())
.def("print", &FunctionSupport::print)
.def_readwrite("a", &FunctionSupport::a)
.def("A", &FunctionSupport::A)
.def_readwrite("b", &FunctionSupport::b)
.def("B", &FunctionSupport::B)
.def_readwrite("a", &FunctionSupport::a)
.def("C", &FunctionSupport::C);
PYBIND11_MODULE(poc_pyb, m) {
py::class_<FunctionSupport, py::smart_holder>(m, "FunctionSupport")
.def(py::init<>())
.def("print", &FunctionSupport::print)
.def_readwrite("a", &FunctionSupport::a)
.def("A", &FunctionSupport::A)
.def_readwrite("b", &FunctionSupport::b)
.def("B", &FunctionSupport::B)
.def_readwrite("c", &FunctionSupport::c)
.def("C", &FunctionSupport::C);
🤖 Prompt for AI Agents
In `@test/poc/jit/pybind.cpp` around lines 57 - 66, The PYBIND11_MODULE binding
for class FunctionSupport contains a duplicate def_readwrite("a",
&FunctionSupport::a) where the third member should be exposed; update the
duplicate binding to def_readwrite("c", &FunctionSupport::c) so the class
exposes members a, b, and c correctly and keep the other method bindings (print,
A, B, C) unchanged.


using Span_t = mkn::gpu::Span<float>;
py::class_<Span_t, py::smart_holder>(m, "Span_s")
.def("__getitem__", [](Span_t& self, unsigned index) { return self[index]; });
}
2 changes: 2 additions & 0 deletions res/poc/jit/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
numpy
numba-cuda
2 changes: 2 additions & 0 deletions tst/any/async_streaming.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@

#include <algorithm>

#include <algorithm>

Comment on lines +10 to +11
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n test/any/async_streaming.cpp | head -30

Repository: mkn/mkn.gpu

Length of output: 989


🏁 Script executed:

rg -n 'std::this_thread' test/any/async_streaming.cpp

Repository: mkn/mkn.gpu

Length of output: 292


🏁 Script executed:

rg -n '#include' test/any/async_streaming.cpp | head -20

Repository: mkn/mkn.gpu

Length of output: 199


Add explicit #include <thread> header.

The file uses std::this_thread::sleep_for at lines 38, 70, 103, 138, and 173 but does not directly include <thread>. This relies on transitive inclusion from one of the project headers, which is fragile and non-portable across different compilers and standard library implementations.

Proposed fix
 `#include` <algorithm>
+#include <thread>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#include <algorithm>
`#include` <algorithm>
`#include` <thread>
🤖 Prompt for AI Agents
In `@test/any/async_streaming.cpp` around lines 8 - 9, The test uses
std::this_thread::sleep_for (seen at usages around lines for the async streaming
tests) but doesn't include <thread>, relying on transitive includes; add an
explicit `#include` <thread> at the top of async_streaming.cpp so
std::this_thread::sleep_for is available portably and no longer depends on
indirect headers.

using namespace mkn::gpu;
using namespace std::chrono_literals;

Expand Down
Loading