-
Notifications
You must be signed in to change notification settings - Fork 1
python jit Poc #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
python jit Poc #31
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,5 +13,8 @@ | |
| "format_on_save": true, | ||
| "style": "file" | ||
| }, | ||
| "python-black": { | ||
| "format_on_save": "on", | ||
| }, | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,34 @@ | ||||||||||||||||
| import numpy as np | ||||||||||||||||
| from numba import config | ||||||||||||||||
|
|
||||||||||||||||
| config.CUDA_ENABLE_PYNVJITLINK = 1 | ||||||||||||||||
|
|
||||||||||||||||
| import warnings | ||||||||||||||||
| from numba import cuda | ||||||||||||||||
| from numba.core.errors import NumbaPerformanceWarning | ||||||||||||||||
|
|
||||||||||||||||
| warnings.simplefilter("ignore", category=NumbaPerformanceWarning) | ||||||||||||||||
|
|
||||||||||||||||
| import poc_pyb | ||||||||||||||||
|
|
||||||||||||||||
| N = 32 # or warpsize | ||||||||||||||||
|
|
||||||||||||||||
|
|
||||||||||||||||
| @cuda.jit | ||||||||||||||||
| def vadd(i, a, b, c): | ||||||||||||||||
| c[i] = a[i] + b[i] | ||||||||||||||||
|
Comment on lines
+17
to
+19
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🌐 Web query:
💡 Result: In Numba CUDA, a Defining + calling a device function (the “calling convention”)from numba import cuda
`@cuda.jit`(device=True)
def bar(a, b):
return a + b
`@cuda.jit`
def kern(x, y, out):
i = cuda.grid(1)
out[i] = bar(x[i], y[i]) # normal call (device-side)This is the core convention: call like a regular function from within GPU code. [1] Signatures / specialization
Inlining / call overhead controls
What you cannot do (common confusion)
Also: calling non-
|
||||||||||||||||
|
|
||||||||||||||||
|
|
||||||||||||||||
| @cuda.jit | ||||||||||||||||
| def vector_add_gpu(a, b, c): | ||||||||||||||||
| vadd(cuda.threadIdx.x, a, b, c) | ||||||||||||||||
|
|
||||||||||||||||
|
|
||||||||||||||||
| s = poc_pyb.FunctionSupport() | ||||||||||||||||
| s.print() | ||||||||||||||||
| a, b, c = s.A(), s.B(), s.C() | ||||||||||||||||
| print(c) | ||||||||||||||||
| vector_add_gpu[1, N](a, b, c) | ||||||||||||||||
| print(c) | ||||||||||||||||
| a += 11 | ||||||||||||||||
|
Comment on lines
+31
to
+33
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: cat -n res/poc/jit/eg.pyRepository: mkn/mkn.gpu Length of output: 844 🏁 Script executed: find . -name "*.py" -o -name "*.pyx" | xargs grep -l "class FunctionSupport\|def FunctionSupport"Repository: mkn/mkn.gpu Length of output: 39 🏁 Script executed: find . -name "*poc*" -type f | head -20Repository: mkn/mkn.gpu Length of output: 37 🏁 Script executed: rg "class FunctionSupport|def FunctionSupport" --type pyRepository: mkn/mkn.gpu Length of output: 37 🏁 Script executed: find . -name "poc_pyb*" -o -name "*poc*" | grep -v "__pycache__"Repository: mkn/mkn.gpu Length of output: 64 🏁 Script executed: ls -la res/poc/Repository: mkn/mkn.gpu Length of output: 215 🏁 Script executed: ls -la res/poc/jit/Repository: mkn/mkn.gpu Length of output: 405 🏁 Script executed: cat -n res/poc/jit/pybind.cppRepository: mkn/mkn.gpu Length of output: 2359 🏁 Script executed: rg "mkn::gpu::alloc_managed" --type cpp -A 2 -B 2Repository: mkn/mkn.gpu Length of output: 399 🏁 Script executed: web_search "Numba CUDA kernel launch synchronous asynchronous cuda.synchronize"Repository: mkn/mkn.gpu Length of output: 105 🌐 Web query:
💡 Result:
Sources: Numba “Writing CUDA Kernels / kernel invocation” docs [1], Numba CUDA transfer/stream docs [2], Numba CUDA Array Interface sync semantics [3], NVIDIA CUDA Programming Guide (Asynchronous Execution) [4]. [1] (numba.readthedocs.io) Citations:
Add explicit Numba CUDA kernel launches are asynchronous—the launch enqueues work on the GPU and returns immediately to the host. Without synchronization, Minimal fix vector_add_gpu[1, N](a, b, c)
+cuda.synchronize()
print(c)
a += 11📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||
| s.print() | ||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| #! build -p pyb -qx ../../../res/mkn/nvcc.yaml -a "-g" | ||
|
|
||
| name: poc | ||
| dep: mkn.gpu | ||
|
|
||
| profile: | ||
| - name: pyb | ||
| inc: . | ||
| dep: lang.pybind11 mkn.gpu | ||
| mod: lang.python3 python3.module | ||
| src: pybind.cpp | ||
| install: . | ||
| env: | | ||
| MKN_LIB_LINK_LIB=1 | ||
| MKN_PYTHON_LIB_EMBED=1 |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,71 @@ | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| #include <pybind11/stl.h> | ||||||||||||||||||||||||||||||||||||||||||
| #include <pybind11/numpy.h> | ||||||||||||||||||||||||||||||||||||||||||
| #include <pybind11/pybind11.h> | ||||||||||||||||||||||||||||||||||||||||||
| #include <pybind11/functional.h> | ||||||||||||||||||||||||||||||||||||||||||
| #include <typeinfo> | ||||||||||||||||||||||||||||||||||||||||||
| #include <iostream> | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| #define PRINT(x) std::cout << __FILE__ << " " << __LINE__ << " " << x << std::endl; | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| #include "mkn/gpu.hpp" | ||||||||||||||||||||||||||||||||||||||||||
| #include "mkn/gpu/tuple.hpp" | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| void init(float* a, float* b) { | ||||||||||||||||||||||||||||||||||||||||||
| mkn::gpu::DLauncher()([a = a, b = b] __device__() { | ||||||||||||||||||||||||||||||||||||||||||
| a[threadIdx.x] = threadIdx.x + 1; | ||||||||||||||||||||||||||||||||||||||||||
| b[threadIdx.x] = threadIdx.x + 2; | ||||||||||||||||||||||||||||||||||||||||||
| }); | ||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| namespace py = pybind11; | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| struct FunctionSupport { | ||||||||||||||||||||||||||||||||||||||||||
| FunctionSupport() { | ||||||||||||||||||||||||||||||||||||||||||
| mkn::gpu::alloc_managed(a, 32); | ||||||||||||||||||||||||||||||||||||||||||
| mkn::gpu::alloc_managed(b, 32); | ||||||||||||||||||||||||||||||||||||||||||
| mkn::gpu::alloc_managed(c, 32); | ||||||||||||||||||||||||||||||||||||||||||
| print(); | ||||||||||||||||||||||||||||||||||||||||||
| init(a, b); | ||||||||||||||||||||||||||||||||||||||||||
| print(); | ||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+24
to
+30
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't log these buffers before this module has populated them. Both 🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||
| ~FunctionSupport() { | ||||||||||||||||||||||||||||||||||||||||||
| mkn::gpu::destroy(a); | ||||||||||||||||||||||||||||||||||||||||||
| mkn::gpu::destroy(b); | ||||||||||||||||||||||||||||||||||||||||||
| mkn::gpu::destroy(c); | ||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| void print() { | ||||||||||||||||||||||||||||||||||||||||||
| PRINT(a[0]); | ||||||||||||||||||||||||||||||||||||||||||
| PRINT(b[0]); | ||||||||||||||||||||||||||||||||||||||||||
| PRINT(c[0]); | ||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| py::array_t<float> make(auto p) { | ||||||||||||||||||||||||||||||||||||||||||
| return {{32}, {sizeof(float)}, p, py::capsule(p, [](void* f) { /* noop */ })}; | ||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+44
to
+46
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Dangling pointer risk: noop capsule means Python arrays outlive the backing memory. The One approach: pass a reference to the parent 💡 Sketch: tie array lifetime to parent- py::array_t<float> make(auto p) {
- return {{32}, {sizeof(float)}, p, py::capsule(p, [](void* f) { /* noop */ })};
+ py::array_t<float> make(auto p, py::object self) {
+ // Use `self` as the base object so NumPy keeps FunctionSupport alive
+ return py::array_t<float>({32}, {sizeof(float)}, p, self);
}Then pass - py::array_t<float> A() { return make(a); }
+ py::array_t<float> A(py::object self) { return make(a, self); }🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| py::array_t<float> A() { return make(a); } | ||||||||||||||||||||||||||||||||||||||||||
| py::array_t<float> B() { return make(b); } | ||||||||||||||||||||||||||||||||||||||||||
| py::array_t<float> C() { return make(c); } | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| float* a = nullptr; | ||||||||||||||||||||||||||||||||||||||||||
| float* b = nullptr; | ||||||||||||||||||||||||||||||||||||||||||
| float* c = nullptr; | ||||||||||||||||||||||||||||||||||||||||||
| }; | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| PYBIND11_MODULE(poc_pyb, m) { | ||||||||||||||||||||||||||||||||||||||||||
| py::class_<FunctionSupport, py::smart_holder>(m, "FunctionSupport") | ||||||||||||||||||||||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Check pybind11 version references in configuration files
echo "=== Pybind11 version references ==="
rg -rn 'pybind11' --glob='*.yaml' --glob='*.yml' --glob='*.txt' --glob='*.cmake' --glob='CMakeLists.txt' -C2Repository: mkn/mkn.gpu Length of output: 296 🏁 Script executed: #!/bin/bash
# Check CMakeLists.txt for pybind11 version specifications
echo "=== CMakeLists.txt content (if present) ==="
find . -name 'CMakeLists.txt' -type f -exec head -50 {} + 2>/dev/null
# Check setup.py or pyproject.toml for dependencies
echo -e "\n=== Setup/config files ==="
find . -maxdepth 3 -name 'setup.py' -o -name 'pyproject.toml' -o -name 'requirements.txt' | head -10Repository: mkn/mkn.gpu Length of output: 126 🏁 Script executed: #!/bin/bash
# Check smart_holder usage in the codebase
echo "=== Smart_holder usage ==="
rg -rn 'smart_holder' -C3Repository: mkn/mkn.gpu Length of output: 813 🏁 Script executed: #!/bin/bash
# Check the specific file context around lines 58 and 69
echo "=== Context around pybind.cpp lines 58 and 69 ==="
cat -n test/poc/jit/pybind.cpp | sed -n '50,80p'Repository: mkn/mkn.gpu Length of output: 977 🏁 Script executed: #!/bin/bash
# Read the full mkn.yaml file to see pybind11/dependency specifications
echo "=== Full mkn.yaml content ==="
find . -name '*.yaml' -o -name '*.yml' | xargs cat 2>/dev/null | head -100Repository: mkn/mkn.gpu Length of output: 2280 🏁 Script executed: #!/bin/bash
# Search for any pybind11 version specifications across all files
echo "=== Pybind11 version specifications ==="
rg -rn 'pybind11.*2\.'
rg -rn 'pybind11.*version'Repository: mkn/mkn.gpu Length of output: 94 🏁 Script executed: #!/bin/bash
# Look for dependency resolution in mkn related files
echo "=== MKN related config files ==="
find . -maxdepth 3 -type f \( -name 'mkn*' -o -name '*.mkn' \) | head -20Repository: mkn/mkn.gpu Length of output: 98 🏁 Script executed: #!/bin/bash
# Read the mkn.yaml in test/poc/jit directory
echo "=== test/poc/jit/mkn.yaml ==="
cat test/poc/jit/mkn.yamlRepository: mkn/mkn.gpu Length of output: 348 🏁 Script executed: #!/bin/bash
# Search for any pybind11 include or dependency statements
echo "=== Pybind11 includes in codebase ==="
rg -rn '#include.*pybind11'
# Check if there are any header files or submodule references
echo -e "\n=== Git submodules ==="
cat .gitmodules 2>/dev/null || echo "No submodules file found"Repository: mkn/mkn.gpu Length of output: 278 🏁 Script executed: #!/bin/bash
# Look for any CI/build documentation that might specify pybind11 version
echo "=== README or documentation files ==="
find . -maxdepth 2 -type f \( -name 'README*' -o -name 'CONTRIBUTING*' \) -exec head -50 {} + 2>/dev/null
# Check for any version pins in GitHub workflows
echo -e "\n=== Full workflow file ==="
find . -path './.github/workflows/*.yml' -o -path './.github/workflows/*.yaml' 2>/dev/null | xargs cat 2>/dev/nullRepository: mkn/mkn.gpu Length of output: 2389 Ensure pybind11 version >= 2.13 is explicitly specified or verified. The code uses 🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||
| .def(py::init<>()) | ||||||||||||||||||||||||||||||||||||||||||
| .def("print", &FunctionSupport::print) | ||||||||||||||||||||||||||||||||||||||||||
| .def_readwrite("a", &FunctionSupport::a) | ||||||||||||||||||||||||||||||||||||||||||
| .def("A", &FunctionSupport::A) | ||||||||||||||||||||||||||||||||||||||||||
| .def_readwrite("b", &FunctionSupport::b) | ||||||||||||||||||||||||||||||||||||||||||
| .def("B", &FunctionSupport::B) | ||||||||||||||||||||||||||||||||||||||||||
| .def_readwrite("a", &FunctionSupport::a) | ||||||||||||||||||||||||||||||||||||||||||
| .def("C", &FunctionSupport::C); | ||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+57
to
+66
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Duplicate binding: Line 65 binds Line 61 already binds 🐛 Proposed fix .def_readwrite("b", &FunctionSupport::b)
.def("B", &FunctionSupport::B)
- .def_readwrite("a", &FunctionSupport::a)
+ .def_readwrite("c", &FunctionSupport::c)
.def("C", &FunctionSupport::C);📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| using Span_t = mkn::gpu::Span<float>; | ||||||||||||||||||||||||||||||||||||||||||
| py::class_<Span_t, py::smart_holder>(m, "Span_s") | ||||||||||||||||||||||||||||||||||||||||||
| .def("__getitem__", [](Span_t& self, unsigned index) { return self[index]; }); | ||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| numpy | ||
| numba-cuda |
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -7,6 +7,8 @@ | |||||||||
|
|
||||||||||
| #include <algorithm> | ||||||||||
|
|
||||||||||
| #include <algorithm> | ||||||||||
|
|
||||||||||
|
Comment on lines
+10
to
+11
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: cat -n test/any/async_streaming.cpp | head -30Repository: mkn/mkn.gpu Length of output: 989 🏁 Script executed: rg -n 'std::this_thread' test/any/async_streaming.cppRepository: mkn/mkn.gpu Length of output: 292 🏁 Script executed: rg -n '#include' test/any/async_streaming.cpp | head -20Repository: mkn/mkn.gpu Length of output: 199 Add explicit The file uses Proposed fix `#include` <algorithm>
+#include <thread>📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||
| using namespace mkn::gpu; | ||||||||||
| using namespace std::chrono_literals; | ||||||||||
|
|
||||||||||
|
|
||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing comma produces invalid JSON.
Line 18 has a trailing comma after
"on"which is not valid in strict JSON. Some editors tolerate this, but it may cause parse errors in others.Proposed fix
"python-black": { - "format_on_save": "on", - }, + "format_on_save": "on" + }📝 Committable suggestion
🤖 Prompt for AI Agents