fix(callconv): align alloca slots to CUDA vector type ABI requirements#321
fix(callconv): align alloca slots to CUDA vector type ABI requirements#321isVoid wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
📝 WalkthroughWalkthroughThe shim lowering in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@numbast/src/numbast/callconv.py`:
- Around line 111-119: Extract the duplicated CUDA-allocation alignment logic
into a helper function and use it in all four places: create a helper like
_set_cuda_alloca_align(alloca, ty, target_data) that sets alloca.align =
max(target_data.abi_alignment(ty), min(target_data.abi_size(ty), 16)); replace
the repeated expressions (currently using _dl = context.target_data and setting
retval_ptr.align or other alloca.align with max(_dl.abi_alignment(...),
min(_dl.abi_size(...), 16))) by calling this helper with the corresponding
alloca (e.g., retval_ptr), type (e.g., retval_ty), and context.target_data to
centralize the logic and avoid repetition.
- Line 4: Remove the development/tracking marker comment
"NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of
numbast/src/numbast/callconv.py; simply delete that standalone comment line so
the file contains only production code/comments, and ensure no leftover blank
line or stray artifact remains.
- Around line 116-119: Update calls to the target data API and remove
unsupported alloca alignment assignments: replace _dl.abi_alignment(...) and
_dl.abi_size(...) with _dl.get_abi_alignment(...) and _dl.get_abi_size(...)
wherever used (e.g., the calculations around retval_ptr, and the other
occurrences at the same sites later in the file), and remove assignments to
AllocaInstr.align (e.g., retval_ptr.align = ..., and the other .align
assignments) since llvmlite IR AllocaInstr does not support setting .align; if
explicit "align N" is required use the binding layer to emit raw LLVM IR
instead, otherwise rely on the target data defaults and omit the .align
statements.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 09ca3211-42c8-4afe-84bd-ab4a26b03bf8
📒 Files selected for processing (1)
numbast/src/numbast/callconv.py
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| # NUMBAST_RETVAL_ALIGN_FIX_APPLIED |
There was a problem hiding this comment.
Remove debugging marker comment.
This marker comment appears to be a development/tracking artifact (possibly from Claude Code generation) and should not be committed to the production codebase.
Proposed fix
-# NUMBAST_RETVAL_ALIGN_FIX_APPLIED
from numbast.args import prepare_ir_types🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@numbast/src/numbast/callconv.py` at line 4, Remove the development/tracking
marker comment "NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of
numbast/src/numbast/callconv.py; simply delete that standalone comment line so
the file contains only production code/comments, and ensure no leftover blank
line or stray artifact remains.
| # Align the retval slot to the CUDA ABI requirement. | ||
| # CUDA vector types (float2=8B, float4=16B) need alignment | ||
| # == sizeof(type), but LLVM struct ABI alignment is only the | ||
| # element alignment (4B for floats). max(abi_align, min(sz,16)) | ||
| # covers float2→8, float4→16 without over-aligning large structs. | ||
| _dl = context.target_data | ||
| retval_ptr.align = max( | ||
| _dl.abi_alignment(retval_ty), min(_dl.abi_size(retval_ty), 16) | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Extract duplicated alignment logic into a helper function.
The alignment calculation max(_dl.abi_alignment(vty), min(_dl.abi_size(vty), 16)) is duplicated four times. Additionally, context.target_data is fetched repeatedly. Consider extracting this into a helper to improve maintainability and reduce the chance of inconsistent changes.
Proposed refactor
Add a module-level or nested helper function:
def _set_cuda_alloca_align(alloca: ir.Instruction, ty: ir.Type, target_data) -> None:
"""Set alloca alignment to satisfy CUDA vector type ABI requirements.
CUDA vector types (float2=8B, float4=16B) need alignment == sizeof(type),
but LLVM struct ABI alignment is only element alignment (4B for floats).
Formula: max(abi_align, min(size, 16)) covers vectors without over-aligning.
"""
alloca.align = max(
target_data.abi_alignment(ty), min(target_data.abi_size(ty), 16)
)Then replace each occurrence:
retval_ty = context.get_value_type(cxx_return_type)
retval_ptr = builder.alloca(retval_ty, name="retval")
- # Align the retval slot to the CUDA ABI requirement.
- # CUDA vector types (float2=8B, float4=16B) need alignment
- # == sizeof(type), but LLVM struct ABI alignment is only the
- # element alignment (4B for floats). max(abi_align, min(sz,16))
- # covers float2→8, float4→16 without over-aligning large structs.
- _dl = context.target_data
- retval_ptr.align = max(
- _dl.abi_alignment(retval_ty), min(_dl.abi_size(retval_ty), 16)
- )
+ _set_cuda_alloca_align(retval_ptr, retval_ty, context.target_data)Apply the same pattern at lines 166-170, 191-195, and 214-218.
Also applies to: 167-170, 192-195, 215-218
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@numbast/src/numbast/callconv.py` around lines 111 - 119, Extract the
duplicated CUDA-allocation alignment logic into a helper function and use it in
all four places: create a helper like _set_cuda_alloca_align(alloca, ty,
target_data) that sets alloca.align = max(target_data.abi_alignment(ty),
min(target_data.abi_size(ty), 16)); replace the repeated expressions (currently
using _dl = context.target_data and setting retval_ptr.align or other
alloca.align with max(_dl.abi_alignment(...), min(_dl.abi_size(...), 16))) by
calling this helper with the corresponding alloca (e.g., retval_ptr), type
(e.g., retval_ty), and context.target_data to centralize the logic and avoid
repetition.
| _dl = context.target_data | ||
| retval_ptr.align = max( | ||
| _dl.abi_alignment(retval_ty), min(_dl.abi_size(retval_ty), 16) | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check llvmlite usage patterns in the codebase for similar alignment patterns
rg -nC3 'target_data\.abi_alignment|target_data\.abi_size|\.align\s*=' --type=pyRepository: NVIDIA/numbast
Length of output: 2326
🌐 Web query:
llvmlite TargetData abi_alignment abi_size methods
💡 Result:
In llvmlite, the llvmlite.binding.TargetData class provides methods to query ABI size and alignment for LLVM types based on a target data layout string. It is instantiated via llvmlite.binding.create_target_data(data_layout) where data_layout is an LLVM data layout string. Key methods related to ABI size and alignment: - get_abi_size(type): Returns the ABI-mandated size (in bytes) of the given TypeRef object as an integer. - get_abi_alignment(type): Returns the ABI-mandated alignment (in bytes) of the given TypeRef object as an integer. Additional related methods: - get_pointee_abi_size(type): Like get_abi_size, but for the type pointed to by an LLVM pointer type (useful for globals). - get_pointee_abi_alignment(type): Like get_abi_alignment, but for the pointed-to type of a pointer. - get_element_offset(type, position): Computes the byte offset of the struct element at the given position. These methods are wrappers around LLVM C API functions like LLVMPY_ABISizeOfType and LLVMPY_ABIAlignmentOfType. Note: llvmlite.ir.Type instances also provide get_abi_size(target_data) and get_abi_alignment(target_data) methods that delegate to a provided TargetData instance. Example usage: from llvmlite import binding td = binding.create_target_data('e-p:64:64-i64:64-f80:128-n8:16:32') size = td.get_abi_size(some_type_ref) alignment = td.get_abi_alignment(some_type_ref)
Citations:
- 1: https://llvmlite.pydata.org/en/v0.42.0/user-guide/binding/target-information.html
- 2: https://github.com/numba/llvmlite/blob/master/llvmlite/binding/targets.py
- 3: https://llvmlite.readthedocs.io/en/latest/user-guide/binding/target-information.html
🏁 Script executed:
#!/bin/bash
# Find project dependencies to identify llvmlite version
fd -e 'requirements\.txt' -e 'setup\.py' -e 'setup\.cfg' -e 'pyproject\.toml' | head -10Repository: NVIDIA/numbast
Length of output: 40
🏁 Script executed:
#!/bin/bash
# Check the context object to understand target_data type
rg -nC5 'context\.target_data' numbast/src/numbast/callconv.py | head -30Repository: NVIDIA/numbast
Length of output: 1580
🌐 Web query:
llvmlite abi_alignment abi_size method names API version
💡 Result:
In llvmlite, the llvmlite.ir.Type base class (and subclasses for LLVM types like IntType, PointerType, etc.) provides methods get_abi_size(target_data) and get_abi_alignment(target_data) to query the ABI size (in bytes) and minimum ABI alignment (in bytes) of the type according to a given llvmlite.binding.TargetData instance representing the data layout. These methods internally convert the Python IR type to an LLVM type via _get_ll_global_value_type(target_data) (which creates a temporary module and global variable), then call LLVM C++ APIs: target_data.get_abi_size(llty) and target_data.get_abi_alignment(llty). Related methods include get_element_offset(target_data, ndx) for struct/array element byte offsets. In the binding layer (llvmlite.binding), TargetData exposes equivalent methods get_abi_size(type) (where type is a TypeRef), get_pointee_abi_size(type), get_pointee_abi_alignment(type), and get_element_offset(type, position) directly on LLVM TypeRef objects. llvmlite's current stable version is 0.47.0 (released March 31, 2026), compatible with LLVM 20.x (since 0.45.0). No breaking API changes to these methods are noted in recent release notes or changelog.
Citations:
- 1: https://github.com/numba/llvmlite/blob/master/llvmlite/ir/types.py
- 2: https://llvmlite.readthedocs.io/en/v0.9.0/ir/types.html
- 3: https://llvmlite.readthedocs.io/en/stable/user-guide/ir/types.html
- 4: https://llvmlite.pydata.org/en/latest/user-guide/ir/types.html
- 5: https://llvmlite.readthedocs.io/en/latest/user-guide/ir/types.html
- 6: https://pypi.python.org/pypi/llvmlite
- 7: https://pypi.python.org/project/llvmlite/
- 8: https://llvmlite.pydata.org/en/v0.42.0/user-guide/binding/target-information.html
- 9: https://llvmlite.pydata.org/en/v0.42.0/user-guide/binding/type-references.html
🏁 Script executed:
#!/bin/bash
# Check imports and context class definition
head -50 numbast/src/numbast/callconv.pyRepository: NVIDIA/numbast
Length of output: 1504
🏁 Script executed:
#!/bin/bash
# Find where context comes from and what target_data is
rg -nC3 'def.*context|class.*Context' numbast/src/numbast/callconv.py | head -40Repository: NVIDIA/numbast
Length of output: 890
🏁 Script executed:
#!/bin/bash
# Search for requirements or dependencies files
find . -name 'requirements*.txt' -o -name 'setup.py' -o -name 'pyproject.toml' | xargs cat 2>/dev/null | grep -i llvmliteRepository: NVIDIA/numbast
Length of output: 40
🌐 Web query:
llvmlite TargetData abi_size abi_alignment method without get prefix
💡 Result:
In llvmlite's binding module, the TargetData class provides methods abi_size and abi_alignment without the "get_" prefix. These are the underlying CFFI bindings that directly wrap LLVM C API functions like LLVMPY_ABISizeOfType and LLVMPY_ABIAlignmentOfType. The official documentation lists higher-level methods with "get_" prefixes: get_abi_size(type), get_abi_alignment(type), get_pointee_abi_size(type), and get_pointee_abi_alignment(type). However, the source code in llvmlite/binding/targets.py confirms the existence of the direct methods: def get_abi_size(self, ty): return ffi.lib.LLVMPY_ABISizeOfType(self, ty) def get_abi_alignment(self, ty): return ffi.lib.LLVMPY_ABIAlignmentOfType(self, ty) No, wait—the source shows def get_abi_size and def get_abi_alignment on TargetData, but the query asks for methods without the "get" prefix. Looking closely at the source snippet: the Python methods are named get_abi_size and get_abi_alignment, but the query specifies "without get prefix", suggesting perhaps older versions or direct attribute access. Actually, upon review, the official docs consistently show get_abi_size and get_abi_alignment as the public methods on TargetData (e.g., v0.42.0, latest dev docs). But the GitHub source code explicitly defines: def get_abi_size(self, ty): ... def get_abi_alignment(self, ty): ... There are no methods named abi_size or abi_alignment without "get_". However, llvmlite.ir.Type objects have methods get_abi_size(target_data) and get_abi_alignment(target_data), which also use "get_" prefix. The query may refer to LLVM C API style (ABISizeOfType, ABIAlignmentOfType without "get"), exposed via CFFI as the underlying lib functions, but the Python class methods on TargetData are get_abi_size and get_abi_alignment. No evidence of TargetData methods literally named abi_size or abi_alignment. In contrast, llvmpy (different library) has TargetData.abi_size(ty) and abi_alignment(ty) without "get_" (from result 3). llvmlite appears to consistently use "get_" prefix for these TargetData methods across all versions (0.42.0, 0.48 dev, source code). Thus, there are no such methods without "get" prefix in llvmlite TargetData; the standard methods are get_abi_size and get_abi_alignment.
Citations:
- 1: https://llvmlite.pydata.org/en/v0.42.0/user-guide/binding/target-information.html
- 2: https://github.com/numba/llvmlite/blob/master/llvmlite/binding/targets.py
- 3: https://llvmlite.pydata.org/en/latest/user-guide/binding/target-information.html
- 4: https://llvmlite.readthedocs.io/en/v0.42.0rc1/ir/types.html
- 5: https://llvmlite.pydata.org/en/v0.36.0/user-guide/binding/target-information.html
- 6: https://github.com/numba/llvmlite/blob/master/docs/source/user-guide/binding/target-information.rst
🏁 Script executed:
#!/bin/bash
# Search for context.target_data usage in numba-related code
rg 'context\.target_data' -A2 -B2 --type=py | head -50Repository: NVIDIA/numbast
Length of output: 1770
🌐 Web query:
numba cuda context target_data abi_alignment abi_size
💡 Result:
In Numba's CUDA target (numba.cuda), the CUDATargetContext class provides a target_data property that returns an llvmlite TargetData object specific to the NVVM (NVIDIA's LLVM backend for PTX/CUDA) data layout. This TargetData is created via ll.create_target_data(nvvm.NVVM.data_layout), where nvvm.NVVM.data_layout gives the NVVM data layout string (e.g., something like "e-p:64:64-p1:64:64-p2:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024"). The TargetData object exposes methods for querying ABI (Application Binary Interface) properties of LLVM types: - context.target_data.get_abi_size(llvm_type): Returns the ABI-mandated size in bytes of the given LLVM type. - context.target_data.get_abi_alignment(llvm_type): Returns the ABI-mandated alignment in bytes of the given LLVM type. These are used internally by Numba for accurate code generation, such as in make_constant_array where alignment is computed from get_abi_sizeof (alias for get_abi_size) to set global variable alignment: align = self.get_abi_sizeof(lldtype); gv.align = 2 (align - 1).bit_length. "Context" refers to the CUDATargetContext (the LLVM lowering context for CUDA compilation). No direct connection to CUDA driver contexts (numba.cuda.current_context), but rather the compiler target context. Numba's built-in CUDA target is deprecated; development continues in NVIDIA/numba-cuda, which inherits similar structure. These are low-level LLVM/llvmlite APIs used by Numba for PTX generation, ensuring correct type sizing/alignment per NVVM ABI for device code.
Citations:
- 1: https://github.com/numba/numba/blob/main/numba/cuda/target.py
- 2: https://github.com/numba/numba/blob/master/numba/cuda/target.py
- 3: https://github.com/numba/llvmlite/blob/master/docs/source/user-guide/binding/target-information.rst
- 4: https://llvmlite.pydata.org/en/v0.36.0/user-guide/binding/target-information.html
- 5: https://llvmlite.pydata.org/en/v0.42.0/user-guide/binding/target-information.html
🏁 Script executed:
#!/bin/bash
# Check if there's a wrapper or monkey-patch for TargetData in the codebase
rg 'abi_alignment|abi_size' -B3 -A1 --type=pyRepository: NVIDIA/numbast
Length of output: 1671
🏁 Script executed:
#!/bin/bash
# Search for alloca.align usage patterns in llvmlite/numba code
rg 'alloca.*\.align|\.align\s*=' numbast/src/numbast/callconv.py -B2 -A2Repository: NVIDIA/numbast
Length of output: 1031
🌐 Web query:
llvmlite alloca instruction align attribute setter
💡 Result:
llvmlite's IRBuilder.alloca method does not support setting an explicit align attribute at creation time. Official documentation shows alloca(typ, size=None, name='') with no align parameter. A feature request to add an "align" option exists (open as of 2025) because llvmlite defaults to target-determined alignment, unlike LLVM IR which supports optional align. No public setter method like setAlignment is exposed on the AllocaInstr object in llvmlite's IR layer. In the binding layer (llvmlite.binding), ValueRef exposes general attributes (name, linkage, etc.) but no specific alignment setter for alloca instructions. In native LLVM C++, AllocaInst has setAlignment(Align), but this is not directly accessible via llvmlite's high-level IR builder API. Alignment is implicitly handled by the allocated type's natural alignment or data layout defaults. Workarounds: Use load/store with explicit align parameters (supported), or parse custom LLVM IR strings via binding.parse_assembly with manual "align N" in textual IR.
Citations:
- 1: Add "align" option for
IRBuilder.allocanumba/llvmlite#1040 - 2: https://llvmlite.pydata.org/en/latest/user-guide/ir/ir-builder.html
- 3: http://llvm.org/docs/doxygen/html/classllvm_1_1AllocaInst.html
- 4: https://llvmlite.pydata.org/en/latest/user-guide/binding/value-references.html
🏁 Script executed:
#!/bin/bash
# Check if there are any tests for this code
fd -e '*.py' | xargs grep -l 'FunctionCallConv\|BaseCallConv' | head -5Repository: NVIDIA/numbast
Length of output: 40
Fix incorrect llvmlite API method names and alloca alignment setting.
The code has two critical bugs:
-
Incorrect method names:
_dl.abi_alignment()and_dl.abi_size()do not exist in llvmlite. The correct method names are_dl.get_abi_alignment()and_dl.get_abi_size()(with "get_" prefix). -
Invalid alloca.align attribute: llvmlite's AllocaInstr does not expose an
.alignattribute setter. The alignment feature is not supported in llvmlite's high-level IR API.
Change lines 117–119, 168–170, 193–195, and 216–218 to use the correct method names:
_dl.abi_alignment(...)→_dl.get_abi_alignment(...)_dl.abi_size(...)→_dl.get_abi_size(...)
For alignment, either use llvmlite's binding layer to parse custom LLVM IR with explicit "align N" directives, or rely on the target data layout's default alignment and remove the .align assignments.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@numbast/src/numbast/callconv.py` around lines 116 - 119, Update calls to the
target data API and remove unsupported alloca alignment assignments: replace
_dl.abi_alignment(...) and _dl.abi_size(...) with _dl.get_abi_alignment(...) and
_dl.get_abi_size(...) wherever used (e.g., the calculations around retval_ptr,
and the other occurrences at the same sites later in the file), and remove
assignments to AllocaInstr.align (e.g., retval_ptr.align = ..., and the other
.align assignments) since llvmlite IR AllocaInstr does not support setting
.align; if explicit "align N" is required use the binding layer to emit raw LLVM
IR instead, otherwise rely on the target data defaults and omit the .align
statements.
CUDA vector types (float2, float4, etc.) carry __align__(N) attributes that require N-byte alignment (float2 → 8 B, float4 → 16 B). LLVM represents them as anonymous structs whose ABI alignment defaults to the element alignment (4 B for float), not the vector alignment. When FunctionCallConv._lower_impl allocates stack slots via builder.alloca / cgutils.alloca_once without an explicit alignment, LLVM emits 4-byte-aligned allocas. The NVRTC-compiled shim then tries to perform a vector load/store (e.g. ld.global.v2.f32 for float2) on that 4-byte-aligned pointer, which violates the 8-byte alignment requirement and raises cudaErrorMisalignedAddress at runtime. Fix: after every alloca in _lower_impl, set alloca.align = max(dl.abi_alignment(ty), min(dl.abi_size(ty), 16)) The cap at 16 bytes covers float4/int4 (the widest standard CUDA vector types) without over-aligning large user-defined structs. The four sites fixed are: 1. retval_ptr — the function return-value slot 2. visible-arg ptrs (no-intent-plan path) 3. out_return ptrs (intent-plan path) 4. visible-arg ptrs (intent-plan path) Fixes optixGetTriangleBarycentrics() (float2 return) and any other Numbast binding that returns or accepts a CUDA vector type.
21c668b to
4fc0c38
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
numbast/src/numbast/callconv.py (1)
135-138: 🧹 Nitpick | 🔵 TrivialDeduplicate the repeated alignment formula into a helper.
The same expression is repeated four times; centralizing it will reduce drift risk and simplify future ABI updates.
Refactor sketch
+def _set_cuda_alloca_align(alloca, value_ty, target_data): + alloca.align = max( + target_data.abi_alignment(value_ty), + min(target_data.abi_size(value_ty), 16), + ) ... - _dl = context.target_data - retval_ptr.align = max( - _dl.abi_alignment(retval_ty), min(_dl.abi_size(retval_ty), 16) - ) + _set_cuda_alloca_align(retval_ptr, retval_ty, context.target_data) ... - _dl = context.target_data - ptr.align = max( - _dl.abi_alignment(vty), min(_dl.abi_size(vty), 16) - ) + _set_cuda_alloca_align(ptr, vty, context.target_data)Also applies to: 186-189, 211-214, 234-237
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@numbast/src/numbast/callconv.py` around lines 135 - 138, Several places set retval_ptr.align using the same expression; extract that logic into a small helper (e.g., abi_retval_align(dl, ty) or _compute_retval_align) that takes the data layout/target_data and the type and returns max(dl.abi_alignment(ty), min(dl.abi_size(ty), 16)); then replace each repeated expression (the assignments to retval_ptr.align found where context.target_data/_dl and retval_ty are used) with a call to this new helper (pass context.target_data or _dl and retval_ty). Ensure the helper is placed in the same module (callconv.py) and update all four locations (the occurrences setting retval_ptr.align) to use it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@numbast/src/numbast/callconv.py`:
- Around line 135-138: Several places set retval_ptr.align using the same
expression; extract that logic into a small helper (e.g., abi_retval_align(dl,
ty) or _compute_retval_align) that takes the data layout/target_data and the
type and returns max(dl.abi_alignment(ty), min(dl.abi_size(ty), 16)); then
replace each repeated expression (the assignments to retval_ptr.align found
where context.target_data/_dl and retval_ty are used) with a call to this new
helper (pass context.target_data or _dl and retval_ty). Ensure the helper is
placed in the same module (callconv.py) and update all four locations (the
occurrences setting retval_ptr.align) to use it.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 7a63c268-ad04-4299-9645-7c500649845e
📒 Files selected for processing (1)
numbast/src/numbast/callconv.py
…uristic The previous fix used max(abi_alignment, min(sizeof, 16)) to guess the required alloca alignment for CUDA vector types. This heuristic works for power-of-2 sized types (float2, float4) but is incorrect for non-power-of-2 types like float3/uint3 (sizeof=12 → would produce alignment=12, which is not a valid power-of-2 LLVM alignment). Numbast already propagates alignof_ from ast_canopy onto user-defined bound structs, and already uses getattr(argty, "alignof_", None) for load/store instructions. Apply the same convention to alloca: check alignof_ on the Numba type and set it when present; when absent, leave LLVM's default ABI alignment (correct for scalars and structs without an explicit __align__ attribute). Callers registering built-in CUDA vector types in CTYPE_MAPS must set alignof_ on the Numba type to match the __align__(N) in the CUDA headers (e.g. float32x2.alignof_ = 8 for float2's __align__(8)). This mirrors how ast_canopy-derived struct types already work.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
numbast/src/numbast/callconv.py (1)
4-4:⚠️ Potential issue | 🟡 MinorRemove transient marker comment at Line 4.
# NUMBAST_RETVAL_ALIGN_FIX_APPLIEDlooks like a tracking artifact and should not remain in production source.Proposed fix
-# NUMBAST_RETVAL_ALIGN_FIX_APPLIED from numbast.args import prepare_ir_types🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@numbast/src/numbast/callconv.py` at line 4, Remove the transient tracking comment "# NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of callconv.py; simply delete that standalone marker so the source contains no leftover transient artifact (search for the exact string NUMBAST_RETVAL_ALIGN_FIX_APPLIED to locate it).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@numbast/src/numbast/callconv.py`:
- Line 4: Remove the transient tracking comment "#
NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of callconv.py; simply delete
that standalone marker so the source contains no leftover transient artifact
(search for the exact string NUMBAST_RETVAL_ALIGN_FIX_APPLIED to locate it).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 18c274b5-7749-4f9e-8607-5769df380ab4
📒 Files selected for processing (1)
numbast/src/numbast/callconv.py
Problem
CUDA vector types (
float2,float4, etc.) carry__align__(N)attributes requiring N-byte alignment (float2→ 8 B,float4→ 16 B). LLVM represents them as anonymous structs whose ABI alignment defaults to the element alignment (4 B forfloat), not the vector alignment.When
FunctionCallConv._lower_implallocates stack slots viabuilder.alloca/cgutils.alloca_oncewithout an explicit alignment, LLVM emits 4-byte-aligned allocas. The NVRTC-compiled shim then performs a vector load/store (e.g.ld.global.v2.f32forfloat2) on that 4-byte-aligned pointer, violating the 8-byte alignment requirement and raisingcudaErrorMisalignedAddressat runtime.Concrete example:
optixGetTriangleBarycentrics()returnsfloat2. With the default alloca alignment, every call crashes withcudaErrorMisalignedAddress.Fix
After every
alloca/alloca_oncein_lower_impl, set:The cap at 16 bytes covers
float4/int4(the widest standard CUDA vector types) without over-aligning large user-defined structs.Sites fixed
retval_ptr— the function return-value slotout_returnptrs (intent-plan path)Testing
Verified with an OptiX
closesthitprogram callingoptixGetTriangleBarycentrics()(float2 return) via a Numbast auto-binding — previously crashed withcudaErrorMisalignedAddress, now returns correct barycentric coordinates.🤖 Generated with Claude Code
Summary by CodeRabbit