fix(callconv): align alloca slots to CUDA vector type ABI requirements by isVoid · Pull Request #321 · NVIDIA/numbast

isVoid · 2026-04-15T18:49:58Z

Problem

CUDA vector types (float2, float4, etc.) carry __align__(N) attributes requiring N-byte alignment (float2 → 8 B, float4 → 16 B). LLVM represents them as anonymous structs whose ABI alignment defaults to the element alignment (4 B for float), not the vector alignment.

When FunctionCallConv._lower_impl allocates stack slots via builder.alloca / cgutils.alloca_once without an explicit alignment, LLVM emits 4-byte-aligned allocas. The NVRTC-compiled shim then performs a vector load/store (e.g. ld.global.v2.f32 for float2) on that 4-byte-aligned pointer, violating the 8-byte alignment requirement and raising cudaErrorMisalignedAddress at runtime.

Concrete example: optixGetTriangleBarycentrics() returns float2. With the default alloca alignment, every call crashes with cudaErrorMisalignedAddress.

Fix

After every alloca / alloca_once in _lower_impl, set:

_dl = context.target_data
alloca_instr.align = max(_dl.abi_alignment(ty), min(_dl.abi_size(ty), 16))

The cap at 16 bytes covers float4/int4 (the widest standard CUDA vector types) without over-aligning large user-defined structs.

Sites fixed

retval_ptr — the function return-value slot
Visible-arg ptrs (no-intent-plan path)
out_return ptrs (intent-plan path)
Visible-arg ptrs (intent-plan path)

Testing

Verified with an OptiX closesthit program calling optixGetTriangleBarycentrics() (float2 return) via a Numbast auto-binding — previously crashed with cudaErrorMisalignedAddress, now returns correct barycentric coordinates.

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Fixed alignment handling for return value slots and stack allocations, ensuring allocations use the proper type alignment.
- Improved alignment for out-return pointers and non-passthrough arguments, enhancing stability and ABI compliance for CUDA and similar targets.

copy-pr-bot · 2026-04-15T18:50:03Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-15T18:50:14Z

📝 Walkthrough

Walkthrough

The shim lowering in numbast/src/numbast/callconv.py now reads alignof_ from the C++ return type and sets retval_ptr.align when present. Stack alloca results for non-passthrough arguments and out-return pointer targets also get ptr.align from the corresponding Numba type’s alignof_ (when present). Store alignment uses the local _nb_align variable. A file-level marker comment was added.

Changes

Cohort / File(s)	Summary
Callconv alignment updates `numbast/src/numbast/callconv.py`	Set explicit `alloca` alignment for C++ non-void return slot by reading `cxx_return_type.alignof_` and assigning `retval_ptr.align`. Apply `ptr.align` to `cgutils.alloca_once(...)` results using each Numba type's `alignof_` when present. Use `_nb_align` for `builder.store(..., align=...)`. Add marker `# NUMBAST_RETVAL_ALIGN_FIX_APPLIED`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibble bytes and hop in kind,

Align the returns so none misbind,
Allocas dressed in tidy lines,
Stores obey my little signs,
A rabbit's patch keeps stacks aligned. 🥕

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: fixing alignment of stack allocation slots to meet CUDA vector type ABI requirements, which is the core problem addressed in the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@numbast/src/numbast/callconv.py`:
- Around line 111-119: Extract the duplicated CUDA-allocation alignment logic
into a helper function and use it in all four places: create a helper like
_set_cuda_alloca_align(alloca, ty, target_data) that sets alloca.align =
max(target_data.abi_alignment(ty), min(target_data.abi_size(ty), 16)); replace
the repeated expressions (currently using _dl = context.target_data and setting
retval_ptr.align or other alloca.align with max(_dl.abi_alignment(...),
min(_dl.abi_size(...), 16))) by calling this helper with the corresponding
alloca (e.g., retval_ptr), type (e.g., retval_ty), and context.target_data to
centralize the logic and avoid repetition.
- Line 4: Remove the development/tracking marker comment
"NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of
numbast/src/numbast/callconv.py; simply delete that standalone comment line so
the file contains only production code/comments, and ensure no leftover blank
line or stray artifact remains.
- Around line 116-119: Update calls to the target data API and remove
unsupported alloca alignment assignments: replace _dl.abi_alignment(...) and
_dl.abi_size(...) with _dl.get_abi_alignment(...) and _dl.get_abi_size(...)
wherever used (e.g., the calculations around retval_ptr, and the other
occurrences at the same sites later in the file), and remove assignments to
AllocaInstr.align (e.g., retval_ptr.align = ..., and the other .align
assignments) since llvmlite IR AllocaInstr does not support setting .align; if
explicit "align N" is required use the binding layer to emit raw LLVM IR
instead, otherwise rely on the target data defaults and omit the .align
statements.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 09ca3211-42c8-4afe-84bd-ab4a26b03bf8

📥 Commits

Reviewing files that changed from the base of the PR and between f29a975 and 21c668b.

📒 Files selected for processing (1)

numbast/src/numbast/callconv.py

coderabbitai · 2026-04-15T18:54:06Z

 # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0

+# NUMBAST_RETVAL_ALIGN_FIX_APPLIED


⚠️ Potential issue | 🟡 Minor

Remove debugging marker comment.

This marker comment appears to be a development/tracking artifact (possibly from Claude Code generation) and should not be committed to the production codebase.

Proposed fix

-# NUMBAST_RETVAL_ALIGN_FIX_APPLIED from numbast.args import prepare_ir_types

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@numbast/src/numbast/callconv.py` at line 4, Remove the development/tracking marker comment "NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of numbast/src/numbast/callconv.py; simply delete that standalone comment line so the file contains only production code/comments, and ensure no leftover blank line or stray artifact remains.

coderabbitai · 2026-04-15T18:54:06Z

+            # Align the retval slot to the CUDA ABI requirement.
+            # CUDA vector types (float2=8B, float4=16B) need alignment
+            # == sizeof(type), but LLVM struct ABI alignment is only the
+            # element alignment (4B for floats). max(abi_align, min(sz,16))
+            # covers float2→8, float4→16 without over-aligning large structs.
+            _dl = context.target_data
+            retval_ptr.align = max(
+                _dl.abi_alignment(retval_ty), min(_dl.abi_size(retval_ty), 16)
+            )


🧹 Nitpick | 🔵 Trivial

Extract duplicated alignment logic into a helper function.

The alignment calculation max(_dl.abi_alignment(vty), min(_dl.abi_size(vty), 16)) is duplicated four times. Additionally, context.target_data is fetched repeatedly. Consider extracting this into a helper to improve maintainability and reduce the chance of inconsistent changes.

Proposed refactor

Add a module-level or nested helper function:

def _set_cuda_alloca_align(alloca: ir.Instruction, ty: ir.Type, target_data) -> None: """Set alloca alignment to satisfy CUDA vector type ABI requirements. CUDA vector types (float2=8B, float4=16B) need alignment == sizeof(type), but LLVM struct ABI alignment is only element alignment (4B for floats). Formula: max(abi_align, min(size, 16)) covers vectors without over-aligning. """ alloca.align = max( target_data.abi_alignment(ty), min(target_data.abi_size(ty), 16) )

Then replace each occurrence:

retval_ty = context.get_value_type(cxx_return_type) retval_ptr = builder.alloca(retval_ty, name="retval") - # Align the retval slot to the CUDA ABI requirement. - # CUDA vector types (float2=8B, float4=16B) need alignment - # == sizeof(type), but LLVM struct ABI alignment is only the - # element alignment (4B for floats). max(abi_align, min(sz,16)) - # covers float2→8, float4→16 without over-aligning large structs. - _dl = context.target_data - retval_ptr.align = max( - _dl.abi_alignment(retval_ty), min(_dl.abi_size(retval_ty), 16) - ) + _set_cuda_alloca_align(retval_ptr, retval_ty, context.target_data)

Apply the same pattern at lines 166-170, 191-195, and 214-218.

Also applies to: 167-170, 192-195, 215-218

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@numbast/src/numbast/callconv.py` around lines 111 - 119, Extract the duplicated CUDA-allocation alignment logic into a helper function and use it in all four places: create a helper like _set_cuda_alloca_align(alloca, ty, target_data) that sets alloca.align = max(target_data.abi_alignment(ty), min(target_data.abi_size(ty), 16)); replace the repeated expressions (currently using _dl = context.target_data and setting retval_ptr.align or other alloca.align with max(_dl.abi_alignment(...), min(_dl.abi_size(...), 16))) by calling this helper with the corresponding alloca (e.g., retval_ptr), type (e.g., retval_ty), and context.target_data to centralize the logic and avoid repetition.

coderabbitai · 2026-04-15T18:54:06Z

+            _dl = context.target_data
+            retval_ptr.align = max(
+                _dl.abi_alignment(retval_ty), min(_dl.abi_size(retval_ty), 16)
+            )


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check llvmlite usage patterns in the codebase for similar alignment patterns rg -nC3 'target_data\.abi_alignment|target_data\.abi_size|\.align\s*=' --type=py

Repository: NVIDIA/numbast

Length of output: 2326

🌐 Web query:

llvmlite TargetData abi_alignment abi_size methods

💡 Result:

In llvmlite, the llvmlite.binding.TargetData class provides methods to query ABI size and alignment for LLVM types based on a target data layout string. It is instantiated via llvmlite.binding.create_target_data(data_layout) where data_layout is an LLVM data layout string. Key methods related to ABI size and alignment: - get_abi_size(type): Returns the ABI-mandated size (in bytes) of the given TypeRef object as an integer. - get_abi_alignment(type): Returns the ABI-mandated alignment (in bytes) of the given TypeRef object as an integer. Additional related methods: - get_pointee_abi_size(type): Like get_abi_size, but for the type pointed to by an LLVM pointer type (useful for globals). - get_pointee_abi_alignment(type): Like get_abi_alignment, but for the pointed-to type of a pointer. - get_element_offset(type, position): Computes the byte offset of the struct element at the given position. These methods are wrappers around LLVM C API functions like LLVMPY_ABISizeOfType and LLVMPY_ABIAlignmentOfType. Note: llvmlite.ir.Type instances also provide get_abi_size(target_data) and get_abi_alignment(target_data) methods that delegate to a provided TargetData instance. Example usage: from llvmlite import binding td = binding.create_target_data('e-p:64:64-i64:64-f80:128-n8:16:32') size = td.get_abi_size(some_type_ref) alignment = td.get_abi_alignment(some_type_ref)

Citations:

1: https://llvmlite.pydata.org/en/v0.42.0/user-guide/binding/target-information.html

2: https://github.com/numba/llvmlite/blob/master/llvmlite/binding/targets.py

3: https://llvmlite.readthedocs.io/en/latest/user-guide/binding/target-information.html

🏁 Script executed:

#!/bin/bash # Find project dependencies to identify llvmlite version fd -e 'requirements\.txt' -e 'setup\.py' -e 'setup\.cfg' -e 'pyproject\.toml' | head -10

Repository: NVIDIA/numbast

Length of output: 40

🏁 Script executed:

#!/bin/bash # Check the context object to understand target_data type rg -nC5 'context\.target_data' numbast/src/numbast/callconv.py | head -30

Repository: NVIDIA/numbast

Length of output: 1580

🌐 Web query:

llvmlite abi_alignment abi_size method names API version

💡 Result:

In llvmlite, the llvmlite.ir.Type base class (and subclasses for LLVM types like IntType, PointerType, etc.) provides methods get_abi_size(target_data) and get_abi_alignment(target_data) to query the ABI size (in bytes) and minimum ABI alignment (in bytes) of the type according to a given llvmlite.binding.TargetData instance representing the data layout. These methods internally convert the Python IR type to an LLVM type via _get_ll_global_value_type(target_data) (which creates a temporary module and global variable), then call LLVM C++ APIs: target_data.get_abi_size(llty) and target_data.get_abi_alignment(llty). Related methods include get_element_offset(target_data, ndx) for struct/array element byte offsets. In the binding layer (llvmlite.binding), TargetData exposes equivalent methods get_abi_size(type) (where type is a TypeRef), get_pointee_abi_size(type), get_pointee_abi_alignment(type), and get_element_offset(type, position) directly on LLVM TypeRef objects. llvmlite's current stable version is 0.47.0 (released March 31, 2026), compatible with LLVM 20.x (since 0.45.0). No breaking API changes to these methods are noted in recent release notes or changelog.

Citations:

1: https://github.com/numba/llvmlite/blob/master/llvmlite/ir/types.py

2: https://llvmlite.readthedocs.io/en/v0.9.0/ir/types.html

3: https://llvmlite.readthedocs.io/en/stable/user-guide/ir/types.html

4: https://llvmlite.pydata.org/en/latest/user-guide/ir/types.html

5: https://llvmlite.readthedocs.io/en/latest/user-guide/ir/types.html

6: https://pypi.python.org/pypi/llvmlite

7: https://pypi.python.org/project/llvmlite/

8: https://llvmlite.pydata.org/en/v0.42.0/user-guide/binding/target-information.html

9: https://llvmlite.pydata.org/en/v0.42.0/user-guide/binding/type-references.html

🏁 Script executed:

#!/bin/bash # Check imports and context class definition head -50 numbast/src/numbast/callconv.py

Repository: NVIDIA/numbast

Length of output: 1504

🏁 Script executed:

#!/bin/bash # Find where context comes from and what target_data is rg -nC3 'def.*context|class.*Context' numbast/src/numbast/callconv.py | head -40

Repository: NVIDIA/numbast

Length of output: 890

🏁 Script executed:

#!/bin/bash # Search for requirements or dependencies files find . -name 'requirements*.txt' -o -name 'setup.py' -o -name 'pyproject.toml' | xargs cat 2>/dev/null | grep -i llvmlite

Repository: NVIDIA/numbast

Length of output: 40

🌐 Web query:

llvmlite TargetData abi_size abi_alignment method without get prefix

💡 Result:

In llvmlite's binding module, the TargetData class provides methods abi_size and abi_alignment without the "get_" prefix. These are the underlying CFFI bindings that directly wrap LLVM C API functions like LLVMPY_ABISizeOfType and LLVMPY_ABIAlignmentOfType. The official documentation lists higher-level methods with "get_" prefixes: get_abi_size(type), get_abi_alignment(type), get_pointee_abi_size(type), and get_pointee_abi_alignment(type). However, the source code in llvmlite/binding/targets.py confirms the existence of the direct methods: def get_abi_size(self, ty): return ffi.lib.LLVMPY_ABISizeOfType(self, ty) def get_abi_alignment(self, ty): return ffi.lib.LLVMPY_ABIAlignmentOfType(self, ty) No, wait—the source shows def get_abi_size and def get_abi_alignment on TargetData, but the query asks for methods without the "get" prefix. Looking closely at the source snippet: the Python methods are named get_abi_size and get_abi_alignment, but the query specifies "without get prefix", suggesting perhaps older versions or direct attribute access. Actually, upon review, the official docs consistently show get_abi_size and get_abi_alignment as the public methods on TargetData (e.g., v0.42.0, latest dev docs). But the GitHub source code explicitly defines: def get_abi_size(self, ty): ... def get_abi_alignment(self, ty): ... There are no methods named abi_size or abi_alignment without "get_". However, llvmlite.ir.Type objects have methods get_abi_size(target_data) and get_abi_alignment(target_data), which also use "get_" prefix. The query may refer to LLVM C API style (ABISizeOfType, ABIAlignmentOfType without "get"), exposed via CFFI as the underlying lib functions, but the Python class methods on TargetData are get_abi_size and get_abi_alignment. No evidence of TargetData methods literally named abi_size or abi_alignment. In contrast, llvmpy (different library) has TargetData.abi_size(ty) and abi_alignment(ty) without "get_" (from result 3). llvmlite appears to consistently use "get_" prefix for these TargetData methods across all versions (0.42.0, 0.48 dev, source code). Thus, there are no such methods without "get" prefix in llvmlite TargetData; the standard methods are get_abi_size and get_abi_alignment.

Citations:

1: https://llvmlite.pydata.org/en/v0.42.0/user-guide/binding/target-information.html

2: https://github.com/numba/llvmlite/blob/master/llvmlite/binding/targets.py

3: https://llvmlite.pydata.org/en/latest/user-guide/binding/target-information.html

4: https://llvmlite.readthedocs.io/en/v0.42.0rc1/ir/types.html

5: https://llvmlite.pydata.org/en/v0.36.0/user-guide/binding/target-information.html

6: https://github.com/numba/llvmlite/blob/master/docs/source/user-guide/binding/target-information.rst

🏁 Script executed:

#!/bin/bash # Search for context.target_data usage in numba-related code rg 'context\.target_data' -A2 -B2 --type=py | head -50

Repository: NVIDIA/numbast

Length of output: 1770

🌐 Web query:

numba cuda context target_data abi_alignment abi_size

💡 Result:

In Numba's CUDA target (numba.cuda), the CUDATargetContext class provides a target_data property that returns an llvmlite TargetData object specific to the NVVM (NVIDIA's LLVM backend for PTX/CUDA) data layout. This TargetData is created via ll.create_target_data(nvvm.NVVM.data_layout), where nvvm.NVVM.data_layout gives the NVVM data layout string (e.g., something like "e-p:64:64-p1:64:64-p2:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024"). The TargetData object exposes methods for querying ABI (Application Binary Interface) properties of LLVM types: - context.target_data.get_abi_size(llvm_type): Returns the ABI-mandated size in bytes of the given LLVM type. - context.target_data.get_abi_alignment(llvm_type): Returns the ABI-mandated alignment in bytes of the given LLVM type. These are used internally by Numba for accurate code generation, such as in make_constant_array where alignment is computed from get_abi_sizeof (alias for get_abi_size) to set global variable alignment: align = self.get_abi_sizeof(lldtype); gv.align = 2 (align - 1).bit_length. "Context" refers to the CUDATargetContext (the LLVM lowering context for CUDA compilation). No direct connection to CUDA driver contexts (numba.cuda.current_context), but rather the compiler target context. Numba's built-in CUDA target is deprecated; development continues in NVIDIA/numba-cuda, which inherits similar structure. These are low-level LLVM/llvmlite APIs used by Numba for PTX generation, ensuring correct type sizing/alignment per NVVM ABI for device code.

Citations:

1: https://github.com/numba/numba/blob/main/numba/cuda/target.py

2: https://github.com/numba/numba/blob/master/numba/cuda/target.py

3: https://github.com/numba/llvmlite/blob/master/docs/source/user-guide/binding/target-information.rst

4: https://llvmlite.pydata.org/en/v0.36.0/user-guide/binding/target-information.html

5: https://llvmlite.pydata.org/en/v0.42.0/user-guide/binding/target-information.html

🏁 Script executed:

#!/bin/bash # Check if there's a wrapper or monkey-patch for TargetData in the codebase rg 'abi_alignment|abi_size' -B3 -A1 --type=py

Repository: NVIDIA/numbast

Length of output: 1671

🏁 Script executed:

#!/bin/bash # Search for alloca.align usage patterns in llvmlite/numba code rg 'alloca.*\.align|\.align\s*=' numbast/src/numbast/callconv.py -B2 -A2

Repository: NVIDIA/numbast

Length of output: 1031

🌐 Web query:

llvmlite alloca instruction align attribute setter

💡 Result:

llvmlite's IRBuilder.alloca method does not support setting an explicit align attribute at creation time. Official documentation shows alloca(typ, size=None, name='') with no align parameter. A feature request to add an "align" option exists (open as of 2025) because llvmlite defaults to target-determined alignment, unlike LLVM IR which supports optional align. No public setter method like setAlignment is exposed on the AllocaInstr object in llvmlite's IR layer. In the binding layer (llvmlite.binding), ValueRef exposes general attributes (name, linkage, etc.) but no specific alignment setter for alloca instructions. In native LLVM C++, AllocaInst has setAlignment(Align), but this is not directly accessible via llvmlite's high-level IR builder API. Alignment is implicitly handled by the allocated type's natural alignment or data layout defaults. Workarounds: Use load/store with explicit align parameters (supported), or parse custom LLVM IR strings via binding.parse_assembly with manual "align N" in textual IR.

Citations:

1: Add "align" option for IRBuilder.alloca numba/llvmlite#1040

2: https://llvmlite.pydata.org/en/latest/user-guide/ir/ir-builder.html

3: http://llvm.org/docs/doxygen/html/classllvm_1_1AllocaInst.html

4: https://llvmlite.pydata.org/en/latest/user-guide/binding/value-references.html

🏁 Script executed:

#!/bin/bash # Check if there are any tests for this code fd -e '*.py' | xargs grep -l 'FunctionCallConv\|BaseCallConv' | head -5

Repository: NVIDIA/numbast

Length of output: 40

Fix incorrect llvmlite API method names and alloca alignment setting.

The code has two critical bugs:

Incorrect method names: _dl.abi_alignment() and _dl.abi_size() do not exist in llvmlite. The correct method names are _dl.get_abi_alignment() and _dl.get_abi_size() (with "get_" prefix).

Invalid alloca.align attribute: llvmlite's AllocaInstr does not expose an .align attribute setter. The alignment feature is not supported in llvmlite's high-level IR API.

Change lines 117–119, 168–170, 193–195, and 216–218 to use the correct method names:

_dl.abi_alignment(...) → _dl.get_abi_alignment(...)

_dl.abi_size(...) → _dl.get_abi_size(...)

For alignment, either use llvmlite's binding layer to parse custom LLVM IR with explicit "align N" directives, or rely on the target data layout's default alignment and remove the .align assignments.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@numbast/src/numbast/callconv.py` around lines 116 - 119, Update calls to the target data API and remove unsupported alloca alignment assignments: replace _dl.abi_alignment(...) and _dl.abi_size(...) with _dl.get_abi_alignment(...) and _dl.get_abi_size(...) wherever used (e.g., the calculations around retval_ptr, and the other occurrences at the same sites later in the file), and remove assignments to AllocaInstr.align (e.g., retval_ptr.align = ..., and the other .align assignments) since llvmlite IR AllocaInstr does not support setting .align; if explicit "align N" is required use the binding layer to emit raw LLVM IR instead, otherwise rely on the target data defaults and omit the .align statements.

CUDA vector types (float2, float4, etc.) carry __align__(N) attributes that require N-byte alignment (float2 → 8 B, float4 → 16 B). LLVM represents them as anonymous structs whose ABI alignment defaults to the element alignment (4 B for float), not the vector alignment. When FunctionCallConv._lower_impl allocates stack slots via builder.alloca / cgutils.alloca_once without an explicit alignment, LLVM emits 4-byte-aligned allocas. The NVRTC-compiled shim then tries to perform a vector load/store (e.g. ld.global.v2.f32 for float2) on that 4-byte-aligned pointer, which violates the 8-byte alignment requirement and raises cudaErrorMisalignedAddress at runtime. Fix: after every alloca in _lower_impl, set alloca.align = max(dl.abi_alignment(ty), min(dl.abi_size(ty), 16)) The cap at 16 bytes covers float4/int4 (the widest standard CUDA vector types) without over-aligning large user-defined structs. The four sites fixed are: 1. retval_ptr — the function return-value slot 2. visible-arg ptrs (no-intent-plan path) 3. out_return ptrs (intent-plan path) 4. visible-arg ptrs (intent-plan path) Fixes optixGetTriangleBarycentrics() (float2 return) and any other Numbast binding that returns or accepts a CUDA vector type.

coderabbitai

♻️ Duplicate comments (1)

numbast/src/numbast/callconv.py (1)

135-138: 🧹 Nitpick | 🔵 Trivial

Deduplicate the repeated alignment formula into a helper.

The same expression is repeated four times; centralizing it will reduce drift risk and simplify future ABI updates.

Refactor sketch

+def _set_cuda_alloca_align(alloca, value_ty, target_data):
+    alloca.align = max(
+        target_data.abi_alignment(value_ty),
+        min(target_data.abi_size(value_ty), 16),
+    )
...
-            _dl = context.target_data
-            retval_ptr.align = max(
-                _dl.abi_alignment(retval_ty), min(_dl.abi_size(retval_ty), 16)
-            )
+            _set_cuda_alloca_align(retval_ptr, retval_ty, context.target_data)
...
-                    _dl = context.target_data
-                    ptr.align = max(
-                        _dl.abi_alignment(vty), min(_dl.abi_size(vty), 16)
-                    )
+                    _set_cuda_alloca_align(ptr, vty, context.target_data)

Also applies to: 186-189, 211-214, 234-237

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@numbast/src/numbast/callconv.py` around lines 135 - 138, Several places set
retval_ptr.align using the same expression; extract that logic into a small
helper (e.g., abi_retval_align(dl, ty) or _compute_retval_align) that takes the
data layout/target_data and the type and returns max(dl.abi_alignment(ty),
min(dl.abi_size(ty), 16)); then replace each repeated expression (the
assignments to retval_ptr.align found where context.target_data/_dl and
retval_ty are used) with a call to this new helper (pass context.target_data or
_dl and retval_ty). Ensure the helper is placed in the same module (callconv.py)
and update all four locations (the occurrences setting retval_ptr.align) to use
it.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@numbast/src/numbast/callconv.py`:
- Around line 135-138: Several places set retval_ptr.align using the same
expression; extract that logic into a small helper (e.g., abi_retval_align(dl,
ty) or _compute_retval_align) that takes the data layout/target_data and the
type and returns max(dl.abi_alignment(ty), min(dl.abi_size(ty), 16)); then
replace each repeated expression (the assignments to retval_ptr.align found
where context.target_data/_dl and retval_ty are used) with a call to this new
helper (pass context.target_data or _dl and retval_ty). Ensure the helper is
placed in the same module (callconv.py) and update all four locations (the
occurrences setting retval_ptr.align) to use it.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 7a63c268-ad04-4299-9645-7c500649845e

📥 Commits

Reviewing files that changed from the base of the PR and between 21c668b and 4fc0c38.

📒 Files selected for processing (1)

numbast/src/numbast/callconv.py

…uristic The previous fix used max(abi_alignment, min(sizeof, 16)) to guess the required alloca alignment for CUDA vector types. This heuristic works for power-of-2 sized types (float2, float4) but is incorrect for non-power-of-2 types like float3/uint3 (sizeof=12 → would produce alignment=12, which is not a valid power-of-2 LLVM alignment). Numbast already propagates alignof_ from ast_canopy onto user-defined bound structs, and already uses getattr(argty, "alignof_", None) for load/store instructions. Apply the same convention to alloca: check alignof_ on the Numba type and set it when present; when absent, leave LLVM's default ABI alignment (correct for scalars and structs without an explicit __align__ attribute). Callers registering built-in CUDA vector types in CTYPE_MAPS must set alignof_ on the Numba type to match the __align__(N) in the CUDA headers (e.g. float32x2.alignof_ = 8 for float2's __align__(8)). This mirrors how ast_canopy-derived struct types already work.

coderabbitai

♻️ Duplicate comments (1)

numbast/src/numbast/callconv.py (1)
4-4: ⚠️ Potential issue | 🟡 Minor

Remove transient marker comment at Line 4.

# NUMBAST_RETVAL_ALIGN_FIX_APPLIED looks like a tracking artifact and should not remain in production source.
Proposed fix
-# NUMBAST_RETVAL_ALIGN_FIX_APPLIED
 from numbast.args import prepare_ir_types
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@numbast/src/numbast/callconv.py` at line 4, Remove the transient tracking
comment "# NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of callconv.py; simply
delete that standalone marker so the source contains no leftover transient
artifact (search for the exact string NUMBAST_RETVAL_ALIGN_FIX_APPLIED to locate
it).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@numbast/src/numbast/callconv.py`:
- Line 4: Remove the transient tracking comment "#
NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of callconv.py; simply delete
that standalone marker so the source contains no leftover transient artifact
(search for the exact string NUMBAST_RETVAL_ALIGN_FIX_APPLIED to locate it).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 18c274b5-7749-4f9e-8607-5769df380ab4

📥 Commits

Reviewing files that changed from the base of the PR and between 4fc0c38 and f32141d.

📒 Files selected for processing (1)

numbast/src/numbast/callconv.py

coderabbitai bot reviewed Apr 15, 2026

View reviewed changes

isVoid force-pushed the fix/callconv-alloca-alignment-cuda-vector-types branch from 21c668b to 4fc0c38 Compare April 15, 2026 18:57

coderabbitai bot reviewed Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(callconv): align alloca slots to CUDA vector type ABI requirements#321

fix(callconv): align alloca slots to CUDA vector type ABI requirements#321
isVoid wants to merge 2 commits intoNVIDIA:mainfrom
isVoid:fix/callconv-alloca-alignment-cuda-vector-types

isVoid commented Apr 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Apr 15, 2026

Uh oh!

coderabbitai bot commented Apr 15, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 15, 2026

Uh oh!

coderabbitai bot Apr 15, 2026

Uh oh!

coderabbitai bot Apr 15, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

isVoid commented Apr 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Sites fixed

Testing

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Apr 15, 2026

Uh oh!

coderabbitai bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

isVoid commented Apr 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 15, 2026 •

edited

Loading