Skip to content

fix(callconv): align alloca slots to CUDA vector type ABI requirements#321

Open
isVoid wants to merge 2 commits intoNVIDIA:mainfrom
isVoid:fix/callconv-alloca-alignment-cuda-vector-types
Open

fix(callconv): align alloca slots to CUDA vector type ABI requirements#321
isVoid wants to merge 2 commits intoNVIDIA:mainfrom
isVoid:fix/callconv-alloca-alignment-cuda-vector-types

Conversation

@isVoid
Copy link
Copy Markdown
Collaborator

@isVoid isVoid commented Apr 15, 2026

Problem

CUDA vector types (float2, float4, etc.) carry __align__(N) attributes requiring N-byte alignment (float2 → 8 B, float4 → 16 B). LLVM represents them as anonymous structs whose ABI alignment defaults to the element alignment (4 B for float), not the vector alignment.

When FunctionCallConv._lower_impl allocates stack slots via builder.alloca / cgutils.alloca_once without an explicit alignment, LLVM emits 4-byte-aligned allocas. The NVRTC-compiled shim then performs a vector load/store (e.g. ld.global.v2.f32 for float2) on that 4-byte-aligned pointer, violating the 8-byte alignment requirement and raising cudaErrorMisalignedAddress at runtime.

Concrete example: optixGetTriangleBarycentrics() returns float2. With the default alloca alignment, every call crashes with cudaErrorMisalignedAddress.

Fix

After every alloca / alloca_once in _lower_impl, set:

_dl = context.target_data
alloca_instr.align = max(_dl.abi_alignment(ty), min(_dl.abi_size(ty), 16))

The cap at 16 bytes covers float4/int4 (the widest standard CUDA vector types) without over-aligning large user-defined structs.

Sites fixed

  1. retval_ptr — the function return-value slot
  2. Visible-arg ptrs (no-intent-plan path)
  3. out_return ptrs (intent-plan path)
  4. Visible-arg ptrs (intent-plan path)

Testing

Verified with an OptiX closesthit program calling optixGetTriangleBarycentrics() (float2 return) via a Numbast auto-binding — previously crashed with cudaErrorMisalignedAddress, now returns correct barycentric coordinates.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Fixed alignment handling for return value slots and stack allocations, ensuring allocations use the proper type alignment.
    • Improved alignment for out-return pointers and non-passthrough arguments, enhancing stability and ABI compliance for CUDA and similar targets.

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Apr 15, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 15, 2026

📝 Walkthrough

Walkthrough

The shim lowering in numbast/src/numbast/callconv.py now reads alignof_ from the C++ return type and sets retval_ptr.align when present. Stack alloca results for non-passthrough arguments and out-return pointer targets also get ptr.align from the corresponding Numba type’s alignof_ (when present). Store alignment uses the local _nb_align variable. A file-level marker comment was added.

Changes

Cohort / File(s) Summary
Callconv alignment updates
numbast/src/numbast/callconv.py
Set explicit alloca alignment for C++ non-void return slot by reading cxx_return_type.alignof_ and assigning retval_ptr.align. Apply ptr.align to cgutils.alloca_once(...) results using each Numba type's alignof_ when present. Use _nb_align for builder.store(..., align=...). Add marker # NUMBAST_RETVAL_ALIGN_FIX_APPLIED.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibble bytes and hop in kind,

Align the returns so none misbind,
Allocas dressed in tidy lines,
Stores obey my little signs,
A rabbit's patch keeps stacks aligned. 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: fixing alignment of stack allocation slots to meet CUDA vector type ABI requirements, which is the core problem addressed in the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@numbast/src/numbast/callconv.py`:
- Around line 111-119: Extract the duplicated CUDA-allocation alignment logic
into a helper function and use it in all four places: create a helper like
_set_cuda_alloca_align(alloca, ty, target_data) that sets alloca.align =
max(target_data.abi_alignment(ty), min(target_data.abi_size(ty), 16)); replace
the repeated expressions (currently using _dl = context.target_data and setting
retval_ptr.align or other alloca.align with max(_dl.abi_alignment(...),
min(_dl.abi_size(...), 16))) by calling this helper with the corresponding
alloca (e.g., retval_ptr), type (e.g., retval_ty), and context.target_data to
centralize the logic and avoid repetition.
- Line 4: Remove the development/tracking marker comment
"NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of
numbast/src/numbast/callconv.py; simply delete that standalone comment line so
the file contains only production code/comments, and ensure no leftover blank
line or stray artifact remains.
- Around line 116-119: Update calls to the target data API and remove
unsupported alloca alignment assignments: replace _dl.abi_alignment(...) and
_dl.abi_size(...) with _dl.get_abi_alignment(...) and _dl.get_abi_size(...)
wherever used (e.g., the calculations around retval_ptr, and the other
occurrences at the same sites later in the file), and remove assignments to
AllocaInstr.align (e.g., retval_ptr.align = ..., and the other .align
assignments) since llvmlite IR AllocaInstr does not support setting .align; if
explicit "align N" is required use the binding layer to emit raw LLVM IR
instead, otherwise rely on the target data defaults and omit the .align
statements.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 09ca3211-42c8-4afe-84bd-ab4a26b03bf8

📥 Commits

Reviewing files that changed from the base of the PR and between f29a975 and 21c668b.

📒 Files selected for processing (1)
  • numbast/src/numbast/callconv.py

# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

# NUMBAST_RETVAL_ALIGN_FIX_APPLIED
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove debugging marker comment.

This marker comment appears to be a development/tracking artifact (possibly from Claude Code generation) and should not be committed to the production codebase.

Proposed fix
-# NUMBAST_RETVAL_ALIGN_FIX_APPLIED
 from numbast.args import prepare_ir_types
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@numbast/src/numbast/callconv.py` at line 4, Remove the development/tracking
marker comment "NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of
numbast/src/numbast/callconv.py; simply delete that standalone comment line so
the file contains only production code/comments, and ensure no leftover blank
line or stray artifact remains.

Comment thread numbast/src/numbast/callconv.py Outdated
Comment on lines +111 to +119
# Align the retval slot to the CUDA ABI requirement.
# CUDA vector types (float2=8B, float4=16B) need alignment
# == sizeof(type), but LLVM struct ABI alignment is only the
# element alignment (4B for floats). max(abi_align, min(sz,16))
# covers float2→8, float4→16 without over-aligning large structs.
_dl = context.target_data
retval_ptr.align = max(
_dl.abi_alignment(retval_ty), min(_dl.abi_size(retval_ty), 16)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Extract duplicated alignment logic into a helper function.

The alignment calculation max(_dl.abi_alignment(vty), min(_dl.abi_size(vty), 16)) is duplicated four times. Additionally, context.target_data is fetched repeatedly. Consider extracting this into a helper to improve maintainability and reduce the chance of inconsistent changes.

Proposed refactor

Add a module-level or nested helper function:

def _set_cuda_alloca_align(alloca: ir.Instruction, ty: ir.Type, target_data) -> None:
    """Set alloca alignment to satisfy CUDA vector type ABI requirements.
    
    CUDA vector types (float2=8B, float4=16B) need alignment == sizeof(type),
    but LLVM struct ABI alignment is only element alignment (4B for floats).
    Formula: max(abi_align, min(size, 16)) covers vectors without over-aligning.
    """
    alloca.align = max(
        target_data.abi_alignment(ty), min(target_data.abi_size(ty), 16)
    )

Then replace each occurrence:

         retval_ty = context.get_value_type(cxx_return_type)
         retval_ptr = builder.alloca(retval_ty, name="retval")
-            # Align the retval slot to the CUDA ABI requirement.
-            # CUDA vector types (float2=8B, float4=16B) need alignment
-            # == sizeof(type), but LLVM struct ABI alignment is only the
-            # element alignment (4B for floats). max(abi_align, min(sz,16))
-            # covers float2→8, float4→16 without over-aligning large structs.
-            _dl = context.target_data
-            retval_ptr.align = max(
-                _dl.abi_alignment(retval_ty), min(_dl.abi_size(retval_ty), 16)
-            )
+        _set_cuda_alloca_align(retval_ptr, retval_ty, context.target_data)

Apply the same pattern at lines 166-170, 191-195, and 214-218.

Also applies to: 167-170, 192-195, 215-218

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@numbast/src/numbast/callconv.py` around lines 111 - 119, Extract the
duplicated CUDA-allocation alignment logic into a helper function and use it in
all four places: create a helper like _set_cuda_alloca_align(alloca, ty,
target_data) that sets alloca.align = max(target_data.abi_alignment(ty),
min(target_data.abi_size(ty), 16)); replace the repeated expressions (currently
using _dl = context.target_data and setting retval_ptr.align or other
alloca.align with max(_dl.abi_alignment(...), min(_dl.abi_size(...), 16))) by
calling this helper with the corresponding alloca (e.g., retval_ptr), type
(e.g., retval_ty), and context.target_data to centralize the logic and avoid
repetition.

Comment thread numbast/src/numbast/callconv.py Outdated
Comment on lines +116 to +119
_dl = context.target_data
retval_ptr.align = max(
_dl.abi_alignment(retval_ty), min(_dl.abi_size(retval_ty), 16)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check llvmlite usage patterns in the codebase for similar alignment patterns
rg -nC3 'target_data\.abi_alignment|target_data\.abi_size|\.align\s*=' --type=py

Repository: NVIDIA/numbast

Length of output: 2326


🌐 Web query:

llvmlite TargetData abi_alignment abi_size methods

💡 Result:

In llvmlite, the llvmlite.binding.TargetData class provides methods to query ABI size and alignment for LLVM types based on a target data layout string. It is instantiated via llvmlite.binding.create_target_data(data_layout) where data_layout is an LLVM data layout string. Key methods related to ABI size and alignment: - get_abi_size(type): Returns the ABI-mandated size (in bytes) of the given TypeRef object as an integer. - get_abi_alignment(type): Returns the ABI-mandated alignment (in bytes) of the given TypeRef object as an integer. Additional related methods: - get_pointee_abi_size(type): Like get_abi_size, but for the type pointed to by an LLVM pointer type (useful for globals). - get_pointee_abi_alignment(type): Like get_abi_alignment, but for the pointed-to type of a pointer. - get_element_offset(type, position): Computes the byte offset of the struct element at the given position. These methods are wrappers around LLVM C API functions like LLVMPY_ABISizeOfType and LLVMPY_ABIAlignmentOfType. Note: llvmlite.ir.Type instances also provide get_abi_size(target_data) and get_abi_alignment(target_data) methods that delegate to a provided TargetData instance. Example usage: from llvmlite import binding td = binding.create_target_data('e-p:64:64-i64:64-f80:128-n8:16:32') size = td.get_abi_size(some_type_ref) alignment = td.get_abi_alignment(some_type_ref)

Citations:


🏁 Script executed:

#!/bin/bash
# Find project dependencies to identify llvmlite version
fd -e 'requirements\.txt' -e 'setup\.py' -e 'setup\.cfg' -e 'pyproject\.toml' | head -10

Repository: NVIDIA/numbast

Length of output: 40


🏁 Script executed:

#!/bin/bash
# Check the context object to understand target_data type
rg -nC5 'context\.target_data' numbast/src/numbast/callconv.py | head -30

Repository: NVIDIA/numbast

Length of output: 1580


🌐 Web query:

llvmlite abi_alignment abi_size method names API version

💡 Result:

In llvmlite, the llvmlite.ir.Type base class (and subclasses for LLVM types like IntType, PointerType, etc.) provides methods get_abi_size(target_data) and get_abi_alignment(target_data) to query the ABI size (in bytes) and minimum ABI alignment (in bytes) of the type according to a given llvmlite.binding.TargetData instance representing the data layout. These methods internally convert the Python IR type to an LLVM type via _get_ll_global_value_type(target_data) (which creates a temporary module and global variable), then call LLVM C++ APIs: target_data.get_abi_size(llty) and target_data.get_abi_alignment(llty). Related methods include get_element_offset(target_data, ndx) for struct/array element byte offsets. In the binding layer (llvmlite.binding), TargetData exposes equivalent methods get_abi_size(type) (where type is a TypeRef), get_pointee_abi_size(type), get_pointee_abi_alignment(type), and get_element_offset(type, position) directly on LLVM TypeRef objects. llvmlite's current stable version is 0.47.0 (released March 31, 2026), compatible with LLVM 20.x (since 0.45.0). No breaking API changes to these methods are noted in recent release notes or changelog.

Citations:


🏁 Script executed:

#!/bin/bash
# Check imports and context class definition
head -50 numbast/src/numbast/callconv.py

Repository: NVIDIA/numbast

Length of output: 1504


🏁 Script executed:

#!/bin/bash
# Find where context comes from and what target_data is
rg -nC3 'def.*context|class.*Context' numbast/src/numbast/callconv.py | head -40

Repository: NVIDIA/numbast

Length of output: 890


🏁 Script executed:

#!/bin/bash
# Search for requirements or dependencies files
find . -name 'requirements*.txt' -o -name 'setup.py' -o -name 'pyproject.toml' | xargs cat 2>/dev/null | grep -i llvmlite

Repository: NVIDIA/numbast

Length of output: 40


🌐 Web query:

llvmlite TargetData abi_size abi_alignment method without get prefix

💡 Result:

In llvmlite's binding module, the TargetData class provides methods abi_size and abi_alignment without the "get_" prefix. These are the underlying CFFI bindings that directly wrap LLVM C API functions like LLVMPY_ABISizeOfType and LLVMPY_ABIAlignmentOfType. The official documentation lists higher-level methods with "get_" prefixes: get_abi_size(type), get_abi_alignment(type), get_pointee_abi_size(type), and get_pointee_abi_alignment(type). However, the source code in llvmlite/binding/targets.py confirms the existence of the direct methods: def get_abi_size(self, ty): return ffi.lib.LLVMPY_ABISizeOfType(self, ty) def get_abi_alignment(self, ty): return ffi.lib.LLVMPY_ABIAlignmentOfType(self, ty) No, wait—the source shows def get_abi_size and def get_abi_alignment on TargetData, but the query asks for methods without the "get" prefix. Looking closely at the source snippet: the Python methods are named get_abi_size and get_abi_alignment, but the query specifies "without get prefix", suggesting perhaps older versions or direct attribute access. Actually, upon review, the official docs consistently show get_abi_size and get_abi_alignment as the public methods on TargetData (e.g., v0.42.0, latest dev docs). But the GitHub source code explicitly defines: def get_abi_size(self, ty): ... def get_abi_alignment(self, ty): ... There are no methods named abi_size or abi_alignment without "get_". However, llvmlite.ir.Type objects have methods get_abi_size(target_data) and get_abi_alignment(target_data), which also use "get_" prefix. The query may refer to LLVM C API style (ABISizeOfType, ABIAlignmentOfType without "get"), exposed via CFFI as the underlying lib functions, but the Python class methods on TargetData are get_abi_size and get_abi_alignment. No evidence of TargetData methods literally named abi_size or abi_alignment. In contrast, llvmpy (different library) has TargetData.abi_size(ty) and abi_alignment(ty) without "get_" (from result 3). llvmlite appears to consistently use "get_" prefix for these TargetData methods across all versions (0.42.0, 0.48 dev, source code). Thus, there are no such methods without "get" prefix in llvmlite TargetData; the standard methods are get_abi_size and get_abi_alignment.

Citations:


🏁 Script executed:

#!/bin/bash
# Search for context.target_data usage in numba-related code
rg 'context\.target_data' -A2 -B2 --type=py | head -50

Repository: NVIDIA/numbast

Length of output: 1770


🌐 Web query:

numba cuda context target_data abi_alignment abi_size

💡 Result:

In Numba's CUDA target (numba.cuda), the CUDATargetContext class provides a target_data property that returns an llvmlite TargetData object specific to the NVVM (NVIDIA's LLVM backend for PTX/CUDA) data layout. This TargetData is created via ll.create_target_data(nvvm.NVVM.data_layout), where nvvm.NVVM.data_layout gives the NVVM data layout string (e.g., something like "e-p:64:64-p1:64:64-p2:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024"). The TargetData object exposes methods for querying ABI (Application Binary Interface) properties of LLVM types: - context.target_data.get_abi_size(llvm_type): Returns the ABI-mandated size in bytes of the given LLVM type. - context.target_data.get_abi_alignment(llvm_type): Returns the ABI-mandated alignment in bytes of the given LLVM type. These are used internally by Numba for accurate code generation, such as in make_constant_array where alignment is computed from get_abi_sizeof (alias for get_abi_size) to set global variable alignment: align = self.get_abi_sizeof(lldtype); gv.align = 2 (align - 1).bit_length. "Context" refers to the CUDATargetContext (the LLVM lowering context for CUDA compilation). No direct connection to CUDA driver contexts (numba.cuda.current_context), but rather the compiler target context. Numba's built-in CUDA target is deprecated; development continues in NVIDIA/numba-cuda, which inherits similar structure. These are low-level LLVM/llvmlite APIs used by Numba for PTX generation, ensuring correct type sizing/alignment per NVVM ABI for device code.

Citations:


🏁 Script executed:

#!/bin/bash
# Check if there's a wrapper or monkey-patch for TargetData in the codebase
rg 'abi_alignment|abi_size' -B3 -A1 --type=py

Repository: NVIDIA/numbast

Length of output: 1671


🏁 Script executed:

#!/bin/bash
# Search for alloca.align usage patterns in llvmlite/numba code
rg 'alloca.*\.align|\.align\s*=' numbast/src/numbast/callconv.py -B2 -A2

Repository: NVIDIA/numbast

Length of output: 1031


🌐 Web query:

llvmlite alloca instruction align attribute setter

💡 Result:

llvmlite's IRBuilder.alloca method does not support setting an explicit align attribute at creation time. Official documentation shows alloca(typ, size=None, name='') with no align parameter. A feature request to add an "align" option exists (open as of 2025) because llvmlite defaults to target-determined alignment, unlike LLVM IR which supports optional align. No public setter method like setAlignment is exposed on the AllocaInstr object in llvmlite's IR layer. In the binding layer (llvmlite.binding), ValueRef exposes general attributes (name, linkage, etc.) but no specific alignment setter for alloca instructions. In native LLVM C++, AllocaInst has setAlignment(Align), but this is not directly accessible via llvmlite's high-level IR builder API. Alignment is implicitly handled by the allocated type's natural alignment or data layout defaults. Workarounds: Use load/store with explicit align parameters (supported), or parse custom LLVM IR strings via binding.parse_assembly with manual "align N" in textual IR.

Citations:


🏁 Script executed:

#!/bin/bash
# Check if there are any tests for this code
fd -e '*.py' | xargs grep -l 'FunctionCallConv\|BaseCallConv' | head -5

Repository: NVIDIA/numbast

Length of output: 40


Fix incorrect llvmlite API method names and alloca alignment setting.

The code has two critical bugs:

  1. Incorrect method names: _dl.abi_alignment() and _dl.abi_size() do not exist in llvmlite. The correct method names are _dl.get_abi_alignment() and _dl.get_abi_size() (with "get_" prefix).

  2. Invalid alloca.align attribute: llvmlite's AllocaInstr does not expose an .align attribute setter. The alignment feature is not supported in llvmlite's high-level IR API.

Change lines 117–119, 168–170, 193–195, and 216–218 to use the correct method names:

  • _dl.abi_alignment(...)_dl.get_abi_alignment(...)
  • _dl.abi_size(...)_dl.get_abi_size(...)

For alignment, either use llvmlite's binding layer to parse custom LLVM IR with explicit "align N" directives, or rely on the target data layout's default alignment and remove the .align assignments.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@numbast/src/numbast/callconv.py` around lines 116 - 119, Update calls to the
target data API and remove unsupported alloca alignment assignments: replace
_dl.abi_alignment(...) and _dl.abi_size(...) with _dl.get_abi_alignment(...) and
_dl.get_abi_size(...) wherever used (e.g., the calculations around retval_ptr,
and the other occurrences at the same sites later in the file), and remove
assignments to AllocaInstr.align (e.g., retval_ptr.align = ..., and the other
.align assignments) since llvmlite IR AllocaInstr does not support setting
.align; if explicit "align N" is required use the binding layer to emit raw LLVM
IR instead, otherwise rely on the target data defaults and omit the .align
statements.

CUDA vector types (float2, float4, etc.) carry __align__(N) attributes
that require N-byte alignment (float2 → 8 B, float4 → 16 B).  LLVM
represents them as anonymous structs whose ABI alignment defaults to
the element alignment (4 B for float), not the vector alignment.

When FunctionCallConv._lower_impl allocates stack slots via
builder.alloca / cgutils.alloca_once without an explicit alignment,
LLVM emits 4-byte-aligned allocas.  The NVRTC-compiled shim then tries
to perform a vector load/store (e.g. ld.global.v2.f32 for float2) on
that 4-byte-aligned pointer, which violates the 8-byte alignment
requirement and raises cudaErrorMisalignedAddress at runtime.

Fix: after every alloca in _lower_impl, set
  alloca.align = max(dl.abi_alignment(ty), min(dl.abi_size(ty), 16))

The cap at 16 bytes covers float4/int4 (the widest standard CUDA
vector types) without over-aligning large user-defined structs.

The four sites fixed are:
  1. retval_ptr — the function return-value slot
  2. visible-arg ptrs (no-intent-plan path)
  3. out_return ptrs (intent-plan path)
  4. visible-arg ptrs (intent-plan path)

Fixes optixGetTriangleBarycentrics() (float2 return) and any other
Numbast binding that returns or accepts a CUDA vector type.
@isVoid isVoid force-pushed the fix/callconv-alloca-alignment-cuda-vector-types branch from 21c668b to 4fc0c38 Compare April 15, 2026 18:57
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
numbast/src/numbast/callconv.py (1)

135-138: 🧹 Nitpick | 🔵 Trivial

Deduplicate the repeated alignment formula into a helper.

The same expression is repeated four times; centralizing it will reduce drift risk and simplify future ABI updates.

Refactor sketch
+def _set_cuda_alloca_align(alloca, value_ty, target_data):
+    alloca.align = max(
+        target_data.abi_alignment(value_ty),
+        min(target_data.abi_size(value_ty), 16),
+    )
...
-            _dl = context.target_data
-            retval_ptr.align = max(
-                _dl.abi_alignment(retval_ty), min(_dl.abi_size(retval_ty), 16)
-            )
+            _set_cuda_alloca_align(retval_ptr, retval_ty, context.target_data)
...
-                    _dl = context.target_data
-                    ptr.align = max(
-                        _dl.abi_alignment(vty), min(_dl.abi_size(vty), 16)
-                    )
+                    _set_cuda_alloca_align(ptr, vty, context.target_data)

Also applies to: 186-189, 211-214, 234-237

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@numbast/src/numbast/callconv.py` around lines 135 - 138, Several places set
retval_ptr.align using the same expression; extract that logic into a small
helper (e.g., abi_retval_align(dl, ty) or _compute_retval_align) that takes the
data layout/target_data and the type and returns max(dl.abi_alignment(ty),
min(dl.abi_size(ty), 16)); then replace each repeated expression (the
assignments to retval_ptr.align found where context.target_data/_dl and
retval_ty are used) with a call to this new helper (pass context.target_data or
_dl and retval_ty). Ensure the helper is placed in the same module (callconv.py)
and update all four locations (the occurrences setting retval_ptr.align) to use
it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@numbast/src/numbast/callconv.py`:
- Around line 135-138: Several places set retval_ptr.align using the same
expression; extract that logic into a small helper (e.g., abi_retval_align(dl,
ty) or _compute_retval_align) that takes the data layout/target_data and the
type and returns max(dl.abi_alignment(ty), min(dl.abi_size(ty), 16)); then
replace each repeated expression (the assignments to retval_ptr.align found
where context.target_data/_dl and retval_ty are used) with a call to this new
helper (pass context.target_data or _dl and retval_ty). Ensure the helper is
placed in the same module (callconv.py) and update all four locations (the
occurrences setting retval_ptr.align) to use it.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 7a63c268-ad04-4299-9645-7c500649845e

📥 Commits

Reviewing files that changed from the base of the PR and between 21c668b and 4fc0c38.

📒 Files selected for processing (1)
  • numbast/src/numbast/callconv.py

…uristic

The previous fix used max(abi_alignment, min(sizeof, 16)) to guess the
required alloca alignment for CUDA vector types. This heuristic works
for power-of-2 sized types (float2, float4) but is incorrect for
non-power-of-2 types like float3/uint3 (sizeof=12 → would produce
alignment=12, which is not a valid power-of-2 LLVM alignment).

Numbast already propagates alignof_ from ast_canopy onto user-defined
bound structs, and already uses getattr(argty, "alignof_", None) for
load/store instructions. Apply the same convention to alloca: check
alignof_ on the Numba type and set it when present; when absent, leave
LLVM's default ABI alignment (correct for scalars and structs without
an explicit __align__ attribute).

Callers registering built-in CUDA vector types in CTYPE_MAPS must set
alignof_ on the Numba type to match the __align__(N) in the CUDA
headers (e.g. float32x2.alignof_ = 8 for float2's __align__(8)).
This mirrors how ast_canopy-derived struct types already work.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
numbast/src/numbast/callconv.py (1)

4-4: ⚠️ Potential issue | 🟡 Minor

Remove transient marker comment at Line 4.

# NUMBAST_RETVAL_ALIGN_FIX_APPLIED looks like a tracking artifact and should not remain in production source.

Proposed fix
-# NUMBAST_RETVAL_ALIGN_FIX_APPLIED
 from numbast.args import prepare_ir_types
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@numbast/src/numbast/callconv.py` at line 4, Remove the transient tracking
comment "# NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of callconv.py; simply
delete that standalone marker so the source contains no leftover transient
artifact (search for the exact string NUMBAST_RETVAL_ALIGN_FIX_APPLIED to locate
it).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@numbast/src/numbast/callconv.py`:
- Line 4: Remove the transient tracking comment "#
NUMBAST_RETVAL_ALIGN_FIX_APPLIED" from the top of callconv.py; simply delete
that standalone marker so the source contains no leftover transient artifact
(search for the exact string NUMBAST_RETVAL_ALIGN_FIX_APPLIED to locate it).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 18c274b5-7749-4f9e-8607-5769df380ab4

📥 Commits

Reviewing files that changed from the base of the PR and between 4fc0c38 and f32141d.

📒 Files selected for processing (1)
  • numbast/src/numbast/callconv.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant