Skip to content

fix(ci): cudaRoundMode typing failure in FP8 test#834

Merged
gmarkall merged 1 commit into
NVIDIA:mainfrom
kaeun97:kaeun97/unblock-ci
Mar 12, 2026
Merged

fix(ci): cudaRoundMode typing failure in FP8 test#834
gmarkall merged 1 commit into
NVIDIA:mainfrom
kaeun97:kaeun97/unblock-ci

Conversation

@kaeun97

@kaeun97 kaeun97 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

Attempt to fix this issue.

cuda-bindings 13.2.0 changed cudaRoundMode from a standard Python IntEnum to a FastEnumMetaclass type. Numba's type inference cannot resolve FastEnumMetaclass types, causing three FP8 tests to fail (ref).

This PR replaces adds local IntEnum in cuda_fp8.py, matching the pattern already used for saturation_t and `fp8_interpretation_t.

If this works, let file a issue on the Numbast side (if needed).

@copy-pr-bot

copy-pr-bot Bot commented Mar 11, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@kaeun97

kaeun97 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor Author

Thanks in advance - unable to reproduce the error locally so would have to use CI to test this. @gmarkall

E5M2 = 1


class cudaRoundMode(IntEnum):

@leofang leofang Mar 12, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmarkall @isVoid what does it take for numba-cuda to understand cudaRoundMode from cuda-bindings as a legal enum, without numba-cuda having to repeat the definitions?

On the cuda-bindings side, we switched to a custom fast enum from the builtin IntEnum to reduce Python overhead (cc @mdboom), and we can evaluate if a patch on cuda-bindings makes better sense, or if we could just register the type in numba-cuda (or both).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it only needs a simple patch to cuda-bindings (and @mdboom deems it's acceptable), it'd be better to fix it on the cuda-bindings side and publish 13.2.1/12.9.7, instead of breaking numba-cuda.

cc @rparolin for vis

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need to add typing to Numba for the FastEnumMetaclass type for Numba to recognise it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it help if we add __int__ to the fast enum class?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Numba-CUDA will see that. Apart from a small set of cases:

@singledispatch
def typeof_impl(val, c):
"""
Generic typeof() implementation.
"""
tp = getattr(val, "_numba_type_", None)
if tp is not None:
return tp
# Check for dlpack objects
dlpack = getattr(val, "__dlpack__", None)
if dlpack is not None:
tp = _typeof_dlpack(dlpack, c)
if tp is not None:
return tp
# Check for __cuda_array_interface__ objects (third-party device arrays)
# Numba's own DeviceNDArray is handled above via _numba_type_.
cai = getattr(val, "__cuda_array_interface__", None)
if cai is not None:
tp = _typeof_cuda_array_interface(cai, c)
if tp is not None:
return tp
tp = _typeof_buffer(val, c)
if tp is not None:
return tp
# cffi is handled here as it does not expose a public base class
# for exported functions or CompiledFFI instances.
from numba.cuda.typing import cffi_utils
if cffi_utils.SUPPORTED:
if cffi_utils.is_cffi_func(val):
return cffi_utils.make_function_type(val)
if cffi_utils.is_ffi_instance(val):
return types.ffi
if HAS_NUMBA:
# Fallback to Numba's typeof_impl for third-party registrations
from numba.core.typing.typeof import typeof_impl as core_typeof_impl
tp = core_typeof_impl(val, c)
if tp is not None:
return tp
return None

Numba-CUDA looks at the Python type to map to the Numba type.

Maybe we could / should add recognition of __int__ as well, but it would need a Numba-CUDA change in addition to adding __int__ to the fast enum class.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's done by the same function I just touched recently... Got it. So this is a corner case where Python types do matter and duck-typing does not work.

I would suggest that we merge this PR as a workaround to unblock the CI, and discuss a long-term fix. (My 2c is if the surface area grows to other enums we should register the fast enum type, but one-off patches like this are not bad.) WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I got around to this a bit late. I saw this error yesterday in my Numbast CI but didn't trace it to the bottom. The cuda_fp8.py file was auto generated by Numbast. If we decided on a fast enum type for cuda types we should probably inform Numbast to generate corresponding logics.

@leofang

leofang commented Mar 12, 2026

Copy link
Copy Markdown
Member

/ok to test cfc3787

@gmarkall gmarkall merged commit 84360da into NVIDIA:main Mar 12, 2026
104 checks passed
gmarkall pushed a commit that referenced this pull request Mar 13, 2026
To support the new `FastEnum` class in `cuda_bindings` 13.2, this adds
new type registrations to support them. These instances are otherwise
100% API-compatible with `enum.IntEnum`, so there is no new logic.

This should hopefully be a more sustainable solution than overriding
individual enums, so this also reverts
#834.
@isVoid isVoid mentioned this pull request Mar 17, 2026
rparolin pushed a commit that referenced this pull request Mar 17, 2026
- bump version to 0.29.0
- fix: normalize numpy integer types to python int to prevent overflow
errors (#774)
- Support cuda.core.GraphBuilder as a kernel-launch stream (#836)
- Support cuda_bindings FastEnum (#837)
- fix(ci): cudaRoundMode typing failure in FP8 test (#834)
- Use `cuda-python` for `nvvm` bindings (#818)
- Fix mixed-IR liveness for inline overload DCE (#795)
- Use dbg.declare for scalar kernel parameters (#828)
- Fix FP8 uint64 cast flake on Windows (#829)
- Extend dbg.value coverage to loadvar for scalar kernel parameters
(#813)

<!--

Thank you for contributing to numba-cuda :)

Here are some guidelines to help the review process go smoothly.

1. Please write a description in this text box of the changes that are
being
   made.

2. Please ensure that you have written units tests for the changes
made/features
   added.

3. If you are closing an issue please use one of the automatic closing
words as
noted here:
https://help.github.com/articles/closing-issues-using-keywords/

4. If your pull request is not ready for review but you want to make use
of the
continuous integration testing facilities please label it with `[WIP]`.

5. If your pull request is ready to be reviewed without requiring
additional
work on top of it, then remove the `[WIP]` label (if present) and
replace
it with `[REVIEW]`. If assistance is required to complete the
functionality,
for example when the C/C++ code of a feature is complete but Python
bindings
are still required, then add the label `[HELP-REQ]` so that others can
triage
and assist. The additional changes then can be implemented on top of the
same PR. If the assistance is done by members of the rapidsAI team, then
no
additional actions are required by the creator of the original PR for
this,
otherwise the original author of the PR needs to give permission to the
person(s) assisting to commit to their personal fork of the project. If
that
doesn't happen then a new PR based on the code of the original PR can be
opened by the person assisting, which then will be the PR that will be
   merged.

6. Once all work has been done and review has taken place please do not
add
features or make changes out of the scope of those requested by the
reviewer
(doing this just add delays as already reviewed code ends up having to
be
re-reviewed/it is hard to tell what is new etc!). Further, please do not
rebase your branch on main/force push/rewrite history, doing any of
these
   causes the context of any comments made by reviewers to be lost. If
   conflicts occur against main they should be resolved by merging main
   into the branch used for making the pull request.

Many thanks in advance for your cooperation!

-->

Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants