fix(base_dsl): drop ArchMeta alias so Arch.sm_*.value is correct#3248
fix(base_dsl): drop ArchMeta alias so Arch.sm_*.value is correct#3248lingolin128 wants to merge 1 commit into
Conversation
|
fix: #3249 |
|
Hi @lingolin128 , thanks for the careful Enum-semantics analysis — the diagnosis that Before we land it, some context that wasn't documented in the original code (and probably why this looked like a clear bug): What
|
|
Hi @yiwangchunyu , thanks for the context — the single-source / ptxas-string motivation wasn't obvious from the code. The real-alias pattern is exactly right: it kills
One thing that needs to come along: if self in [Arch.sm_101a, Arch.sm_101f]:
return arch in [Arch.sm_101a, Arch.sm_101f]Holds reflexivity, holds cross-name, still rejects Heads-up on |
|
Hi, @lingolin128 , thanks for your update!
|
|
Hi @yiwangchunyu ,Thanks for the careful review and valuable suggestions!
# CUDA 12.x
BlackwellArchs()
-> (Arch.sm_100, Arch.sm_100a, Arch.sm_100f, Arch.sm_101, Arch.sm_101a, Arch.sm_101f, Arch.sm_103, Arch.sm_103a, Arch.sm_103f, Arch.sm_120, Arch.sm_120a, Arch.sm_120f, Arch.sm_121, Arch.sm_121a, Arch.sm_121f)
len(BlackwellArchs())
-> 15
# CUDA 13.x
BlackwellArchs()
-> (Arch.sm_100, Arch.sm_100a, Arch.sm_100f, Arch.sm_110, Arch.sm_110a, Arch.sm_110f, Arch.sm_103, Arch.sm_103a, Arch.sm_103f, Arch.sm_120, Arch.sm_120a, Arch.sm_120f, Arch.sm_121, Arch.sm_121a, Arch.sm_121f)
len(BlackwellArchs())
-> 15Thanks a lot for review! |
|
Thanks for the update, LGTM! |
|
Hi @Junkai-Wu , please review this pr. Thanks a lot! |
Summary
Arch.sm_110f.valuereturned(10, 1, 'f')instead of(11, 0, 'f'). The same class of bug existed on the other CUDAbranch —
Arch.sm_101f.valuereturned(11, 0, 'f')when CUDA ≥ 13. Anything reading.value,.major, or.minoron these members got the wrong tuple.
Root cause
In
python/CuTeDSL/cutlass/base_dsl/arch.py,sm_101*andsm_110*were declared as separate enum members withdifferent value tuples:
A custom ArchMeta(EnumMeta) then tried to alias one set onto the other based on CUDA version via getattribute and
getitem:
This is fundamentally incompatible with how Enum works. A real enum alias must share the same value tuple as its
canonical member; here the two members had different tuples, and the metaclass only intercepted attribute / subscript
lookup. So Arch.sm_110f got silently rerouted to the sm_101f member object, and .value on that object honestly reported
(10, 1, 'f').
The bug surfaces on both CUDA branches — just on the opposite name on each. The (10, 1, 'f') symptom means CUDA < 13.
Fix
Drop ArchMeta entirely and let sm_101* and sm_110* stand as independent enum members, each carrying its correct (major,
minor, suffix) tuple.
The cross-name family relationship (sm_101f is family-of sm_110f but not sm_100f) is already handled inside
Arch.is_family_of via an explicit special case on sm_101a / sm_101f, so no semantics are lost.
Net diff: arch.py loses ~50 lines of metaclass machinery; the rest of the file is unchanged.
Verification
All pass.
Compatibility notes
numeric_conversion.py, the MLA decode examples) continue to work.
relied on the implicit rerouting should select the canonical member for their target CUDA version explicitly.