fix(base_dsl): drop ArchMeta alias so Arch.sm_*.value is correct by lingolin128 · Pull Request #3248 · NVIDIA/cutlass

lingolin128 · 2026-05-19T11:16:20Z

Summary

Arch.sm_110f.value returned (10, 1, 'f') instead of (11, 0, 'f'). The same class of bug existed on the other CUDA
branch — Arch.sm_101f.value returned (11, 0, 'f') when CUDA ≥ 13. Anything reading .value, .major, or .minor
on these members got the wrong tuple.

Root cause

In python/CuTeDSL/cutlass/base_dsl/arch.py, sm_101* and sm_110* were declared as separate enum members with
different value tuples:

sm_101  = (10, 1, "")
sm_101f = (10, 1, "f")
sm_110  = (11, 0, "")
sm_110f = (11, 0, "f")

A custom ArchMeta(EnumMeta) then tried to alias one set onto the other based on CUDA version via getattribute and
getitem:

CUDA ≥ 13: sm_101* → sm_110*
CUDA < 13: sm_110* → sm_101*

This is fundamentally incompatible with how Enum works. A real enum alias must share the same value tuple as its
canonical member; here the two members had different tuples, and the metaclass only intercepted attribute / subscript
lookup. So Arch.sm_110f got silently rerouted to the sm_101f member object, and .value on that object honestly reported
(10, 1, 'f').

The bug surfaces on both CUDA branches — just on the opposite name on each. The (10, 1, 'f') symptom means CUDA < 13.

Fix

Drop ArchMeta entirely and let sm_101* and sm_110* stand as independent enum members, each carrying its correct (major,
minor, suffix) tuple.

The cross-name family relationship (sm_101f is family-of sm_110f but not sm_100f) is already handled inside
Arch.is_family_of via an explicit special case on sm_101a / sm_101f, so no semantics are lost.

Net diff: arch.py loses ~50 lines of metaclass machinery; the rest of the file is unchanged.

Verification

  assert Arch.sm_110f.value == (11, 0, 'f')
  assert Arch.sm_101f.value == (10, 1, 'f')
  assert Arch.sm_110.value  == (11, 0, '')
  assert Arch.sm_101.value  == (10, 1, '')

  assert Arch.from_string('sm_110f').value == (11, 0, 'f')
  assert Arch.from_string('sm_101f').value == (10, 1, 'f')

  assert Arch.sm_101f.is_family_of(Arch.sm_110f) is True
  assert Arch.sm_101f.is_family_of(Arch.sm_100f) is False
  assert Arch.sm_103f.is_family_of(Arch.sm_100f) is True

All pass.

Compatibility notes

Existing call sites that already use Arch.sm_110* or Arch.sm_101* (tcgen05/copy.py, tcgen05/mma.py,
numeric_conversion.py, the MLA decode examples) continue to work.
Arch["sm_110f"] / Arch["sm_101f"] both resolve to their own members instead of being version-routed. Callers that
relied on the implicit rerouting should select the canonical member for their target CUDA version explicitly.

lingolin128 · 2026-05-19T11:26:28Z

fix: #3249

yiwangchunyu · 2026-05-21T08:28:06Z

Hi @lingolin128 , thanks for the careful Enum-semantics analysis — the diagnosis that __getattribute__-based aliasing with different _value_ tuples can't be self-consistent is accurate.

Before we land it, some context that wasn't documented in the original code (and probably why this looked like a clear bug):

What `ArchMeta` was trying to do

We want to use same Python source against both CUDA 12.9 and CUDA 13.1. sm_101 was renamed to sm_110 after CUDA 13.0. ArchMeta was trying to make this transparent to callers, so both of these work on either CUDA without per-call-site handling:

if arch is Arch.sm_110f:                                # same chip on both CUDA
    emit_arch_specific_intrinsic()

target_str = f"sm_{arch.major}{arch.minor}{arch.suffix}"  # toolchain-recognized
target_str = arch.to_string()

The symptom you saw — Arch.sm_110f.value == (10, 1, 'f') on CUDA 12 — is actually the intended outcome of that second pattern: on CUDA 12, the chip you're referring to is what ptxas calls sm_101f, so (10, 1, 'f') is the tuple that produces a ptxas-acceptable string.

That said, you're right that the implementation is broken (fights against _member_map_, iteration, pickle, etc.). So I agree it shouldn't stay as-is.

Concern with deleting it

After this PR, Arch.sm_110f and Arch.sm_101f become independent members on every CUDA, so:

if arch is Arch.sm_110f: silently misses the sm_101f case (and vice versa) — every site needs defensive or.
f"Arch.sm110" emits a string the CUDA 12.9 toolchain rejects.
Some sites might silently regress

A smaller fix

Use real Python Enum aliases gated on CUDA version (same _value_ tuple → automatic alias, no metaclass):

class Arch(Enum):
    sm_100  = (10, 0, "")
    # ...
    if CUDA_VERSION.major >= 13:
        sm_110  = (11, 0, "")
        sm_110a = (11, 0, "a")
        sm_110f = (11, 0, "f")
        sm_101  = sm_110
        sm_101a = sm_110a
        sm_101f = sm_110f
    else:
        sm_101  = (10, 1, "")
        sm_101a = (10, 1, "a")
        sm_101f = (10, 1, "f")
        sm_110  = sm_101
        sm_110a = sm_101a
        sm_110f = sm_101f

CUDA 12.9:  Arch.sm_110f is Arch.sm_101f → True; .value (10,1,'f'); .name 'sm_101f'
CUDA 13.1:  Arch.sm_110f is Arch.sm_101f → True; .value (11,0,'f'); .name 'sm_110f'

This kills ArchMeta and makes .value/.name/identity all self-consistent (your goals), while keeping the single-source / no-OR-at-callers property.

Does this direction match what you had in mind for the fix? I'd like to hear if you see issues with the real-alias pattern, or if there are alternatives you've considered that we should weigh against it. Happy to iterate on the design. Thanks again — your analysis made the right replacement much easier to see.

lingolin128 · 2026-05-22T08:36:35Z

Hi @yiwangchunyu , thanks for the context — the single-source / ptxas-string motivation wasn't obvious from the code. The real-alias pattern is exactly right: it kills ArchMeta, makes .value/.name/identity self-consistent, and keeps the no-OR-at-callers property. Pushed an update that adopts it. Verified on both CUDA branches with a stubbed CUDA_VERSION:

	CUDA 12.x	CUDA 13.x
`sm_110f is sm_101f`	`True`	`True`
`.value`	`(10, 1, 'f')`	`(11, 0, 'f')`
`.name`	`'sm_101f'`	`'sm_110f'`
`f"sm_{m}{n}{s}"` / `to_string()`	ptxas-accepted	ptxas-accepted

if arch is Arch.sm_110f: matches on both CUDAs, toolchain strings stay accepted.

One thing that needs to come along: is_family_of. The existing special case arch.major == 11 and arch.minor >= 0 breaks reflexivity under real aliases — on CUDA 12 the canonical member has .major == 10, so sm_110f.is_family_of(sm_110f) would return False. Since aliases collapse the two names into one member, I rewrote it as identity-based:

if self in [Arch.sm_101a, Arch.sm_101f]:
    return arch in [Arch.sm_101a, Arch.sm_101f]

Holds reflexivity, holds cross-name, still rejects sm_100f as the family root. sm_103f.is_family_of(sm_100f) is unaffected.

Heads-up on BlackwellArchs(): under real aliases the literal tuple now contains duplicate object refs (sm_101* and sm_110* are the same member), so len(Arch.BlackwellArchs()) no longer matches the number of distinct chips, and set(...) shrinks. Not sure whethez the original size was load-bearing for any caller, so I didn't pre-emptively dedupe — let me know if it should be tuple(dict.fromkeys(...)) and I'll add it.

yiwangchunyu · 2026-05-24T13:57:01Z

Hi, @lingolin128 , thanks for your update!

is_family_of — good catch The rewrite is necessary. One small tweak I'd suggest to keep the behavior strictly equivalent to the pre-PR version: the semantic intent of this function is "can self use family-specific features of arch", so plain sm_101/sm_110 should still be a valid family root for the a/f variants (consistent with sm_101f.is_family_of(sm_101) → True):

if self in [Arch.sm_101a, Arch.sm_101f]:
    return arch in [Arch.sm_101, Arch.sm_101a, Arch.sm_101f]

BlackwellArchs() both options are fine to me. Because we don't rely on the length of the returned list so far. Returning a deduplicated tuple would be cleaner, since under real aliases the duplicate entries also surface as repeated names in print(BlackwellArchs()).

lingolin128 · 2026-05-24T15:08:13Z

Hi @yiwangchunyu ,Thanks for the careful review and valuable suggestions!

Fixed the is_family_of logic as you pointed out to keep consistent behavior.
Optimized BlackwellArchs() and removed duplicate entries
Now it returns a deduplicated tuple with 15 valid arch entries under different CUDA versions.

# CUDA 12.x
BlackwellArchs()
-> (Arch.sm_100, Arch.sm_100a, Arch.sm_100f, Arch.sm_101, Arch.sm_101a, Arch.sm_101f, Arch.sm_103, Arch.sm_103a, Arch.sm_103f, Arch.sm_120, Arch.sm_120a, Arch.sm_120f, Arch.sm_121, Arch.sm_121a, Arch.sm_121f)
len(BlackwellArchs())
-> 15

# CUDA 13.x
BlackwellArchs()
-> (Arch.sm_100, Arch.sm_100a, Arch.sm_100f, Arch.sm_110, Arch.sm_110a, Arch.sm_110f, Arch.sm_103, Arch.sm_103a, Arch.sm_103f, Arch.sm_120, Arch.sm_120a, Arch.sm_120f, Arch.sm_121, Arch.sm_121a, Arch.sm_121f)
len(BlackwellArchs())
-> 15

Thanks a lot for review!

yiwangchunyu · 2026-05-25T01:26:36Z

Thanks for the update, LGTM!

lingolin128 · 2026-05-26T08:28:33Z

Hi @Junkai-Wu , please review this pr. Thanks a lot!

lingolin128 force-pushed the fix branch from d055610 to 2df6414 Compare May 22, 2026 08:16

lingolin128 force-pushed the fix branch from 2df6414 to 92283df Compare May 24, 2026 14:55

fix(base_dsl): replace ArchMeta with conditional Enum aliases

da18917

lingolin128 force-pushed the fix branch from 92283df to da18917 Compare May 24, 2026 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(base_dsl): drop ArchMeta alias so Arch.sm_*.value is correct#3248

fix(base_dsl): drop ArchMeta alias so Arch.sm_*.value is correct#3248
lingolin128 wants to merge 1 commit into
NVIDIA:mainfrom
lingolin128:fix

lingolin128 commented May 19, 2026

Uh oh!

lingolin128 commented May 19, 2026

Uh oh!

yiwangchunyu commented May 21, 2026

Uh oh!

lingolin128 commented May 22, 2026 •

edited

Loading

Uh oh!

yiwangchunyu commented May 24, 2026

Uh oh!

lingolin128 commented May 24, 2026

Uh oh!

yiwangchunyu commented May 25, 2026

Uh oh!

lingolin128 commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lingolin128 commented May 19, 2026

Summary

Root cause

Fix

Uh oh!

lingolin128 commented May 19, 2026

Uh oh!

yiwangchunyu commented May 21, 2026

What ArchMeta was trying to do

Concern with deleting it

A smaller fix

Uh oh!

lingolin128 commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiwangchunyu commented May 24, 2026

Uh oh!

lingolin128 commented May 24, 2026

Uh oh!

yiwangchunyu commented May 25, 2026

Uh oh!

lingolin128 commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

What `ArchMeta` was trying to do

lingolin128 commented May 22, 2026 •

edited

Loading