Enable autotuner rocblas/hipblaslt fission backends by Eetusjo · Pull Request #663 · ROCm/xla

Eetusjo · 2026-03-10T09:08:48Z

Enables rocBLAS/hipBLASLt fission backends in autotuner.

Example using run_hlo_module

Running input HLO:

HloModule test

triton_gemm {
  p0 = bf16[4096,4096] parameter(0)
  p1 = bf16[4096,4096] parameter(1)
  ROOT dot = bf16[4096,4096] dot(p0, p1),
    lhs_contracting_dims={1}, rhs_contracting_dims={0}
}

ENTRY main {
  p0 = bf16[4096,4096] parameter(0)
  p1 = bf16[4096,4096] parameter(1)
  ROOT r = bf16[4096,4096] fusion(p0, p1),
    kind=kCustom, calls=triton_gemm,
    backend_config={"fusion_backend_config":{"kind":"__triton_gemm"}}
}

Running bazel run --config=rocm_ci //xla/tools:run_hlo_module -- --input_format=hlo --platform=gpu $(realpath triton_gemm.hlo) --xla_gpu_enable_cublaslt --xla_gpu_dump_autotune_logs_to=/tmp/at_logs with --xla_gpu_cublas_fallback={true,false} to enable or disable fission respectively.

Autotune results when fission disabled we see Triton:

results {
  result {
    triton {
      ...
    }
  }
  ...
}

Enabled we get hipBLASLt.

results {
  ...
  result {
    other {
      name: "HipblasLt_fission"
      ...
    }
  }
}

xla/backends/gpu/autotuner/factory_rocm.cc

claude · 2026-03-10T09:16:55Z

xla/backends/autotuner/autotuner.cc

    executable_candidates.erase(
-        std::remove_if(executable_candidates.begin(),
-                       executable_candidates.end(),
-                       [](const ExecutableCandidate& candidate) {
-                         return candidate.config.codegen_backend->name() ==
-                                "Cublas_fission";
-                       }),
+        std::remove_if(
+            executable_candidates.begin(), executable_candidates.end(),
+            [](const ExecutableCandidate& candidate) {
+              const auto& name = candidate.config.codegen_backend->name();
+              return name == "Cublas_fission" ||
+                     name == "Rocblas_fission" ||
+                     name == "HipblasLt_fission";
+            }),
        executable_candidates.end());
  }



exclude_cublas_config name no longer matches its scope

This field now also excludes Rocblas_fission and HipblasLt_fission, but the name and comment ("autotuner will not select cublas configs for fusions") still suggest cuBLAS only. Consider renaming to something vendor-neutral like exclude_fission_configs, or at minimum updating the comment in autotuner.h:75 to clarify it covers all BLAS library fission backends.

claude · 2026-03-10T09:16:59Z

xla/backends/gpu/autotuner/factory_rocm.cc

+  std::vector<std::unique_ptr<CodegenBackend>> backends;
+  backends.push_back(std::make_unique<FissionBackend>(
+      debug_options, compiler, target_config,
+      std::make_unique<RocblasBackend>(stream_executor, debug_options, compiler,
+                                      target_config),
+      GetCublasRewriterPipeline(target_config->device_description),
+      mlir_context));
+  backends.push_back(std::make_unique<FissionBackend>(
+      debug_options, compiler, target_config,
+      std::make_unique<HipblasLtBackend>(stream_executor, debug_options,
+                                         compiler, target_config),
+      GetCublasRewriterPipeline(target_config->device_description,
+                                /*enable_cublaslt=*/true),
+      mlir_context));
+  return backends;


No tests for the new ROCm fission backends

This PR enables two new backend registrations (Rocblas_fission, HipblasLt_fission) and modifies the exclusion filter, but adds no tests. The existing test at autotuner_test.cc:713 only mocks Cublas_fission. At minimum, a unit test verifying that GetFissionBackendsForROCm returns the correct number/types of backends and a test covering the exclusion of the ROCm backend names when exclude_cublas_config is set would help prevent regressions.

xla/service/gpu/amdgpu_compiler.cc

claude · 2026-03-10T09:43:53Z

Re-review: 2/4 prior issues resolved (pass ordering fixed). 2 remain open (exclude_cublas_config naming, missing tests). No new findings.

Eetusjo · 2026-03-10T11:47:24Z

This downstream branch was branched out from upstream when the codegen backend factories were added, but not used in the compilers. Currently I'm duplicating some functionality to enable both the compiler autotuner path and the autotuner_main target. @draganmladjenovic do you have an opinion on whether I should refactor this to the current upstream way which uses the factory everywhere?

claude · 2026-03-10T12:03:51Z

xla/backends/gpu/autotuner/autotuner_main.cc

+  auto fission_backends_or =
+      registry.FindObject<GetFissionBackends>(platform->id());
+  if (fission_backends_or.ok()) {
+    const GetFissionBackends::Type& get_fission_backends =
+        fission_backends_or.value();
+    std::vector<std::unique_ptr<CodegenBackend>> fission_backends =
+        get_fission_backends(stream_executor, &debug_options, compiler.get(),
+                             &target_config, mlir_context);
+    backends.insert(backends.end(),
+                    std::make_move_iterator(fission_backends.begin()),
+                    std::make_move_iterator(fission_backends.end()));
+  }


Fission backends loaded unconditionally, bypassing xla_gpu_cublas_fallback gate

In amdgpu_compiler.cc (and nvptx_compiler.cc), fission backends are only added when debug_options.xla_gpu_cublas_fallback() is true. Here in autotuner_main, fission backends are loaded unconditionally whenever a platform has registered them. This means running via autotuner_main will always include fission backends regardless of the xla_gpu_cublas_fallback flag, while the compiler path respects that flag.

Consider gating this block behind the same debug_options.xla_gpu_cublas_fallback() check, e.g.:

if (debug_options.xla_gpu_cublas_fallback()) { auto fission_backends_or = ...; // ... }

i-chaochen · 2026-03-11T13:00:11Z

This downstream branch was branched out from upstream when the codegen backend factories were added, but not used in the compilers. Currently I'm duplicating some functionality to enable both the compiler autotuner path and the autotuner_main target. @draganmladjenovic do you have an opinion on whether I should refactor this to the current upstream way which uses the factory everywhere?

IMHO we no need to refactor this on 0.9.0 as we're planning to have 0.9.1 release soon https://github.com/ROCm/xla/commits/rocm-jaxlib-v0.9.1/

Let's stay what it has in 0.9.0 and make the enablement as little changes as possible. As we don't want to confuse future debug and causes more divegences. Hope it makes sense.

If it's much better shape in 0.9.1 I guess we shall land this feature into our 0.9.1 release?

We shall restrict our refactoring on release branches https://github.com/ROCm/xla-internal/pull/6/changes

cc @nurmukhametov @hsharsha

hsharsha · 2026-03-11T13:44:50Z

I tend to agree with @i-chaochen to keep the release branch as close to the point of release, with only bug fixes to them rather than bringing in new code or features.

nurmukhametov · 2026-03-12T15:04:43Z

I tend to agree with @i-chaochen to keep the release branch as close to the point of release, with only bug fixes to them rather than bringing in new code or features.

I agree

claude bot reviewed Mar 10, 2026

View reviewed changes

xla/backends/gpu/autotuner/factory_rocm.cc Show resolved Hide resolved

claude bot reviewed Mar 10, 2026

View reviewed changes

xla/service/gpu/amdgpu_compiler.cc Show resolved Hide resolved

Eetusjo added 2 commits March 10, 2026 11:36

Enable autotuner rocblas/hipblaslt fission backends

d6f211a

Reorder rewriter passes to match gpu compiler

942000b

Eetusjo force-pushed the ci_enable_autotuner_fission_090 branch from 7a3d591 to 942000b Compare March 10, 2026 11:36

Eetusjo marked this pull request as ready for review March 10, 2026 11:39

Eetusjo requested review from draganmladjenovic March 10, 2026 11:44

claude bot reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable autotuner rocblas/hipblaslt fission backends#663

Enable autotuner rocblas/hipblaslt fission backends#663
Eetusjo wants to merge 2 commits intorocm-jaxlib-v0.9.0from
ci_enable_autotuner_fission_090

Eetusjo commented Mar 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

claude bot Mar 10, 2026

Uh oh!

claude bot Mar 10, 2026

Uh oh!

Uh oh!

claude bot commented Mar 10, 2026

Uh oh!

Eetusjo commented Mar 10, 2026

Uh oh!

claude bot Mar 10, 2026

Uh oh!

i-chaochen commented Mar 11, 2026 •

edited

Loading

Uh oh!

hsharsha commented Mar 11, 2026

Uh oh!

nurmukhametov commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Eetusjo commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

claude bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude bot commented Mar 10, 2026

Uh oh!

Eetusjo commented Mar 10, 2026

Uh oh!

claude bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

i-chaochen commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsharsha commented Mar 11, 2026

Uh oh!

nurmukhametov commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Eetusjo commented Mar 10, 2026 •

edited

Loading

i-chaochen commented Mar 11, 2026 •

edited

Loading