Enable autotuner rocblas/hipblaslt fission backends#663
Enable autotuner rocblas/hipblaslt fission backends#663Eetusjo wants to merge 2 commits intorocm-jaxlib-v0.9.0from
Conversation
| executable_candidates.erase( | ||
| std::remove_if(executable_candidates.begin(), | ||
| executable_candidates.end(), | ||
| [](const ExecutableCandidate& candidate) { | ||
| return candidate.config.codegen_backend->name() == | ||
| "Cublas_fission"; | ||
| }), | ||
| std::remove_if( | ||
| executable_candidates.begin(), executable_candidates.end(), | ||
| [](const ExecutableCandidate& candidate) { | ||
| const auto& name = candidate.config.codegen_backend->name(); | ||
| return name == "Cublas_fission" || | ||
| name == "Rocblas_fission" || | ||
| name == "HipblasLt_fission"; | ||
| }), | ||
| executable_candidates.end()); | ||
| } | ||
|
|
There was a problem hiding this comment.
exclude_cublas_config name no longer matches its scope
This field now also excludes Rocblas_fission and HipblasLt_fission, but the name and comment ("autotuner will not select cublas configs for fusions") still suggest cuBLAS only. Consider renaming to something vendor-neutral like exclude_fission_configs, or at minimum updating the comment in autotuner.h:75 to clarify it covers all BLAS library fission backends.
| std::vector<std::unique_ptr<CodegenBackend>> backends; | ||
| backends.push_back(std::make_unique<FissionBackend>( | ||
| debug_options, compiler, target_config, | ||
| std::make_unique<RocblasBackend>(stream_executor, debug_options, compiler, | ||
| target_config), | ||
| GetCublasRewriterPipeline(target_config->device_description), | ||
| mlir_context)); | ||
| backends.push_back(std::make_unique<FissionBackend>( | ||
| debug_options, compiler, target_config, | ||
| std::make_unique<HipblasLtBackend>(stream_executor, debug_options, | ||
| compiler, target_config), | ||
| GetCublasRewriterPipeline(target_config->device_description, | ||
| /*enable_cublaslt=*/true), | ||
| mlir_context)); | ||
| return backends; |
There was a problem hiding this comment.
No tests for the new ROCm fission backends
This PR enables two new backend registrations (Rocblas_fission, HipblasLt_fission) and modifies the exclusion filter, but adds no tests. The existing test at autotuner_test.cc:713 only mocks Cublas_fission. At minimum, a unit test verifying that GetFissionBackendsForROCm returns the correct number/types of backends and a test covering the exclusion of the ROCm backend names when exclude_cublas_config is set would help prevent regressions.
|
Re-review: 2/4 prior issues resolved (pass ordering fixed). 2 remain open (exclude_cublas_config naming, missing tests). No new findings. |
7a3d591 to
942000b
Compare
|
This downstream branch was branched out from upstream when the codegen backend factories were added, but not used in the compilers. Currently I'm duplicating some functionality to enable both the compiler autotuner path and the autotuner_main target. @draganmladjenovic do you have an opinion on whether I should refactor this to the current upstream way which uses the factory everywhere? |
| auto fission_backends_or = | ||
| registry.FindObject<GetFissionBackends>(platform->id()); | ||
| if (fission_backends_or.ok()) { | ||
| const GetFissionBackends::Type& get_fission_backends = | ||
| fission_backends_or.value(); | ||
| std::vector<std::unique_ptr<CodegenBackend>> fission_backends = | ||
| get_fission_backends(stream_executor, &debug_options, compiler.get(), | ||
| &target_config, mlir_context); | ||
| backends.insert(backends.end(), | ||
| std::make_move_iterator(fission_backends.begin()), | ||
| std::make_move_iterator(fission_backends.end())); | ||
| } |
There was a problem hiding this comment.
Fission backends loaded unconditionally, bypassing xla_gpu_cublas_fallback gate
In amdgpu_compiler.cc (and nvptx_compiler.cc), fission backends are only added when debug_options.xla_gpu_cublas_fallback() is true. Here in autotuner_main, fission backends are loaded unconditionally whenever a platform has registered them. This means running via autotuner_main will always include fission backends regardless of the xla_gpu_cublas_fallback flag, while the compiler path respects that flag.
Consider gating this block behind the same debug_options.xla_gpu_cublas_fallback() check, e.g.:
if (debug_options.xla_gpu_cublas_fallback()) {
auto fission_backends_or = ...;
// ...
}
IMHO we no need to refactor this on 0.9.0 as we're planning to have 0.9.1 release soon https://github.com/ROCm/xla/commits/rocm-jaxlib-v0.9.1/ Let's stay what it has in 0.9.0 and make the enablement as little changes as possible. As we don't want to confuse future debug and causes more divegences. Hope it makes sense. If it's much better shape in 0.9.1 I guess we shall land this feature into our 0.9.1 release? We shall restrict our refactoring on release branches https://github.com/ROCm/xla-internal/pull/6/changes |
|
I tend to agree with @i-chaochen to keep the release branch as close to the point of release, with only bug fixes to them rather than bringing in new code or features. |
I agree |
Enables rocBLAS/hipBLASLt fission backends in autotuner.
Example using
run_hlo_moduleRunning input HLO:
Running
bazel run --config=rocm_ci //xla/tools:run_hlo_module -- --input_format=hlo --platform=gpu $(realpath triton_gemm.hlo) --xla_gpu_enable_cublaslt --xla_gpu_dump_autotune_logs_to=/tmp/at_logswith--xla_gpu_cublas_fallback={true,false}to enable or disable fission respectively.Autotune results when fission disabled we see Triton:
Enabled we get hipBLASLt.