Code review comments. by zoranjovanovic-ns · Pull Request #698 · ROCm/xla

zoranjovanovic-ns · 2026-03-19T17:33:01Z

Motivation

Applied code review comments from claude

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Imported from GitHub PR openxla#38507 📝 Summary of Changes Updated functions from triton/support.c and triton/support_test to execute correctly on ROCm. 🎯 Justification support_test was failing on ROCm. 🚀 Kind of Contribution Please remove what does not apply: 🐛 Bug Fix, ⚡️ Performance Improvement, ✨ New Feature, ♻️ Cleanup, 📚 Documentation, 🧪 Tests 📊 Benchmark (for Performance Improvements) Please measure and include speedups for one of the public HLOs in `compiler/xla/tools/benchmarks/hlo/`. 🧪 Unit Tests: Used existing triton/support_test 🧪 Execution Tests: What execution tests were added? For example, a new optimization should be tested with an end-to-end execution test triggering the optimization and asserting correctness. Please provide test cases running with at most 2 GPUs. Copybara import of the project: -- e067431 by zoranjovanovic-ns <126815388+zoranjovanovic-ns@users.noreply.github.com>: Fixed triton_support_test on rocm. -- 16f1f07 by Zoran Jovanovic <zjovanov@amd.com>: Review comments. -- 731fb44 by Zoran Jovanovic <zjovanov@amd.com>: Code review 2 Merging this change closes openxla#38507 COPYBARA_INTEGRATE_REVIEW=openxla#38507 from ROCm:ci_rocm-fix-triton-support-4 731fb44 PiperOrigin-RevId: 877900520

…evice_test for ROCm. Imported from GitHub PR openxla#38742 📝 Summary of Changes Created expected output for FuseSubchannelDequantizationWithTranspose in triton/fusion_emitter_int4_device_test on ROCm. 🎯 Justification triton/fusion_emitter_int4_device_test was failing 🚀 Kind of Contribution Please remove what does not apply: 🧪 Tests 📊 Benchmark (for Performance Improvements) Please measure and include speedups for one of the public HLOs in `compiler/xla/tools/benchmarks/hlo/`. 🧪 Unit Tests: triton/fusion_emitter_int4_device_test 🧪 Execution Tests: What execution tests were added? For example, a new optimization should be tested with an end-to-end execution test triggering the optimization and asserting correctness. Please provide test cases running with at most 2 GPUs. Copybara import of the project: -- d00e6d1 by Zoran Jovanovic <zjovanov@amd.com>: Fix expected output in fusion_emitter_int4_device_test for ROCm. Merging this change closes openxla#38742 COPYBARA_INTEGRATE_REVIEW=openxla#38742 from ROCm:rocm-fusion_emitter_int4_device_test d00e6d1 PiperOrigin-RevId: 880942122

…u in gpu_triton_cu… Imported from GitHub PR openxla#38801 …stom_call_test for ROCm 📝 Summary of Changes skipped CanNotEmitTritonCustomCallOnPreAmpereGpu in gpu_triton_custom_call_test for ROCm 🎯 Justification Unit test was failing because it works on ROCm 🚀 Kind of Contribution Please remove what does not apply: 🧪 Tests 📊 Benchmark (for Performance Improvements) Please measure and include speedups for one of the public HLOs in `compiler/xla/tools/benchmarks/hlo/`. 🧪 Unit Tests: gpu_triton_custom_call_test 🧪 Execution Tests: What execution tests were added? For example, a new optimization should be tested with an end-to-end execution test triggering the optimization and asserting correctness. Please provide test cases running with at most 2 GPUs. Copybara import of the project: -- 6cf15ac by Zoran Jovanovic <zjovanov@amd.com>: [ROCm] Skip CanNotEmitTritonCustomCallOnPreAmpereGpu in gpu_triton_custom_call_test for ROCm Merging this change closes openxla#38801 COPYBARA_INTEGRATE_REVIEW=openxla#38801 from ROCm:rocm-fix-gpu_triton_custom_call_test 6cf15ac PiperOrigin-RevId: 881953512

…egacy and test itself. Imported from GitHub PR openxla#38759 📝 Summary of Changes Modified IsDotAlgorithmSupportedByTriton to reflect implementation on ROCm. 🎯 Justification triton/dot_algorithms_test was failing 🚀 Kind of Contribution Please remove what does not apply: 🐛 Bug Fix, 🧪 Tests 📊 Benchmark (for Performance Improvements) Please measure and include speedups for one of the public HLOs in `compiler/xla/tools/benchmarks/hlo/`. 🧪 Unit Tests: triton/dot_algorithms_test 🧪 Execution Tests: What execution tests were added? For example, a new optimization should be tested with an end-to-end execution test triggering the optimization and asserting correctness. Please provide test cases running with at most 2 GPUs. Copybara import of the project: -- 757877c by Zoran Jovanovic <zjovanov@amd.com>: Fixed dot_algorithms_test. Updated support_legacy and test itself. -- 5685c9c by Zoran Jovanovic <zjovanov@amd.com>: Review comments. -- d801633 by Zoran Jovanovic <zjovanov@amd.com>: Review comment. Merging this change closes openxla#38759 COPYBARA_INTEGRATE_REVIEW=openxla#38759 from ROCm:rocm-fix-dot_algorithms_test d801633 PiperOrigin-RevId: 885489190

… autotuner Imported from GitHub PR openxla#38792 Enable FissionBackend autotuning for ROCm (rocBLAS + hipBLASLt) - Added HIPBLASLT_FISSION to backend proto - Updated factory_rocm.cc to register the backends - xla_gpu_experimental_disabe_binary_libraries, xla_gpu_enable_cublaslt behavior mirror CUDA Also minor fix: removed dead return in fission_backend.cc 🚀 Kind of Contribution ✨ New Feature Copybara import of the project: -- ad88d8c by Eetu Sjöblom <eetu.sjoblom@amd.com>: Enable rocblas/hipBLASLt fission on ROCm Merging this change closes openxla#38792 COPYBARA_INTEGRATE_REVIEW=openxla#38792 from ROCm:ci_rocm_enable_fission_upstream ad88d8c PiperOrigin-RevId: 884317138 (cherry picked from commit 77ecf53)

…ne_level == 0 Imported from GitHub PR openxla#37074 📝 Summary of Changes Adapt GpuCompiler::AddConvAndGemmAutotuningPass to match pre refactor behavior of AMDGPUCompiler. 🎯 Justification For ROCm we need to run miopen backend even when autotuning is disabled in order to decompose back unsupported fused convolutions. There is no runtime fallback. 🚀 Kind of Contribution 🐛 Bug Fix 📊 Benchmark (for Performance Improvements) N\A 🧪 Unit Tests: None 🧪 Execution Tests: None Copybara import of the project: -- 486498b by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: [ROCm] Reenable miopen autotune when xla_gpu_autotune_level == 0 -- feec478 by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>: Fix //xla/backends/gpu/autotuner:miopen_test_amdgpu_any Merging this change closes openxla#37074 COPYBARA_INTEGRATE_REVIEW=openxla#37074 from ROCm:miopen_autotune feec478 PiperOrigin-RevId: 877364967

Backport upstream fix that prevents a crash in the BlockPingpong optimization when applied to FMA (non-MFMA) dot operations. The pass now uses dyn_cast instead of cast for AMDMfmaEncodingAttr and returns early when the encoding is not MFMA.

zoranjovanovic-ns and others added 13 commits March 18, 2026 11:01

[ROCm] Modify fusion_emitter_large_test to work on ROCm.

6568c21

Modified triton_fusion_numerics_verifier_test to work on ROCm.

b3de6d8

skip conditional graph tests

f6e8c1d

Avoid upcast of lib func operands to F32 for F16 type.

4bda461

[ROCm] Temporary workaround for column reduction warp size

48c25c1

Code review comments.

6c5ce77

zoranjovanovic-ns requested review from draganmladjenovic, hsharsha and i-chaochen March 19, 2026 17:33

zoranjovanovic-ns mentioned this pull request Mar 19, 2026

Backport from upstream to 0.9.1 branch #672

Merged

1 task

i-chaochen force-pushed the v0.9.1-test-ci branch from 11b1fce to 07fe5a1 Compare March 20, 2026 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code review comments.#698

Code review comments.#698
zoranjovanovic-ns wants to merge 13 commits intov0.9.1-test-cifrom
v0.9.1-test-ci-code-review

zoranjovanovic-ns commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

zoranjovanovic-ns commented Mar 19, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants