Skip to content

Code review comments.#698

Open
zoranjovanovic-ns wants to merge 13 commits intov0.9.1-test-cifrom
v0.9.1-test-ci-code-review
Open

Code review comments.#698
zoranjovanovic-ns wants to merge 13 commits intov0.9.1-test-cifrom
v0.9.1-test-ci-code-review

Conversation

@zoranjovanovic-ns
Copy link

Motivation

Applied code review comments from claude

Technical Details

Test Plan

Test Result

Submission Checklist

zoranjovanovic-ns and others added 13 commits March 18, 2026 11:01
Imported from GitHub PR openxla#38507

📝 Summary of Changes
Updated functions from triton/support.c and triton/support_test to execute
correctly on ROCm.

🎯 Justification
support_test was failing on ROCm.
🚀 Kind of Contribution
Please remove what does not apply: 🐛 Bug Fix, ⚡️ Performance Improvement,
✨ New Feature, ♻️ Cleanup, 📚 Documentation, 🧪 Tests

📊 Benchmark (for Performance Improvements)
Please measure and include speedups for one of the public HLOs in
`compiler/xla/tools/benchmarks/hlo/`.

🧪 Unit Tests:
Used existing triton/support_test

🧪 Execution Tests:
What execution tests were added? For example, a new optimization should be
tested with an end-to-end execution test triggering the optimization and
asserting correctness. Please provide test cases running with at most 2 GPUs.

Copybara import of the project:

--
e067431 by zoranjovanovic-ns <126815388+zoranjovanovic-ns@users.noreply.github.com>:

Fixed triton_support_test on rocm.

--
16f1f07 by Zoran Jovanovic <zjovanov@amd.com>:

Review comments.

--
731fb44 by Zoran Jovanovic <zjovanov@amd.com>:

Code review 2

Merging this change closes openxla#38507

COPYBARA_INTEGRATE_REVIEW=openxla#38507 from ROCm:ci_rocm-fix-triton-support-4 731fb44
PiperOrigin-RevId: 877900520
…evice_test for ROCm.

Imported from GitHub PR openxla#38742

📝 Summary of Changes
Created expected output for FuseSubchannelDequantizationWithTranspose in triton/fusion_emitter_int4_device_test on ROCm.

🎯 Justification
triton/fusion_emitter_int4_device_test  was failing

🚀 Kind of Contribution
Please remove what does not apply: 🧪 Tests

📊 Benchmark (for Performance Improvements)
Please measure and include speedups for one of the public HLOs in
`compiler/xla/tools/benchmarks/hlo/`.

🧪 Unit Tests:
triton/fusion_emitter_int4_device_test

🧪 Execution Tests:
What execution tests were added? For example, a new optimization should be
tested with an end-to-end execution test triggering the optimization and
asserting correctness. Please provide test cases running with at most 2 GPUs.

Copybara import of the project:

--
d00e6d1 by Zoran Jovanovic <zjovanov@amd.com>:

Fix expected output in fusion_emitter_int4_device_test for ROCm.

Merging this change closes openxla#38742

COPYBARA_INTEGRATE_REVIEW=openxla#38742 from ROCm:rocm-fusion_emitter_int4_device_test d00e6d1
PiperOrigin-RevId: 880942122
…u in gpu_triton_cu…

Imported from GitHub PR openxla#38801

…stom_call_test for ROCm

📝 Summary of Changes
skipped CanNotEmitTritonCustomCallOnPreAmpereGpu in gpu_triton_custom_call_test for ROCm

🎯 Justification
Unit test was failing because it works on ROCm

🚀 Kind of Contribution
Please remove what does not apply:  🧪 Tests

📊 Benchmark (for Performance Improvements)
Please measure and include speedups for one of the public HLOs in
`compiler/xla/tools/benchmarks/hlo/`.

🧪 Unit Tests:
gpu_triton_custom_call_test

🧪 Execution Tests:
What execution tests were added? For example, a new optimization should be
tested with an end-to-end execution test triggering the optimization and
asserting correctness. Please provide test cases running with at most 2 GPUs.

Copybara import of the project:

--
6cf15ac by Zoran Jovanovic <zjovanov@amd.com>:

[ROCm] Skip CanNotEmitTritonCustomCallOnPreAmpereGpu in gpu_triton_custom_call_test for ROCm

Merging this change closes openxla#38801

COPYBARA_INTEGRATE_REVIEW=openxla#38801 from ROCm:rocm-fix-gpu_triton_custom_call_test 6cf15ac
PiperOrigin-RevId: 881953512
…egacy and test itself.

Imported from GitHub PR openxla#38759

📝 Summary of Changes
Modified IsDotAlgorithmSupportedByTriton to reflect implementation on ROCm.

🎯 Justification
triton/dot_algorithms_test was failing

🚀 Kind of Contribution
Please remove what does not apply: 🐛 Bug Fix,  🧪 Tests

📊 Benchmark (for Performance Improvements)
Please measure and include speedups for one of the public HLOs in
`compiler/xla/tools/benchmarks/hlo/`.

🧪 Unit Tests:
triton/dot_algorithms_test

🧪 Execution Tests:
What execution tests were added? For example, a new optimization should be
tested with an end-to-end execution test triggering the optimization and
asserting correctness. Please provide test cases running with at most 2 GPUs.

Copybara import of the project:

--
757877c by Zoran Jovanovic <zjovanov@amd.com>:

Fixed dot_algorithms_test. Updated support_legacy and test itself.

--
5685c9c by Zoran Jovanovic <zjovanov@amd.com>:

Review comments.

--
d801633 by Zoran Jovanovic <zjovanov@amd.com>:

Review comment.

Merging this change closes openxla#38759

COPYBARA_INTEGRATE_REVIEW=openxla#38759 from ROCm:rocm-fix-dot_algorithms_test d801633
PiperOrigin-RevId: 885489190
… autotuner

Imported from GitHub PR openxla#38792

Enable FissionBackend autotuning for ROCm (rocBLAS + hipBLASLt)

- Added HIPBLASLT_FISSION to backend proto
- Updated factory_rocm.cc to register the backends
- xla_gpu_experimental_disabe_binary_libraries, xla_gpu_enable_cublaslt behavior mirror CUDA

Also minor fix: removed dead return in fission_backend.cc

🚀 Kind of Contribution
✨ New Feature

Copybara import of the project:

--
ad88d8c by Eetu Sjöblom <eetu.sjoblom@amd.com>:

Enable rocblas/hipBLASLt fission on ROCm

Merging this change closes openxla#38792

COPYBARA_INTEGRATE_REVIEW=openxla#38792 from ROCm:ci_rocm_enable_fission_upstream ad88d8c
PiperOrigin-RevId: 884317138

(cherry picked from commit 77ecf53)
…ne_level == 0

Imported from GitHub PR openxla#37074

📝 Summary of Changes
Adapt GpuCompiler::AddConvAndGemmAutotuningPass to match pre refactor behavior of AMDGPUCompiler.

🎯 Justification
For ROCm we need to run miopen backend even when autotuning is disabled in order to decompose back unsupported fused convolutions. There is no runtime fallback.

🚀 Kind of Contribution
🐛 Bug Fix

📊 Benchmark (for Performance Improvements)
N\A

🧪 Unit Tests:
None

🧪 Execution Tests:
None

Copybara import of the project:

--
486498b by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>:

[ROCm] Reenable miopen autotune when xla_gpu_autotune_level == 0

--
feec478 by Dragan Mladjenovic <Dragan.Mladjenovic@amd.com>:

Fix //xla/backends/gpu/autotuner:miopen_test_amdgpu_any

Merging this change closes openxla#37074

COPYBARA_INTEGRATE_REVIEW=openxla#37074 from ROCm:miopen_autotune feec478
PiperOrigin-RevId: 877364967
Backport upstream fix that prevents a crash in the BlockPingpong
optimization when applied to FMA (non-MFMA) dot operations. The pass
now uses dyn_cast instead of cast for AMDMfmaEncodingAttr and returns
early when the encoding is not MFMA.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants