Add `SparseMatricesCSR.jl` Ext by Abdelrahman912 · Pull Request #906 · JuliaGPU/AMDGPU.jl

Abdelrahman912 · 2026-05-09T01:22:58Z

Add support to CPU CSR matrices, e.g., ROCSparseMatrixCSR(::SparseMatrixCSR)

luraess · 2026-05-18T21:09:30Z

Thanks for the contribution! The structure closely mirrors what CUDA.jl does in lib/cusparse/ext/SparseMatricesCSRExt.jl.

One thing to consider (unless I am wrong): SparseMatricesCSR.SparseMatrixCSR{Bi,Tv,Ti} has a type parameter Bi for the index base (0 or 1). The CPU→GPU constructors currently use Mat.rowptr/Mat.colval directly without checking Bi, which would silently produce incorrect index arrays for a 0-based SparseMatrixCSR{0}; ROCSPARSE expects 1-based pointers in the Julia wrappers (matching the SparseMatrixCSR{1} used in the GPU→CPU direction). Restricting the dispatch to {1} would turn this into a clear error instead:

ROCSparseMatrixCSR{T}(Mat::SparseMatrixCSR{1}) where {T} = ...

Note that CUDA.jl has the same gap, but this could be also fixed there if needbe.

Abdelrahman912 · 2026-05-28T21:30:02Z

This should be ready for merge.

github-actions

AMDGPU.jl Benchmarks

Details

Benchmark suite	Current: `cd19e25`	Previous: `77983ff`	Ratio
`amdgpu/synchronization/context/device`	`580` ns	`590` ns	`0.98`
`amdgpu/synchronization/stream/blocking`	`240` ns	`250` ns	`0.96`
`amdgpu/synchronization/stream/nonblocking`	`340` ns	`340` ns	`1`
`array/accumulate/Float32/1d`	`86522` ns	`87811` ns	`0.99`
`array/accumulate/Float32/dims=1`	`389157` ns	`403115` ns	`0.97`
`array/accumulate/Float32/dims=1L`	`136393` ns	`135562` ns	`1.01`
`array/accumulate/Float32/dims=2`	`129533` ns	`130651` ns	`0.99`
`array/accumulate/Float32/dims=2L`	`2829561` ns	`2830168` ns	`1.00`
`array/accumulate/Int64/1d`	`94602` ns	`95162` ns	`0.99`
`array/accumulate/Int64/dims=1`	`404208` ns	`287554` ns	`1.41`
`array/accumulate/Int64/dims=1L`	`167973` ns	`163062` ns	`1.03`
`array/accumulate/Int64/dims=2`	`126082` ns	`122662` ns	`1.03`
`array/accumulate/Int64/dims=2L`	`3006764` ns	`3009581` ns	`1.00`
`array/broadcast`	`82021` ns	`92571` ns	`0.89`
`array/construct`	`1760` ns	`1650` ns	`1.07`
`array/copy`	`37901` ns	`40541` ns	`0.93`
`array/copyto!/cpu_to_gpu`	`123523` ns	`101241` ns	`1.22`
`array/copyto!/gpu_to_cpu`	`183374` ns	`171042` ns	`1.07`
`array/copyto!/gpu_to_gpu`	`59931` ns	`67061` ns	`0.89`
`array/iteration/findall/bool`	`183394` ns	`186323` ns	`0.98`
`array/iteration/findall/int`	`196534` ns	`193182` ns	`1.02`
`array/iteration/findfirst/bool`	`120442` ns	`116491` ns	`1.03`
`array/iteration/findfirst/int`	`115812` ns	`116172` ns	`1.00`
`array/iteration/findmin/1d`	`170134` ns	`170953` ns	`1.00`
`array/iteration/findmin/2d`	`156333` ns	`157112` ns	`1.00`
`array/iteration/logical`	`355797` ns	`357464` ns	`1.00`
`array/iteration/scalar`	`289836` ns	`297354` ns	`0.97`
`array/permutedims/2d`	`74041` ns	`76211` ns	`0.97`
`array/permutedims/3d`	`74062` ns	`75451` ns	`0.98`
`array/permutedims/4d`	`76681` ns	`77631` ns	`0.99`
`array/random/rand/Float32`	`52941` ns	`52101` ns	`1.02`
`array/random/rand/Int64`	`57841` ns	`57911` ns	`1.00`
`array/random/rand!/Float32`	`72201` ns	`90721` ns	`0.80`
`array/random/rand!/Int64`	`113362` ns	`79251` ns	`1.43`
`array/random/randn/Float32`	`88392` ns	`94361` ns	`0.94`
`array/random/randn!/Float32`	`109012` ns	`111361` ns	`0.98`
`array/reductions/mapreduce/Float32/1d`	`133433` ns	`134212` ns	`0.99`
`array/reductions/mapreduce/Float32/dims=1`	`94822` ns	`95701` ns	`0.99`
`array/reductions/mapreduce/Float32/dims=1L`	`773215` ns	`781590` ns	`0.99`
`array/reductions/mapreduce/Float32/dims=2`	`97082` ns	`97911` ns	`0.99`
`array/reductions/mapreduce/Float32/dims=2L`	`307446` ns	`296814` ns	`1.04`
`array/reductions/mapreduce/Int64/1d`	`133202` ns	`131561` ns	`1.01`
`array/reductions/mapreduce/Int64/dims=1`	`95131` ns	`95531` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=1L`	`783035` ns	`785040` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=2`	`96311` ns	`97011` ns	`0.99`
`array/reductions/mapreduce/Int64/dims=2L`	`305326` ns	`300435` ns	`1.02`
`array/reductions/reduce/Float32/1d`	`132643` ns	`130022` ns	`1.02`
`array/reductions/reduce/Float32/dims=1`	`95222` ns	`95462` ns	`1.00`
`array/reductions/reduce/Float32/dims=1L`	`773155` ns	`776581` ns	`1.00`
`array/reductions/reduce/Float32/dims=2`	`97282` ns	`97971` ns	`0.99`
`array/reductions/reduce/Float32/dims=2L`	`307396` ns	`299134` ns	`1.03`
`array/reductions/reduce/Int64/1d`	`133453` ns	`134991` ns	`0.99`
`array/reductions/reduce/Int64/dims=1`	`94811` ns	`95171` ns	`1.00`
`array/reductions/reduce/Int64/dims=1L`	`781526` ns	`783650` ns	`1.00`
`array/reductions/reduce/Int64/dims=2`	`96212` ns	`97052` ns	`0.99`
`array/reductions/reduce/Int64/dims=2L`	`299866` ns	`298874` ns	`1.00`
`array/reverse/1d`	`44531` ns	`44560` ns	`1.00`
`array/reverse/1dL`	`75841` ns	`76361` ns	`0.99`
`array/reverse/1dL_inplace`	`110452` ns	`119381` ns	`0.93`
`array/reverse/1d_inplace`	`77421` ns	`79391` ns	`0.98`
`array/reverse/2d`	`52031` ns	`52601` ns	`0.99`
`array/reverse/2dL`	`101522` ns	`102402` ns	`0.99`
`array/reverse/2dL_inplace`	`112843` ns	`104651` ns	`1.08`
`array/reverse/2d_inplace`	`121763` ns	`125452` ns	`0.97`
`array/sorting/1d`	`341026` ns	`370525` ns	`0.92`
`integration/byval/reference`	`38731` ns	`39170` ns	`0.99`
`integration/byval/slices=1`	`39990` ns	`40441` ns	`0.99`
`integration/byval/slices=2`	`162353` ns	`147172` ns	`1.10`
`integration/byval/slices=3`	`243344` ns	`240543` ns	`1.01`
`integration/volumerhs`	`5037498` ns	`5026059` ns	`1.00`
`kernel/indexing`	`73662` ns	`65060` ns	`1.13`
`kernel/indexing_checked`	`72222` ns	`50911` ns	`1.42`
`kernel/launch`	`1340` ns	`1280` ns	`1.05`
`kernel/rand`	`197324` ns	`197633` ns	`1.00`
`latency/import`	`1473069466` ns	`1473646012` ns	`1.00`
`latency/precompile`	`11897056494` ns	`11890085969` ns	`1.00`
`latency/ttfp`	`10343919358` ns	`10356361296` ns	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

luraess · 2026-05-29T06:11:31Z

Thanks for the update. There are failing tests now. Upon fixing those we could then merge, if you can have a look.

Abdelrahman912 · 2026-05-29T21:20:51Z

+    for (n, bd, p) in [(100, 5, 0.02), (5, 1, 0.8), (4, 2, 0.5)]
+        @testset "conversions between ROCSparseMatrices (n, bd, p) = ($n, $bd, $p)" begin
+            _A = sprand(n, n, p)
+            A = SparseMatrixCSR(_A)
+            blockdim = bd
+            for ROCSparseMatrixType1 in (ROCSparseMatrixCSC, ROCSparseMatrixCSR, ROCSparseMatrixCOO, ROCSparseMatrixBSR)
+                dA1 = ROCSparseMatrixType1 == ROCSparseMatrixBSR ? ROCSparseMatrixType1(A, blockdim) : ROCSparseMatrixType1(A)
+                @testset "conversion $ROCSparseMatrixType1 --> SparseMatrixCSR" begin
+                    @test SparseMatrixCSR(dA1) ≈ A
+                end
+                for ROCSparseMatrixType2 in (ROCSparseMatrixCSC, ROCSparseMatrixCSR, ROCSparseMatrixCOO, ROCSparseMatrixBSR)
+                    ROCSparseMatrixType1 == ROCSparseMatrixType2 && continue
+                    dA2 = ROCSparseMatrixType2 == ROCSparseMatrixBSR ? ROCSparseMatrixType2(dA1, blockdim) : ROCSparseMatrixType2(dA1)
+                    @testset "conversion $ROCSparseMatrixType1 --> $ROCSparseMatrixType2" begin
+                        @test collect(dA1) ≈ collect(dA2)
+                    end
+                end
+            end
+        end


Error During Test at /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/amdgpu-dot-jl/test/hip_rocsparse/sparse_matrices_csr.jl:12 Got exception outside of a @test MethodError: no method matching (AMDGPU.rocSPARSE.ROCSparseMatrixBSR{Float64})(::AMDGPU.rocSPARSE.ROCSparseMatrixCSC{Float64, Int32}, ::Int64) Closest candidates are: (AMDGPU.rocSPARSE.ROCSparseMatrixBSR{Float64})(::AMDGPU.rocSPARSE.ROCSparseMatrixCSR{Float64}, ::Integer; dir, inda, indc) @ AMDGPU /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/amdgpu-dot-jl/src/sparse/conversions.jl:224 (AMDGPU.rocSPARSE.ROCSparseMatrixBSR{T})(::SparseArrays.SparseMatrixCSC, ::Any) where T @ AMDGPU /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/amdgpu-dot-jl/src/sparse/array.jl:410 (AMDGPU.rocSPARSE.ROCSparseMatrixBSR{T})(::SparseMatricesCSR.SparseMatrixCSR{1}, ::Any) where T @ AMDGPUSparseMatricesCSRExt /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/amdgpu-dot-jl/ext/AMDGPUSparseMatricesCSRExt.jl:26 ... Stacktrace: [1] AMDGPU.rocSPARSE.ROCSparseMatrixBSR(x::AMDGPU.rocSPARSE.ROCSparseMatrixCSC{Float64, Int32}, blockdim::Int64) @ AMDGPU.rocSPARSE /var/lib/buildkite-agent/builds/amdgpu1-luraess-com/julialang/amdgpu-dot-jl/src/sparse/array.jl:417

Probably this is a side tangent to this PR, but it's just so happen that it's the main culprit of the CI fail, due to this test nested loop which essentially converts every sparse matrix to every other sparse matrix. That said,
@luraess is there a reason why there is no conversion between CSC & BSR ? Like this one:
https://github.com/JuliaGPU/CUDA.jl/blob/54c758682d909f32d656fcbe8c43c20062850927/lib/cusparse/src/conversions.jl#L710-L711

If no need for them, I can just update the test, otherwise I can add those conversions, which hopefully will make CI happy.

otherwise I can add those conversions

That would be the best way if you're willing to do so. Thanks!

luraess · 2026-05-30T20:31:45Z

Thanks!

init

d0ebffe

Abdelrahman912 added 2 commits May 28, 2026 22:48

Merge remote-tracking branch 'upstream/main' into csr-ext

86a0763

restrict dispatch

422c90f

github-actions Bot reviewed May 28, 2026

View reviewed changes

Abdelrahman912 commented May 29, 2026

View reviewed changes

add conversions

cd19e25

luraess merged commit 25220ef into JuliaGPU:main May 30, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `SparseMatricesCSR.jl` Ext#906

Add `SparseMatricesCSR.jl` Ext#906
luraess merged 4 commits into
JuliaGPU:mainfrom
Abdelrahman912:csr-ext

Abdelrahman912 commented May 9, 2026

Uh oh!

luraess commented May 18, 2026

Uh oh!

Abdelrahman912 commented May 28, 2026

Uh oh!

github-actions Bot left a comment •

edited

Loading

Uh oh!

luraess commented May 29, 2026

Uh oh!

Abdelrahman912 May 29, 2026

Uh oh!

luraess May 30, 2026

Uh oh!

Uh oh!

luraess commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Abdelrahman912 commented May 9, 2026

Uh oh!

luraess commented May 18, 2026

Uh oh!

Abdelrahman912 commented May 28, 2026

Uh oh!

github-actions Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

AMDGPU.jl Benchmarks

Uh oh!

luraess commented May 29, 2026

Uh oh!

Abdelrahman912 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

luraess May 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

luraess commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot left a comment •

edited

Loading