Added Base.similar methods for CuSparseMatrixCOO and BSR#3114
Open
rainerrodrigues wants to merge 4 commits intoJuliaGPU:masterfrom
Open
Added Base.similar methods for CuSparseMatrixCOO and BSR#3114rainerrodrigues wants to merge 4 commits intoJuliaGPU:masterfrom
rainerrodrigues wants to merge 4 commits intoJuliaGPU:masterfrom
Conversation
kshyatt
reviewed
Apr 21, 2026
Member
|
Also, can some tests be added? |
Contributor
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: 845e83c | Previous: d08923d | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
101140.5 ns |
101201.5 ns |
1.00 |
array/accumulate/Float32/dims=1 |
76322 ns |
77253 ns |
0.99 |
array/accumulate/Float32/dims=1L |
1584657 ns |
1586384.5 ns |
1.00 |
array/accumulate/Float32/dims=2 |
143544 ns |
144050 ns |
1.00 |
array/accumulate/Float32/dims=2L |
657740 ns |
658315 ns |
1.00 |
array/accumulate/Int64/1d |
118704 ns |
118535 ns |
1.00 |
array/accumulate/Int64/dims=1 |
80145 ns |
80398 ns |
1.00 |
array/accumulate/Int64/dims=1L |
1705722 ns |
1695368 ns |
1.01 |
array/accumulate/Int64/dims=2 |
156343 ns |
156400.5 ns |
1.00 |
array/accumulate/Int64/dims=2L |
961909 ns |
962477 ns |
1.00 |
array/broadcast |
20273 ns |
20549 ns |
0.99 |
array/construct |
1273.6 ns |
1244.7 ns |
1.02 |
array/copy |
17792 ns |
18346 ns |
0.97 |
array/copyto!/cpu_to_gpu |
213574 ns |
217009 ns |
0.98 |
array/copyto!/gpu_to_cpu |
281602 ns |
283646 ns |
0.99 |
array/copyto!/gpu_to_gpu |
10655 ns |
10940 ns |
0.97 |
array/iteration/findall/bool |
134322 ns |
135015 ns |
0.99 |
array/iteration/findall/int |
149464 ns |
150942 ns |
0.99 |
array/iteration/findfirst/bool |
80978 ns |
81514 ns |
0.99 |
array/iteration/findfirst/int |
83157 ns |
84056 ns |
0.99 |
array/iteration/findmin/1d |
84753 ns |
87756 ns |
0.97 |
array/iteration/findmin/2d |
117102 ns |
117744 ns |
0.99 |
array/iteration/logical |
197029.5 ns |
201505 ns |
0.98 |
array/iteration/scalar |
64912 ns |
67810 ns |
0.96 |
array/permutedims/2d |
52533 ns |
52558 ns |
1.00 |
array/permutedims/3d |
52825.5 ns |
52631 ns |
1.00 |
array/permutedims/4d |
51771 ns |
51253 ns |
1.01 |
array/random/rand/Float32 |
12720 ns |
12853 ns |
0.99 |
array/random/rand/Int64 |
25014 ns |
25414 ns |
0.98 |
array/random/rand!/Float32 |
9253 ns |
8376.666666666666 ns |
1.10 |
array/random/rand!/Int64 |
21467 ns |
21965 ns |
0.98 |
array/random/randn/Float32 |
40917.5 ns |
38474 ns |
1.06 |
array/random/randn!/Float32 |
25853 ns |
30963 ns |
0.83 |
array/reductions/mapreduce/Float32/1d |
34170 ns |
34333 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1 |
39205.5 ns |
40570 ns |
0.97 |
array/reductions/mapreduce/Float32/dims=1L |
51195.5 ns |
51354 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2 |
56360 ns |
56637 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
69013 ns |
69577 ns |
0.99 |
array/reductions/mapreduce/Int64/1d |
41911 ns |
42670 ns |
0.98 |
array/reductions/mapreduce/Int64/dims=1 |
42090 ns |
43498 ns |
0.97 |
array/reductions/mapreduce/Int64/dims=1L |
87036 ns |
87200 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
59145 ns |
59546 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=2L |
84434 ns |
84737 ns |
1.00 |
array/reductions/reduce/Float32/1d |
34387 ns |
34657.5 ns |
0.99 |
array/reductions/reduce/Float32/dims=1 |
40647 ns |
39996.5 ns |
1.02 |
array/reductions/reduce/Float32/dims=1L |
51397.5 ns |
51457 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
56481 ns |
56754 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
69363.5 ns |
70015.5 ns |
0.99 |
array/reductions/reduce/Int64/1d |
41982 ns |
42974.5 ns |
0.98 |
array/reductions/reduce/Int64/dims=1 |
50345.5 ns |
42327 ns |
1.19 |
array/reductions/reduce/Int64/dims=1L |
86951 ns |
87167 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
59596 ns |
59657 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
84767 ns |
84515 ns |
1.00 |
array/reverse/1d |
17643 ns |
17882 ns |
0.99 |
array/reverse/1dL |
68225.5 ns |
68439 ns |
1.00 |
array/reverse/1dL_inplace |
65570 ns |
65793.5 ns |
1.00 |
array/reverse/1d_inplace |
10158.5 ns |
8645.333333333334 ns |
1.18 |
array/reverse/2d |
20688 ns |
21177 ns |
0.98 |
array/reverse/2dL |
72784 ns |
73131 ns |
1.00 |
array/reverse/2dL_inplace |
65691 ns |
65813 ns |
1.00 |
array/reverse/2d_inplace |
10372 ns |
9973 ns |
1.04 |
array/sorting/1d |
2733819 ns |
2735906 ns |
1.00 |
array/sorting/2d |
1068081 ns |
1068705.5 ns |
1.00 |
array/sorting/by |
3302391 ns |
3304477 ns |
1.00 |
cuda/synchronization/context/auto |
1162.1 ns |
1153.8 ns |
1.01 |
cuda/synchronization/context/blocking |
930.5714285714286 ns |
920.219512195122 ns |
1.01 |
cuda/synchronization/context/nonblocking |
7707.299999999999 ns |
7049.8 ns |
1.09 |
cuda/synchronization/stream/auto |
989.2105263157895 ns |
1045.5 ns |
0.95 |
cuda/synchronization/stream/blocking |
807.7676767676768 ns |
845.6486486486486 ns |
0.96 |
cuda/synchronization/stream/nonblocking |
7204.8 ns |
7261.5 ns |
0.99 |
integration/byval/reference |
143774 ns |
143708 ns |
1.00 |
integration/byval/slices=1 |
145530 ns |
145668 ns |
1.00 |
integration/byval/slices=2 |
284392 ns |
284436 ns |
1.00 |
integration/byval/slices=3 |
423048 ns |
422883 ns |
1.00 |
integration/cudadevrt |
102357 ns |
102385 ns |
1.00 |
integration/volumerhs |
23462010.5 ns |
23480818 ns |
1.00 |
kernel/indexing |
12997 ns |
13249 ns |
0.98 |
kernel/indexing_checked |
13782 ns |
14040 ns |
0.98 |
kernel/launch |
2047.111111111111 ns |
2194.8888888888887 ns |
0.93 |
kernel/occupancy |
705.3356164383562 ns |
704.5137931034483 ns |
1.00 |
kernel/rand |
14118 ns |
16709 ns |
0.84 |
latency/import |
3837187980.5 ns |
3825894453 ns |
1.00 |
latency/precompile |
4600227064.5 ns |
4600509263 ns |
1.00 |
latency/ttfp |
4404423502.5 ns |
4395653546.5 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
kshyatt
reviewed
Apr 21, 2026
rainerrodrigues
commented
Apr 22, 2026
Author
rainerrodrigues
left a comment
There was a problem hiding this comment.
@kshyatt Hi, can you check if this is suitable and extensive enough for testing?
f08a059 to
b48050e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds the missing Base.similar methods for CuSparseMatrixCOO and CuSparseMatrixBSR, allowing them to fallback gracefully without converting to dense CPU arrays.
Fixes #3061
Fixes #3055