Skip to content

@nospecialize testf/_compare forwarders in CUDA tests.#3117

Merged
maleadt merged 1 commit intomasterfrom
tb/testsuite
Apr 23, 2026
Merged

@nospecialize testf/_compare forwarders in CUDA tests.#3117
maleadt merged 1 commit intomasterfrom
tb/testsuite

Conversation

@maleadt
Copy link
Copy Markdown
Member

@maleadt maleadt commented Apr 23, 2026

The main test/helpers.jl `testf` and the private clones in lib/cublas, lib/cufft, lib/cusolver are all thin forwarders that call out to deepcopy / adapt / ≈. Each unique (f, xs...) call site was getting its own compiled method even though the body does no per-type compute.

The main test/helpers.jl \`testf\` and the private clones in
lib/cublas, lib/cufft, lib/cusolver are all thin forwarders that
call out to deepcopy / adapt / ≈. Each unique (f, xs...) call site
was getting its own compiled method even though the body does no
per-type compute.

Matches the GPUArrays-side fix on TestSuite.compare / test_result
(that one was the #1 testsuite compile hotspot until it was
@nospecialize'd — 322 events dropped to 0 on the broadcasting
testset trace).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: 85552c4 Previous: 22b2689 Ratio
array/accumulate/Float32/1d 101042 ns 102014.5 ns 0.99
array/accumulate/Float32/dims=1 76713 ns 77428 ns 0.99
array/accumulate/Float32/dims=1L 1586552 ns 1587073 ns 1.00
array/accumulate/Float32/dims=2 143947.5 ns 144782 ns 0.99
array/accumulate/Float32/dims=2L 658019 ns 661233 ns 1.00
array/accumulate/Int64/1d 118225 ns 118923 ns 0.99
array/accumulate/Int64/dims=1 80053.5 ns 80472.5 ns 0.99
array/accumulate/Int64/dims=1L 1695730 ns 1706560 ns 0.99
array/accumulate/Int64/dims=2 156555 ns 156801 ns 1.00
array/accumulate/Int64/dims=2L 962158 ns 962520 ns 1.00
array/broadcast 20369 ns 20669 ns 0.99
array/construct 1256.9 ns 1256.8 ns 1.00
array/copy 18009 ns 18092 ns 1.00
array/copyto!/cpu_to_gpu 216616 ns 217195 ns 1.00
array/copyto!/gpu_to_cpu 280853 ns 284274 ns 0.99
array/copyto!/gpu_to_gpu 10742 ns 10873 ns 0.99
array/iteration/findall/bool 134816 ns 134986 ns 1.00
array/iteration/findall/int 149572 ns 151080 ns 0.99
array/iteration/findfirst/bool 81437 ns 81781 ns 1.00
array/iteration/findfirst/int 83808 ns 84023 ns 1.00
array/iteration/findmin/1d 86422.5 ns 87902.5 ns 0.98
array/iteration/findmin/2d 117166 ns 117417 ns 1.00
array/iteration/logical 198169.5 ns 201691.5 ns 0.98
array/iteration/scalar 66685 ns 66086 ns 1.01
array/permutedims/2d 51987.5 ns 52480 ns 0.99
array/permutedims/3d 52434 ns 53206 ns 0.99
array/permutedims/4d 51399.5 ns 52056 ns 0.99
array/random/rand/Float32 13113 ns 13513 ns 0.97
array/random/rand/Int64 25032 ns 25225 ns 0.99
array/random/rand!/Float32 8447.666666666666 ns 9480 ns 0.89
array/random/rand!/Int64 21770 ns 21874.5 ns 1.00
array/random/randn/Float32 43447 ns 39930.5 ns 1.09
array/random/randn!/Float32 30894 ns 30795 ns 1.00
array/reductions/mapreduce/Float32/1d 34244 ns 35149.5 ns 0.97
array/reductions/mapreduce/Float32/dims=1 49219.5 ns 40034.5 ns 1.23
array/reductions/mapreduce/Float32/dims=1L 51349 ns 51335 ns 1.00
array/reductions/mapreduce/Float32/dims=2 56457 ns 56606 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 69468.5 ns 69556 ns 1.00
array/reductions/mapreduce/Int64/1d 42976.5 ns 42597 ns 1.01
array/reductions/mapreduce/Int64/dims=1 43024.5 ns 50815 ns 0.85
array/reductions/mapreduce/Int64/dims=1L 87461 ns 87295 ns 1.00
array/reductions/mapreduce/Int64/dims=2 59307 ns 59460 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 84710 ns 84825 ns 1.00
array/reductions/reduce/Float32/1d 34755 ns 35215.5 ns 0.99
array/reductions/reduce/Float32/dims=1 41015.5 ns 40067 ns 1.02
array/reductions/reduce/Float32/dims=1L 51446 ns 51370 ns 1.00
array/reductions/reduce/Float32/dims=2 56728.5 ns 56646 ns 1.00
array/reductions/reduce/Float32/dims=2L 70065 ns 69814 ns 1.00
array/reductions/reduce/Int64/1d 42968 ns 42572 ns 1.01
array/reductions/reduce/Int64/dims=1 42021.5 ns 50859 ns 0.83
array/reductions/reduce/Int64/dims=1L 87380 ns 87191 ns 1.00
array/reductions/reduce/Int64/dims=2 59393 ns 59743 ns 0.99
array/reductions/reduce/Int64/dims=2L 84703.5 ns 84667 ns 1.00
array/reverse/1d 17939 ns 18121 ns 0.99
array/reverse/1dL 68538 ns 68646 ns 1.00
array/reverse/1dL_inplace 65727 ns 65864 ns 1.00
array/reverse/1d_inplace 10232.833333333332 ns 10292.666666666668 ns 0.99
array/reverse/2d 20679 ns 21138 ns 0.98
array/reverse/2dL 72699 ns 73135 ns 0.99
array/reverse/2dL_inplace 65688 ns 65713 ns 1.00
array/reverse/2d_inplace 11080 ns 11182 ns 0.99
array/sorting/1d 2735481 ns 2735147 ns 1.00
array/sorting/2d 1068965 ns 1073189 ns 1.00
array/sorting/by 3304941 ns 3303865.5 ns 1.00
cuda/synchronization/context/auto 1182.1 ns 1112 ns 1.06
cuda/synchronization/context/blocking 938.6515151515151 ns 888.829268292683 ns 1.06
cuda/synchronization/context/nonblocking 7634.4 ns 6852.5 ns 1.11
cuda/synchronization/stream/auto 996 ns 970.8 ns 1.03
cuda/synchronization/stream/blocking 796.6421052631579 ns 800.5102040816327 ns 1.00
cuda/synchronization/stream/nonblocking 8034.1 ns 7101.299999999999 ns 1.13
integration/byval/reference 143711 ns 143926 ns 1.00
integration/byval/slices=1 145657 ns 145976 ns 1.00
integration/byval/slices=2 284549 ns 284590 ns 1.00
integration/byval/slices=3 423087 ns 423197 ns 1.00
integration/cudadevrt 102345 ns 102511 ns 1.00
integration/volumerhs 23471812 ns 23505751 ns 1.00
kernel/indexing 13153 ns 13253 ns 0.99
kernel/indexing_checked 13896 ns 13903 ns 1.00
kernel/launch 2147.222222222222 ns 2079.8888888888887 ns 1.03
kernel/occupancy 664.1314102564102 ns 693.52 ns 0.96
kernel/rand 17278.5 ns 15424 ns 1.12
latency/import 3821736397.5 ns 3843588033 ns 0.99
latency/precompile 4587847485 ns 4581549814.5 ns 1.00
latency/ttfp 4396109132 ns 4430341918 ns 0.99

This comment was automatically generated by workflow using github-action-benchmark.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 0% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 16.58%. Comparing base (22a3b2c) to head (85552c4).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
lib/cublas/test/setup.jl 0.00% 4 Missing ⚠️
lib/cusolver/test/setup.jl 0.00% 4 Missing ⚠️
lib/cufft/test/setup.jl 0.00% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3117   +/-   ##
=======================================
  Coverage   16.58%   16.58%           
=======================================
  Files         120      120           
  Lines        9586     9586           
=======================================
  Hits         1590     1590           
  Misses       7996     7996           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maleadt maleadt merged commit 345c160 into master Apr 23, 2026
2 checks passed
@maleadt maleadt deleted the tb/testsuite branch April 23, 2026 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant