`cuda::std::simd` Optimize Arithmetic Floating-point Operations by fbusato · Pull Request #8775 · NVIDIA/cccl

fbusato · 2026-04-30T23:43:05Z

Description

This PR introduces the following optimizations and checks:

HADD2, HMUL2, HFMA for Half and Bfloat16: There are no single-element variants, so here we only check that all operators relying on them generate the expected number of instructions of that type.
F32x2 Blackwell SM100: Ensure that the following operations are mapped to F32x2 instructions:
- Plus +
- Minus -
- Unary minus (-)
- Increment ++
- Decrement --

Jacobfaib · 2026-05-01T20:54:38Z

+#  define _CCCL_HAS_SIMD_F32X2() 0
+#endif // _CCCL_CUDA_COMPILER(NVCC, >=, 12, 8) || (__cccl_ptx_isa >= 860ULL)
+
+#if _CCCL_HAS_SIMD_F32X2()


Does the entire file need to be gated behind this? I notice that only a few functions specifically need it, and for those you can just add an extra #else clause that does the naive algorithm.

This saves downstream users also needing to gate their code behind _CCCL_HAS_SIMD_F32X2() as the symbols always exist.

This is intentional. Users should never use this file

Jacobfaib · 2026-05-01T20:57:40Z

+endif()
+
+set(simd_codegen_cuda_archs 80 90)
+if (CMAKE_CUDA_COMPILER_VERSION VERSION_GREATER_EQUAL 12.9)


Don't assume nvcc here. Can also be clang-cuda

I don't think checking the SASS code for clang-cuda is critical

Jacobfaib · 2026-05-01T20:59:07Z

License header?

I followed atomic_codegen. I defer it to @wmaxey

github-actions · 2026-05-02T01:16:10Z

😬 CI Workflow Results

🟥 Finished in 1h 08m: Pass: 98%/110 | Total: 19h 00m | Max: 43m 45s | Hits: 98%/307133

See results here.

half/bfloat plus/multiply/fma

691608b

fbusato self-assigned this Apr 30, 2026

fbusato requested a review from a team as a code owner April 30, 2026 23:43

fbusato added this to CCCL Apr 30, 2026

fbusato requested a review from a team as a code owner April 30, 2026 23:43

fbusato requested a review from bernhardmgruber April 30, 2026 23:43

fbusato added the libcu++ For all items related to libcu++ label Apr 30, 2026

github-project-automation Bot moved this to Todo in CCCL Apr 30, 2026

cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Apr 30, 2026

fbusato added 3 commits April 30, 2026 17:02

check comparison

fc02d50

test f32x2

4ef18db

fix _CCCL_DEVICE_API

7bf2ed9

This comment has been minimized.

Sign in to view

Merge branch 'main' into simd-optimize-add-mul-fma

b8761c1

This comment has been minimized.

Sign in to view

extend f32x2 optmization to sub/increment/decrement/unary minus

3dd3298

fbusato changed the title ~~[DRAFT] cuda::std::simd Optimize basic vector operations~~ cuda::std::simd Optimize Arithmentic Floating-point Operations May 1, 2026

This comment has been minimized.

Sign in to view

Jacobfaib reviewed May 1, 2026

View reviewed changes

address comments

6a29aa2

This comment has been minimized.

Sign in to view

fbusato changed the title ~~cuda::std::simd Optimize Arithmentic Floating-point Operations~~ cuda::std::simd Optimize Arithmetic Floating-point Operations May 1, 2026

fbusato moved this from In Review to In Progress in CCCL May 1, 2026

improve organization

58e44c5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`cuda::std::simd` Optimize Arithmetic Floating-point Operations#8775

`cuda::std::simd` Optimize Arithmetic Floating-point Operations#8775
fbusato wants to merge 8 commits intoNVIDIA:mainfrom
fbusato:simd-optimize-add-mul-fma

fbusato commented Apr 30, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Uh oh!

Uh oh!

Uh oh!

Jacobfaib May 1, 2026

Uh oh!

fbusato May 1, 2026

Uh oh!

Uh oh!

Uh oh!

Jacobfaib May 1, 2026

Uh oh!

fbusato May 1, 2026 •

edited

Loading

Uh oh!

Jacobfaib May 1, 2026

Uh oh!

fbusato May 1, 2026

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fbusato commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Uh oh!

Uh oh!

Uh oh!

Jacobfaib May 1, 2026

Choose a reason for hiding this comment

Uh oh!

fbusato May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Jacobfaib May 1, 2026

Choose a reason for hiding this comment

Uh oh!

fbusato May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jacobfaib May 1, 2026

Choose a reason for hiding this comment

Uh oh!

fbusato May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented May 2, 2026

😬 CI Workflow Results

🟥 Finished in 1h 08m: Pass: 98%/110 | Total: 19h 00m | Max: 43m 45s | Hits: 98%/307133

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fbusato commented Apr 30, 2026 •

edited

Loading

fbusato May 1, 2026 •

edited

Loading