Skip to content

Add cuda_wrappers include feature for clang CUDA toolchains#481

Merged
cloudhan merged 1 commit into
bazel-contrib:mainfrom
AustinSchuh:cuda_wrappers
Jun 8, 2026
Merged

Add cuda_wrappers include feature for clang CUDA toolchains#481
cloudhan merged 1 commit into
bazel-contrib:mainfrom
AustinSchuh:cuda_wrappers

Conversation

@AustinSchuh

Copy link
Copy Markdown
Contributor

Clang ships cuda_wrappers/ shims under its resource directory (lib/clang//include/cuda_wrappers/) that #undef noinline around libstdc++ headers, working around the CUDA host_defines.h macro conflict with libstdc++'s attribute((noinline)) (seen in bits/shared_ptr_base.h).

Clang's CUDA driver already adds that directory automatically, but only as -internal-isystem. Clang merges all system-tier include groups in argument order, and -internal-isystem sorts after -cxx-isystem. So when the C++ standard library is injected explicitly via -nostdinc++ -cxx-isystem, the real bits/shared_ptr_base.h is found before clang's automatic cuda_wrappers entry; the wrapper never intercepts it via #include_next and the build fails on the noinline clash.

This worked before because clang was allowed to discover libstdc++ through its own GCC detection, which adds the stdlib as -internal-isystem after the cuda_wrappers entry the driver injects first, so the wrapper came first and fired automatically. Tightening the sandbox so clang no longer auto-detects the host toolchain (passing the stdlib via -cxx-isystem instead) reorders the stdlib ahead of the automatic wrapper and exposes the conflict.

Fix it by detecting lib/clang//include in the cc_toolchain's built-in include directories and adding the cuda_wrappers subdir via -isystem to cuda_compile actions. -isystem (System group) sorts ahead of -cxx-isystem (CXXSystem), so the wrapper intercepts again. Clang dedups the now-redundant automatic -internal-isystem copy (RemoveDuplicates keeps the earlier occurrence), so this is safe even when the driver also adds it.

Clang ships cuda_wrappers/ shims under its resource directory
(lib/clang/<v>/include/cuda_wrappers/) that #undef __noinline__ around
libstdc++ headers, working around the CUDA host_defines.h macro conflict
with libstdc++'s __attribute__((__noinline__)) (seen in
bits/shared_ptr_base.h).

Clang's CUDA driver already adds that directory automatically, but only
as -internal-isystem. Clang merges all system-tier include groups in
argument order, and -internal-isystem sorts after -cxx-isystem. So when
the C++ standard library is injected explicitly via -nostdinc++
-cxx-isystem, the real bits/shared_ptr_base.h is found before clang's
automatic cuda_wrappers entry; the wrapper never intercepts it via
#include_next and the build fails on the __noinline__ clash.

This worked before because clang was allowed to discover libstdc++
through its own GCC detection, which adds the stdlib as -internal-isystem
*after* the cuda_wrappers entry the driver injects first, so the wrapper
came first and fired automatically. Tightening the sandbox so clang no
longer auto-detects the host toolchain (passing the stdlib via
-cxx-isystem instead) reorders the stdlib ahead of the automatic wrapper
and exposes the conflict.

Fix it by detecting lib/clang/<v>/include in the cc_toolchain's built-in
include directories and adding the cuda_wrappers subdir via -isystem to
cuda_compile actions. -isystem (System group) sorts ahead of -cxx-isystem
(CXXSystem), so the wrapper intercepts again. Clang dedups the
now-redundant automatic -internal-isystem copy (RemoveDuplicates keeps
the earlier occurrence), so this is safe even when the driver also adds
it.
@cloudhan cloudhan merged commit 345dd02 into bazel-contrib:main Jun 8, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants