Skip to content

Commit fde9e28

Browse files
author
peng.li24
committed
ci: fix SIGILL on Azure VMs — add -mprefer-vector-width=256
GitHub Actions ubuntu-latest runners (Azure) expose avx512f in /proc/cpuinfo (CPUID) but have ZMM state disabled by hypervisor. __builtin_cpu_supports correctly returns false, so the SVML bridge avoids AVX-512 paths. But GCC auto-vectorizer with -mavx512f can emit ZMM instructions in *any* loop — those SIGILL when ZMM state is off. Fix: add -mprefer-vector-width=256 to limit auto-vectorization to 256-bit. Explicit __attribute__((target("avx512f"))) functions with runtime __builtin_cpu_supports guards are unaffected. README: document -mprefer-vector-width=256 as a 4th required flag.
1 parent 782eec9 commit fde9e28

2 files changed

Lines changed: 10 additions & 3 deletions

File tree

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -114,8 +114,9 @@ caused at least one test failure are marked **required**.
114114

115115
```cmake
116116
target_compile_options(<target> PRIVATE
117-
-ffp-contract=off # REQUIRED — see below
118-
-mavx512f -mfma # REQUIRED — see below
117+
-ffp-contract=off # REQUIRED — see below
118+
-mavx512f -mfma # REQUIRED — see below
119+
-mprefer-vector-width=256 # REQUIRED — see below
119120
)
120121
target_link_libraries(<target> PRIVATE dl) # REQUIRED — dlsym
121122
```
@@ -124,6 +125,7 @@ target_link_libraries(<target> PRIVATE dl) # REQUIRED — dlsym
124125
|------|-------------|-------------------------------|
125126
| `-ffp-contract=off` | Prevents the compiler from silently fusing `a*b + c` into a single FMA instruction. numpycpp's einsum accumulation loops must use the same multiply-then-add order as numpy's BLAS kernels. | 36 einsum tests fail with ±1 ULP differences. |
126127
| `-mavx512f -mfma` | The SVML bridge declares fast scalar wrappers (`exp_svml_f64`, etc.) inside `#ifdef __AVX512F__`. Without this flag the preprocessor omits those declarations and the dispatcher fails to compile. AVX-512 intrinsics are runtime-guarded via `__builtin_cpu_supports` — the binary is safe on non-AVX-512 CPUs. | Hard compile error: `'exp_svml_f64' was not declared in this scope`. |
128+
| `-mprefer-vector-width=256` | Prevents the GCC auto-vectorizer from emitting 512-bit (ZMM) instructions globally. Some cloud VMs expose `avx512f` in `/proc/cpuinfo` (CPUID) but have ZMM state disabled by the hypervisor (XSAVE does not save ZMM). `__builtin_cpu_supports` correctly returns false in that case, so the SVML bridge is safe — but any auto-vectorized ZMM instruction in unguarded code still causes SIGILL. `-mprefer-vector-width=256` hard-limits auto-vectorization to 256-bit; explicit `__attribute__((target("avx512f")))` functions and runtime-guarded intrinsics are unaffected. | SIGILL at test startup on cloud VMs (e.g. GitHub Actions azure runners) where ZMM state is not enabled by the hypervisor. |
127129
| `-ldl` | `dlsym` / `dlopen` are used at startup to locate numpy's `_multiarray_umath.so` and resolve `npy_exp`, `__svml_exp8`, etc. | Link error: `undefined reference to 'dlsym'`. |
128130

129131
#### Recommended (defensive) flags

tests/CMakeLists.txt

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,12 @@ endif()
6969
target_compile_options(numpycpp PRIVATE
7070
-O2
7171
-ffp-contract=off # no implicit FMA for a+b*c (keeps Cody-Waite exact)
72-
-msse4.1 -mavx512f -mfma # enable AVX-512 code paths
72+
-msse4.1 -mavx512f -mfma # -mavx512f needed to compile AVX-512 intrinsics
73+
-mprefer-vector-width=256 # prevent auto-vectorizer from emitting 512-bit (ZMM)
74+
# instructions globally; Azure VMs expose avx512f in
75+
# cpuinfo but may have ZMM state disabled by hypervisor
76+
# → explicit __attribute__((target("avx512f"))) + runtime
77+
# __builtin_cpu_supports guard handles AVX-512 safely
7378
# disable builtin replacements so our calls go through SVML/npy_math paths
7479
-fno-builtin-exp -fno-builtin-log -fno-builtin-sin
7580
-fno-builtin-cos -fno-builtin-tan -fno-builtin-pow

0 commit comments

Comments
 (0)