You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ci: fix SIGILL on Azure VMs — add -mprefer-vector-width=256
GitHub Actions ubuntu-latest runners (Azure) expose avx512f in
/proc/cpuinfo (CPUID) but have ZMM state disabled by hypervisor.
__builtin_cpu_supports correctly returns false, so the SVML bridge
avoids AVX-512 paths. But GCC auto-vectorizer with -mavx512f can emit
ZMM instructions in *any* loop — those SIGILL when ZMM state is off.
Fix: add -mprefer-vector-width=256 to limit auto-vectorization to
256-bit. Explicit __attribute__((target("avx512f"))) functions with
runtime __builtin_cpu_supports guards are unaffected.
README: document -mprefer-vector-width=256 as a 4th required flag.
|`-ffp-contract=off`| Prevents the compiler from silently fusing `a*b + c` into a single FMA instruction. numpycpp's einsum accumulation loops must use the same multiply-then-add order as numpy's BLAS kernels. | 36 einsum tests fail with ±1 ULP differences. |
126
127
|`-mavx512f -mfma`| The SVML bridge declares fast scalar wrappers (`exp_svml_f64`, etc.) inside `#ifdef __AVX512F__`. Without this flag the preprocessor omits those declarations and the dispatcher fails to compile. AVX-512 intrinsics are runtime-guarded via `__builtin_cpu_supports` — the binary is safe on non-AVX-512 CPUs. | Hard compile error: `'exp_svml_f64' was not declared in this scope`. |
128
+
|`-mprefer-vector-width=256`| Prevents the GCC auto-vectorizer from emitting 512-bit (ZMM) instructions globally. Some cloud VMs expose `avx512f` in `/proc/cpuinfo` (CPUID) but have ZMM state disabled by the hypervisor (XSAVE does not save ZMM). `__builtin_cpu_supports` correctly returns false in that case, so the SVML bridge is safe — but any auto-vectorized ZMM instruction in unguarded code still causes SIGILL. `-mprefer-vector-width=256` hard-limits auto-vectorization to 256-bit; explicit `__attribute__((target("avx512f")))` functions and runtime-guarded intrinsics are unaffected. | SIGILL at test startup on cloud VMs (e.g. GitHub Actions azure runners) where ZMM state is not enabled by the hypervisor. |
127
129
|`-ldl`|`dlsym` / `dlopen` are used at startup to locate numpy's `_multiarray_umath.so` and resolve `npy_exp`, `__svml_exp8`, etc. | Link error: `undefined reference to 'dlsym'`. |
0 commit comments