Skip to content

Commit 32cd98a

Browse files
author
peng.li24
committed
perf: inline f32 poly loops + AVX-512 sqrt/abs + CMake tests build
Performance improvements (N=524288, 0 ULP maintained): exp f32: 0.085x numpy → 0.70x (+8x vs old scalar-per-call approach) log f32: 0.095x numpy → 0.87x (+9x) sin f32: 0.054x numpy → 0.74x (+14x) cos f32: 0.053x numpy → 0.72x (+14x) sqrt f32: 0.910x numpy → 1.07x (now vectorized, AVX-512 immune to throttle) sqrt f64: parity maintained Root causes fixed: 1. noinline helper functions (npy_expf_vec16 etc.) caused 32768 function calls per 524288-element array; now the polynomial is inlined directly into each template specialization with all 14-15 constants defined as non-static locals before the loop — GCC keeps them in zmm8-zmm31. 2. -ffloat-store in Makefile caused GCC to spill every __m512 intermediate to the stack and reload it, doubling the instruction count for every operation. Removed (redundant on x86-64 with SSE/AVX default float ABI). 3. sqrt/abs had no AVX-512 specialization; added 16-wide float and 8-wide double loops using _mm512_sqrt_ps/pd and _mm512_abs_ps/pd (IEEE 754 exact, 0 ULP, immune to CPU frequency throttling caused by other AVX-512 loops running in the same process). Build system: - Replace tests/Makefile with tests/CMakeLists.txt cmake -S tests -B tests/build && cmake --build tests/build cmake --build tests/build --target test - Update root CMakeLists.txt help messages accordingly
1 parent d81c887 commit 32cd98a

5 files changed

Lines changed: 537 additions & 34 deletions

File tree

CMakeLists.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -76,5 +76,5 @@ add_custom_target(deb
7676

7777
message(STATUS "numpycpp v${PROJECT_VERSION} (header-only C++ library)")
7878
message(STATUS " C++ Standard: ${CMAKE_CXX_STANDARD}")
79-
message(STATUS " DEB: make deb → numpycpp-dev-${CPACK_PACKAGE_VERSION}-Linux.deb")
80-
message(STATUS " Test: cd tests/ && make → build + run bit-level alignment tests")
79+
message(STATUS " DEB: cmake --build <build_dir> --target deb")
80+
message(STATUS " Test: cmake -S tests -B tests/build && cmake --build tests/build --target test")

0 commit comments

Comments
 (0)