Skip to content

fix: default to AppleClang on macOS; fix Phase 1 correctness bugs#16

Merged
OldCrow merged 8 commits intomainfrom
fix/appleclang-default
May 6, 2026
Merged

fix: default to AppleClang on macOS; fix Phase 1 correctness bugs#16
OldCrow merged 8 commits intomainfrom
fix/appleclang-default

Conversation

@OldCrow
Copy link
Copy Markdown
Owner

@OldCrow OldCrow commented May 6, 2026

Summary

Switch the macOS compiler default from Homebrew LLVM to system AppleClang, and fix a set of correctness bugs identified during a code quality audit.

Motivation

Homebrew LLVM was previously auto-selected on all macOS builds to enable std::execution::par_unseq. This created a real ABI mismatch risk: exceptions and std::type_info records from Homebrew libc++ are binary incompatible with those from Apple libc++, causing silent catch failures when any consumer (Python extensions, test executables) was built with AppleClang. The same failure mode was confirmed in the libhmm project on this development ecosystem.

The library already has a complete GCD (dispatch_apply-based) fallback in include/platform/parallel_execution.h for exactly this case. The core batch distribution paths use WorkStealingPool directly and are unaffected. The utility layer degrades to GCD parallelism — still parallel, not SIMD-hinted — which is the correct trade-off for ABI safety.

Changes

Phase 0 — Compiler default (ABI correctness)

  • CMakeLists.txt: detect Homebrew LLVM but do not apply it by default; expose -DLIBSTATS_USE_HOMEBREW_LLVM=ON as an explicit opt-in with ABI contract documented in the option comment
  • GCD parallel path is now the active macOS parallel strategy

Phase 1 — Correctness bug fixes

  • src/cpu_detection.cpp: brand-string memcpy was passing &eax (4 bytes) as a 16-byte source; fix assembles all four CPUID registers into uint32_t regs[4] before copying
  • src/libstats_init.cpp: fast-path double-checked locking used a plain static bool (data race under the C++ memory model); replaced with std::atomic<bool> using acquire/release ordering
  • src/performance_history.cpp: getBestStrategy() iterated WORK_STEALING twice in the strategy list; learnOptimalThresholds() had a dead else if branch checking the same key string twice — both leftover from the GPU_ACCELERATEDWORK_STEALING rename
  • src/discrete.cpp, exponential.cpp, gamma.cpp, uniform.cpp: removed 12 no-op assignments of the form if (strategy == WORK_STEALING) { strategy = WORK_STEALING; } across the three WithStrategy overloads in each file (same rename remnant)

Catalina attribute warning fix

  • Added LIBSTATS_LIKELY / LIBSTATS_UNLIKELY macros to include/common/utility_common.h using __has_cpp_attribute(likely) for a portable, forward-compatible check
  • Replaced five bare [[likely]] / [[unlikely]] uses in math_utils.h
  • Eliminates -Wunknown-attributes noise on AppleClang 12 while preserving the hint on compilers that support it

CMake / SIMD audit (from cross-platform validation)

  • Corrected AVX-512 detection on MSVC and GCC/Clang; removed redundant cmake/DetectAVX512.cmake
  • Corrected GTest detection on Windows with FetchContent fallback
  • Eight additional CMake build system inefficiencies corrected
  • Suppressed spurious unused-variable warning in test_simd_policy Release build

Validation

Tested on all four machines in the development ecosystem:

Machine OS Compiler SIMD Tests
MacBook Pro 9,1 (Ivy Bridge) macOS Catalina 10.15 AppleClang 12 SSE2 + AVX 32/32 ✅
MacBook Pro 14,1 (Kaby Lake) macOS Ventura AppleClang 14 SSE2 + AVX + AVX2 all ✅
Mac Mini M1 macOS 26 Tahoe AppleClang 16 NEON all ✅
Asus TUF A16 (Ryzen Zen 4) Windows 11 MSVC SSE2 + AVX + AVX2 + AVX-512 all ✅

Build is clean (zero warnings) on all targets. No timing-sensitive tests run for this PR; those remain for a quiet-machine session.


Conversation: https://app.warp.dev/conversation/aed97ef4-35cf-4a9d-ab39-264b97f3c2f4
Code quality plan: https://app.warp.dev/drive/notebook/TppMrlBw7MwyC1NVBAyMMu

OldCrow and others added 8 commits May 5, 2026 21:11
Phase 0 — Compiler default (ABI correctness)
- CMakeLists.txt: detect Homebrew LLVM but do not apply it by default;
  expose -DLIBSTATS_USE_HOMEBREW_LLVM=ON as an explicit opt-in
- Document the ABI boundary risk (C++ exception type-info mismatch)
  in the option comment so it is visible at configure time
- Default path uses system AppleClang + Apple libc++ + GCD parallelism,
  which is ABI-safe for all consumers including Python extensions

Phase 1.1 — cpu_detection.cpp: brand-string memcpy fix
- Each CPUID call returns four 4-byte registers (eax/ebx/ecx/edx) = 16 bytes
- Assemble via uint32_t regs[4] per call then memcpy 16 bytes from regs
- Previous code passed &eax (4 bytes) as 16-byte source

Phase 1.2 — libstats_init.cpp: data race on fast-path flag
- Replace plain static bool with static std::atomic<bool>
- Use memory_order_acquire loads and memory_order_release store
  to satisfy the C++ memory model for double-checked locking

Phase 1.3 — performance_history.cpp: duplicate WORK_STEALING entries
- getBestStrategy(): remove duplicate Strategy::WORK_STEALING entry
  in the strategy iterator list (leftover from GPU_ACCELERATED removal)
- learnOptimalThresholds(): remove dead else-if branch that checked
  the same WORK_STEALING key string a second time

Phase 1.4 — distribution files: remove 12 redundant no-op assignments
- Pattern: if (strategy == WORK_STEALING) { strategy = WORK_STEALING; }
- Appears in getProbabilityWithStrategy / getLogProbabilityWithStrategy /
  getCumulativeProbabilityWithStrategy in discrete.cpp, exponential.cpp,
  gamma.cpp, uniform.cpp (3 sites each, 12 total)
- Remnant of GPU_ACCELERATED -> WORK_STEALING rename; always a no-op

Catalina attribute warning fix
- Add LIBSTATS_LIKELY / LIBSTATS_UNLIKELY macros to utility_common.h
  using __has_cpp_attribute(likely) for a portable, forward-compatible check
- Replace five bare [[likely]] / [[unlikely]] uses in math_utils.h
- Eliminates -Wunknown-attributes noise on AppleClang 12 (Catalina)
  while preserving the hint on compilers that support it

Build verified: AppleClang 12, SIMD SSE2+AVX (correct for Ivy Bridge),
GCD parallel path active. 32/32 tests pass, 0 warnings.

Co-Authored-By: Oz <oz-agent@warp.dev>
policy_level is only referenced inside assert(), which is compiled out
under -DNDEBUG. Add [[maybe_unused]] consistent with all other similarly
NDEBUG-guarded variables in this test file.

Found on Kaby Lake/Ventura (AppleClang 15, Release build).

Co-Authored-By: Oz <oz-agent@warp.dev>
Root cause: the AVX-512 test code strings used CMake \\\\n escape sequences
(double-backslash-n), which evaluate to literal backslash-n characters when
written to the temporary test source file. The #include directive on the
same physical line as the following code caused a compile error, producing
the misleading COMPILE FAILED message during cmake configure.

SSE2, AVX, and AVX2 tests all used properly-formatted multi-line CMake
string literals and were unaffected.

Changes in cmake/SIMDDetection.cmake:
- MSVC path: replace test_runtime_cpu_feature with check_cxx_source_runs
  (compile + execute) following the libhmm pattern. check_cxx_compiler_flag
  only tests that MSVC accepts the flag syntax, not that the build machine
  CPU can execute the instructions. CMAKE_REQUIRED_FLAGS is set to
  /arch:AVX512 before the check and restored afterwards. Uses <intrin.h>
  which is the correct MSVC intrinsics header.
- Non-MSVC path: rewrite the test code as a proper multi-line CMake string
  literal (real newlines). Also fixes an indentation error on the endif().

Validated on Windows MSVC 19.44 (Asus TUF A16 Ryzen 7000/Zen 4): AVX-512
now detected correctly (CPU verified), 30/30 tests pass.

Co-Authored-By: Oz <oz-agent@warp.dev>
Two bugs in the GTest detection inside if(LIBSTATS_BUILD_TESTS):

1. The Homebrew-specific detection block ran unconditionally on all
   platforms. On Windows it defaulted GTEST_ROOT to
   /opt/homebrew/opt/googletest, found nothing, and overwrote
   GTEST_FOUND=FALSE, discarding whatever the Windows-specific block
   earlier in the file had set.

2. No FetchContent fallback existed, so any machine without a system
   GTest package (including this Windows machine, where GTest is only
   available as a libhmm FetchContent download) silently skipped all
   GTest-based tests.

Fix (matches libhmm tests/CMakeLists.txt pattern):
- Step 1: plain find_package(GTest QUIET) -- covers vcpkg, system
  installs, and previously cached results on all platforms.
- Step 2: Apple only -- probe the architecture-appropriate Homebrew path
  and fall back to manual .a file check. Guarded by if(APPLE).
- Step 3: FetchContent fallback using GTest v1.17.0 (same tag as libhmm).
  Sets gtest_force_shared_crt=ON on Windows to match the project CRT.

Result on Windows MSVC: FetchContent fetches GTest v1.17.0; 33/33 tests
pass (up from 30/30), with test_math_comprehensive,
test_performance_history, and test_performance_initialization now included.

Co-Authored-By: Oz <oz-agent@warp.dev>
cmake/SIMDDetection.cmake:
- Extract apply_simd_source_flags() from configure_simd_target(). Source
  file properties are file-global; nine redundant applications reduced
  to one. configure_simd_target() now only links the SIMD interface target.

CMakeLists.txt:
- Restrict Homebrew LLVM detection to if(APPLE). The APPLE OR UNIX guard
  ran find_program(CMAKE_CXX_COMPILER clang++) on Linux before project(),
  silently forcing Clang over GCC on a fresh configure.
- Drop FORCE from CMAKE_BUILD_PARALLEL_LEVEL and guard with
  if(NOT DEFINED) so user-provided values are not overridden.
- Add PARENT_SCOPE to set(CMAKE_CXX_FLAGS ... TBB_CFLAGS_OTHER) in all
  three threading detection functions; without it the flags were lost
  when the function returned.
- Remove redundant Windows-only find_package(GTest) block (gave a
  misleading skip message after FetchContent was added).
- Remove unused option LIBSTATS_CONSERVATIVE_SIMD.
- Remove unused variable LIBSTATS_MSVC_ODR_LINKER_FLAGS.
- Delete cmake/DetectAVX512.cmake: never included, has write-after-read bug.

Validated: configure clean, build succeeds, 33/33 tests pass (MSVC 19.44).

Co-Authored-By: Oz <oz-agent@warp.dev>
Timing test failures (GCC 11/12, MSVC, AppleClang, Coverage):
- ctest was running all tests including those labelled 'timing' and
  'benchmark', which contain speedup assertions calibrated for quiet
  dedicated hardware
- CI runners are shared and under contention; parallel execution makes
  the assertions unreliable by design (documented in WARP.md)
- Fix: add -LE "timing|benchmark" to both ctest invocations

Clang 14/15 build failures (ubuntu-latest):
- ubuntu-latest is now Ubuntu 24.04 with GCC 14 as the system runtime
- GCC 14's <chrono> made hh_mm_ss::_S_fractional_width consteval; Clang
  versions below 17 cannot evaluate it at compile time in C++20 mode
- This is a known incompatibility fixed in Clang 17
- Fix: bump matrix from clang-14/clang-15 to clang-16/clang-17

Co-Authored-By: Oz <oz-agent@warp.dev>
…ore-errors

test_math_comprehensive contains vectorized math speedup assertions that are
timing-sensitive. Add it to the 'timing' label group so -LE "timing|benchmark"
correctly excludes it from parallel CI runs, consistent with all other GTest
tests that have speedup assertions.

Coverage: lcov --capture was failing on missing .gcno files from CMakeTmp probe
artifacts created during configure-time SIMD detection (SIMDDetection.cmake
compiles small test programs in CMakeTmp). Those .gcda files exist at coverage
collection time but their .gcno companions are cleaned up after cmake completes.
Add --ignore-errors gcov to the lcov capture step to skip these transient files.

Co-Authored-By: Oz <oz-agent@warp.dev>
GTest macro-expanded test body symbols (e.g. _Test8TestBodyEv) can produce
mismatched DWARF start/end line ranges in gcov output. This is a known false
positive in lcov/geninfo when processing GTest macro-heavy code, not a source
error. Add mismatch to the existing ignore-errors list alongside gcov.

Co-Authored-By: Oz <oz-agent@warp.dev>
@OldCrow OldCrow merged commit 4c4ed9d into main May 6, 2026
26 checks passed
@OldCrow OldCrow deleted the fix/appleclang-default branch May 6, 2026 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant