Rewrite QBLAS as a dispatched C library with per-tier SIMD kernels#14
Closed
SwayamInSync wants to merge 37 commits into
Closed
Rewrite QBLAS as a dispatched C library with per-tier SIMD kernels#14SwayamInSync wants to merge 37 commits into
SwayamInSync wants to merge 37 commits into
Conversation
…-164× multi-thread wins
…ME rewrite; PERF_HEADROOM.md
…m/subproject split)
clang on macOS builds without OpenMP, exposing four kinds of warnings
that gcc/clang-on-Linux happen to elide:
- ts_ns() in qblas_cpu.c is only referenced from a _OPENMP-gated
overhead probe; guard it so it doesn't compile as dead code.
- level1 threshold comparisons compared a signed int against a size_t
field; widen to size_t.
- cblas_qger / trsm_left_diag / trmm_left_diag declared nthreads
outside their _OPENMP block; move the declaration inside.
test_against_numpy was failing on macOS because find_package(Python3)
captured the framework python at configure time and missed the venv
where numpy is installed. The CI now exports VIRTUAL_ENV alongside the
PATH update, and tests/CMakeLists.txt checks 'import numpy' at
configure time, skipping the test (with a status message) if numpy is
unreachable from the chosen interpreter.
The ctypes loader hardcoded libtlfloat.so.1, libsleef.so, etc, which on macOS don't exist - SLEEF installs libtlfloat.1.dylib, libsleef.dylib, and libsleefquad.dylib. Replace the hardcoded names with a per-platform candidate list that tries the versioned soname first and falls back to the unversioned symlink. The QBLAS_TEST_* env vars still let callers override individual paths. Also extend the CI matrix to cover macos-15 alongside macos-14 so the dylib resolution path is exercised on both runners.
Replace the per-platform extension lists with a directory scan that
picks any libfoo.{so,dylib}* file matching the stem, preferring the
shortest (canonical) name. ctypes.CDLL needs an explicit path because
the libs live under .sleef-prefix/lib/ which isn't on the system
loader's search path, but we don't have to spell out which extensions
exist - whichever the bootstrap installed will be picked up.
Set DYLD_LIBRARY_PATH / LD_LIBRARY_PATH in the test environment so the dynamic loader can resolve libqblas, libqblas_shim, and the SLEEF deps by basename. ctypes.CDLL then just hands the loader a name and gets back a loaded handle - no directory scanning, no platform-specific filename lists, no env-var indirection in Python. The only platform sniff left is the .so/.dylib suffix, which CPython's own ctypes does the same way.
When _POSIX_C_SOURCE is defined (it is, both from the source and from the build system's -D_POSIX_C_SOURCE=200809L), Apple's <unistd.h> hides BSD/Darwin extensions including _SC_NPROCESSORS_ONLN. The constant is in POSIX-1.2008 in principle but Apple keeps it gated on _DARWIN_C_SOURCE. Define _DARWIN_C_SOURCE inside the __APPLE__ guard so the macro becomes visible. Also wrap the sysconf call in #ifdef so a hypothetical platform without the constant falls back to single-core instead of breaking the build. Linux is unaffected (the apple guard is dead code there). Surfaced by the numpy-quaddtype macOS-arm64 wheel build, which built QBLAS as a subproject under cibuildwheel.
QBLAS 1.0.0 was the legacy header-only template implementation that shipped on the main branch; this is the compiled-library rewrite with runtime CPU dispatch, CBLAS-style C ABI, and the perf gains documented in perf_comparison_with_old.md.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.