Skip to content

Rewrite QBLAS as a dispatched C library with per-tier SIMD kernels#14

Closed
SwayamInSync wants to merge 37 commits into
mainfrom
overhaul-rewrite
Closed

Rewrite QBLAS as a dispatched C library with per-tier SIMD kernels#14
SwayamInSync wants to merge 37 commits into
mainfrom
overhaul-rewrite

Conversation

@SwayamInSync
Copy link
Copy Markdown
Owner

@SwayamInSync SwayamInSync commented May 14, 2026

No description provided.

clang on macOS builds without OpenMP, exposing four kinds of warnings
that gcc/clang-on-Linux happen to elide:

  - ts_ns() in qblas_cpu.c is only referenced from a _OPENMP-gated
    overhead probe; guard it so it doesn't compile as dead code.
  - level1 threshold comparisons compared a signed int against a size_t
    field; widen to size_t.
  - cblas_qger / trsm_left_diag / trmm_left_diag declared nthreads
    outside their _OPENMP block; move the declaration inside.

test_against_numpy was failing on macOS because find_package(Python3)
captured the framework python at configure time and missed the venv
where numpy is installed. The CI now exports VIRTUAL_ENV alongside the
PATH update, and tests/CMakeLists.txt checks 'import numpy' at
configure time, skipping the test (with a status message) if numpy is
unreachable from the chosen interpreter.
The ctypes loader hardcoded libtlfloat.so.1, libsleef.so, etc, which
on macOS don't exist - SLEEF installs libtlfloat.1.dylib, libsleef.dylib,
and libsleefquad.dylib. Replace the hardcoded names with a per-platform
candidate list that tries the versioned soname first and falls back to
the unversioned symlink. The QBLAS_TEST_* env vars still let callers
override individual paths.

Also extend the CI matrix to cover macos-15 alongside macos-14 so the
dylib resolution path is exercised on both runners.
Replace the per-platform extension lists with a directory scan that
picks any libfoo.{so,dylib}* file matching the stem, preferring the
shortest (canonical) name. ctypes.CDLL needs an explicit path because
the libs live under .sleef-prefix/lib/ which isn't on the system
loader's search path, but we don't have to spell out which extensions
exist - whichever the bootstrap installed will be picked up.
Set DYLD_LIBRARY_PATH / LD_LIBRARY_PATH in the test environment so the
dynamic loader can resolve libqblas, libqblas_shim, and the SLEEF deps
by basename. ctypes.CDLL then just hands the loader a name and gets
back a loaded handle - no directory scanning, no platform-specific
filename lists, no env-var indirection in Python. The only platform
sniff left is the .so/.dylib suffix, which CPython's own ctypes does
the same way.
When _POSIX_C_SOURCE is defined (it is, both from the source and from
the build system's -D_POSIX_C_SOURCE=200809L), Apple's <unistd.h>
hides BSD/Darwin extensions including _SC_NPROCESSORS_ONLN. The
constant is in POSIX-1.2008 in principle but Apple keeps it gated on
_DARWIN_C_SOURCE.

Define _DARWIN_C_SOURCE inside the __APPLE__ guard so the macro
becomes visible. Also wrap the sysconf call in #ifdef so a hypothetical
platform without the constant falls back to single-core instead of
breaking the build. Linux is unaffected (the apple guard is dead code
there).

Surfaced by the numpy-quaddtype macOS-arm64 wheel build, which built
QBLAS as a subproject under cibuildwheel.
@SwayamInSync SwayamInSync added the enhancement New feature or request label May 14, 2026
QBLAS 1.0.0 was the legacy header-only template implementation that
shipped on the main branch; this is the compiled-library rewrite with
runtime CPU dispatch, CBLAS-style C ABI, and the perf gains documented
in perf_comparison_with_old.md.
@SwayamInSync SwayamInSync deleted the overhaul-rewrite branch May 14, 2026 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant