Skip to content

Commit aaa6322

Browse files
author
peng.li24
committed
enforce: lock internal headers behind NUMPYCPP_INTERNAL_INCLUDE guard
All 4 arch/OS-specific implementation headers now cause a hard compile error if included directly — external callers must only use #include "numpy/core.h". Files locked: npy_math_float.h — float32 numpy polynomial kernels (numpy-internal constants) svml_bridge.h — SVML/npy scalar bridge (x86_64 + Linux only) blas_bridge.h — OpenBLAS ILP64 bridge (x86_64 + Linux only) avx512_loops.h — AVX-512F template specializations Mechanism: core.h defines NUMPYCPP_INTERNAL_INCLUDE before pulling in the internal headers, then #undef-s it at the end so the macro cannot leak into the caller's translation unit. Each internal header opens with: #ifndef NUMPYCPP_INTERNAL_INCLUDE # error "... do not include directly. Use #include "numpy/core.h"" #endif This is the standard C++ header-only library pattern for API boundary enforcement (same as used by Abseil, Boost.Asio, etc.). core.h file comment updated to list all 4 locked headers and clarify: Public API: namespace numpy:: Internal API: namespace numpy::detail:: — DO NOT CALL DIRECTLY
1 parent 8e45c51 commit aaa6322

5 files changed

Lines changed: 105 additions & 38 deletions

File tree

numpy/avx512_loops.h

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
1-
// INTERNAL HEADER — included at the bottom of core.h, inside namespace numpy.
2-
// DO NOT include directly.
1+
// ╔══════════════════════════════════════════════════════════════════════════╗
2+
// ║ INTERNAL HEADER — DIRECT INCLUSION IS A COMPILE ERROR ║
3+
// ║ ║
4+
// ║ This file contains AVX-512 template specializations that override the ║
5+
// ║ generic loops in core.h. It is x86_64 + AVX-512F specific and must ║
6+
// ║ be included INSIDE namespace numpy at the end of core.h — nowhere else.║
7+
// ║ ║
8+
// ║ ✗ #include "numpy/avx512_loops.h" ← compile error ║
9+
// ║ ✓ #include "numpy/core.h" ← only correct entry point ║
10+
// ╚══════════════════════════════════════════════════════════════════════════╝
311
//
412
// AVX-512 wide-loop specializations for array math functions.
513
//
@@ -18,6 +26,12 @@
1826
// Previously these called noinline helpers → 32768 call/returns per 524k array.
1927

2028
#pragma once
29+
30+
#ifndef NUMPYCPP_INTERNAL_INCLUDE
31+
# error "avx512_loops.h is an internal header — do not include directly. \
32+
Use #include \"numpy/core.h\" instead."
33+
#endif
34+
2135
#ifdef __AVX512F__
2236
#include <immintrin.h>
2337

numpy/blas_bridge.h

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,15 @@
1-
// INTERNAL HEADER — auto-included by core.h and linalg.h.
2-
// DO NOT include directly.
1+
// ╔══════════════════════════════════════════════════════════════════════════╗
2+
// ║ INTERNAL HEADER — DIRECT INCLUSION IS A COMPILE ERROR ║
3+
// ║ ║
4+
// ║ This file wraps OpenBLAS ILP64 (Linux x86_64 only) via dlsym/dlopen. ║
5+
// ║ All symbols live in numpy::detail — an implementation namespace that ║
6+
// ║ external code must never reference. ║
7+
// ║ ║
8+
// ║ ✗ #include "numpy/blas_bridge.h" ← compile error ║
9+
// ║ ✗ numpy::detail::blas_sdot(...) ← undefined behaviour ║
10+
// ║ ✓ #include "numpy/core.h" ← only correct entry point ║
11+
// ║ ✓ numpy::dot(a, b, n) ← public API ║
12+
// ╚══════════════════════════════════════════════════════════════════════════╝
313
//
414
// BLAS bridge — bit-exact dot/norm vs numpy's OpenBLAS-backed np.dot /
515
// np.linalg.norm (without axis).
@@ -22,6 +32,11 @@
2232

2333
#pragma once
2434

35+
#ifndef NUMPYCPP_INTERNAL_INCLUDE
36+
# error "blas_bridge.h is an internal header — do not include directly. \
37+
Use #include \"numpy/core.h\" instead."
38+
#endif
39+
2540
#include <cstdint>
2641
#include <cmath>
2742
#include <dlfcn.h>

numpy/core.h

Lines changed: 35 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,29 @@
1-
// Native C++ implementations — zero pybind11 dependency.
2-
// All functions operate on raw pointers + sizes.
1+
// ════════════════════════════════════════════════════════════════════════════
2+
// numpycpp — public C++ API (zero pybind11 dependency)
3+
// The ONLY header external code should include:
34
//
4-
// Usable by any C++ project via #include "numpy/core.h"
5+
// #include "numpy/core.h"
56
//
6-
// Convention: each function is annotated with its Python numpy equivalent,
7-
// e.g. /// numpy.sqrt(x, /, out=None, *, where=True, ...)
7+
// Public namespace: numpy:: e.g. numpy::exp(src, dst, n)
8+
// Internal namespace: numpy::detail:: ← DO NOT CALL DIRECTLY
89
//
9-
// Acceleration (安全优化,保持 bit-exact 对齐):
10-
// - Loop unrolling (4x) for element-wise functions
11-
// - Stack allocation for small buffers (n ≤ 128)
12-
// - Reusable fiber buffer in axis reductions
13-
// - Fused multiply-accumulate in norm_sq/dot
10+
// The four internal headers pulled in below are LOCKED behind
11+
// NUMPYCPP_INTERNAL_INCLUDE and will cause a #error if included directly:
12+
// • svml_bridge.h — SVML/npy scalar bridge (x86_64 + Linux)
13+
// • blas_bridge.h — OpenBLAS ILP64 bridge (x86_64 + Linux)
14+
// • npy_math_float.h— float32 poly kernels (numpy internal constants)
15+
// • avx512_loops.h — AVX-512 specializations (requires AVX-512F CPU)
16+
//
17+
// All functions operate on raw pointers + sizes.
18+
// Each function is annotated with its Python numpy equivalent,
19+
// e.g. /// numpy.sqrt(x, /, out=None, *, where=True, ...)
20+
//
21+
// Acceleration (安全优化,保持 bit-exact 对齐):
22+
// - Loop unrolling (4x) for element-wise functions
23+
// - Stack allocation for small buffers (n ≤ 128)
24+
// - Reusable fiber buffer in axis reductions
25+
// - Fused multiply-accumulate in norm_sq/dot
26+
// ════════════════════════════════════════════════════════════════════════════
1427

1528
#pragma once
1629

@@ -22,8 +35,15 @@
2235
#include <cstddef>
2336
#include <stdexcept>
2437

25-
#include "svml_bridge.h"
26-
#include "blas_bridge.h"
38+
// ── Internal headers ─────────────────────────────────────────────────────────
39+
// These files contain arch/OS-specific implementations (SVML/AVX-512/BLAS/npy).
40+
// They MUST NOT be included directly by external code.
41+
// The macro below is the compile-time lock; it is #undef-ed at the end of this
42+
// file so it cannot "leak" into translation units that include core.h.
43+
#define NUMPYCPP_INTERNAL_INCLUDE
44+
#include "svml_bridge.h" // numpy::detail::{exp,log,sin,...}_f32/f64 — SVML/npy
45+
#include "blas_bridge.h" // numpy::detail::blas_ops<T> — OpenBLAS ILP64
46+
// avx512_loops.h included at namespace-close (line ~1004), also guarded.
2747

2848
namespace numpy {
2949

@@ -1003,4 +1023,7 @@ inline void norm_axis(const T* src, T* dst, const ptrdiff_t* shape, int ndim, in
10031023
// ============================================================================
10041024
#include "avx512_loops.h"
10051025

1026+
// Release the internal-include lock so it does not pollute the includer's TU.
1027+
#undef NUMPYCPP_INTERNAL_INCLUDE
1028+
10061029
} // namespace numpy

numpy/npy_math_float.h

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,21 @@
1-
// INTERNAL HEADER — DO NOT INCLUDE DIRECTLY.
2-
// Use #include "numpy/core.h" which pulls this in automatically.
3-
//
4-
// All functions live in numpy::detail — do not call directly.
5-
// Use numpy::exp() etc. from core.h.
6-
//
7-
// Bit-exact float32 math matching numpy 1.23.5 SIMD polynomial approximations.
8-
// Replicates numpy's simd_exp_FLOAT, simd_log_FLOAT, simd_sincos_f32 algorithms.
1+
// ╔══════════════════════════════════════════════════════════════════════════╗
2+
// ║ INTERNAL HEADER — DIRECT INCLUSION IS A COMPILE ERROR ║
3+
// ║ ║
4+
// ║ This file implements arch/OS-specific float32 polynomial kernels that ║
5+
// ║ are tied to numpy's internal SIMD constants. The API is UNSTABLE and ║
6+
// ║ subject to change without notice. ║
7+
// ║ ║
8+
// ║ ✗ #include "numpy/npy_math_float.h" ← compile error ║
9+
// ║ ✓ #include "numpy/core.h" ← only correct entry point ║
10+
// ╚══════════════════════════════════════════════════════════════════════════╝
911

1012
#pragma once
1113

14+
#ifndef NUMPYCPP_INTERNAL_INCLUDE
15+
# error "npy_math_float.h is an internal header — do not include directly. \
16+
Use #include \"numpy/core.h\" instead."
17+
#endif
18+
1219
#include <cstdint>
1320
#include <cstring>
1421
#include <cmath>

numpy/svml_bridge.h

Lines changed: 22 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,31 @@
1-
// INTERNAL HEADER — DO NOT INCLUDE DIRECTLY.
2-
// Use #include "numpy/core.h" which pulls this in automatically.
3-
//
4-
// All functions live in numpy::detail — do not call numpy::detail::exp()
5-
// directly. Use numpy::exp() from core.h.
1+
// ╔══════════════════════════════════════════════════════════════════════════╗
2+
// ║ INTERNAL HEADER — DIRECT INCLUSION IS A COMPILE ERROR ║
3+
// ║ ║
4+
// ║ This file bridges numpycpp to numpy's SVML / npy_* scalar kernels. ║
5+
// ║ It is x86_64 + Linux specific (dlsym, /proc/self/maps, AVX-512). ║
6+
// ║ All symbols live in numpy::detail — an implementation namespace that ║
7+
// ║ external code must never reference. ║
8+
// ║ ║
9+
// ║ ✗ #include "numpy/svml_bridge.h" ← compile error ║
10+
// ║ ✗ numpy::detail::exp_svml_f64(x) ← undefined behaviour ║
11+
// ║ ✓ #include "numpy/core.h" ← only correct entry point ║
12+
// ║ ✓ numpy::exp(src, dst, n) ← public API ║
13+
// ╚══════════════════════════════════════════════════════════════════════════╝
614
//
715
// SVML/npy bridge — bit-exact math on every x86_64 architecture.
8-
//
9-
// numpy uses different math implementations depending on CPU features:
10-
// AVX-512 HW → __svml_exp8 (SVML vector) → resolves via dlsym
11-
// non-AVX-512 → npy_exp (scalar) → resolves via dlsym
12-
//
13-
// This header detects CPU features at RUNTIME and selects the matching path.
14-
// AVX-512 intrinsics are isolated behind __attribute__((target("avx512f")))
15-
// so the binary is safe on non-AVX-512 CPUs — no SIGILL.
16-
//
16+
// AVX-512 HW → __svml_exp8 (SVML vector) → resolved via dlsym
17+
// non-AVX-512 → npy_exp (scalar) → resolved via dlsym
18+
// CPU feature detection is at RUNTIME; AVX-512 intrinsics are isolated behind
19+
// __attribute__((target("avx512f"))) — safe on non-AVX-512 CPUs (no SIGILL).
1720
// The .so path is auto-discovered via /proc/self/maps — no manual init needed.
1821

1922
#pragma once
2023

24+
#ifndef NUMPYCPP_INTERNAL_INCLUDE
25+
# error "svml_bridge.h is an internal header — do not include directly. \
26+
Use #include \"numpy/core.h\" instead."
27+
#endif
28+
2129
#include <cmath>
2230
#include <cstdio>
2331
#include <dlfcn.h>

0 commit comments

Comments
 (0)