Skip to content

Commit d842caa

Browse files
author
peng.li24
committed
feat: add advanced indexing (take/compress/slice/put/putmask/slice_assign); align API naming with numpy
- numpy.take: gather along any axis or flattened (axis=None) - numpy.compress: boolean mask gather → 1-D output - numpy.slice / slice_assign: N-D slicing with arbitrary step (replaces nd_slice) - numpy.put: scatter values at flat indices (wrapping) - numpy.putmask: boolean mask assignment, scalar and array overloads (replaces boolean_assign) - Rename nd_slice → slice, boolean_assign → putmask, nd_slice_assign → slice_assign for consistent numpy-aligned naming across core.h / core_py.h / module.cpp / test_all.py - 108 new bit-exact tests; total now 900 (float64 + float32)
1 parent 4e85157 commit d842caa

5 files changed

Lines changed: 1136 additions & 5 deletions

File tree

README.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
55
[![C++17](https://img.shields.io/badge/C%2B%2B-17-blue.svg)](https://en.cppreference.com/w/cpp/17)
66
[![CMake](https://img.shields.io/badge/CMake-%3E%3D3.16-green.svg)](https://cmake.org/)
7-
[![Tests](https://img.shields.io/badge/tests-792%20bit--exact-brightgreen.svg)](tests/test_all.py)
7+
[![Tests](https://img.shields.io/badge/tests-900%20bit--exact-brightgreen.svg)](tests/test_all.py)
88
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md)
99

1010
## Background
@@ -17,7 +17,7 @@ We created `numpycpp` to keep NumPy's familiar usage patterns while letting C++
1717

1818
`numpycpp` is a **header-only C++ library** implementing numpy's core API (`numpy.*`, `numpy.linalg.*`, `numpy.einsum`) with **bit-level precision alignment**. Raw pointer + size interface. Zero external dependencies — pure C++17 standard library.
1919

20-
All APIs are tested against Python numpy under strict bit-level comparison: every IEEE 754 float bit must match exactly (754 tests, float64 + float32, including NaN passthrough, signed-zero, ±∞, and domain-error cases).
20+
All APIs are tested against Python numpy under strict bit-level comparison: every IEEE 754 float bit must match exactly (900 tests, float64 + float32, including NaN passthrough, signed-zero, ±∞, domain-error cases, and advanced indexing).
2121

2222
**Bit-exact math** is achieved by resolving numpy's own math functions from `_multiarray_umath.so` at runtime. The SVML bridge auto-detects your CPU and selects the same path numpy uses: AVX‑512 SVML (`__svml_exp8`) when available, or scalar `npy_exp`/`npy_log`/etc. otherwise. AVX‑512 intrinsics are isolated behind `__attribute__((target))` — the binary is safe on any x86_64 CPU (no SIGILL). Every transcendental function produces the exact same IEEE 754 bits as numpy on **all architectures**.
2323

@@ -93,7 +93,7 @@ Add `-Ipath/to/numpycpp` to your compiler flags and include the headers directly
9393

9494
The test suite verifies **bit-level precision alignment** between every C++ function and Python numpy.
9595
No tolerance, no `atol`/`rtol` — raw IEEE 754 bits must match exactly.
96-
754 tests: float64 + float32, including NaN passthrough, signed-zero, ±∞, domain errors, and AVX-512 boundary sizes.
96+
900 tests: float64 + float32, including NaN passthrough, signed-zero, ±∞, domain errors, advanced indexing, and AVX-512 boundary sizes.
9797

9898
```bash
9999
# build
@@ -173,7 +173,7 @@ target_compile_options(<target> PRIVATE
173173
### Alignment status
174174

175175
The table below reflects the current bit-level parity between `numpycpp` C++ and Python numpy.
176-
All 754 tests pass under strict IEEE 754 bit comparison (float64 + float32).
176+
All 900 tests pass under strict IEEE 754 bit comparison (float64 + float32).
177177

178178
✅ = bit-exact on ALL architectures (SVML bridge with runtime CPU dispatch).
179179

@@ -185,6 +185,7 @@ All 754 tests pass under strict IEEE 754 bit comparison (float64 + float32).
185185
| Logical ||| bool-only (and/or/not/xor) |
186186
| Special values ||| isnan, isinf, isfinite |
187187
| Manipulation ||| diff, stack, concatenate, transpose, slice, roll, flip, repeat, tile, where |
188+
| **Advanced indexing** ||| take (any axis), compress (bool mask gather), nd_slice (step/reverse), put, boolean_assign, nd_slice_assign |
188189
| Sorting ||| argsort, argmax, argmin |
189190
| Setops / interp ||| isin, intersect1d, interp, safe_divide |
190191
| Access / convert ||| array_get, asarray, to_vector |
@@ -226,7 +227,7 @@ numpycpp/
226227
│ └── einsum_py.h
227228
├── tests/ # bit-level precision tests + test module
228229
│ ├── module.cpp # pybind11 module for testing
229-
│ ├── test_all.py # single entry — all APIs, 754 tests, float64+float32
230+
│ ├── test_all.py # single entry — all APIs, 900 tests, float64+float32
230231
│ ├── conftest.py # silent-mode output suppression
231232
│ └── CMakeLists.txt # test-module build
232233
├── CMakeLists.txt # build & .deb packaging

numpy/core.h

Lines changed: 224 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -796,6 +796,230 @@ inline ptrdiff_t argmin(const T* data, size_t n) {
796796
return mi;
797797
}
798798

799+
// ============================================================================
800+
// Advanced / Fancy indexing
801+
// ============================================================================
802+
803+
/// Compute the number of elements produced by a slice(start, stop, step).
804+
/// Pre-condition: step != 0, and start/stop are already normalized
805+
/// (integers, with -1 allowed as the "before index 0" sentinel for negative step).
806+
///
807+
/// Matches Python's len(range(start, stop, step)) exactly:
808+
/// • step > 0: ceil((stop - start) / step) = max(0, (stop-start+step-1)/step)
809+
/// • step < 0: ceil((start - stop) / -step) = max(0, (start-stop-step-1)/(-step))
810+
///
811+
/// Example: slice(9, -1, -1).indices(10) → start=9, stop=-1, step=-1 → 10 elements
812+
inline ptrdiff_t slice_len(ptrdiff_t start, ptrdiff_t stop, ptrdiff_t step) noexcept {
813+
if (step > 0) {
814+
return (stop > start) ? (stop - start + step - 1) / step : 0;
815+
} else {
816+
return (start > stop) ? (start - stop - step - 1) / (-step) : 0;
817+
}
818+
}
819+
820+
/// numpy.take(a, indices, axis=None)
821+
/// axis = -1 means axis=None (gather from flattened array).
822+
/// Indices may be negative; they are wrapped around the axis dimension size.
823+
///
824+
/// For axis ≥ 0 the output shape is:
825+
/// shape[:axis] + (ni,) + shape[axis+1:]
826+
/// For axis = -1 (None) the output is flat with shape (ni,).
827+
template<typename T>
828+
inline void take(const T* src, T* dst,
829+
const ptrdiff_t* shape, int ndim, int axis,
830+
const ptrdiff_t* indices, size_t ni) {
831+
if (ni == 0) return;
832+
833+
if (axis < 0) {
834+
// axis=None: treat src as flat, gather by flat indices
835+
ptrdiff_t total = 1;
836+
for (int d = 0; d < ndim; ++d) total *= shape[d];
837+
for (size_t k = 0; k < ni; ++k) {
838+
ptrdiff_t idx = indices[k];
839+
if (idx < 0) idx += total;
840+
dst[k] = src[idx];
841+
}
842+
return;
843+
}
844+
845+
// axis in [0, ndim)
846+
ptrdiff_t leading = 1;
847+
for (int d = 0; d < axis; ++d) leading *= shape[d];
848+
ptrdiff_t axis_size = shape[axis];
849+
ptrdiff_t trailing = 1;
850+
for (int d = axis + 1; d < ndim; ++d) trailing *= shape[d];
851+
852+
for (ptrdiff_t l = 0; l < leading; ++l) {
853+
const T* src_l = src + l * axis_size * trailing;
854+
T* dst_l = dst + l * static_cast<ptrdiff_t>(ni) * trailing;
855+
for (size_t k = 0; k < ni; ++k) {
856+
ptrdiff_t idx = indices[k];
857+
if (idx < 0) idx += axis_size;
858+
std::memcpy(dst_l + static_cast<ptrdiff_t>(k) * trailing,
859+
src_l + idx * trailing,
860+
static_cast<size_t>(trailing) * sizeof(T));
861+
}
862+
}
863+
}
864+
865+
/// numpy.compress(condition, a, axis=None)
866+
/// Gathers elements from src (treated as flat) where mask[i] == true.
867+
/// Returns the count of elements written to dst.
868+
template<typename T>
869+
inline size_t compress(const T* src, T* dst, const bool* mask, size_t n) {
870+
size_t cnt = 0;
871+
for (size_t i = 0; i < n; ++i)
872+
if (mask[i]) dst[cnt++] = src[i];
873+
return cnt;
874+
}
875+
876+
/// N-D slice with per-dimension start/stop/step (pre-normalized by caller;
877+
/// stop=-1 is the valid "before index 0" sentinel for negative-step slices).
878+
///
879+
/// Output shape[d] = slice_len(starts[d], stops[d], steps[d]).
880+
/// dst must be pre-allocated to product(output_shape) elements.
881+
///
882+
/// Overload of numpy::slice() that accepts per-dimension step vectors.
883+
/// Covers a[s0:e0:k0, s1:e1:k1, ...] for arbitrary N-D arrays.
884+
template<typename T>
885+
inline void slice(const T* src, T* dst,
886+
const ptrdiff_t* shape, int ndim,
887+
const ptrdiff_t* starts, const ptrdiff_t* stops,
888+
const ptrdiff_t* steps) {
889+
if (ndim == 0) return;
890+
891+
// Compute output shape
892+
std::vector<ptrdiff_t> out_shape(ndim);
893+
ptrdiff_t total = 1;
894+
for (int d = 0; d < ndim; ++d) {
895+
out_shape[d] = slice_len(starts[d], stops[d], steps[d]);
896+
if (out_shape[d] <= 0) return; // empty slice
897+
total *= out_shape[d];
898+
}
899+
900+
// Input strides (C-contiguous)
901+
std::vector<ptrdiff_t> in_stride(ndim);
902+
in_stride[ndim - 1] = 1;
903+
for (int d = ndim - 2; d >= 0; --d)
904+
in_stride[d] = in_stride[d + 1] * shape[d + 1];
905+
906+
// Iterate output in C-order, compute source flat index from multi-index
907+
for (ptrdiff_t out_idx = 0; out_idx < total; ++out_idx) {
908+
ptrdiff_t rem = out_idx;
909+
ptrdiff_t in_idx = 0;
910+
for (int d = ndim - 1; d >= 0; --d) {
911+
ptrdiff_t od = rem % out_shape[d];
912+
rem /= out_shape[d];
913+
in_idx += (starts[d] + od * steps[d]) * in_stride[d];
914+
}
915+
dst[out_idx] = src[in_idx];
916+
}
917+
}
918+
919+
/// N-D slice assignment with scalar: dst[slice] = value (in-place).
920+
/// Overload of slice_assign() that accepts per-dimension step vectors.
921+
/// starts/stops/steps are pre-normalized (same convention as slice()).
922+
template<typename T>
923+
inline void slice_assign(T* dst,
924+
const ptrdiff_t* shape, int ndim,
925+
const ptrdiff_t* starts, const ptrdiff_t* stops,
926+
const ptrdiff_t* steps, T value) {
927+
if (ndim == 0) return;
928+
929+
std::vector<ptrdiff_t> out_shape(ndim);
930+
ptrdiff_t total = 1;
931+
for (int d = 0; d < ndim; ++d) {
932+
out_shape[d] = slice_len(starts[d], stops[d], steps[d]);
933+
if (out_shape[d] <= 0) return;
934+
total *= out_shape[d];
935+
}
936+
937+
std::vector<ptrdiff_t> in_stride(ndim);
938+
in_stride[ndim - 1] = 1;
939+
for (int d = ndim - 2; d >= 0; --d)
940+
in_stride[d] = in_stride[d + 1] * shape[d + 1];
941+
942+
for (ptrdiff_t out_idx = 0; out_idx < total; ++out_idx) {
943+
ptrdiff_t rem = out_idx;
944+
ptrdiff_t in_idx = 0;
945+
for (int d = ndim - 1; d >= 0; --d) {
946+
ptrdiff_t od = rem % out_shape[d];
947+
rem /= out_shape[d];
948+
in_idx += (starts[d] + od * steps[d]) * in_stride[d];
949+
}
950+
dst[in_idx] = value;
951+
}
952+
}
953+
954+
/// N-D slice assignment with array: dst[slice] = values (in-place).
955+
/// Overload of slice_assign() with an array of values (one per slice element).
956+
/// values must contain exactly product(output_shape) elements in C-order.
957+
template<typename T>
958+
inline void slice_assign(T* dst,
959+
const ptrdiff_t* shape, int ndim,
960+
const ptrdiff_t* starts, const ptrdiff_t* stops,
961+
const ptrdiff_t* steps, const T* values) {
962+
if (ndim == 0) return;
963+
964+
std::vector<ptrdiff_t> out_shape(ndim);
965+
ptrdiff_t total = 1;
966+
for (int d = 0; d < ndim; ++d) {
967+
out_shape[d] = slice_len(starts[d], stops[d], steps[d]);
968+
if (out_shape[d] <= 0) return;
969+
total *= out_shape[d];
970+
}
971+
972+
std::vector<ptrdiff_t> in_stride(ndim);
973+
in_stride[ndim - 1] = 1;
974+
for (int d = ndim - 2; d >= 0; --d)
975+
in_stride[d] = in_stride[d + 1] * shape[d + 1];
976+
977+
for (ptrdiff_t out_idx = 0; out_idx < total; ++out_idx) {
978+
ptrdiff_t rem = out_idx;
979+
ptrdiff_t in_idx = 0;
980+
for (int d = ndim - 1; d >= 0; --d) {
981+
ptrdiff_t od = rem % out_shape[d];
982+
rem /= out_shape[d];
983+
in_idx += (starts[d] + od * steps[d]) * in_stride[d];
984+
}
985+
dst[in_idx] = values[out_idx];
986+
}
987+
}
988+
989+
/// numpy.put(a, indices, values, mode='raise')
990+
/// Scatters values into dst (flat) at the given indices.
991+
/// Negative indices are wrapped: idx < 0 → idx += n.
992+
/// Out-of-range indices are silently skipped (clip mode).
993+
template<typename T>
994+
inline void put(T* dst, size_t n,
995+
const ptrdiff_t* indices, const T* values, size_t ni) {
996+
for (size_t k = 0; k < ni; ++k) {
997+
ptrdiff_t idx = indices[k];
998+
if (idx < 0) idx += static_cast<ptrdiff_t>(n);
999+
if (idx >= 0 && idx < static_cast<ptrdiff_t>(n))
1000+
dst[static_cast<size_t>(idx)] = values[k];
1001+
}
1002+
}
1003+
1004+
/// numpy.putmask(a, mask, values) — scalar variant.
1005+
/// Sets dst[i] = value for every i where mask[i] is true.
1006+
/// Equivalent to Python: a[mask] = scalar
1007+
template<typename T>
1008+
inline void putmask(T* dst, const bool* mask, size_t n, T value) {
1009+
for (size_t i = 0; i < n; ++i)
1010+
if (mask[i]) dst[i] = value;
1011+
}
1012+
1013+
/// numpy.putmask(a, mask, values) — array variant.
1014+
/// Sets dst[i] = values[j++] for every i where mask[i] is true (sequential).
1015+
/// Equivalent to Python: a[mask] = array_of_values
1016+
template<typename T>
1017+
inline void putmask(T* dst, const bool* mask, size_t n, const T* values) {
1018+
size_t j = 0;
1019+
for (size_t i = 0; i < n; ++i)
1020+
if (mask[i]) dst[i] = values[j++];
1021+
}
1022+
7991023
// ============================================================================
8001024
// Set operations
8011025
// ============================================================================

0 commit comments

Comments
 (0)