A modern C++20 statistical distributions library demonstrating how to build statistical software correctly β with genuine SIMD vectorization, parallel dispatch, thread safety, and zero external dependencies.
π Complete Documentation: For detailed information about building, architecture, parallel processing, and platform support, see the comprehensive guides below.
- PDF/CDF/Quantiles: Full probability density, cumulative distribution, and quantile functions
- Statistical Moments: Mean, variance, skewness, kurtosis with thread-safe access
- Random Sampling: Integration with std:: distributions for high-quality random number generation
- Parameter Estimation: Maximum Likelihood Estimation (MLE) with comprehensive diagnostics
- Statistical Validation: KS and AD Goodness-of-Fit, model selection
- Gaussian (Normal): N(ΞΌ, ΟΒ²)
- Exponential: Exp(Ξ»)
- Uniform: U(a, b)
- Poisson: P(Ξ»)
- Discrete: Custom discrete distributions with arbitrary support
- Gamma: Ξ(Ξ±, Ξ²)
- Chi-squared: ΟΒ²(Ξ½)
- Student's t: t(Ξ½)
- Beta: Beta(Ξ±, Ξ²)
- Thread-Safe: Concurrent read access with safe cache management
- Zero Dependencies: Only standard library required
- SIMD Optimized: Vectorized operations for bulk calculations
- Memory Safe: RAII principles and smart pointer usage
- Exception Safe: Robust error handling throughout
- C++20 Concepts: Type-safe mathematical function interfaces
- Parallel Processing: Traditional and work-stealing thread pools
- Memory Safety: Comprehensive bounds checking and overflow protection
- Numerical Stability: Safe mathematical operations and edge case handling
- Error Recovery: Multiple strategies for handling numerical failures
- Convergence Detection: Advanced monitoring for iterative algorithms
- Diagnostics: Automated numerical health assessment
- Goodness-of-Fit Tests: Kolmogorov-Smirnov, Anderson-Darling (β implemented)
- Model Selection: AIC/BIC information criteria (β implemented)
- Residual Analysis: Standardized residuals and diagnostics (β implemented)
- Cross-Validation: K-fold validation framework (β implemented)
- SIMD Operations: Vectorized statistical computations with cross-platform detection
- Parallel Processing: Both traditional and work-stealing thread pools
- C++20 Parallel Algorithms: Safe wrappers for
std::executionpolicies - Cache Optimization: Thread-safe caching with lock-free fast paths
π Cross-Platform SIMD Support: Automatic detection and optimization for SSE2/AVX/AVX2/AVX-512/NEON instruction sets with runtime safety verification. Validated on Intel (Ivy Bridge/Kaby Lake), Apple Silicon (M1/NEON), AMD Ryzen Zen 4 (AVX-512), and Linux CI.
git clone https://github.com/OldCrow/libstats.git
cd libstats
mkdir build && cd build
cmake .. # Auto-detects optimal configuration
make -j$(nproc) # Parallel build with auto-detected core count
ctest --output-on-failure # Run testsπ For complete build information, including cross-platform support, SIMD optimization, and advanced configuration options, see docs/BUILD_SYSTEM_GUIDE.md.
#include "libstats.h"
#include <iostream>
#include <numeric>
#include <span>
#include <vector>
int main() {
// Initialize performance systems (recommended)
libstats::initialize_performance_systems();
// Create distributions with safe factory methods
auto gaussian_result = libstats::GaussianDistribution::create(0.0, 1.0);
if (gaussian_result.isOk()) {
auto& gaussian = gaussian_result.value;
// Single-value operations
std::cout << "PDF at 1.0: " << gaussian.getProbability(1.0) << std::endl;
std::cout << "CDF at 1.0: " << gaussian.getCumulativeProbability(1.0) << std::endl;
// High-performance batch operations (auto-optimized)
std::vector<double> values(10000);
std::vector<double> results(10000);
std::iota(values.begin(), values.end(), -5.0);
gaussian.getProbability(std::span<const double>(values),
std::span<double>(results));
std::cout << "Processed " << values.size() << " values with auto-optimization" << std::endl;
}
return 0;
}π For comprehensive parallel processing and batch operation guides, see docs/PARALLEL_BATCH_PROCESSING_GUIDE.md.
libstats/
βββ include/ # Modular header architecture
β βββ libstats.h # Complete library (single include)
β βββ core/ # Core mathematical and statistical components
β βββ distributions/ # Statistical distributions (Gaussian, Exponential, etc.)
β βββ platform/ # SIMD, threading, and platform optimizations
βββ src/ # Implementation files
βββ tests/ # Comprehensive unit and integration tests
βββ examples/ # Usage demonstrations
βββ tools/ # Performance analysis and optimization utilities
βββ docs/ # Complete documentation guides
βββ scripts/ # Build and development scripts
π For detailed header organization and dependency management, see docs/HEADER_ARCHITECTURE_GUIDE.md.
- PDF, CDF, quantiles, parameter estimation, and validation
- 9 distributions across continuous, bounded, and discrete families
- Beyond
std::distributions with full statistical interfaces
- Automatic SIMD optimization (SSE2, AVX, AVX2, AVX-512, NEON)
- Intelligent parallel processing with auto-dispatch
- Thread-safe batch operations with work-stealing pools
- Smart caching and adaptive algorithm selection
- Memory-safe operations with comprehensive bounds checking
- Exception-safe error handling with safe factory methods
- Thread-safe concurrent access with reader-writer locks
- Numerical stability with log-space arithmetic
- Zero external dependencies (standard library only)
- C++20 concepts,
std::span, and execution policies - Cross-platform: Windows, macOS, Linux with automatic optimization
| Feature | std:: distributions | libstats |
|---|---|---|
| Random Sampling | β Excellent | β Uses std:: internally |
| PDF Evaluation | β Not available | β Complete implementation |
| CDF Evaluation | β Not available | β Complete implementation |
| Quantile Functions | β Not available | β Complete implementation |
| Parameter Fitting | β Not available | β MLE with diagnostics |
| Statistical Tests | β Not available | β Comprehensive validation |
| Thread Safety | β Full concurrent access |
quick_start_tutorial.cpp- 5-minute introduction to the core APIbasic_usage.cpp- End-to-end usage of creation, evaluation, sampling, fitting, and batch APIsdistribution_families_demo.cpp- The 9 distributions organized by family: what each models, when to use it, and how to choose within a familystatistical_validation_demo.cpp- Goodness-of-fit tests, cross-validation, bootstrap CIs, and model selectionparallel_execution_demo.cpp- Batch-processing and dispatch workflow
system_inspector- CPU capabilities and system informationsimd_verification- SIMD correctness and speedup verificationstrategy_profile- Canonical forced-strategy profiler for dispatcher threshold tuningparallel_batch_fitting_benchmark- Parallel batch fitting performance analysis
# Correctness suite β parallel-safe, always reliable
make run_tests # or: ctest -LE "timing|benchmark"
# Timing/speedup tests β run serially for accurate results
make run_tests_timing # or: ctest -j1 -L timing
# Everything (including dynamic linking tests)
make run_all_tests
# SIMD correctness and speedup measurement
./build/tools/simd_verification
# Run a specific test
ctest -R test_gaussian_basicTests are labelled: no label = correctness (parallel-safe); timing = speedup assertions (run serially); benchmark = performance tools (not in standard suite). The *_enhanced GTest tests require GTest installed; they are silently skipped when GTest is absent.
- C++20 compatible compiler: GCC 10+, Clang 14+, MSVC 2019+
- CMake: 3.20 or later
- Platform: Windows, macOS, Linux (automatic detection and optimization)
| Configuration | Command | Use Case |
|---|---|---|
| Development (default) | cmake .. |
Daily development with light optimization |
| Release | cmake -DCMAKE_BUILD_TYPE=Release .. |
Production builds with maximum optimization |
| Debug | cmake -DCMAKE_BUILD_TYPE=Debug .. |
Full debugging support |
For complete information about libstats, refer to these comprehensive guides:
Complete build system documentation covering:
- Cross-platform build instructions (Windows, macOS, Linux)
- SIMD detection and optimization
- Parallel build configuration
- Advanced CMake options
- Troubleshooting and manual builds
ποΈ HEADER_ARCHITECTURE_GUIDE.md
Header organization and dependency management:
- Modular header architecture
- Consolidated vs individual includes
- Development patterns for distributions, tools, and tests
- Performance optimization through header design
High-performance parallel and batch processing:
- Auto-dispatch vs explicit strategy control
- SIMD and parallel processing APIs
- Performance optimization guidelines
- Thread safety and memory management
For Windows development environment setup (MSVC activation, DLL CRT handling, Smart App Control), see the Windows session setup section in WARP.md.
libstats can be consumed by external projects in three ways.
# Build and install libstats
cd /path/to/libstats
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --parallel
cmake --install build --prefix /path/to/installIn your project's CMakeLists.txt:
find_package(libstats REQUIRED)
target_link_libraries(your_target PRIVATE libstats::libstats_static)Configure with -DCMAKE_PREFIX_PATH=/path/to/install.
include(FetchContent)
FetchContent_Declare(
libstats
GIT_REPOSITORY https://github.com/OldCrow/libstats.git
GIT_TAG main
)
FetchContent_MakeAvailable(libstats)
target_link_libraries(your_target PRIVATE libstats_static)After installing libstats:
pkg-config --cflags --libs libstatsSee consumer_example/ for a complete find_package project and consumer_example_fetchcontent/ for FetchContent.
Note: Define LIBSTATS_FULL_INTERFACE before including libstats/libstats.h to get the complete API (distributions, performance framework, etc.). Without it, only forward declarations and core utilities are available.
- 9 distributions (Gaussian, Exponential, Uniform, Poisson, Discrete, Gamma, Chi-squared, Student's t, Beta)
- Complete PDF/CDF/quantile/MLE/validation coverage across the implemented families
- Thread-safe with reader-writer locks and lock-free fast paths
- SIMD batch operations (SSE2/AVX/AVX2/AVX-512/NEON) with runtime dispatch
- Work-stealing parallel thread pool
- Goodness-of-fit tests (KS, AD), information criteria (AIC/BIC), cross-validation, bootstrap
- Honest strategy naming: SCALAR/VECTORIZED/PARALLEL/WORK_STEALING
- Constants consolidated from 10 micro-headers to 3 semantic groups
- Corrected
vector_exp_avxunderflow bug; Gaussian CDF heap allocation removed - Cross-platform validated: Intel AVX, Apple Silicon NEON, AMD AVX-512, MSVC/Linux CI
- All compiler warnings addressed (GCC, Clang, MSVC); zero warnings under ClangStrict
- Test labels for parallel-safe correctness runs vs timing-sensitive runs
find_package(libstats)with exported CMake targetsFetchContentsupport (zero-install consumption)pkg-configfor Linux and Homebrew- Installed headers use
#include "libstats/core/..."prefix - Consumer examples for both methods
- SIMD batch paths added for Exponential, Gamma, and Uniform where the current
VectorOpsabstraction makes them worthwhile - New distributions added: Student's t, Chi-squared, and Beta
- SIMD verification expanded to cover the full current distribution set
- Dispatch heuristics replaced with
constexprlookup table derived from 6912 profiling measurements across NEON, AVX, AVX2, and AVX-512 - AVX-512/MSVC build fix: global compile flag follows SIMDDetection results
- Student-T MLE robustness: upper-bounded Newton-Raphson prevents divergence
- Beta CDF batch optimisation: hoisted
lgammaprefix - Canonical
strategy_profiletool replaces ad-hoc benchmarks
- Cross-platform validated: Ivy Bridge AVX, Kaby Lake AVX2, M1 NEON, Asus A16 AVX-512/MSVC
- 54/54 SIMD verification tests pass on all four machines
- Tagged and released on
main
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
For third-party code attributions and licenses, see THIRD_PARTY_NOTICES.md.
This project builds upon concepts and components from libhmm, adapting them for general-purpose statistical computing while maintaining the focus on modern C++ design and performance.
Our SIMD implementations incorporate algorithms inspired by the SLEEF library for high-accuracy mathematical functions.
libstats - Bringing comprehensive statistical computing to modern C++