libhmm — Modern C++20 Hidden Markov Model Library
A modern, high-performance C++20 implementation of Hidden Markov Models with 15 emission distributions, canonical log-space algorithms, and compile-time SIMD acceleration.
Zero external dependencies — C++20 standard library only.
- Baum-Welch — canonical log-space EM; works with any
EmissionDistributionvia weightedfit() - Viterbi Training — hard-assignment with
TrainingConfigpresets (fast,balanced,precise) - Segmental K-Means — for discrete HMMs; useful as initialiser before EM
- ForwardBackward — canonical log-space calculator; returns
probability()andgetLogProbability() - Viterbi — canonical log-space decoder; returns the MAP state sequence
Both calculators call getBatchLogProbabilities() per state per time step, enabling
SIMD acceleration directly at the distribution layer.
Discrete: DiscreteDistribution, BinomialDistribution, NegativeBinomialDistribution, PoissonDistribution
Continuous: GaussianDistribution, ExponentialDistribution, GammaDistribution, LogNormalDistribution,
BetaDistribution, UniformDistribution, WeibullDistribution, ParetoDistribution,
RayleighDistribution, StudentTDistribution, ChiSquaredDistribution
All distributions implement getBatchLogProbabilities() for SIMD-accelerated batch evaluation.
GaussianDistribution and ExponentialDistribution have explicit AVX-512/AVX2/SSE2/NEON intrinsics (tier 2);
the remaining 13 use concrete non-virtual loops that the compiler auto-vectorizes under -march=native.
- JSON serialization (recommended):
save_json(hmm, path)/load_json(path)— exact IEEE 754 round-trip, no external dependencies, locale-safe.libhmm::save_json(hmm, "model.json"); auto hmm2 = libhmm::load_json("model.json");
- Legacy XML (
XMLFileReader/XMLFileWriter): retained for reading existing.xmlfiles; deprecated in favour of JSON for new code. - See
samples/for ready-to-use reference HMM files in both formats.
- Compile-time SIMD dispatch: each machine builds for its own CPU
- GCC/Clang:
-march=native(AVX-512 on capable x86, NEON on AArch64) - MSVC:
/arch:AVX512,/arch:AVX2, or/arch:AVX(CPU-verified at configure time)
- GCC/Clang:
- Log-space throughout: no numerical underflow on long sequences
- Pre-computed log transition matrices: amortised once per
compute()call
git clone https://github.com/OldCrow/libhmm.git
cd libhmm
cmake -B build
cmake --build build --config Release
ctest --test-dir buildOn Windows with Visual Studio:
cmake -B build -G "Visual Studio 17 2022" -A x64
cmake --build build --config Release --parallel 4
ctest --test-dir build -C Release --parallel 4On macOS Catalina (10.15), use the guarded configure path:
./scripts/configure_catalina.sh build
cmake --build build --config Release
ctest --test-dir build#include "libhmm/libhmm.h"
using namespace libhmm;
// Create a 2-state HMM with Gaussian emissions
Hmm hmm(2);
Matrix trans(2, 2);
trans(0, 0) = 0.9; trans(0, 1) = 0.1;
trans(1, 0) = 0.2; trans(1, 1) = 0.8;
hmm.setTrans(trans);
Vector pi(2); pi(0) = 0.6; pi(1) = 0.4;
hmm.setPi(pi);
hmm.setDistribution(0, std::make_unique<GaussianDistribution>(0.0, 1.0));
hmm.setDistribution(1, std::make_unique<GaussianDistribution>(5.0, 1.5));
// Train with Baum-Welch (works with any EmissionDistribution)
ObservationLists obs = { /* your sequences */ };
BaumWelchTrainer trainer(&hmm, obs);
trainer.train();
// Evaluate
ForwardBackwardCalculator fbc(hmm, obs[0]);
double log_p = fbc.getLogProbability();
// Decode
ViterbiCalculator vc(hmm, obs[0]);
StateSequence path = vc.decode();// Fast convergence for interactive use
ViterbiTrainer fast_trainer(&hmm, obs, training_presets::fast());
fast_trainer.train();
// Precise convergence for final models
ViterbiTrainer precise_trainer(&hmm, obs, training_presets::precise());
precise_trainer.train();
std::cout << "Converged: " << precise_trainer.hasConverged() << "\n";libhmm/
├── include/libhmm/ # Public headers (layered architecture)
│ ├── platform/ # Layer 0: SIMD detection
│ ├── math/ # Layer 1: constants, log-space, numerics
│ ├── linalg/ # Layer 2: Matrix, Vector types
│ ├── distributions/ # Layer 3: 15 distributions + base
│ ├── hmm.h # Core HMM class
│ ├── calculators/ # Layer 4: ForwardBackward, Viterbi
│ ├── training/ # Layer 4: BaumWelch, Viterbi, SegmentalKMeans
│ └── io/ # JSON (hmm_json.h) + legacy XML I/O
├── src/ # Implementation (mirrors include/)
├── tests/ # 37-test GTest suite
├── examples/ # 12 usage demonstrations
├── tools/ # simd_inspection, batch_performance, hmm_validator (.json/.xml)
├── samples/ # Reference HMM files (two_state_gaussian, casino) in JSON and XML
├── benchmarks/ # Comparative benchmarks (requires external libraries)
├── docs/ # Documentation and checklists
└── CMakeLists.txt
See examples/ for demonstrations:
| Example | Distribution(s) | Trainer |
|---|---|---|
basic_hmm_example |
Discrete, Gaussian, Poisson | Viterbi + JSON I/O |
baum_welch_example |
Gaussian | BaumWelch (with EM convergence table) |
viterbi_trainer_example |
Gaussian | Viterbi (preset comparison) |
student_t_hmm_example |
StudentT | BaumWelch |
poisson_hmm_example |
Poisson | Viterbi |
financial_hmm_example |
Beta, LogNormal | Viterbi |
reliability_hmm_example |
Weibull, Exponential | Viterbi |
quality_control_hmm_example |
Binomial, Uniform | Viterbi |
economics_hmm_example |
NegBinomial, Pareto | Viterbi |
queuing_theory_hmm_example |
Poisson, Exponential, Gamma | Viterbi |
statistical_process_control_hmm_example |
ChiSquared | Viterbi |
swarm_coordination_example |
Discrete (243 symbols) | — |
- C++20 compiler: GCC 11+, Clang 14+, MSVC 2019 16.11+
- CMake 3.20+
No external dependencies. GTest is fetched automatically via CMake FetchContent for the test suite.
- WARP.md — session guide for Warp AI agent
- docs/CROSS_PLATFORM.md — build options, library output, CI matrix
- docs/GOLD_STANDARD_CHECKLIST.md — distribution implementation requirements
- docs/STYLE_GUIDE.md — coding conventions
MIT License — see LICENSE for details.