Lightweight Human Activity Recognition for Cortex-M Microcontrollers
EfficientHAR-Lite is an end-to-end framework for deploying IMU-based Human Activity Recognition models on resource-constrained microcontrollers (MCUs). It achieves near state-of-the-art accuracy on standard benchmarks while fitting within the Flash and RAM constraints of Cortex-M class processors.
- LiteAttCNN Architecture — Residual 1D-CNN with dual convolutions per block, depthwise separable convolutions, and configurable attention modules
- Tier-Scaled Training — Automatic regularization scaling (weight decay, mixup, label smoothing, dropout) matched to model capacity
- Adaptive Multi-Exit Inference — Confidence-based early termination saving 60–71% inference latency
- INT8 Quantization Pipeline — Full post-training quantization with on-device benchmarking via ST Edge AI Developer Cloud
- Three Model Tiers — Tiny (30–50K params), Default (80–108K), Large (234–305K) targeting different MCU memory budgets
| Dataset | Tiny | Default | Large |
|---|---|---|---|
| UCI-HAR (6 activities) | 94.4 ± 0.5% | 96.1 ± 0.2% | 96.4 ± 0.5% |
| MHEALTH (12 activities) | 82.8 ± 5.9% | 87.8 ± 1.5% | 86.9 ± 2.3% |
| PAMAP2 (12 activities) | 80.0 ± 2.0% | 84.7 ± 0.8% | 85.7 ± 1.0% |
No-attention, single-exit configuration (recommended default).
| Tier | Flash | RAM | Latency (Exit 1) | Latency (Full) |
|---|---|---|---|---|
| Tiny | 39–84 KB | 17–21 KB | 1.77 ms | 6.05 ms |
| Default | 48–139 KB | 25–31 KB | 4.11 ms | 12.85 ms |
| Large | 80–308 KB | 41–51 KB | 11.71 ms | 32.51 ms |
UCI-HAR dataset. Flash/RAM ranges span Exit 1 (minimum) to full single-exit model.
| System | Accuracy | Flash | Acc/KB | Scenario |
|---|---|---|---|---|
| EfficientHAR-Lite (tiny, full) | 94.4% | 65 KB | 1.45 | Full model |
| EfficientHAR-Lite (default, full) | 96.1% | 117 KB | 0.82 | Full model |
| EfficientHAR-Lite (default, E1) | 92.4% | 40 KB | 2.31 | Exit-1 only |
| DeepConvLSTM (quant.) | 98.2% | 137 KB | 0.72 | Full model |
| MHCNLS-HAR | 95.7% | ~600 KB | 0.16 | Full model |
| Efficient TinyML | 92.5% | 320 KB | 0.29 | Full model |
Acc/KB = accuracy / INT8 Flash. Full-model entries pair single-exit accuracy with total INT8 Flash. Exit-1 entries pair E1 accuracy with E1 sub-graph Flash.
git clone https://github.com/oezeb/efficienthar-lite.git
cd efficienthar-lite
pip install -r requirements.txt# Default tier, UCI-HAR, single-exit, no attention (recommended)
python train.py --dataset uci-har --config default --augment --quantize
# Tiny tier with multi-exit
python train.py --dataset uci-har --config tiny --multi-exit --augment --quantize
# Large tier on PAMAP2
python train.py --dataset pamap2 --config large --multi-exit --augment --quantizeAfter training multi-exit models, analyze confidence-based early termination:
# Generate exit distributions and τ-sensitivity analysis
python generate_exit_distributions.py --outputs-dir outputs --seed 1 --attention noneThis produces:
exit_distributions.csv— Full τ sweep (0.50–0.99) for all configurationsexit_distributions_tau08.csv— Summary at τ=0.8 (recommended default)- Printed LaTeX table for paper inclusion
| Tier | Filters | Kernels | Parameters | Target MCU |
|---|---|---|---|---|
| Tiny | [24, 48, 96] | [3, 5, 5] | 30–52K | 128 KB+ Flash |
| Default | [40, 80, 160] | [5, 5, 5] | 80–108K | 256 KB+ Flash |
| Large | [80, 160, 240] | [5, 7, 7] | 234–305K | 512 KB+ Flash |
python benchmark.py --model outputs/<experiment>/model_int8.tflite --target stm32h7Requires an ST Edge AI Developer Cloud account for on-device benchmarking.
Input (128 × C) → [ResBlock1] → [ResBlock2] → [ResBlock3] → GAP → Dense → Softmax
↓ ↓ ↓
Exit 1 Exit 2 Exit 3 (multi-exit mode)
Each residual block:
- Two 1D convolutions (standard for Block 1, depthwise separable for Blocks 2–3)
- Batch normalization + ReLU after each convolution
- 1×1 projection shortcut for channel alignment
- Max pooling (factor 2)
Four configurable variants via --attention {none,channel,temporal,dual}:
Recommendation: Use
--attention none(the default). A 360-experiment ablation with 5-seed statistical validation found that attention type is statistically insignificant for HAR accuracy (1/18 conditions at p < 0.05, median p = 0.76). The no-attention configuration uses 9–21% fewer parameters with equivalent accuracy.
The framework supports five IMU-based HAR datasets:
| Dataset | Activities | Channels | Train Samples |
|---|---|---|---|
| UCI-HAR | 6 | 6 | 7,352 |
| MHEALTH | 12 | 6 | ~4,200 |
| PAMAP2 | 12 | 18 | ~11,400 |
| WISDM | 6 | 3 | ~25,000 |
| OPPORTUNITY | 18 | 113 | ~57,000 |
Datasets are downloaded automatically on first use.
Regularization scales automatically with model capacity:
| Parameter | Tiny | Default | Large |
|---|---|---|---|
| Epochs | 50 | 100 | 150 |
| AdamW weight decay | 10⁻⁴ | 5×10⁻⁴ | 10⁻³ |
| Mixup α | 0.0 | 0.2 | 0.3 |
| Label smoothing | 0.0 | 0.1 | 0.1 |
| Data augmentation | 3× | 4× | 6× |
efficienthar-lite/
├── train.py # Main training script
├── benchmark.py # MCU deployment benchmarking
├── generate_exit_distributions.py # Exit-rate and τ-sensitivity analysis
├── requirements.txt
├── src/
│ ├── data/ # Dataset loaders and augmentation
│ │ ├── uci_har.py
│ │ ├── mhealth.py
│ │ ├── pamap2.py
│ │ ├── wisdm.py
│ │ ├── opportunity.py
│ │ └── augmentation.py
│ ├── models/ # Architecture and training
│ │ ├── liteattcnn.py # LiteAttCNN architecture
│ │ ├── attention.py # Attention modules
│ │ ├── training.py # Tier-scaled training configs
│ │ ├── quantization.py # INT8 quantization pipeline
│ │ └── qat.py # Quantization-aware training
│ └── deployment/ # ST Edge AI integration
│ ├── benchmark.py
│ └── stm32ai_dc/ # ST Edge AI Developer Cloud client
└── LICENSE
If you use this framework in your research, please cite:
@article{ouedraogo2026attention,
author = {Ouedraogo, Ezekiel B. and Wang, Xingfu and Xu, Xiaohua and Ugwu, Emmanuel U.},
title = {Attention Mechanisms Are Statistically Insignificant for IMU-Based
Human Activity Recognition on Microcontrollers:
A 360-Experiment Empirical Study},
journal = {IEEE Sensors Journal},
year = {2026},
note = {Submitted}
}
@article{ouedraogo2026efficienthar,
author = {Ouedraogo, Ezekiel B. and Xu, Xiaohua and Wang, Xingfu},
title = {EfficientHAR-Lite: Closing the Accuracy Gap Between Cloud and
MCU-Deployable Human Activity Recognition via
Architecture-Training Co-Design},
journal = {ACM Transactions on Internet of Things},
year = {2026},
note = {Submitted}
}MIT License. See LICENSE for details.