Skip to content

oezeb/efficienthar-lite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EfficientHAR-Lite

Lightweight Human Activity Recognition for Cortex-M Microcontrollers

EfficientHAR-Lite is an end-to-end framework for deploying IMU-based Human Activity Recognition models on resource-constrained microcontrollers (MCUs). It achieves near state-of-the-art accuracy on standard benchmarks while fitting within the Flash and RAM constraints of Cortex-M class processors.

Key Features

  • LiteAttCNN Architecture — Residual 1D-CNN with dual convolutions per block, depthwise separable convolutions, and configurable attention modules
  • Tier-Scaled Training — Automatic regularization scaling (weight decay, mixup, label smoothing, dropout) matched to model capacity
  • Adaptive Multi-Exit Inference — Confidence-based early termination saving 60–71% inference latency
  • INT8 Quantization Pipeline — Full post-training quantization with on-device benchmarking via ST Edge AI Developer Cloud
  • Three Model Tiers — Tiny (30–50K params), Default (80–108K), Large (234–305K) targeting different MCU memory budgets

Results

Accuracy (5-seed mean ± std, no-attention single-exit)

Dataset Tiny Default Large
UCI-HAR (6 activities) 94.4 ± 0.5% 96.1 ± 0.2% 96.4 ± 0.5%
MHEALTH (12 activities) 82.8 ± 5.9% 87.8 ± 1.5% 86.9 ± 2.3%
PAMAP2 (12 activities) 80.0 ± 2.0% 84.7 ± 0.8% 85.7 ± 1.0%

No-attention, single-exit configuration (recommended default).

On-Device Benchmarks — STM32H743 @ 480 MHz (INT8)

Tier Flash RAM Latency (Exit 1) Latency (Full)
Tiny 39–84 KB 17–21 KB 1.77 ms 6.05 ms
Default 48–139 KB 25–31 KB 4.11 ms 12.85 ms
Large 80–308 KB 41–51 KB 11.71 ms 32.51 ms

UCI-HAR dataset. Flash/RAM ranges span Exit 1 (minimum) to full single-exit model.

Comparison with Deployed TinyML Systems (UCI-HAR)

System Accuracy Flash Acc/KB Scenario
EfficientHAR-Lite (tiny, full) 94.4% 65 KB 1.45 Full model
EfficientHAR-Lite (default, full) 96.1% 117 KB 0.82 Full model
EfficientHAR-Lite (default, E1) 92.4% 40 KB 2.31 Exit-1 only
DeepConvLSTM (quant.) 98.2% 137 KB 0.72 Full model
MHCNLS-HAR 95.7% ~600 KB 0.16 Full model
Efficient TinyML 92.5% 320 KB 0.29 Full model

Acc/KB = accuracy / INT8 Flash. Full-model entries pair single-exit accuracy with total INT8 Flash. Exit-1 entries pair E1 accuracy with E1 sub-graph Flash.

Quick Start

Installation

git clone https://github.com/oezeb/efficienthar-lite.git
cd efficienthar-lite
pip install -r requirements.txt

Train a Model

# Default tier, UCI-HAR, single-exit, no attention (recommended)
python train.py --dataset uci-har --config default --augment --quantize

# Tiny tier with multi-exit
python train.py --dataset uci-har --config tiny --multi-exit --augment --quantize

# Large tier on PAMAP2
python train.py --dataset pamap2 --config large --multi-exit --augment --quantize

Analyze Exit-Rate Distributions

After training multi-exit models, analyze confidence-based early termination:

# Generate exit distributions and τ-sensitivity analysis
python generate_exit_distributions.py --outputs-dir outputs --seed 1 --attention none

This produces:

  • exit_distributions.csv — Full τ sweep (0.50–0.99) for all configurations
  • exit_distributions_tau08.csv — Summary at τ=0.8 (recommended default)
  • Printed LaTeX table for paper inclusion

Model Configurations

Tier Filters Kernels Parameters Target MCU
Tiny [24, 48, 96] [3, 5, 5] 30–52K 128 KB+ Flash
Default [40, 80, 160] [5, 5, 5] 80–108K 256 KB+ Flash
Large [80, 160, 240] [5, 7, 7] 234–305K 512 KB+ Flash

Benchmark on STM32 Hardware

python benchmark.py --model outputs/<experiment>/model_int8.tflite --target stm32h7

Requires an ST Edge AI Developer Cloud account for on-device benchmarking.

Architecture

Input (128 × C) → [ResBlock1] → [ResBlock2] → [ResBlock3] → GAP → Dense → Softmax
                       ↓              ↓              ↓
                    Exit 1          Exit 2        Exit 3  (multi-exit mode)

Each residual block:

  • Two 1D convolutions (standard for Block 1, depthwise separable for Blocks 2–3)
  • Batch normalization + ReLU after each convolution
  • 1×1 projection shortcut for channel alignment
  • Max pooling (factor 2)

Attention Modules

Four configurable variants via --attention {none,channel,temporal,dual}:

Recommendation: Use --attention none (the default). A 360-experiment ablation with 5-seed statistical validation found that attention type is statistically insignificant for HAR accuracy (1/18 conditions at p < 0.05, median p = 0.76). The no-attention configuration uses 9–21% fewer parameters with equivalent accuracy.

Datasets

The framework supports five IMU-based HAR datasets:

Dataset Activities Channels Train Samples
UCI-HAR 6 6 7,352
MHEALTH 12 6 ~4,200
PAMAP2 12 18 ~11,400
WISDM 6 3 ~25,000
OPPORTUNITY 18 113 ~57,000

Datasets are downloaded automatically on first use.

Tier-Scaled Training

Regularization scales automatically with model capacity:

Parameter Tiny Default Large
Epochs 50 100 150
AdamW weight decay 10⁻⁴ 5×10⁻⁴ 10⁻³
Mixup α 0.0 0.2 0.3
Label smoothing 0.0 0.1 0.1
Data augmentation

Project Structure

efficienthar-lite/
├── train.py                        # Main training script
├── benchmark.py                    # MCU deployment benchmarking
├── generate_exit_distributions.py  # Exit-rate and τ-sensitivity analysis
├── requirements.txt
├── src/
│   ├── data/                       # Dataset loaders and augmentation
│   │   ├── uci_har.py
│   │   ├── mhealth.py
│   │   ├── pamap2.py
│   │   ├── wisdm.py
│   │   ├── opportunity.py
│   │   └── augmentation.py
│   ├── models/                     # Architecture and training
│   │   ├── liteattcnn.py           # LiteAttCNN architecture
│   │   ├── attention.py            # Attention modules
│   │   ├── training.py             # Tier-scaled training configs
│   │   ├── quantization.py         # INT8 quantization pipeline
│   │   └── qat.py                  # Quantization-aware training
│   └── deployment/                 # ST Edge AI integration
│       ├── benchmark.py
│       └── stm32ai_dc/             # ST Edge AI Developer Cloud client
└── LICENSE

Citation

If you use this framework in your research, please cite:

@article{ouedraogo2026attention,
  author    = {Ouedraogo, Ezekiel B. and Wang, Xingfu and Xu, Xiaohua and Ugwu, Emmanuel U.},
  title     = {Attention Mechanisms Are Statistically Insignificant for IMU-Based
               Human Activity Recognition on Microcontrollers:
               A 360-Experiment Empirical Study},
  journal   = {IEEE Sensors Journal},
  year      = {2026},
  note      = {Submitted}
}

@article{ouedraogo2026efficienthar,
  author    = {Ouedraogo, Ezekiel B. and Xu, Xiaohua and Wang, Xingfu},
  title     = {EfficientHAR-Lite: Closing the Accuracy Gap Between Cloud and
               MCU-Deployable Human Activity Recognition via
               Architecture-Training Co-Design},
  journal   = {ACM Transactions on Internet of Things},
  year      = {2026},
  note      = {Submitted}
}

License

MIT License. See LICENSE for details.

About

A lightweight human activity recognition framework for MCU deployment, including the LiteAttCNN architecture, tier-scaled training pipeline, adaptive multi-exit inference, and INT8 quantization deployment tools

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages