EfficientHAR-Lite

Lightweight Human Activity Recognition for Cortex-M Microcontrollers

EfficientHAR-Lite is an end-to-end framework for deploying IMU-based Human Activity Recognition models on resource-constrained microcontrollers (MCUs). It achieves near state-of-the-art accuracy on standard benchmarks while fitting within the Flash and RAM constraints of Cortex-M class processors.

Key Features

LiteAttCNN Architecture — Residual 1D-CNN with dual convolutions per block, depthwise separable convolutions, and configurable attention modules
Tier-Scaled Training — Automatic regularization scaling (weight decay, mixup, label smoothing, dropout) matched to model capacity
Adaptive Multi-Exit Inference — Confidence-based early termination saving 60–71% inference latency
INT8 Quantization Pipeline — Full post-training quantization with on-device benchmarking via ST Edge AI Developer Cloud
Three Model Tiers — Tiny (30–50K params), Default (80–108K), Large (234–305K) targeting different MCU memory budgets

Results

Accuracy (5-seed mean ± std, no-attention single-exit)

Dataset	Tiny	Default	Large
UCI-HAR (6 activities)	94.4 ± 0.5%	96.1 ± 0.2%	96.4 ± 0.5%
MHEALTH (12 activities)	82.8 ± 5.9%	87.8 ± 1.5%	86.9 ± 2.3%
PAMAP2 (12 activities)	80.0 ± 2.0%	84.7 ± 0.8%	85.7 ± 1.0%

No-attention, single-exit configuration (recommended default).

On-Device Benchmarks — STM32H743 @ 480 MHz (INT8)

Tier	Flash	RAM	Latency (Exit 1)	Latency (Full)
Tiny	39–84 KB	17–21 KB	1.77 ms	6.05 ms
Default	48–139 KB	25–31 KB	4.11 ms	12.85 ms
Large	80–308 KB	41–51 KB	11.71 ms	32.51 ms

UCI-HAR dataset. Flash/RAM ranges span Exit 1 (minimum) to full single-exit model.

Comparison with Deployed TinyML Systems (UCI-HAR)

System	Accuracy	Flash	Acc/KB	Scenario
EfficientHAR-Lite (tiny, full)	94.4%	65 KB	1.45	Full model
EfficientHAR-Lite (default, full)	96.1%	117 KB	0.82	Full model
EfficientHAR-Lite (default, E1)	92.4%	40 KB	2.31	Exit-1 only
DeepConvLSTM (quant.)	98.2%	137 KB	0.72	Full model
MHCNLS-HAR	95.7%	~600 KB	0.16	Full model
Efficient TinyML	92.5%	320 KB	0.29	Full model

Acc/KB = accuracy / INT8 Flash. Full-model entries pair single-exit accuracy with total INT8 Flash. Exit-1 entries pair E1 accuracy with E1 sub-graph Flash.

Quick Start

Installation

git clone https://github.com/oezeb/efficienthar-lite.git
cd efficienthar-lite
pip install -r requirements.txt

Train a Model

# Default tier, UCI-HAR, single-exit, no attention (recommended)
python train.py --dataset uci-har --config default --augment --quantize

# Tiny tier with multi-exit
python train.py --dataset uci-har --config tiny --multi-exit --augment --quantize

# Large tier on PAMAP2
python train.py --dataset pamap2 --config large --multi-exit --augment --quantize

Analyze Exit-Rate Distributions

After training multi-exit models, analyze confidence-based early termination:

# Generate exit distributions and τ-sensitivity analysis
python generate_exit_distributions.py --outputs-dir outputs --seed 1 --attention none

This produces:

exit_distributions.csv — Full τ sweep (0.50–0.99) for all configurations
exit_distributions_tau08.csv — Summary at τ=0.8 (recommended default)
Printed LaTeX table for paper inclusion

Model Configurations

Tier	Filters	Kernels	Parameters	Target MCU
Tiny	[24, 48, 96]	[3, 5, 5]	30–52K	128 KB+ Flash
Default	[40, 80, 160]	[5, 5, 5]	80–108K	256 KB+ Flash
Large	[80, 160, 240]	[5, 7, 7]	234–305K	512 KB+ Flash

Benchmark on STM32 Hardware

python benchmark.py --model outputs/<experiment>/model_int8.tflite --target stm32h7

Requires an ST Edge AI Developer Cloud account for on-device benchmarking.

Architecture

Input (128 × C) → [ResBlock1] → [ResBlock2] → [ResBlock3] → GAP → Dense → Softmax
                       ↓              ↓              ↓
                    Exit 1          Exit 2        Exit 3  (multi-exit mode)

Each residual block:

Two 1D convolutions (standard for Block 1, depthwise separable for Blocks 2–3)
Batch normalization + ReLU after each convolution
1×1 projection shortcut for channel alignment
Max pooling (factor 2)

Attention Modules

Four configurable variants via --attention {none,channel,temporal,dual}:

Recommendation: Use --attention none (the default). A 360-experiment ablation with 5-seed statistical validation found that attention type is statistically insignificant for HAR accuracy (1/18 conditions at p < 0.05, median p = 0.76). The no-attention configuration uses 9–21% fewer parameters with equivalent accuracy.

Datasets

The framework supports five IMU-based HAR datasets:

Dataset	Activities	Channels	Train Samples
UCI-HAR	6	6	7,352
MHEALTH	12	6	~4,200
PAMAP2	12	18	~11,400
WISDM	6	3	~25,000
OPPORTUNITY	18	113	~57,000

Datasets are downloaded automatically on first use.

Tier-Scaled Training

Regularization scales automatically with model capacity:

Parameter	Tiny	Default	Large
Epochs	50	100	150
AdamW weight decay	10⁻⁴	5×10⁻⁴	10⁻³
Mixup α	0.0	0.2	0.3
Label smoothing	0.0	0.1	0.1
Data augmentation	3×	4×	6×

Project Structure

efficienthar-lite/
├── train.py                        # Main training script
├── benchmark.py                    # MCU deployment benchmarking
├── generate_exit_distributions.py  # Exit-rate and τ-sensitivity analysis
├── requirements.txt
├── src/
│   ├── data/                       # Dataset loaders and augmentation
│   │   ├── uci_har.py
│   │   ├── mhealth.py
│   │   ├── pamap2.py
│   │   ├── wisdm.py
│   │   ├── opportunity.py
│   │   └── augmentation.py
│   ├── models/                     # Architecture and training
│   │   ├── liteattcnn.py           # LiteAttCNN architecture
│   │   ├── attention.py            # Attention modules
│   │   ├── training.py             # Tier-scaled training configs
│   │   ├── quantization.py         # INT8 quantization pipeline
│   │   └── qat.py                  # Quantization-aware training
│   └── deployment/                 # ST Edge AI integration
│       ├── benchmark.py
│       └── stm32ai_dc/             # ST Edge AI Developer Cloud client
└── LICENSE

Citation

If you use this framework in your research, please cite:

@article{ouedraogo2026attention,
  author    = {Ouedraogo, Ezekiel B. and Wang, Xingfu and Xu, Xiaohua and Ugwu, Emmanuel U.},
  title     = {Attention Mechanisms Are Statistically Insignificant for IMU-Based
               Human Activity Recognition on Microcontrollers:
               A 360-Experiment Empirical Study},
  journal   = {IEEE Sensors Journal},
  year      = {2026},
  note      = {Submitted}
}

@article{ouedraogo2026efficienthar,
  author    = {Ouedraogo, Ezekiel B. and Xu, Xiaohua and Wang, Xingfu},
  title     = {EfficientHAR-Lite: Closing the Accuracy Gap Between Cloud and
               MCU-Deployable Human Activity Recognition via
               Architecture-Training Co-Design},
  journal   = {ACM Transactions on Internet of Things},
  year      = {2026},
  note      = {Submitted}
}

License

MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EfficientHAR-Lite

Key Features

Results

Accuracy (5-seed mean ± std, no-attention single-exit)

On-Device Benchmarks — STM32H743 @ 480 MHz (INT8)

Comparison with Deployed TinyML Systems (UCI-HAR)

Quick Start

Installation

Train a Model

Analyze Exit-Rate Distributions

Model Configurations

Benchmark on STM32 Hardware

Architecture

Attention Modules

Datasets

Tier-Scaled Training

Project Structure

Citation

License

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmark.py		benchmark.py
generate_exit_distributions.py		generate_exit_distributions.py
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

EfficientHAR-Lite

Key Features

Results

Accuracy (5-seed mean ± std, no-attention single-exit)

On-Device Benchmarks — STM32H743 @ 480 MHz (INT8)

Comparison with Deployed TinyML Systems (UCI-HAR)

Quick Start

Installation

Train a Model

Analyze Exit-Rate Distributions

Model Configurations

Benchmark on STM32 Hardware

Architecture

Attention Modules

Datasets

Tier-Scaled Training

Project Structure

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages