📖 HTMA-CL: Efficient Hierarchical Tokenization with Multiscale Attention for Compressed Learning

🔧 Dependencies and Installation

Python >= 3.8 (Recommend to use Anaconda or Miniconda)
PyTorch >= 1.12
timm >= 0.9.16
OpenCV >= 4.8.0
MMSegmentation >= 1.2.2 (For semantic segmentation task)
At least two RTX3090 GPUs are required.

Installation

Clone repo

git clone https://github.com/acrlife/HTMA-CL.git
cd HTMA-CL-main

Install dependent packages

conda install pytorch=1.12.0 torchvision==0.13.0 cudatoolkit=11.3 -c pytorch -y
pip install opencv-python
pip install timm == 0.9.16

Training for Classification

Prepare the training data of ImageNet1K
Download the pre-trained checkpoints of our backbone on ImageNet1K, 提取码: a111.

Training on ImageNet with two GPUs(Change the --data and --transfer-model to your own, and modify the following commands in the same way.)

torchrun --nnodes=1 --nproc_per_node=2 train_on_imagenet.py --data '../imagenet' --model htma_14 --epochs 60 -b 80 -j 8 --blocksize 16 --rat 0.1 --transfer-learning True --transfer-model '../checkpoint/pretrained_weight.pth' --lr 1e-3 --warmup-epochs 5 --warmup-lr 1e-5 --min-lr 2e-4 --weight-decay 5e-4 --amp --img-size 384

Training on Cifar100 with two GPUs

python train_on_cifar.py --model htma_14 --dataset cifar100 --data ../data --lr 0.001 --b 128 --img-size 384 --blocksize 16 --rat 0.1 --transfer-model ../checkpoint/pretrained_weight.pth --num-gpu 2

Training on Cifar10 with two GPUs

python train_on_cifar.py --model htma_14 --dataset cifar10 --data ../data --lr 0.001 --b 128 --img-size 384 --blocksize 16 --rat 0.1 --transfer-model ../checkpoint/pretrained_weight.pth --num-gpu 2

If you want to train on one GPU, set '--num-gpu' to 1.

Testing for Classification

You can download the pre-trained checkpoints from our model zoo.

Testing on ImageNet with two GPUs

python val_imagenet.py --model htma_14 --data ../imagenet --img-size 384 -b 128 --blocksize 16 --rat 0.1 --eval_checkpoint ../checkpoint/imagenet1k@384_r10.pth --num-gpu 2

Testing on Cifar10 with two GPUs

python val_cifar.py --model htma_14 --img-size 384 --dataset cifar10 --data ../data --b 128 --blocksize 16 --rat 0.10 --eval_checkpoint ../checkpoint/cifar10/cifar10_384_r0.1_97.75.pth --num-gpu 2

Testing on Cifar100 with two GPUs

python val_cifar.py --model htma_14 --img-size 384 --dataset cifar100 --data ../data --b 128 --blocksize 16 --rat 0.10 --eval_checkpoint ../checkpoint/cifar100/cifar100_384_r0.1_86.68.pth --num-gpu 2

If you want to test on one GPU, set '--num-gpu' to 1.

Model Zoo

Classification

Mode	Download link
Pre-trained Backbone	URL, 提取码: a111
ImageNet classification (ratio={0.1, 0.05, 0.025, 0.01})	URL, 提取码: a111
Cifar10 classification (ratio={0.25, 0.1, 0.018})	URL, 提取码: a111
Cifar100 classification (ratio={0.25, 0.1, 0.018})	URL, 提取码: a111

Acknowledgements

This project is based on TransCL(paper, code), T2T-ViT(paper, code), timm(code), ml-cvnets(code) and MMSegmentation(code). Thanks for their wonderful works.

🎓 Citation

If you find the code helpful in your research or work, please cite the following paper:

@article{jing2026htma-cl,
author = {Jing, Yanhao and Wu, Xiangjun and You, Datao and Wang, Hui and Hu, Zhe and Kan, Haibin and Kurths, J\"{u}rgen},
title = {HTMA-CL: A Hierarchical Tokenization and Multiscale Attention Framework for Compressive Domain Multimedia Inference},
year = {2026},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
issn = {1551-6857},
url = {https://doi.org/10.1145/3820057},
doi = {10.1145/3820057},
journal = {ACM Trans. Multimedia Comput. Commun. Appl.},
}

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
figs		figs
README.md		README.md
Sampling.py		Sampling.py
cifar_utils.py		cifar_utils.py
model.py		model.py
train_on_cifar.py		train_on_cifar.py
train_on_imagenet.py		train_on_imagenet.py
transformer.py		transformer.py
transformer_block.py		transformer_block.py
utils.py		utils.py
val_cifar.py		val_cifar.py
val_imagenet.py		val_imagenet.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📖 HTMA-CL: Efficient Hierarchical Tokenization with Multiscale Attention for Compressed Learning

🔧 Dependencies and Installation

Installation

Training for Classification

Training on ImageNet with two GPUs(Change the --data and --transfer-model to your own, and modify the following commands in the same way.)

Training on Cifar100 with two GPUs

Training on Cifar10 with two GPUs

If you want to train on one GPU, set '--num-gpu' to 1.

Testing for Classification

Testing on ImageNet with two GPUs

Testing on Cifar10 with two GPUs

Testing on Cifar100 with two GPUs

If you want to test on one GPU, set '--num-gpu' to 1.

Model Zoo

Classification

Acknowledgements

🎓 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📖 HTMA-CL: Efficient Hierarchical Tokenization with Multiscale Attention for Compressed Learning

🔧 Dependencies and Installation

Installation

Training for Classification

Training on ImageNet with two GPUs(Change the --data and --transfer-model to your own, and modify the following commands in the same way.)

Training on Cifar100 with two GPUs

Training on Cifar10 with two GPUs

If you want to train on one GPU, set '--num-gpu' to 1.

Testing for Classification

Testing on ImageNet with two GPUs

Testing on Cifar10 with two GPUs

Testing on Cifar100 with two GPUs

If you want to test on one GPU, set '--num-gpu' to 1.

Model Zoo

Classification

Acknowledgements

🎓 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages