Skip to content

acrlife/HTMA-CL

Repository files navigation

📖 HTMA-CL: Efficient Hierarchical Tokenization with Multiscale Attention for Compressed Learning

network

🔧 Dependencies and Installation

  • Python >= 3.8 (Recommend to use Anaconda or Miniconda)
  • PyTorch >= 1.12
  • timm >= 0.9.16
  • OpenCV >= 4.8.0
  • MMSegmentation >= 1.2.2 (For semantic segmentation task)
  • At least two RTX3090 GPUs are required.

Installation

  1. Clone repo

    git clone https://github.com/acrlife/HTMA-CL.git
    cd HTMA-CL-main
  2. Install dependent packages

    conda install pytorch=1.12.0 torchvision==0.13.0 cudatoolkit=11.3 -c pytorch -y
    pip install opencv-python
    pip install timm == 0.9.16     

Training for Classification

  1. Prepare the training data of ImageNet1K
  2. Download the pre-trained checkpoints of our backbone on ImageNet1K, 提取码: a111.

Training on ImageNet with two GPUs(Change the --data and --transfer-model to your own, and modify the following commands in the same way.)

torchrun --nnodes=1 --nproc_per_node=2 train_on_imagenet.py --data '../imagenet' --model htma_14 --epochs 60 -b 80 -j 8 --blocksize 16 --rat 0.1 --transfer-learning True --transfer-model '../checkpoint/pretrained_weight.pth' --lr 1e-3 --warmup-epochs 5 --warmup-lr 1e-5 --min-lr 2e-4 --weight-decay 5e-4 --amp --img-size 384

Training on Cifar100 with two GPUs

python train_on_cifar.py --model htma_14 --dataset cifar100 --data ../data --lr 0.001 --b 128 --img-size 384 --blocksize 16 --rat 0.1 --transfer-model ../checkpoint/pretrained_weight.pth --num-gpu 2

Training on Cifar10 with two GPUs

python train_on_cifar.py --model htma_14 --dataset cifar10 --data ../data --lr 0.001 --b 128 --img-size 384 --blocksize 16 --rat 0.1 --transfer-model ../checkpoint/pretrained_weight.pth --num-gpu 2

If you want to train on one GPU, set '--num-gpu' to 1.

Testing for Classification

You can download the pre-trained checkpoints from our model zoo.

Testing on ImageNet with two GPUs

python val_imagenet.py --model htma_14 --data ../imagenet --img-size 384 -b 128 --blocksize 16 --rat 0.1 --eval_checkpoint ../checkpoint/imagenet1k@384_r10.pth --num-gpu 2

Testing on Cifar10 with two GPUs

python val_cifar.py --model htma_14 --img-size 384 --dataset cifar10 --data ../data --b 128 --blocksize 16 --rat 0.10 --eval_checkpoint ../checkpoint/cifar10/cifar10_384_r0.1_97.75.pth --num-gpu 2

Testing on Cifar100 with two GPUs

python val_cifar.py --model htma_14 --img-size 384 --dataset cifar100 --data ../data --b 128 --blocksize 16 --rat 0.10 --eval_checkpoint ../checkpoint/cifar100/cifar100_384_r0.1_86.68.pth --num-gpu 2

If you want to test on one GPU, set '--num-gpu' to 1.

Model Zoo

Classification

Mode Download link
Pre-trained Backbone URL, 提取码: a111
ImageNet classification (ratio={0.1, 0.05, 0.025, 0.01}) URL, 提取码: a111
Cifar10 classification (ratio={0.25, 0.1, 0.018}) URL, 提取码: a111
Cifar100 classification (ratio={0.25, 0.1, 0.018}) URL, 提取码: a111

Acknowledgements

This project is based on TransCL(paper, code), T2T-ViT(paper, code), timm(code), ml-cvnets(code) and MMSegmentation(code). Thanks for their wonderful works.

🎓 Citation

If you find the code helpful in your research or work, please cite the following paper:

@article{jing2026htma-cl,
author = {Jing, Yanhao and Wu, Xiangjun and You, Datao and Wang, Hui and Hu, Zhe and Kan, Haibin and Kurths, J\"{u}rgen},
title = {HTMA-CL: A Hierarchical Tokenization and Multiscale Attention Framework for Compressive Domain Multimedia Inference},
year = {2026},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
issn = {1551-6857},
url = {https://doi.org/10.1145/3820057},
doi = {10.1145/3820057},
journal = {ACM Trans. Multimedia Comput. Commun. Appl.},
}

About

The official repo of HTMA-CL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages