Code for "Text Image Super-Resolution for Improved OCR in Real-Life Scenarios using Swin Transformers"
For the initial setup (downloading weights and test data) run
bash prepare.bashand then start the benchmark with
bash benchmark.bashIf any of the script fails, follow the next steps. Otherwise you are done :).
Download the pretrained weights for Aster, MORAN and CRNN:
Aster: https://github.com/ayumiymk/aster.pytorch
MORAN: https://github.com/Canjie-Luo/MORAN_v2
CRNN: https://github.com/meijieru/crnn.pytorch
Place them into the folder model_zoo and rename them to aster.pth.tar, moran.pth and crnn.pth.
Download the Textzoom dataset:
https://github.com/JasonBoy1/TextZoom
Change the paths from extract_images.py to match the ones of the test folders you just downloaded.
test_paths = {
'easy': 'textzoom/test/easy', #To be changed
'medium': 'textzoom/test/medium', #To be changed
'hard': 'textzoom/test/hard' #To be changed
}Download our pretrained models for Phase 2 and Phase 3 from here:
https://drive.google.com/drive/folders/14UggkVJH3RPQwF-B_mj0bHEtjoloP3if?usp=sharing
Place them into the model_zoo folder.
Run our image extraction script to extract LR and HR images:
python extract_images.pyRun the evaluation scripts for image quality and text recognition accuracy:
bash demo_image_quality.bash
bash demo_text_recognition.bash
References
Our Code is built on top of the Super Resolution repository KAIR: https://github.com/cszn/KAIR
@inproceedings{liang2021swinir,
title={SwinIR: Image Restoration Using Swin Transformer},
author={Liang, Jingyun and Cao, Jiezhang and Sun, Guolei and Zhang, Kai and Van Gool, Luc and Timofte, Radu},
booktitle={IEEE International Conference on Computer Vision Workshops},
pages={1833--1844},
year={2021}
}
@article{bshi2018aster,
author = {Baoguang Shi and
Mingkun Yang and
Xinggang Wang and
Pengyuan Lyu and
Cong Yao and
Xiang Bai},
title = {ASTER: An Attentional Scene Text Recognizer with Flexible Rectification},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
volume = {},
number = {},
pages = {1-1},
year = {2018},
}
@article{cluo2019moran,
author = {Canjie Luo and Lianwen Jin and Zenghui Sun},
title = {MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition},
journal = {Pattern Recognition},
volume = {90},
pages = {109--118},
year = {2019},
publisher = {Elsevier}
}
@article{shi2016end,
title={An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition},
author={Shi, Baoguang and Bai, Xiang and Yao, Cong},
journal={IEEE transactions on pattern analysis and machine intelligence},
volume={39},
number={11},
pages={2298--2304},
year={2016},
publisher={IEEE}
}
@inproceedings{wang2020scene,
title={Scene text image super-resolution in the wild},
author={Wang, Wenjia and Xie, Enze and Liu, Xuebo and Wang, Wenhai and Liang, Ding and Shen, Chunhua and Bai, Xiang},
booktitle={European Conference on Computer Vision},
pages={650--666},
year={2020},
organization={Springer}
}The aster, moran and crnn models are copied from their respective github repo:
Aster: https://github.com/ayumiymk/aster.pytorch
MORAN: https://github.com/Canjie-Luo/MORAN_v2
CRNN: https://github.com/meijieru/crnn.pytorch