Code for "Text Image Super-Resolution for Improved OCR in Real-Life Scenarios using Swin Transformers"

Step 0 - Automatic Setup

For the initial setup (downloading weights and test data) run

bash prepare.bash

and then start the benchmark with

bash benchmark.bash

If any of the script fails, follow the next steps. Otherwise you are done :).

Step 1 - Downloading

Download the pretrained weights for Aster, MORAN and CRNN:

Aster: https://github.com/ayumiymk/aster.pytorch  
MORAN:  https://github.com/Canjie-Luo/MORAN_v2  
CRNN: https://github.com/meijieru/crnn.pytorch

Place them into the folder model_zoo and rename them to aster.pth.tar, moran.pth and crnn.pth.

Download the Textzoom dataset:

https://github.com/JasonBoy1/TextZoom

Change the paths from extract_images.py to match the ones of the test folders you just downloaded.

test_paths = {
    'easy': 'textzoom/test/easy', #To be changed
    'medium': 'textzoom/test/medium', #To be changed
    'hard': 'textzoom/test/hard' #To be changed
}

Download our pretrained models for Phase 2 and Phase 3 from here:

https://drive.google.com/drive/folders/14UggkVJH3RPQwF-B_mj0bHEtjoloP3if?usp=sharing

Place them into the model_zoo folder.

Step 2 - Image Extraction

Run our image extraction script to extract LR and HR images:

python extract_images.py

Step 3 - Evaluate

Run the evaluation scripts for image quality and text recognition accuracy:

bash demo_image_quality.bash
bash demo_text_recognition.bash

References

Our Code is built on top of the Super Resolution repository KAIR: https://github.com/cszn/KAIR

@inproceedings{liang2021swinir,
title={SwinIR: Image Restoration Using Swin Transformer},
author={Liang, Jingyun and Cao, Jiezhang and Sun, Guolei and Zhang, Kai and Van Gool, Luc and Timofte, Radu},
booktitle={IEEE International Conference on Computer Vision Workshops},
pages={1833--1844},
year={2021}
}

@article{bshi2018aster,
  author  = {Baoguang Shi and
               Mingkun Yang and
               Xinggang Wang and
               Pengyuan Lyu and
               Cong Yao and
               Xiang Bai},
  title   = {ASTER: An Attentional Scene Text Recognizer with Flexible Rectification},
  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  volume  = {}, 
  number  = {}, 
  pages   = {1-1},
  year    = {2018}, 
}

@article{cluo2019moran,
  author    = {Canjie Luo and Lianwen Jin and Zenghui Sun},
  title     = {MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition},
  journal   = {Pattern Recognition}, 
  volume    = {90}, 
  pages     = {109--118},
  year      = {2019},
  publisher = {Elsevier}
}

@article{shi2016end,
  title={An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition},
  author={Shi, Baoguang and Bai, Xiang and Yao, Cong},
  journal={IEEE transactions on pattern analysis and machine intelligence},
  volume={39},
  number={11},
  pages={2298--2304},
  year={2016},
  publisher={IEEE}
}

@inproceedings{wang2020scene,
  title={Scene text image super-resolution in the wild},
  author={Wang, Wenjia and Xie, Enze and Liu, Xuebo and Wang, Wenhai and Liang, Ding and Shen, Chunhua and Bai, Xiang},
  booktitle={European Conference on Computer Vision},
  pages={650--666},
  year={2020},
  organization={Springer}
}

The aster, moran and crnn models are copied from their respective github repo:

Aster: https://github.com/ayumiymk/aster.pytorch  
MORAN:  https://github.com/Canjie-Luo/MORAN_v2  
CRNN: https://github.com/meijieru/crnn.pytorch

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
crnn_original		crnn_original
data		data
docs		docs
kernels		kernels
model_zoo		model_zoo
models		models
options		options
scripts		scripts
superresolution		superresolution
utils		utils
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
benchmark.bash		benchmark.bash
demo.py		demo.py
demo_image_quality.bash		demo_image_quality.bash
demo_text_recognition.bash		demo_text_recognition.bash
download_models.bash		download_models.bash
download_textzoom.bash		download_textzoom.bash
parse_psnrs.py		parse_psnrs.py
phase2-crnn.log		phase2-crnn.log
phase2-difficult-psnr.log		phase2-difficult-psnr.log
phase2.log		phase2.log
phase3-crnn.log		phase3-crnn.log
phase3-difficult-psnr.log		phase3-difficult-psnr.log
phase3.log		phase3.log
prepare.bash		prepare.bash
requirement.txt		requirement.txt
test.bmp		test.bmp
test.png		test.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code for "Text Image Super-Resolution for Improved OCR in Real-Life Scenarios using Swin Transformers"

Step 0 - Automatic Setup

Step 1 - Downloading

Step 2 - Image Extraction

Step 3 - Evaluate

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Code for "Text Image Super-Resolution for Improved OCR in Real-Life Scenarios using Swin Transformers"

Step 0 - Automatic Setup

Step 1 - Downloading

Step 2 - Image Extraction

Step 3 - Evaluate

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages