Sign Language Recognition Sentiment Enhanced

By Martina Colombari, Omayma Moussadek and Nicolò Rossi.

With this paper we would like to explore many different ways to enhance Sign Language Recognition, also known as SLR. We built the foundation of this project on an already working SLR system, we chose this particular implementation for its experimental approach with skeleton extraction, we’ll talk more about this later. So to enhance the prediction we work on 3 different solution:

Emotion enhancement: facial expression is often used in Sign language to convey better words, so words with emotional meaning after comes with the respective facial expression from the user. Hence detecting it can provide a better context to predict a word more accurately.
Skeleton rotation: as introduced earlier the foundation used for this project use a skeleton extraction to feed a Graph Convolutional Network, for this reason we would like to change the orientation of the skeleton in a manner that will always face the camera, by doing this we would like to make the system resilient to user orientation changes.
Similar Sign retrieval: from the high-level feature representation of the SL- GCN branch, we extract the embedding space, in which we are able to find similar video. This similarity provides more information to work in the pre- diction, hence could improve accuracy. In the following paragraphs we’ll explore how we started, how we implemented each component and we’ll talk about our results.

This repo contains the official code of Skeleton Aware Multi-modal Sign Language Recognition (SAM-SLR) that ranked 1st in CVPR 2021 Challenge: Looking at People Large Scale Signer Independent Isolated Sign Language Recognition and our updated version for emotion enahcement.

For more information you can read the related paper.

CVPR21 workshop paper / arXiv preprint / YouTube

original schema and results:

News

[2021/10/14] The extended SAM-SLR-v2 paper is available on Arxiv.

[2021/09/22] Processed skeleton data for AUTSL dataset is released here.

[2021/08/26] Results on SLR500 and WLASL2000 datasets are reported.

[2021/06/25] Workshop presentation will be available on YouTube.

[2021/04/10] Our workshop paper has been accepted. Citation info updated.

[2021/03/24] A preprint version of our paper is released here.

[2021/03/20] Our work has been verified and announced by the organizers as the 1st place winner of the challenge!

[2021/03/15] The code is released to public on GitHub.

[2021/03/11] Our team (smilelab2021) ranked 1st in both tracks and here are the links to the leaderboards:

RGB / RGBD

Data Preparation

Download AUTSL Dataset.

We processed the dataset into six modalities in total: skeleton, skeleton features, rgb frames, flow color, hha and flow depth.

Please put original train, val, test videos in data folder as

    data
    ├── train
    │   ├── signer0_sample1_color.mp4
    │   ├── signer0_sample1_depth.mp4
    │   ├── signer0_sample2_color.mp4
    │   ├── signer0_sample2_depth.mp4
    │   └── ...
    ├── val
    │   └── ...
    └── test
        └── ...

Follow the data-prepare/readme.md to process the data.
Use TPose/data_process to extract wholebody pose features.
Turkish and English meanings of the class IDs can be found here.

Requirements and Docker Image

The code is written using Anaconda Python >= 3.6 and Pytorch 1.7 with OpenCV.

Detailed enviroment requirment can be found in requirement.txt in each code folder.

For convenience, we provide a Nvidia docker image to run our code.

Download Docker Image

Pretrained Models

We provide pretrained models for all modalities to reproduce our submitted results. Please download them at and put them into corresponding folders.

Download Pretrained Models

Usage

Reproducing the Results Submitted to CVPR21 Challenge

To test our pretrained model, please put them under each code folders and run the test code as instructed below. To ensemble the tested results and reproduce our final submission. Please copy all the results .pkl files to ensemble/ and follow the instruction to ensemble our final outputs.

For a step-by-step instruction, please see reproduce.md.

Skeleton Keypoints

Skeleton modality can be trained, finetuned and tested using the code in SL-GCN/ folder. Please follow the SL-GCN/readme.md instruction to prepare skeleton data into four streams (joint, bone, joint_motion, bone motion).

Basic usage:

python main.py --config /path/to/config/file

To train, finetune and test our models, please change the config path to corresponding config files. Detailed instruction can be found in SL-GCN/readme.md

Skeleton Feature

For the skeleton feature, we propose a Separable Spatial-Temporal Convolution Network (SSTCN) to capture spatio-temporal information from those features.

Please follow the instruction in SSTCN/readme.txt to prepare the data, train and test the model.

RGB Frames

The RGB frames modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_clip.py

python Sign_Isolated_Conv3D_clip_finetune.py

python Sign_Isolated_Conv3D_clip_test.py

Detailed instruction can be found in Conv3D/readme.md

Optical Flow

The RGB optical flow modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_flow_clip.py

python Sign_Isolated_Conv3D_flow_clip_funtine.py

python Sign_Isolated_Conv3D_flow_clip_test.py

Detailed instruction can be found in Conv3D/readme.md

Depth HHA

The Depth HHA modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_hha_clip_mask.py

python Sign_Isolated_Conv3D_hha_clip_mask_finetune.py

python Sign_Isolated_Conv3D_hha_clip_mask_test.py

Detailed instruction can be found in Conv3D/readme.md

Depth Flow

The Depth Flow modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_depth_flow_clip.py

python Sign_Isolated_Conv3D_depth_flow_clip_finetune.py

python Sign_Isolated_Conv3D_depth_flow_clip_test.py

Detailed instruction can be found in Conv3D/readme.md

Model Ensemble

For both RGB and RGBD track, the tested results of all modalities need to be ensemble together to generate the final results.

For RGB track, we use the results from skeleton, skeleton feature, rgb, and flow color modalities to ensemble the final results.

a. Test the model using newly trained weights or provided pretrained weights.

b. Copy all the test results to ensemble folder and rename them as their modality names.

c. Ensemble SL-GCN results from joint, bone, joint motion, bone motion streams in gcn/ .
```
 python ensemble_wo_val.py; python ensemble_finetune.py
```
c. Copy test_gcn_w_val_finetune.pkl to ensemble/. Copy RGB, TPose and optical flow results to ensemble/. Ensemble final prediction.
```
 python ensemble_multimodal_rgb.py
```
Final predictions are saved in predictions.csv
For RGBD track, we use the results from skeleton, skeleton feature, rgb, flow color, hha and flow depth modalities to ensemble the final results. a. copy hha and flow depth modalities to ensemble/ folder, then
```
python ensemble_multimodal_rgb.py
```

To reproduce our results in CVPR21Challenge, we provide .pkl files to ensemble and obtain our final submitted predictions. Detailed instruction can be find in ensemble/readme.md

License

Licensed under the Creative Commons Zero v1.0 Universal license with the following exceptions:

The code is released for academic research use only. Commercial use is prohibited.
Published versions (changed or unchanged) must include a reference to the origin of the code.

Citation

If you find this project useful in your research, please cite our paper

% SAM-SLR
@inproceedings{jiang2021skeleton,
  title={Skeleton Aware Multi-modal Sign Language Recognition},
  author={Jiang, Songyao and Sun, Bin and Wang, Lichen and Bai, Yue and Li, Kunpeng and Fu, Yun},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  year={2021}
}

% SAM-SLR-v2
@article{jiang2021sign,
  title={Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble},
  author={Jiang, Songyao and Sun, Bin and Wang, Lichen and Bai, Yue and Li, Kunpeng and Fu, Yun},
  journal={arXiv preprint arXiv:2110.06161},
  year={2021}
}

Reference

https://github.com/Sun1992/SSTCN-for-SLR

https://github.com/jin-s13/COCO-WholeBody

https://github.com/open-mmlab/mmpose

https://github.com/0aqz0/SLR

https://github.com/kchengiva/DecoupleGCN-DropGraph

https://github.com/HRNet/HRNet-Human-Pose-Estimation

https://github.com/charlesCXK/Depth2HHA

Name		Name	Last commit message	Last commit date
Latest commit History 225 Commits
.vscode		.vscode
Conv3D		Conv3D
DAN_module		DAN_module
Depth-Anything @ 1d03336		Depth-Anything @ 1d03336
Depth-Anything_module/run_files		Depth-Anything_module/run_files
SL-GCN		SL-GCN
SSTCN		SSTCN
data-prepare @ 1f59849		data-prepare @ 1f59849
ensemble		ensemble
img		img
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CVPR21W_SAM-SLR.pdf		CVPR21W_SAM-SLR.pdf
Final_Paper.pdf		Final_Paper.pdf
LICENSE		LICENSE
h		h
readme.md		readme.md
reproduce.md		reproduce.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sign Language Recognition Sentiment Enhanced

original schema and results:

News

Table of Contents

Data Preparation

Requirements and Docker Image

Pretrained Models

Usage

Reproducing the Results Submitted to CVPR21 Challenge

Skeleton Keypoints

Skeleton Feature

RGB Frames

Optical Flow

Depth HHA

Depth Flow

Model Ensemble

License

Citation

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sign Language Recognition Sentiment Enhanced

original schema and results:

News

Table of Contents

Data Preparation

Requirements and Docker Image

Pretrained Models

Usage

Reproducing the Results Submitted to CVPR21 Challenge

Skeleton Keypoints

Skeleton Feature

RGB Frames

Optical Flow

Depth HHA

Depth Flow

Model Ensemble

License

Citation

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages