Text2Gloss: Gloss Translation using mBART

This project implements a gloss translation system using the mBART model. It is designed to translate spoken or written text into corresponding ASL gloss sequences, which are used in sign language modeling and generation systems.

📂 Project Structure

text2gloss/
├── data/                 # Input CSV file (gloss.csv)
├── models/               # mBART model loader
├── t2g_datasets/             # Custom dataset class
├── training/             # Training loop
├── evaluation/           # Evaluation using BLEU and ROUGE
├── utils/                # Configs and helpers
├── checkpoints/          # Saved models (.pkl)
├── main.py               # Entry point for training and evaluation
├── requirements.txt      # Python dependencies
└── README.md             # Project documentation

Installation

Make sure you have Python 3.8+. Then run:

git clone https://github.com/abdullaharifx/text2gloss.git
cd text2gloss

# (Optional) Create a virtual environment
python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows

pip install -r requirements.txt

Usage

1. Prepare your dataset

Place a gloss.csv file inside the data/ directory with the following columns:

SENTENCE	GLOSSES

2. Run training and evaluation

python main.py

This will:

Load and split the dataset
Fine-tune mBART for gloss translation
Evaluate using BLEU-4 and ROUGE
Save checkpoints to checkpoints/

Results

The model uses Adafactor optimizer with mBART-large-50, trained for 5 epochs. Evaluation scores (BLEU-4 and ROUGE-L) are printed after training.

Reference

If you use this work, please consider citing the following paper:

@article{zuo2024spoken2sign,
  title={A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars},
  author={Zuo, Ronglai and Wei, Fangyun and Chen, Zenggui and Mak, Brian and Yang, Jiaolong and Tong, Xin},
  journal={arXiv preprint arXiv:2401.04730},
  year={2024},
  note={Accepted at ECCV 2024},
  url={https://arxiv.org/abs/2401.04730}
}

Paper Link

Acknowledgements

HuggingFace Transformers
Evaluate Library (BLEU, ROUGE)
Torch & PyTorch Ecosystem

Contact

Feel free to open an issue or contact the maintainer: [(business.abdullah.arif@gmail.com)]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text2Gloss: Gloss Translation using mBART

📂 Project Structure

Installation

Usage

1. Prepare your dataset

2. Run training and evaluation

Results

Reference

Acknowledgements

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
checkpoints		checkpoints
data		data
evaluation		evaluation
models		models
t2g_datasets		t2g_datasets
training		training
utils		utils
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Text2Gloss: Gloss Translation using mBART

📂 Project Structure

Installation

Usage

1. Prepare your dataset

2. Run training and evaluation

Results

Reference

Acknowledgements

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages