E-MoTChi: Emoji-Mandarin Translation via Computational Homophonic Intelligence

Overview

E-MoTChi is a tuned model for emoji-to-Traditional-Mandarin translation applications.

Key Features

Traditional Mandarin to Emoji (with the power of understanding homophonic)
Emoji to Traditional Mandarin (with the power of understanding homophonic)

Model

Huggingface Link

Dataset

Three different dataset under ./training_data:

1. 20% algo & 80% gpt
2. 50% algo & 50% gpt
3. 80% algo & 20% gpt

Two Competitor model

Get into mt5 and llama folder to read README.md

File usage

data

This contains all the data scraped from the web, including all the sentences we want to translate and characters-to-emoji.

data_processing

This directory contains all files used to generate the dataset. Including web scrapers, openai api calls and codes to generate char-to-emoji

evaluation

This directory contains scripts that is used to calculate the METEOR metric score and a script to aid human evaluation.

llama

This directory contains all the files used to train/inference the TaiwanLLM model. View llama/README.md for more details

mt5

This directory contains all the files used to train/inference the google mt5 large model. View mt5/README.md for more details

plot

This directory contains some training graphs used in our report

training dataset

This directory contain all the mixed training datasets and a testing file and scripts to generate the training datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

E-MoTChi: Emoji-Mandarin Translation via Computational Homophonic Intelligence

Overview

Key Features

Model

Dataset

Two Competitor model

File usage

data

data_processing

evaluation

llama

mt5

plot

training dataset

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
data		data
data_processing		data_processing
evaluation		evaluation
llama		llama
mt5		mt5
plot		plot
training_data		training_data
.gitignore		.gitignore
README.md		README.md
Report.pdf		Report.pdf
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

E-MoTChi: Emoji-Mandarin Translation via Computational Homophonic Intelligence

Overview

Key Features

Model

Dataset

Two Competitor model

File usage

data

data_processing

evaluation

llama

mt5

plot

training dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages