Skip to content

ctaguchi/LSLB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Languages Still Left Behind: Toward a Better Multilingual Machine Translation Benchmark

This repository contains the experimental code used in the EMNLP 2025 Paper "Language Still Left Behind: Toward a Better Multilingual Machine Translation Benchmark". This repository roughly consists of two sections:

  • assessment: Data and code related to the manual re-evaluation of FLORES+.
  • jinghpaw-mt: Data and code related to the Jinghpaw machine translation experiment.

Data use

The provided Jinghpaw data in this repository, except for the FLORES+ data, is under the CC-BY-SA-NC (Creative Commons Attribution Share-Alike Non-Commercial) license. If you are using the Jinghpaw machine translation data released in this repository, please cite the following:

@book{kurabe-2020-jinghpaw-reader,
    author = {Kurabe, Keita},
    title = {Jinghpaw Reader},
    publisher = {The Research Institute for Languages and Cultures of Asia and Africa, Tokyo University of Foreign Studies},
    year = {2020}
}
@book{kurabe-2020-jinghpaw-dictionary,
    author = {Kurabe, Keita},
    title = {A Dictionary of {J}inghpaw Usage},
    publisher = "The Research Institute for Languages and Cultures of Asia and Africa, Tokyo University of Foreign Studies",
    year = "2020"
}
@book{kurabe-2020-jinghpaw-grammar,
    author = {Kurabe, Keita},
    title = {An Introduction to {J}inghpaw Grammar},
    publisher = "The Research Institute for Languages and Cultures of Asia and Africa, Tokyo University of Foreign Studies",
    year = "2020"
}
@misc{kurabe-2013-kachin-folktales,
    title={Kachin folktales told in {J}inghpaw},
    doi={https://dx.doi.org/10.4225/72/59888e8ab2122},
    year={2013},
    author={Kurabe, Keita}
}
@misc{kurabe-2017-kachin-culture-history,
    author={Kurabe, Keita},
    year={2017},
    title={Kachin culture and history told in {J}inghpaw},
    doi={https://dx.doi.org/10.26278/5fa1707c5e77c}
}

If you are using the FLORES+ data, please follow the original license given by FLORES+ (https://huggingface.co/datasets/openlanguagedata/flores_plus) and cite them accordingly.

Citation

To be added.

Acknowledgments

This material is based upon work supported by the National Science Foundation (NSF) under grant BCS-2109709 and IIS-2137396 and by the Japan Society for the Promotion of Science (JSPS) under KAKENHI grant JP24K03887.

About

Code repository for "Language Still Left Behind: Toward a Better Multilingual Machine Translation Benchmark".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors