This repo contains the scripts that were created for the course Machine Translation Advanced Topics.
Most of the scripts were vibe-coded with ChatGPT with lots of testing and back-and-forth conversation.
You can find the slides of each chapter in the slides directory.
The course starts with an introduction to MT and a description of what happened before the NMT paradigm.
- Github: https://github.com/VincentCCL/MTAT/blob/main/notebooks/MTAT26_DataPreparation.ipynb
- Colab: Data Preparation
Translation through Python with commercial engines and evaluation with most common metrics
-
Github:
- https://github.com/VincentCCL/MTAT/blob/main/notebooks/MTAT26_Translation%26Evaluation.ipynb
- https://github.com/VincentCCL/MTAT/blob/main/notebooks/MTAT2026_BLEURT.ipynb
- https://github.com/VincentCCL/MTAT/blob/main/notebooks/MTAT26_COMET.ipynb
(Note that these notebooks are rendered in Github as Invalid Notebooks, but they do run in Google Colab)
-
Colab:
Before we start on MT, we explain RNN language modeling with a toy example and later expand it to a larger language model.
- Github:
- Colab / Kaggle:
- Github:
- Kaggle:
-
Github:
-
Kaggle:
There are no hands-on sessions for the conclusions.
We are in the process of integrating and testing all the different transformer encoder-decoder scripts from chapters 6 and 7 into a single script. Current version (unfinished) is in [code/mtat.py]