This repository collects my machine learning, deep learning, and NLP coursework from FMLAB. It is organized as a learning portfolio: classical ML foundations, PyTorch deep learning practice, and selected Stanford CS224N-style NLP assignments.
Note: this README focuses on
Cs224n,Exercise, andHomework, which are the coursework sections intended for portfolio review.
- Implemented classical ML pipelines for preprocessing, classification, model selection, and evaluation.
- Built deep learning notebooks covering MLPs, CNNs, ResNet-style models, DenseNet, RNNs, GRUs, LSTMs, GANs, and data augmentation.
- Completed NLP assignments on word vectors, dependency parsing, neural machine translation, attention, GPT-style pretraining, fine-tuning, and RoPE positional embeddings.
- Practiced experiment tracking with TensorBoard-style runs, checkpoints, predictions, and evaluation outputs.
| Path | Contents | Main Skills |
|---|---|---|
Cs224n/ |
Four NLP assignments inspired by Stanford CS224N | word vectors, dependency parsing, NMT, attention, GPT, RoPE |
Exercise/ |
18 in-class / practice exercise folders | ML algorithms, PyTorch training, CNNs, RNNs, GANs, graph embeddings |
Homework/ |
18 homework folders | preprocessing, visualization, ML models, CNNs, sequence models, graph learning |
| Assignment | Focus | Representative Files |
|---|---|---|
| A1 | Word vectors and embedding exploration | exploring_word_vectors.ipynb |
| A2 | Neural transition-based dependency parsing | parser_model.py, parser_transitions.py |
| A3 | Neural machine translation with attention | nmt_model.py, model_embeddings.py |
| A4 | GPT-style pretraining/fine-tuning and RoPE | attention.py, run.py, trainer.py |
The Exercise directory contains practice notebooks and scripts across:
- Data preprocessing and exploratory analysis
- KNN, Naive Bayes, SVM, decision trees, random forests, and model selection
- Linear regression and neural network fundamentals
- PyTorch training loops, CNNs, ResNet, and hyperparameter tuning
- RNN, LSTM, image captioning, GANs, graph embeddings, and lab-style implementations
The Homework directory mirrors the course progression with independent assignments:
- Vietnamese news preprocessing and text classification
- Real estate data visualization and tabular ML workflows
- Decision tree, random forest, Naive Bayes, model selection, and image compression
- Neural networks, MLPs, LeNet, DenseNet, data augmentation, and ensembles
- GRU / bidirectional RNN practice, sequence modeling, DCGAN, and graph learning
Most work is notebook-based. Create an environment with common ML/DL packages, then open the notebooks from the relevant assignment folder.
python -m venv .venv
source .venv/bin/activate
pip install numpy pandas matplotlib scikit-learn jupyter torch torchvision tqdm tensorboard
jupyter notebookOn Windows PowerShell:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install numpy pandas matplotlib scikit-learn jupyter torch torchvision tqdm tensorboard
jupyter notebookSome CS224N assignments include their own environment files or requirements:
Dependency parser sanity checks:
cd Cs224n/a2/student-1
python parser_model.py --embedding
python parser_model.py --forwardNeural machine translation sanity checks:
cd Cs224n/a3/student
python sanity_check.py 1d
python sanity_check.py 1e
python sanity_check.py 1fGPT-style assignment entry point:
cd Cs224n/a4/student
python src/run.py pretrain vanilla wiki.txt --writing_params_path vanilla.model.params- The main value of this repository is in the implemented notebooks and model code, not in generated artifacts.
- Large datasets, checkpoints, TensorBoard logs, and prediction files should be treated as reproducible outputs.
.gitignoreis configured to avoid adding new generated training artifacts such as checkpoints, TensorBoard event files, and model parameter dumps.
Luu Hai Dang
Machine Learning / Deep Learning Coursework Portfolio