Emoji Stable Diffusion is a research project dedicated to the development of an Emoji dataset and the creation of a Stable Diffusion training pipeline enhanced by a Curriculum Learning strategy. The primary objective is to improve the quality of generated images despite limited data availability.
- Introduction
- System Requirements
- Installation
- Project Structure
- Configuration and Training
- Execution of Training and Evaluation Scripts
- Testing
The Emoji Stable Diffusion project focuses on researching and developing a model that generates emoji-style images using Stable Diffusion. The project integrates a Curriculum Learning approach to enhance the training process and, consequently, improve the quality of generated images, even when constrained by a limited training dataset.
- Programming Language: Python 3.11.9
- Environment Management: Anaconda (or equivalent tools)
- Hardware: GPU recommended for training procedures
- Dependencies: All required packages are listed in the requirements.txt file
To set up a new environment and install the necessary dependencies, execute the following commands in your terminal:
conda env create -f environment.yml
conda activate esd_envThe repository is organized in the following structure, ensuring consistency and ease of management:
Emoji_SD/
├── config/
│ ├── eval.yaml
│ ├── train.yaml
│ └── visualize.yaml
├── data/
│ ├── processed/
│ │ ├── resized_val_images/
│ │ ├── train_images/
│ │ ├── val_images/
│ │ ├── train.csv
│ │ └── val.csv
│ └── raw/
├── experiments/
│ └── 10042025_2024/
│ ├── gen_images/
│ ├── best_model.pth
│ ├── config.yaml
│ ├── losses.png
│ └── losses.txt
├── notebooks/
├── scripts/
│ ├── eval.sh
│ └── train.sh
├── src/
│ ├── data/
│ ├── models/
│ ├── training/
│ ├── utils/
│ └── __init__.py
├── weights/
├── .env
├── .gitignore
├── eval.py
├── README.md
├── requirements.txt
├── train.py
├── visualize.ipynb
└── visualize.py
The training and evaluation parameters are specified in YAML configuration files located in the config/ directory.
An example configuration in train.yaml is provided below:
os:
seed: 42
model:
vae_id: "stabilityai/sd-vae-ft-mse" # VAE model identifier on Hugging Face
text_encoder_id: "openai/clip-vit-base-patch32" # Text encoder model identifier on Hugging Face
unet_dim: 256 # Dimensionality of the UNet blocks
unet_heads: 8 # Number of heads in the multi-head attention mechanism
step_dim: 128 # Dimensionality of the step representation
context_dim: 512 # Dimensionality of the text embedding from the text encoder
dataset:
train_csv_file: "data/processed/train.csv" # Path to the training CSV file
train_image_folder: "data/processed/train_images" # Directory containing training images
val_csv_file: "data/processed/val.csv" # Path to the validation CSV file
val_image_folder: "data/processed/val_images" # Directory containing validation images
batch_size: 512
resolution: 64 # Image resolution after resizing
training:
learning_rate: 1e-4
eta_min: 1e-6 # Minimum learning rate for the Cosine Annealing Scheduler
num_epochs: 500 # Total number of training epochs
save_best: True # Save weights of the best performing model
save_after: 0.75 # Save model weights after reaching 75% of the total epochs
weights_folder: "experiments/"
weight_name: "best_model.pth"
experiments_folder: "experiments/"
gen_images_folder_name: "gen_images" # Directory for generated images used to compute FID scoreAn example configuration in eval.yaml is outlined below:
os:
seed: 42
data:
path_real_images: "data/processed/resized_val_images" # Directory containing real validation images
path_generated_images: "experiments/10042025_2024/gen_images" # Directory containing images generated post-training
inference:
dims: 2048
batch_size: 128The provided scripts in the scripts/ directory facilitate the training and evaluation processes. Execute the following commands in your terminal:
bash scripts/train.sh # Initiates model training
bash scripts/eval.sh # Performs model evaluation-
Training: Upon completion, the training process will generate:
- The best model weights file (.pth)
- A set of generated images for FID score assessment
- A training configuration file and a loss-chart image along with a corresponding log file
-
Evaluation: Ensure the evaluation script is configured with the correct path to the generated images directory to display the FID score in the terminal output.
To verify that the configurations are properly set, modify the respective YAML configuration files under the config/ directory as needed and follow the execution guidelines under the Execution of Training and Evaluation Scripts section.