Skip to content

Koii2k3/EmojiStableDiffusion

Repository files navigation

Emoji Stable Diffusion

Emoji Stable Diffusion is a research project dedicated to the development of an Emoji dataset and the creation of a Stable Diffusion training pipeline enhanced by a Curriculum Learning strategy. The primary objective is to improve the quality of generated images despite limited data availability.

Table of Contents

Introduction

The Emoji Stable Diffusion project focuses on researching and developing a model that generates emoji-style images using Stable Diffusion. The project integrates a Curriculum Learning approach to enhance the training process and, consequently, improve the quality of generated images, even when constrained by a limited training dataset.

System Requirements

  • Programming Language: Python 3.11.9
  • Environment Management: Anaconda (or equivalent tools)
  • Hardware: GPU recommended for training procedures
  • Dependencies: All required packages are listed in the requirements.txt file

Installation

To set up a new environment and install the necessary dependencies, execute the following commands in your terminal:

conda env create -f environment.yml
conda activate esd_env

Project Structure

The repository is organized in the following structure, ensuring consistency and ease of management:

Emoji_SD/
├── config/
│   ├── eval.yaml  
│   ├── train.yaml  
│   └── visualize.yaml  
├── data/
│   ├── processed/
│   │   ├── resized_val_images/  
│   │   ├── train_images/  
│   │   ├── val_images/  
│   │   ├── train.csv  
│   │   └── val.csv  
│   └── raw/  
├── experiments/
│   └── 10042025_2024/
│       ├── gen_images/  
│       ├── best_model.pth  
│       ├── config.yaml  
│       ├── losses.png  
│       └── losses.txt  
├── notebooks/  
├── scripts/
│   ├── eval.sh  
│   └── train.sh  
├── src/
│   ├── data/  
│   ├── models/  
│   ├── training/  
│   ├── utils/  
│   └── __init__.py  
├── weights/  
├── .env  
├── .gitignore  
├── eval.py  
├── README.md  
├── requirements.txt  
├── train.py  
├── visualize.ipynb  
└── visualize.py  

Configuration and Training

The training and evaluation parameters are specified in YAML configuration files located in the config/ directory.

Training Configuration

An example configuration in train.yaml is provided below:

os:
    seed: 42

model:
    vae_id: "stabilityai/sd-vae-ft-mse"  # VAE model identifier on Hugging Face
    text_encoder_id: "openai/clip-vit-base-patch32"  # Text encoder model identifier on Hugging Face
    unet_dim: 256         # Dimensionality of the UNet blocks
    unet_heads: 8         # Number of heads in the multi-head attention mechanism
    step_dim: 128         # Dimensionality of the step representation
    context_dim: 512      # Dimensionality of the text embedding from the text encoder

dataset:
    train_csv_file: "data/processed/train.csv"      # Path to the training CSV file
    train_image_folder: "data/processed/train_images" # Directory containing training images
    val_csv_file: "data/processed/val.csv"            # Path to the validation CSV file
    val_image_folder: "data/processed/val_images"     # Directory containing validation images
    batch_size: 512                                  
    resolution: 64                                  # Image resolution after resizing

training:
    learning_rate: 1e-4
    eta_min: 1e-6               # Minimum learning rate for the Cosine Annealing Scheduler
    num_epochs: 500             # Total number of training epochs
    save_best: True             # Save weights of the best performing model
    save_after: 0.75            # Save model weights after reaching 75% of the total epochs
    weights_folder: "experiments/" 
    weight_name: "best_model.pth"
    experiments_folder: "experiments/"
    gen_images_folder_name: "gen_images"  # Directory for generated images used to compute FID score

Evaluation Configuration

An example configuration in eval.yaml is outlined below:

os:
    seed: 42

data:
    path_real_images: "data/processed/resized_val_images"  # Directory containing real validation images
    path_generated_images: "experiments/10042025_2024/gen_images"  # Directory containing images generated post-training

inference:
    dims: 2048
    batch_size: 128

Execution of Training and Evaluation Scripts

The provided scripts in the scripts/ directory facilitate the training and evaluation processes. Execute the following commands in your terminal:

bash scripts/train.sh  # Initiates model training
bash scripts/eval.sh   # Performs model evaluation
  • Training: Upon completion, the training process will generate:

    • The best model weights file (.pth)
    • A set of generated images for FID score assessment
    • A training configuration file and a loss-chart image along with a corresponding log file
  • Evaluation: Ensure the evaluation script is configured with the correct path to the generated images directory to display the FID score in the terminal output.

Testing

To verify that the configurations are properly set, modify the respective YAML configuration files under the config/ directory as needed and follow the execution guidelines under the Execution of Training and Evaluation Scripts section.

About

Try generating emoji image at:

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors