This project is part of KAIST Spring 2024's course project of EE474: Introduction to Multimedia.
ALACen is a pipeline for Automatic Language-Level Adjustment for Video Censorship that uses deep-learning techniques to censor videos containing violent speeches while preserving immersion. ALACen consists of four stages: Speech Recognition, Paraphrase Generation, Text-to-Speech Synthesis, and Lip Synchronization.
The instructions below assume that you already have Conda installed. If not so, follow the Conda installation guide here before you proceed.
- Create a new Conda environment with Python 3.9 and activate it. For example,
conda create -n alacen python=3.9 && conda activate alacen
- Install Mamba using the following command. You need to restart your terminal after the installation finishes. We need Mamba because installing the dependencies with conda through the conda-forge channel often hangs. Note: You can skip this step if you already have Mamba installed.
bash install_mamba.sh
- Run the following command to install the dependencies and download pre-trained models. If it fails with a connection error, try running it again. Note: You may have to activate your environment first before running the following command.
bash setup.sh
We provide four options for running ALACen.
- Execute the Python module. Replace
<path-to-your-video>with the correct path.
python -m src.alacen -v --video <path-to-your-video> --num-gpus 3 --device cuda:3
-
Run the
run.ipynbfile. This gives you an interactive execution of ALACen. Put your configuration parameters in the Configuration cell and run all the cells. If you encounter the prompt saying files already exist, try removing those files and rerun the cell. -
Run the
evaluate.ipynbfile. This gives you the feel of how ALACen's output videos for the user study were generated. -
Run the Gradio demo application with the following command. Then, you can access the application with your browser by visiting the specified URL.
python app.py -v -s --num-gpus 3 --device cuda:3
Note that the current version runs on 4 NVIDIA GeForce GTX 1070 GPUs. If you have fewer but larger GPUs, you can set the number of GPUs and default device in the command line arguments accordingly.
Following is the project structure.
ALACen/
├─install_mamba.sh # Scripts for installing Mamba
├─app.py # Gradio demo app
├─README.md # README file
├─run.ipynb # Notebook for running ALACen
├─src/ # Source directory
│ ├─datasets/ # Python scripts for constructing the Violent Speech dataset
│ └─alacen/ # ALACen implementation
│ ├─asr/ # Speech Recognition
│ ├─config.py # Configuration file
│ ├─tts/ # Text-to-Speech Synthesis
│ ├─lipsync/ # Lip Synchronization
│ ├─paraphrase/ # Paraphrase Generation
│ ├─__main__.py # Main ALACen script
│ ├─alacen.py # ALACen class
│ └─... # Other helper and utility functions
├─setup.sh # Script for setting up the environment
├─demo/ # Demo videos
├─assets/ # Images and other assets
├─datasets/ # Datasets
│ ├─violent_speech_list.txt # Extracted violent speeches
│ └─violent_speech_dataset.json # Violent Speech Dataset
├─requirements.txt # Dependencies
├─finetune_lm.ipynb # Notebook for fine-tuning the LM
└─evaluation.ipynb # Notebook for generating videos for the user study
This repository contains code from the diff2lip and VoiceCraft repositories. Visit them for more detail.
