diff --git a/README.md b/README.md index f3200dcc..35a04928 100644 --- a/README.md +++ b/README.md @@ -1,35 +1,53 @@ [![Build Status](https://travis-ci.com/PurdueCAM2Project/Embedded2.svg?branch=master)](https://travis-ci.com/PurdueCAM2Project/Embedded2) -# Embedded Computer Vision 2 -System is used to detect usage of Personal Protection Equipment (PPE), specifically goggles, in labs that require them. System is ran in real time on Jetson Nano and uses a Rasspberry Pi camera to record footage in the lab. To ensure individual privacy is protected, system obfuscates faces after detection and classification. Images are stored in a remote storage drive and image metadata are stored on a SQL database. +# Embedded Computer Vision 2020 +The system is used to detect the usage of Personal Protection Equipment (PPE), specifically goggles, in labs that require them. The system is run in real time on Jetson Nano and uses a Rasspberry Pi camera to record footage in the lab. To ensure individual privacy is protected, the system obfuscates faces after detection and classification. Images are stored in a remote storage drive and image metadata are stored on a SQL database. -#### Features +### Features: * Retinaface based SSD performs face detection * CNN performs classification of detected faces to determine if PPE is being used * Faces are encrypted using AES * Image metadata is stored on a SQL database server * Images are transfered to remote computer using SFTP +* Runs on Jetson Nano in real time + # Table of Contents -- [Description](#Embedded-Computer-Vision-2) -- [Table of Contents](#Table-of-Contents) - [Installation](#Installation) - [Usage](#Contributing) - [Credits](#Credits) +- [Builds](#Builds) - [License](#License) -# Installation + +## Installation 1. Clone the project and enter the folder ```shell $ git clone https://github.com/PurdueCAM2Project/Embedded2.git $ cd Embedded2 ``` -2. The classiifier model (.pth file) can be found on [Drive](https://drive.google.com/drive/u/1/folders/1ZeKVygo-RyIDL_EnxeYJR8tk-xqzgi3Z). Downloadand place it in the ```Embedded2/src/jetson``` directory. -3. There is a requirement.txt file with all the necessary dependencies. We, however, recommend using Conda for this project. Once you have conda installed, run the following command to setup the enviroment with necessary dependicies. +2. The classifier model (.pth file) can be found on [Drive](https://drive.google.com/drive/folders/1QfS7YiuCxK-K93dnEYIMoeHAU65Cs14n). Download and place it in the ```Embedded2/src/jetson``` directory. +3. There is a requirement.txt file with all the necessary dependencies. We, however, recommend using Conda for this project. Once you have conda installed, run the following command to setup the enviroment with necessary dependencies. ```shell $ conda env create -f environment.yml ``` 4. Add the Embedded2 folder to PYTHONPATH by adding the following line in your .bashrc file: ```export PYTHONPATH=/path/Embedded2``` -# Usage -# Contributing -# Credits -# License +## Usage: + +1. Make sure that the image folder is in Pytorch [Imagefolder](https://pytorch.org/docs/stable/torchvision/datasets.html?highlight=imagefolder#torchvision.datasets.ImageFolder) structure. + +2. Run the following script. + +`scripts/goggle_classifier.py --directory=path/to/imagefolder` +`scripts/face_extractor.py --trained_model=path/to/_model.pth --classifier=path/to/trained_classifier.pth --cuda` +* goggle_classifier.py trains our goggle classifier. The model is saved into a .pth file that is loaded as the trained_model of face_extractor.py. +* face_extractor.py detects the face and classifies whether the person is wearing goggles, glasses, or neither. +* We have been using ssd300_WIDER_100455.pth as the SSD model. The classifier model will be any other .pth file stored on the [Drive](https://drive.google.com/drive/folders/1QfS7YiuCxK-K93dnEYIMoeHAU65Cs14n). +* Only include --cuda with face_detector if you have a GPU + + 3. The image is sent to one of the three types of detector: blazeface, retinaface or ssd. Make sure that cuda is enabled and calssifier is activiated. The encrypted images are outputted after detection and classification. + +## Credits: +* [Crontabs](https://github.com/robdmc/crontabs) + +## Builds: +* Travis CI diff --git a/scripts/README.md b/scripts/README.md new file mode 100644 index 00000000..8c617206 --- /dev/null +++ b/scripts/README.md @@ -0,0 +1,43 @@ +Description of files in the 'scripts' folder and the functions used in the files + +# automatic_notification.py +The file uses Cron.schedule() to run the send_email function every day. +#### send_email +The function first retrieves the current timestamp. Then, it forms an email message consisting of information about the sender and the receiver, the email's subject, body messages and timestamp. It connects to a server in order to send the mail. After the mail is sent, the server session is quit. + +# face_extractor.py +The file parses an argument of input directory, output directory, trained model, images, rate and horizontal orientation. It calls get_images or get_Videos to get the files. Then, it calls crop_faces_from_images or crop_faces_from_videos to crop and save the face images. +#### get_images +The function gets filenames of images from the input directory and returns a list of the image filenames. +#### get_Videos +The function gets filenames of videos from the input directory and returns a list of the video filenames. +#### crop_and_save_img +The function runs the frame through FaceDetector to create a bounding box around the face and crop the image. Then, it saves the cropped face image. +#### crop_faces_from_images +The function iterates through the image files and crops and saves face images. +#### crop_faces_from_videos +The function first iterates through the video files. For each video, the function iterates through the video frames and crops and saves face images. If the video is shot horizontally, the function flips it so that it's in the right orientation. + +# goggle_classifier.py +The file first parses the argument for training the Mobilenet classifier. It then initializes the TensorBoard writer. It either loads a pretrained model or calls get_model to use a pretrained Mobilenet model with layers frozen. Then, it performs training and validation augmentations and then outputs the results from the training. +### MapDataset +This class contains custom dataset for applying different transforms to training and validation data. It consists o __init__, __getitem__ and __len__ functions. +#### __init__ +This function initializes variables regarding the dataset. +#### __getitem__ +This function gets the item and returns the mapping of the item in the dataset. +#### __len__ +This function returns the number of elements in the dataset. + + +#### classifier_transforms +This is a dictionary of data augmentation options used to train and validate the classifier. + +#### get_model +This function first initializes Mobilenet and freezes relevant layers. Then, it returns the pretrained Mobilenet model with the relevant layers frozen. +#### load_data +This function first loads the image data from the data location specified. Then, it uses MapDataset to perform data augmentations, creating variables regarding 'train' and 'val'. Fianlly, it returns these new variables: 'train' and 'val' Dataloader, sizes of 'train' and 'val' datasets and the names of dataset classes +#### train_model +This function first initializes hyperparemeters used to train the model. For each epoch, it trains and validates the data. As it iterates through the data, it goes through forward propagation unless the phase is 'train'; in this case, it will go through backward propagation. It prints the loss and accuracy of train or val. There are checkpoints every 10 epochs; these checkpoints help us make the comparison among the trained model and the overfit and underfit models. Finally, it returns the trained model. +#### get_metrics +This function prints statistics from the final epoch of training. diff --git a/src/jetson/models/Retinaface/README.md b/src/jetson/models/Retinaface/README.md new file mode 100644 index 00000000..17d28453 --- /dev/null +++ b/src/jetson/models/Retinaface/README.md @@ -0,0 +1,89 @@ +## net.py +This file describes functions that return batch normalized convolutional layers and defines classes Single Stage Headless face detector, Feature Pyramid Network and MobileNetV1. +#### conv_bn +This funcion returns one layer of batch normalized convolution layer (filter=3x3) applying relu activation. +#### conv_bn_no_relu +This function returns one layer of batch normalized convolution layer (filter=3x3) without applying relu activation. +#### conv_bn1X1 +This function returns one layer of batch normalized convolution layer (filter=1x1) applying relu activation. +#### conv_dw +This function returns layer of batch normalized depthwise convolution layer applying relu activation. +#### SSH +This class defines single stage headless face detector. + +The __init__ function initializes batch normalized convolutional layers using the number of input and output channels, the stride and the leakiness. + +The forward function receives tensor outputted by backbone. Using this input, it returns output tensor after applying network layers(batch normalized convolutional) and operations. +#### FPN +This class defines feature pyramid network. + +The __init__ function initializes batch normalized convolutional layers using the list of input channels, the number of output channels, the stride and the leakiness. + +The forward function first initalizes 3 different batch normalized convolutional layers(output1, output2, output3); each initalized layer uses respective input channel from the input channel list. Then, output2 is updated, involving the interpolation of output3 and the merge of output2. Afterward, output1 is updated, involving the interpolation of output2 and the merge of output1. Finally, the forward function returns output1, output2 and output3. +#### MobileNetV1 +This class defines MobileNetV1 model used as backbone for face detection. + +The __init__ function first initalizes different stages of sequential batch normalized convolution layers and batch normalized depthwise convolution layer. It also initializes operational functions. + +The forward function receives input tensor x: x is either a single image tensor or a batch of them. The forward function applied the stagges of network layers and performs operations. Finally, it returns the output tensor after passing through MobileNetV1 backbone. +## retinaface.py +This file describes the RetinaFace model and the classes and functions used to operate on the facial images. + +#### ClassHead +This class describes functions for the classification of the image. + +The __init__ function adds layers on top of feature extractor for classification. + +The forward function receives tensor from the face extractor and returns reshaped tensor after passing it through a 1x1 conv layer. +#### BboxHead +This class describes functions for creating a bounding box around the face and calculating face coordinates. + +The __init__ function adds layers on top of feature extractor for finding face coordinates. + +The forward function receives tensor from the face extractor and returns reshaped output tensor after passing it through a 1x1 conv layer. +#### LandmarkHead +This class describes functions for calculating face landmark coordinates. + +The __init__ function adds layers on top of feature extractor for finding face landmark coordinates. + +The forward function receives tensor from the face extractor and returns reshaped output tensor after passing it through a 1x1 conv layer. +#### RetinaFace +This class defines the RetinaFace model. + +The __init__ function initalizes variables and configurations used for training. + +The _make_class_head function adds layer on top of retinaface for classification. + +The _make_bbox_head function adds layer on top of retinaface for outputing bounding box coordinates. + +The _make_landmark_head function adds layer on top of retinaface for outputing facial landmark coordinates. + +The forward function first receives the input image(s) and passes it(them) through FPN (feature pyramid network). Then, it operates on the facial images with classes SSH (single stage headless face detector), ClassHead, BboxHead and LandmarkHead. Finally, it returns bounding box and landmark coordinates and class confidences. + +#### load_model +This function returns trained model loaded to desired device. + +# data +This folder consists of __init__.py and config.py. + +__init__.py imports the configurations from the models.Retinaface class. + +config.py defines a library of configurations, including inference configuration and MobileNetV1 and Resnet50 backbone configurations for training. + +# layers + +## functions +### prior_box.py +This file defines a class PriorBox. + +The __init__ function initializes the variables of priorbox according to the configuration of the training models. Furthermore, for each source feature map, it calculates the priorbox coordinates. + +The forward function first loops through the feature maps to compute the anchor using dense face localizations, min_size and image_size. Then, it passes the anchors to torch land and returns the forward pass of the prior box tensor. + +## modules +### multibox_loss.py +This file defines a class MultiBoxLoss. + +The __init__ function initializes variables related to comparing ground truth boxes and priorboxes. + +The forward function computes the SSD weighted loss: the calculations include confidence target indices, localization target and hard negative mining.