diff --git a/article-deep_belief_network/DBN_article.md b/article-deep_belief_network/DBN_article.md new file mode 100644 index 0000000..20b7dbf --- /dev/null +++ b/article-deep_belief_network/DBN_article.md @@ -0,0 +1,437 @@ +# Deep Belief Network + +   In the modern era of innovation and technology, Artificial Intelligence (AI) invaded most +transformative and captivating advancements of our life. Starting with predictions on data, +classifying things into categories, and ending with pictures and music generation, AI is just +everywhere. One of the most outstanding application of AI is creation (a.k.a. generation). +Today we will dive in to discover one of the generative networks: Deep Belief Network (DBN). + +## Content: +1. [Introduction](#introduction) +2. [What are Deep Belief Networks (DBNs)?](#what-are-deep-belief-networks-dbns) +3. [DBNs Overview](#dbns-overview) +4. [Math Base](#math-base) +5. [DBNs Training](#dbns-training) +6. [Use Cases](#use-cases) +7. [Drawbacks and limits](#drawbacks-and-limits) +8. [Conclusion](#conclusion) +9. [References](#references) + + +## Introduction + +   Before we get into more specifics of DBNs, let’s go briefly from +some general concepts for a better understanding. It is common +practice to divide Machine Learning models into discriminative +and generative ones [1]. As it might be concluded from the names, +discriminative models aim to separate data points into different +classes, while generative ones – to generate data points. +Generative models are usually used in unsupervised learning +problems as they are trained on inputs without their labels. +DBNs, that will be discussed today, are a type of deep learning +architecture that combine neural networks and unsupervised +learning. + +

+ Fast_Learning +
Source: https://medium.com/swlh/what-are-rbms-deep-belief-networks-and-why-are-they-important-to-deep-learning-491c7de8937a +

+ +## What are Deep Belief Networks (DBNs)? + +   Deep Belief Network is a deep learning stochastic architecture, +composed of layers of Restricted Boltzmann Machines (RBMs), which are trained +in an unsupervised manner [2]. \ +     A Restricted Boltzmann Machines is a generative unsupervised +model used for feature selection and feature reduction technique, +for dimensionality reduction, classification, regression, and +other tasks in Machine Learning and Deep Learning [3]. It is +learning on a probability distribution on a certain dataset +and uses the learnt distribution to come up with conclusions on +unexplored data. A typical RBM architecture is represented below +(where h represents hidden nodes and v – visible nodes). + +

+ RBM architecture +
Source: https://www.javatpoint.com/keras-restricted-boltzmann-machine +

+ +     All RBMs that are a part of a Deep Belief Network, are trained in an +unsupervised manner, one at a time [4]. Thus, the output of one of them +becomes the input for the next one. The output of the final machine +is used either in classification or regression tasks, making the +general DBN architecture look like the one represented below. +The prior reason that lead to the appearance of DBNs is to create +unbiased values stored in leaf nodes and to avoid being stuck in +the local minima [5]. + +

+ DBN architecture +
Source: https://www.sciencedirect.com/topics/engineering/deep-belief-network +

+ +## DBNs Overview + +   However, besides the Standard DBNs described earlier, there might be +distinguished several extensions that incorporate different +structures and components. Therefore, the following are the most +notorious variations of DBNs: +* Convolutional Deep Belief Networks (ConvDBNs) – in addition to RBMs, +incorporates convolutional layers [6]; +* Temporal Deep Belief Networks (Temporal DBNs) – extended Standard DBN +to model sequential and time-series data, incorporating +recurrent connections or temporal dependencies between layers [7]; +* Variational Deep Belief Networks (VDBNs) – use a probabilistic +modeling technique, variational interface, in DBMs, making +multiple layers of hidden units fully connected between +consecutive layers; +* Stacked Autoencoders - although not a DBN in the traditional +sense, when multiple autoencoders are stacked, they form a deep +architecture that shares a considerate amount of similarities +with Deep Belief Networks. + +## Math Base +   Despite the diversity of the existing DBNs, it shouldn’t be +forgotten the fact that all of the share similar “roots”. +Recall that DBN is a network assembled out of many single +networks. Except the first and last layers, others play dual +role serving at the same time as hidden layers that comes before +and as input for the following one [8] . The joint distribution +between the observed vector X and the hidden layers hk may be +expressed using the formula: + +

+$P(x, h^1, ..., h^l) = (\displaystyle\prod^{l-2}_{k=0} P(h^k|h^{k + 1})P(h^{l - 1}, h^l))$ +

+ +, where: +* $X$ = $h_0$, +* $P(h^k|h^{k + 1})$ – a conditional distribution for the visible units +conditioned on the hidden units of the RBM at level k, +* $P(h^{l - 1}, h^l)$ – visible-hidden joint distribution in the top-level RBM. + +## DBNs Training + +   Before training a DBN, it is necessary to remember that it is composed +of multiple Restricted Boltzmann Machines that should be individually trained. +We will analyze a typical structured DBN training using an example of a classification +task of handwritten digits. The data will be obtained from the MNIST dataset. + +**Step 1. Initiate the units and parameters of the Restricted Boltzmann Machine.** + +To implement an RBM, we create a corresponding class, with the `__init__` function +containing the number of visible and hidden layers, the visible and hidden biases, +and the weight matrix initialization. The Xavier is used to ensure the weights are +initialized properly, which leads to an improved training process. + +```python +class RBM(nn.Module): + def __init__(self, visible_units, hidden_units): + super(RBM, self).__init__() + self.visible_units = visible_units # number of visible units + self.hidden_units = hidden_units # number of hidden units + self.weights = nn.Parameter(torch.empty(hidden_units, visible_units)) # weight matrix + nn.init.xavier_uniform_(self.weights) # weights initialization + self.visible_bias = nn.Parameter(torch.zeros(visible_units)) # bias for the visible units + self.hidden_bias = nn.Parameter(torch.zeros(hidden_units)) # bias for the hidden units +``` + +**Step 2. Initialize the number of RBM layers and the size of each of +them, specifying the parameters (weights and biases) and +initialization strategy.** + +The second step focuses on deciding upon the number of RBM layers in the final model, +specifying the size of each visible and hidden layer, and appropriately initializing +the RBM parameters. + +For the digit classification, we will use a simple model with 784 visible units (as +the images in MNIST dataset are 28x28 pixels) and one hidden layer of 128 units. + +```python +input_dim = 784 +hidden_layers = [128] + +rbm_layers = [] +current_input = train_inputs +current_val_input = val_inputs +for h_units in hidden_layers: + rbm = RBM(input_dim, h_units).to(device) + rbm_layers.append(rbm) # store the RBM + input_dim = h_units # update the input size for the next RBM +``` + +

+ RBM training +
Source: https://icecreamlabs.medium.com/deep-belief-networks-all-you-need-to-know-68aa9a71cc53 +

+ +**Step 3. Pre-train RBM layers – using Greedy learning algorithm, that +implies layer-by-layer approach, determine the relationship +between variables in one layer and variables in layer above.** + +During this step, the RBMs are pre-trained, one by one. Each RBM, trained layer by +layer, is stored in a list for stacking later. + +```python +input_dim = 784 +hidden_layers = [128] + + +rbm_layers = [] +current_input = train_inputs +current_val_input = val_inputs +for h_units in hidden_layers: + rbm = RBM(input_dim, h_units).to(device) + rbm_train_loss, rbm_val_loss = rbm.pretrain(current_input, current_val_input, rbm_batch_size=batch_size) # train RBM + rbm_layers.append(rbm) # store the RBM + input_dim = h_units # update the input size for the next RBM +``` + +The images were reconstructed every five epochs to monitor the process of training +the RBM and how well the model learns new features. Below can be observed the results +after five epochs of training. + +

+ DBN architecture +
The Original and Recreated images after 5 training epochs +

+ +It can be noticed that while it keeps the main characteristics of the original images, +the recreated ones are still blurred, especially compared to the results obtained after +10 epochs of training. It can also be noticed that more training epochs lead to more +defined boundaries of digits, and overall better quality of reconstructions. + +

+ DBN architecture +
The Original and Recreated images after 10 training epochs +

+ +The improvement in RBM performance is also proven by the train and validation loss +dynamic, which shows a decreasing trend, close to 0 in the end. + +

+ DBN architecture +

+ +**Step 4. Feature extraction – use hidden activation of the final RBM +layer as features that can be used for fine-tuned for specific +tasks.** + +This step focuses on the features (hidden activations) extraction from the provided input. +After extracting them, the features will be used as input for the next RBMs. + + +```python +input_dim = 784 +hidden_layers = [128] + +rbm_layers = [] +current_input = train_inputs +current_val_input = val_inputs +for h_units in hidden_layers: + rbm = RBM(input_dim, h_units).to(device) + rbm_train_loss, rbm_val_loss = rbm.pretrain(current_input, current_val_input, rbm_batch_size=batch_size) # train RBM + current_input = rbm.extract_features(current_input) # extracting features + current_val_input = rbm.extract_features(current_val_input) + rbm_layers.append(rbm) # store the RBM + input_dim = h_units # update the input size for the next RBM +``` + +**Step 5. Initialize Supervisor Layer – add a supervised layer (usually +softmax) on top of the last RBM layer.** + +For the next step, we initialize the DBN class, specifying what classifier we will be +using. As we are dealing with a classification problem, we will be using Softmax as the +last activation layer. + +```python +class DBN(nn.Module): + def __init__(self, rbm_layers, output_classes): + super(DBN, self).__init__() + self.rbms = nn.ModuleList(rbm_layers) + self.classifier = nn.Sequential( + nn.Linear(rbm_layers[-1].hidden_units, output_classes), # fully connected layer + nn.Softmax(dim=1), # Softmax for classification + ) + + def forward(self, x): + for rbm in self.rbms: + x = torch.sigmoid(torch.matmul(x, rbm.weights.t()) + rbm.hidden_bias) + return self.classifier(x) +``` + +**Step 6. ~~Word-by-word, say loudly and clearly the spell.~~ Train the model.** + +During this step, the RBMs are used only for feature extraction, while the supervised +layer is trained on labelled data to classify the digits accordingly. + +```python +def train_dbn(dbn, dbn_train_data, dbn_train_labels, dbn_val_data, dbn_val_labels, epochs=25, lr=0.05): + criterion = nn.CrossEntropyLoss() + train_losses, val_losses = [], [] + + for epoch in range(epochs): + dbn.train() + outputs = dbn(dbn_train_data) + train_loss = criterion(outputs, dbn_train_labels) + train_loss.backward() + train_losses.append(train_loss.item()) + + dbn.eval() + with torch.no_grad(): + val_outputs = dbn(dbn_val_data) + val_loss = criterion(val_outputs, dbn_val_labels) + val_losses.append(val_loss.item()) + + print(f"Epoch {epoch + 1}/{epochs}, Train Loss: {train_loss.item():.4f}, Val Loss: {val_loss:.4f}") + + return train_losses, val_losses +``` + +The model was trained for 25 epochs, during which the loss was reduced considerably, +as it can be observed from the diagram provided below. + +

+ DBN architecture +

+ +**Step 7. Fine-tune the model.** + +Use labeled data and backpropagation to update entire network parameters, +adjusting them to minimize the loss, optimize the gradient descent, adjusting weights +and biases based on the gradients of the loss with respect to the parameters, as well +as the number of epochs, hidden layers, or the batch size +(you may perform this step in an iterative manner). + +```python +def train_dbn(dbn, dbn_train_data, dbn_train_labels, dbn_val_data, dbn_val_labels, epochs=25, lr=0.05): + criterion = nn.CrossEntropyLoss() + optimizer = optim.Adam(dbn.parameters(), lr=lr) + train_losses, val_losses = [], [] + + for epoch in range(epochs): + dbn.train() + optimizer.zero_grad() + outputs = dbn(dbn_train_data) + train_loss = criterion(outputs, dbn_train_labels) + train_loss.backward() + optimizer.step() + train_losses.append(train_loss.item()) + + dbn.eval() + with torch.no_grad(): + val_outputs = dbn(dbn_val_data) + val_loss = criterion(val_outputs, dbn_val_labels) + val_losses.append(val_loss.item()) + + print(f"Epoch {epoch + 1}/{epochs}, Train Loss: {train_loss.item():.4f}, Val Loss: {val_loss:.4f}") + + return train_losses, val_losses +``` + +**Step 8. Evaluate the trained DBN on a test set.** + +This step is intended to show the performance of the model, evaluating it against a test +set or, in other words, unseen data. Using the provided source code, it was possible to +achieve an accuracy of 0.94, which is a pretty solid result. + +```python +def evaluate(dbn_model, data, labels): + with torch.no_grad(): + outputs = dbn_model(data) + _, predicted = torch.max(outputs, 1) + accuracy = (predicted == labels).sum().item() / len(labels) + + print(f"Accuracy: {accuracy:.2f}") +``` + +**Step 9. Postprocessing.** + +After performing everything mentioned above you might want to improve the performance +of the model by adding some additional steps as thresholding or normalization. + +```python +# normalization +transform = transforms.Compose([ + transforms.ToTensor(), + transforms.Normalize((0.5,), (0.5,)), + transforms.Lambda(lambda x: (x + 1) / 2) +]) +``` + +## Use Cases + +   On broad terms, Deep Belief Networks can be described as more +efficient version of feedforward neural network. There follows a +vast applicability of this type of networks: image recognition, +recognizing, clustering and generating of images, video sequences +and motion-caption data. For example, ConvDBNs are good to use +for tasks that involve grid-like data, such as images, while +Temporal DBNs are better for speech recognition or natural +language processing tasks. To sum up, the fields they are widely +used are: + +* Computer Vision – object recognition and classification [6]; +* NLP – sentiment analysis and text classification; +* Speech recognition - transcribing speech into text; +* Recommendation Systems - giving suggestions based on previous inputs; +* Financial analysis – predicting stock market and how risky some actions are; +* Bioinformatics – predict the interactions between components like proteins, +find new drugs and predict how genes may express. + +## Drawbacks and limits + +   In spite of a lot advantages DBNs brings, they are still a quite +early version of a deep neural network and might be not so +effective as its newest aliases due to a considerable set of +drawbacks, such as: + +* high hardware requirements; +* complex data model that is difficult to train; +* difficult to use by unexperienced people; +* requires of a huge amount of data for a good performance; +* requires classifiers to grasp the output. + +## Conclusion + +   To summarize everything up, Deep Belief Network is an +early-deep-learning-days architecture, composed of multiple +Restricted Boltzmann Machines, aimed to perform classification, +clustering and generation tasks. Due to the presence of RBMs, +trained once at a time, it uses Greedy Algorithm to train each +layer until all RBM are trained and the output can be passed to +supervised learning model. Although it is quite old, it still +has its fields of applicability where it performs considerable +better than other known algorithms, that being one of the reasons +to know about its existence. Not the last, it is just an awesome +algorithm with a unique architecture that might bring you fun +while diving in! + +

+ Closing image +
Source: https://www.123rf.com/photo_166564781_that-s-all-folks-vintage-movie-ending-screen-background-the-end-vector-illustration-.html +

+ + +## References +[1] - E. ARGOUARC'H, F. DESBOUVRIES, E. BARAT, E. KAWASAKI, +_Generative vs. Discriminative modeling under the lens of uncertainty quantification_, +doi: arXiv.2406.09172. Access link: https://arxiv.org/pdf/2406.09172 \ +[2] - M. ZAMBRA, A. TESTOLIN, M. ZoORZI, _A developmental approach for training deep +belief networks_, doi: arXiv.2207.05473. Access link: https://arxiv.org/pdf/2207.05473 \ +[3] - MEDIUM, _What Are RBMs, Deep Belief Networks and Why Are They Important to Deep Learning?_. +Article. [quoted 29.07.2023]. Access link: https://medium.com/swlh/what-are-rbms-deep-belief-networks-and-why-are-they-important-to-deep-learning-491c7de8937a \ +[4] - JAVATPOINT, _Restricted Boltzmann Machine_. Article. [quoted 30.07.2023]. Access link: +https://www.javatpoint.com/keras-restricted-boltzmann-machine \ +[5] - AL-JABERY K. K., WUNSCH D. C., _Selected approaches to supervised learning_, +Computational Learning Approaches to Data Analytics in Biomedical Applications, 2020 \ +[6] - H. LEE, R. GROSSE, Ra. RANGANATH, A. Y. NG, +_Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations_, +doi: 10.1145/1553374.1553453, Access link: https://ai.stanford.edu/~ang/papers/icml09-ConvolutionalDeepBeliefNetworks.pdf? \ +[7] - F.Y ZHOU, J. Q YIN, Y. Yang, H.T. ZHANG, +_Online recognition of human actions based on temporal deep belief neural network_, +doi: 10.16383/j.aas.2016.c150629. +Access link: https://www.researchgate.net/publication/306168057_Online_recognition_of_human_actions_based_on_temporal_deep_belief_neural_network \ +[8] - ABIRAMI S., CHITRA P., _The Digital Twin Paradigm for +Smarter Systems and Environments: The Industry Use Cases_, +Advances in Computers, 2020 \ No newline at end of file diff --git a/article-deep_belief_network/README.md b/article-deep_belief_network/README.md new file mode 100644 index 0000000..3ccf044 --- /dev/null +++ b/article-deep_belief_network/README.md @@ -0,0 +1,59 @@ +# Deep Belief Network: The hidden hero behind the Deep Learning (r)evolution + +Welcome to an article dedicated to the Deep Belief Network discussion! Be ready to +discover some dark magic and impressive algorithms behind modern generative AI. +## Article Summary + +The proposed article focuses on the discussion of Deep Belief Networks (DBN), a type of +stochastic neural network, one of the first algorithms to prove the feasibility of +unsupervised learning and hidden layers. It highlights the extensive usage of DBNs in +different tasks, such as generation, recognition and feature extraction, focusing the +implementation on digit classification based on the MNIST dataset. + +## Getting Started + +Follow the following steps to check how Deep Belief Networks work for classification +of handwritten digits! + + +1. **Create a New Project**: Create a new empty Python Project on your device and navigate +to the project directory. + + ```sh + cd my-dbn-project + ``` + +2. **Prepare Your Environment**: Before you begin, make sure you have a virtual environment set up for your project. If not, create and activate a virtual environment: + + ```sh + python -m venv dbn_env + source dbn_env/bin/activate # On Windows: .\dbn_env\Scripts\activate + ``` + +3. **Copy the source code**: Inside the empty directory, add the file with the provided +source (`dbn_implementation.py`) code and the requirements (`requirements.txt`). + +4. **Install Requirements**: Inside your virtual environment, install the required packages from the `requirements.txt` file: + + ```sh + pip install -r requirements.txt + ``` + +5. **Run the source code**: To run the source code provided for the DBN, use the command +provided below. It can take some time, as the MNIST dataset should be installed. + + ```sh + python dbn_implementation.py + ``` + +## Next Steps + +Now that you have successfully run the provided source code, consider optimizing some +hyperparameters or making your own DBN implementation for a different task (you may +consider fraud detection or text classification based on topics). + +Make sure to share your achievements with us!😉 + +For any questions or further assistance, reach out to our community. + +Happy coding!✨💙 diff --git a/article-deep_belief_network/images/dbn_RBM_train_validation_loss.png b/article-deep_belief_network/images/dbn_RBM_train_validation_loss.png new file mode 100644 index 0000000..b81edd2 Binary files /dev/null and b/article-deep_belief_network/images/dbn_RBM_train_validation_loss.png differ diff --git a/article-deep_belief_network/images/dbn_actual_reconstructed_images_10_epochs.png b/article-deep_belief_network/images/dbn_actual_reconstructed_images_10_epochs.png new file mode 100644 index 0000000..2559653 Binary files /dev/null and b/article-deep_belief_network/images/dbn_actual_reconstructed_images_10_epochs.png differ diff --git a/article-deep_belief_network/images/dbn_actual_reconstructed_images_5_epochs.png b/article-deep_belief_network/images/dbn_actual_reconstructed_images_5_epochs.png new file mode 100644 index 0000000..6ddd1fc Binary files /dev/null and b/article-deep_belief_network/images/dbn_actual_reconstructed_images_5_epochs.png differ diff --git a/article-deep_belief_network/images/dbn_end_picture.jpg b/article-deep_belief_network/images/dbn_end_picture.jpg new file mode 100644 index 0000000..7350271 Binary files /dev/null and b/article-deep_belief_network/images/dbn_end_picture.jpg differ diff --git a/article-deep_belief_network/images/dbn_network.jpg b/article-deep_belief_network/images/dbn_network.jpg new file mode 100644 index 0000000..3202ba0 Binary files /dev/null and b/article-deep_belief_network/images/dbn_network.jpg differ diff --git a/article-deep_belief_network/images/dbn_rbm_layers.png b/article-deep_belief_network/images/dbn_rbm_layers.png new file mode 100644 index 0000000..81e45cf Binary files /dev/null and b/article-deep_belief_network/images/dbn_rbm_layers.png differ diff --git a/article-deep_belief_network/images/dbn_structure.png b/article-deep_belief_network/images/dbn_structure.png new file mode 100644 index 0000000..ba3696b Binary files /dev/null and b/article-deep_belief_network/images/dbn_structure.png differ diff --git a/article-deep_belief_network/images/dbn_suggestive_meme.jpg b/article-deep_belief_network/images/dbn_suggestive_meme.jpg new file mode 100644 index 0000000..697abe1 Binary files /dev/null and b/article-deep_belief_network/images/dbn_suggestive_meme.jpg differ diff --git a/article-deep_belief_network/images/dbn_train_validation_loss.png b/article-deep_belief_network/images/dbn_train_validation_loss.png new file mode 100644 index 0000000..ae37a18 Binary files /dev/null and b/article-deep_belief_network/images/dbn_train_validation_loss.png differ diff --git a/article-deep_belief_network/src/dbn_implementation.py b/article-deep_belief_network/src/dbn_implementation.py new file mode 100644 index 0000000..292e48d --- /dev/null +++ b/article-deep_belief_network/src/dbn_implementation.py @@ -0,0 +1,359 @@ +import matplotlib.pyplot as plt +import torch.optim as optim +from torchvision import datasets, transforms +from torch.utils.data import DataLoader, random_split + +import torch +import torch.nn as nn + + +class RBM(nn.Module): + """ + Restricted Boltzmann Machine implementation to extract the features from the input data in an unsupervised manner + """ + def __init__(self, visible_units, hidden_units): + """ + Initialize the Restricted Boltzmann Machine + + Args: + visible_units: the number of visible units + hidden_units: the number of hidden units + """ + super(RBM, self).__init__() + self.visible_units = visible_units + self.hidden_units = hidden_units + self.weights = nn.Parameter(torch.empty(hidden_units, visible_units)) # weight matrix + nn.init.xavier_uniform_(self.weights) # weights initialization + self.visible_bias = nn.Parameter(torch.zeros(visible_units)) # bias for the visible units + self.hidden_bias = nn.Parameter(torch.zeros(hidden_units)) # bias for the hidden units + + def sample_h(self, v): + """ + Computes the probability of hidden units given the visible units + Args: + v: activations for visible layers + Returns: + The probability of hidden layer activations + """ + h_prob = torch.sigmoid(torch.matmul(v, self.weights.t()) + self.hidden_bias) + return h_prob + + def sample_v(self, h): + """ + Computes the probability of visible units given the hidden units + Args: + h: activations for hidden layers + Returns: + The probability of visible layer activations + """ + v_prob = torch.sigmoid(torch.matmul(h, self.weights) + self.visible_bias) + return v_prob + + def pretrain(self, rbm_train_data, rbm_val_data, rbm_batch_size, epochs=10, lr=0.001): + """ + Pretrain the Restricted Boltzmann Machine using unsupervised learning, minimizing the reconstruction + errors using Contrastive Divergence + + Args: + rbm_train_data: training data + rbm_val_data: validation data + rbm_batch_size: the batch size for training + epochs: number of training epochs + lr: learning rate + + Returns: + The lists of training and validation losses for each epoch + """ + num_samples = rbm_train_data.size(0) + train_losses, val_losses = [], [] + for epoch in range(epochs): + epoch_loss = 0 + indices = torch.randperm(num_samples) + data = rbm_train_data[indices] + + for i in range(0, num_samples, rbm_batch_size): + batch = data[i:i + rbm_batch_size].to(next(self.parameters()).device) + loss = self.train_rbm(batch, lr) + epoch_loss += loss.item() * batch.size(0) + epoch_loss /= num_samples + train_losses.append(epoch_loss) + + # visualize the reconstructed images for each 5 epochs + if (epoch + 1) % 5 == 0: + print(f"Reconstruction on epoch {epoch + 1}:") + self.visualize_reconstructions(data) + with torch.no_grad(): + val_loss = self.evaluate(rbm_val_data) + val_losses.append(val_loss) + + print(f"Epoch {epoch + 1}/{epochs}, Loss: {epoch_loss:.4f}") + + return train_losses, val_losses + + def train_rbm(self, v, lr=0.01): + """ + Perform one training step of Contrastive Divergence + + Args: + v: the input batch of visible layer data, shape (batch_size, visible_units) + lr: learning rate for the parameter updates + + Returns: + Reconstruction loss for the input batch + """ + h_prob = self.sample_h(v) + v_reconstructed = self.sample_v(h_prob) + h_prob_reconstructed = self.sample_h(v_reconstructed) + + positive_grad = torch.matmul(v.t(), h_prob) + negative_grad = torch.matmul(v_reconstructed.t(), h_prob_reconstructed) + + with torch.no_grad(): + self.weights += lr * (positive_grad - negative_grad).t() + self.visible_bias += lr * torch.sum(v - v_reconstructed, dim=0) + self.hidden_bias += lr * torch.sum(h_prob - h_prob_reconstructed, dim=0) + + return torch.mean((v - v_reconstructed) ** 2) + + def evaluate(self, data): + """ + Compute the reconstruction error on the given dataset to evaluate the RBM performance + + Args: + data: the dataset to evaluate, shape (num_samples, visible_units) + + Returns: + Mean squared reconstruction error + """ + with torch.no_grad(): + v_reconstructed = self.sample_v(self.sample_h(data)) + return torch.mean((data - v_reconstructed) ** 2).item() + + @torch.no_grad() + def extract_features(self, data): + """ + Extract features from the hidden layer of the RBM + + Args: + data: input data to extract features from, shape (num_samples, visible_units) + + Returns: + The activations for the hidden layers, shape (num_samples, hidden_units) + """ + h_activations = self.sample_h(data) + return h_activations + + @torch.no_grad() + def visualize_reconstructions(self, data, num_samples=16): + """ + Visualize original and reconstructed images for the evaluated samples + + Args: + data: the input dataset to reconstruct. + num_samples (default: 16): the number of samples to visualize + """ + v = data[:num_samples] + v_reconstructed = self.sample_v(self.sample_h(v)) + fig, axes = plt.subplots(2, num_samples // 2, figsize=(12, 4)) + for i, ax in enumerate(axes.flatten()): + ax.imshow(v[i].view(28, 28).cpu(), cmap="gray") + ax.axis("off") + plt.suptitle("Original images:") + plt.show() + + fig, axes = plt.subplots(2, num_samples // 2, figsize=(12, 4)) + for i, ax in enumerate(axes.flatten()): + ax.imshow(v_reconstructed[i].view(28, 28).cpu(), cmap="gray") + ax.axis("off") + plt.suptitle("Reconstructed images:") + plt.show() + + +class DBN(nn.Module): + """ + Deep Belief Network implementation to combine multiple Restricted Boltzmann Machines and adding supervised training + for classification task + """ + def __init__(self, rbm_layers, output_classes): + """ + Initialize the Deep Belief Network + + Args: + rbm_layers: a list of pretrained RBMs stacked together + output_classes: the number of output classes for the classification task + """ + super(DBN, self).__init__() + self.rbms = nn.ModuleList(rbm_layers) + self.classifier = nn.Sequential( + nn.Linear(rbm_layers[-1].hidden_units, output_classes), # fully connected layer + nn.Softmax(dim=1), # Softmax for classification + ) + + def forward(self, x): + """ + Perform a forward pass through the DBN + + Args: + x: the input tensor, shape (batch_size, visible_units) + + Returns: + The logits for each class, shape (batch_size, output_classes) + """ + for rbm in self.rbms: + x = torch.sigmoid(torch.matmul(x, rbm.weights.t()) + rbm.hidden_bias) + return self.classifier(x) + + +def train_dbn(dbn, dbn_train_data, dbn_train_labels, dbn_val_data, dbn_val_labels, epochs=25, lr=0.05): + """ + Train the Deep Belief Network (DBN) using supervised learning + + Args: + dbn: the Deep Belief Network model to train + dbn_train_data: the training data + dbn_train_labels: the training data labels + dbn_val_data: the validation data + dbn_val_labels: the validation labels + epochs (default: 15): the number of training epochs + lr (default: 0.05): learning rate for the Adam optimizer + + Returns: + The lists of training and validation losses per epoch + """ + criterion = nn.CrossEntropyLoss() + optimizer = optim.Adam(dbn.parameters(), lr=lr) + train_losses, val_losses = [], [] + + for epoch in range(epochs): + dbn.train() + optimizer.zero_grad() + outputs = dbn(dbn_train_data) + train_loss = criterion(outputs, dbn_train_labels) + train_loss.backward() + optimizer.step() + train_losses.append(train_loss.item()) + + dbn.eval() + with torch.no_grad(): + val_outputs = dbn(dbn_val_data) + val_loss = criterion(val_outputs, dbn_val_labels) + val_losses.append(val_loss.item()) + + print(f"Epoch {epoch + 1}/{epochs}, Train Loss: {train_loss.item():.4f}, Val Loss: {val_loss:.4f}") + + return train_losses, val_losses + + +def evaluate(dbn_model, data, labels): + """ + Evaluate the trained DBN on a test dataset + + Args: + dbn_model: the trained Deep Belief Network model + data: the test data, shape (num_samples, visible_units) + labels: the test data labels, shape (num_samples) + """ + with torch.no_grad(): + outputs = dbn_model(data) + _, predicted = torch.max(outputs, 1) + accuracy = (predicted == labels).sum().item() / len(labels) + + print(f"Accuracy: {accuracy:.2f}") + + +def plot_losses(train_losses, val_losses, title): + """ + Plot the training and validation loss curves + + Args: + train_losses: the list of training losses per epoch + val_losses: the list of validation losses per epoch + title: the title of the plot + """ + plt.figure(figsize=(10, 6)) + plt.plot(train_losses, label="Train Loss") + plt.plot(val_losses, label="Validation Loss") + plt.xlabel("Epochs") + plt.ylabel("Loss") + plt.title(title) + plt.legend() + plt.grid(True) + plt.show() + + +def prepare_data(data_loader): + """ + Prepare data by flattening and batching + + Args: + data_loader: a dataLoader object + + Returns: + the flattened input data and corresponding labels + """ + inputs, labels = [], [] + for batch in data_loader: + images, targets = batch + images = images.view(images.size(0), -1) + inputs.append(images) + labels.append(targets) + inputs = torch.cat(inputs) + labels = torch.cat(labels) + return inputs, labels + + +# normalization +transform = transforms.Compose([ + transforms.ToTensor(), + transforms.Normalize((0.5,), (0.5,)), + transforms.Lambda(lambda x: (x + 1) / 2) +]) + +mnist_data = datasets.MNIST(root='./data', train=True, download=True, transform=transform) + +train_size = int(0.6 * len(mnist_data)) +val_size = int(0.25 * len(mnist_data)) +test_size = len(mnist_data) - train_size - val_size + +train_data, val_data, test_data = random_split(mnist_data, [train_size, val_size, test_size]) + +batch_size = 32 +train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True) +val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=False) +test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False) + +device = torch.device("cuda" if torch.cuda.is_available() else "cpu") + +train_inputs, train_labels = prepare_data(train_loader) +val_inputs, val_labels = prepare_data(val_loader) +test_inputs, test_labels = prepare_data(test_loader) + +train_inputs, train_labels = train_inputs.to(device), train_labels.to(device) +val_inputs, val_labels = val_inputs.to(device), val_labels.to(device) +test_inputs, test_labels = test_inputs.to(device), test_labels.to(device) + +input_dim = 784 +hidden_layers = [128] +output_classes = 10 + +rbm_layers = [] +current_input = train_inputs +current_val_input = val_inputs +for h_units in hidden_layers: + rbm = RBM(input_dim, h_units).to(device) + rbm_train_loss, rbm_val_loss = rbm.pretrain(current_input, current_val_input, rbm_batch_size=batch_size) + current_input = rbm.extract_features(current_input) + current_val_input = rbm.extract_features(current_val_input) + rbm_layers.append(rbm) + input_dim = h_units + + plot_losses(rbm_train_loss, rbm_val_loss, title="RBM training and validation losses") + +dbn = DBN(rbm_layers=rbm_layers, output_classes=output_classes) +dbn.to(device) + +dbn_train_loss, dbn_val_loss = train_dbn(dbn, train_inputs, train_labels, val_inputs, val_labels) + +plot_losses(dbn_train_loss, dbn_val_loss, title="DBN Training and Validation Loss") + +evaluate(dbn, test_inputs, test_labels) diff --git a/article-deep_belief_network/src/requirements.txt b/article-deep_belief_network/src/requirements.txt new file mode 100644 index 0000000..692974b --- /dev/null +++ b/article-deep_belief_network/src/requirements.txt @@ -0,0 +1,3 @@ +matplotlib==3.10.0 +torch==2.5.1 +torchvision==0.20.1