diff --git a/README.md b/README.md
index bedd52a..d4e5983 100644
--- a/README.md
+++ b/README.md
@@ -20,7 +20,7 @@
 
 4. <b>Evaluation & Downstream Analysis</b>: The trained model is evaluated using the test dataset by calculating metrics such as precision, recall, f1-score, and accuracy. Various visualizations, such as ROC curve of class annotation, feature rank plots, heatmap of top genes per class, [DGE analysis](https://colab.research.google.com/github/infocusp/scaLR/blob/main/tutorials/analysis/differential_gene_expression/dge.ipynb), and [gene recall curves](https://colab.research.google.com/github/infocusp/scaLR/blob/main/tutorials/analysis/gene_recall_curve/gene_recall_curve.ipynb), are generated.
 
-The following flowchart explains the major steps of the scaLR platform.
+**The below flowchart also explains the major steps of the scaLR platform.**
 
 ![image.jpg](img/Schematic-of-scPipeline.jpg)
 
@@ -29,7 +29,6 @@ The following flowchart explains the major steps of the scaLR platform.
 
 - ScaLR can be installed using git or pip. It is tested in Python 3.10 and it is recommended to use that environment.
 
-
 ```
 conda create -n scaLR_env python=3.10
 
@@ -47,9 +46,9 @@ pip install -r requirements.txt
 ```
 pip install pyscaLR
 ```
-*Note* If the user wants to run the entire pipeline via installing pip pyscalr, they should clone/download these files(`pipeline.py` and `config.yaml`) from the git repository.
+**Note:** If the user wants to run the entire pipeline via installing pip pyscalr, they should clone/download these files(`pipeline.py` and `config.yaml`) from the git repository.
 
-## Input Data
+## Input data format
 - Currently the pipeline expects all datasets in [anndata](https://anndata.readthedocs.io/en/latest/tutorials/notebooks/getting-started.html) formats (`.h5ad` files only).
 - The anndata object should contain cell samples as `obs` and genes as `var. '
 - `adata.X`: contains normalized gene counts/expression values (`log1p` normalization with range `0-10` expected).
@@ -60,15 +59,192 @@ pip install pyscaLR
 ## How to run
 
 1. It is necessary that the user modify the configuration file, and each stage of the pipeline is available inside the config folder [config.yml] as per your requirements. Simply omit/comment out stages of the pipeline you do not wish to run.
-2. Refer config.yml & it's detailed config [README](https://github.com/infocusp/scaLR/blob/main/config/README.md) file on how to use different parameters and files.
+2. Refer **config.yml** & **it's detailed config** [README](https://github.com/infocusp/scaLR/blob/main/config/README.md) file on how to use different parameters and files.
 3. Then use the `pipeline.py` file to run the entire pipeline according to your configurations. This file takes as argument the path to config (`-c | --config`), along with optional flags to log all parts of the pipelines (`-l | --log`) and to analyze memory usage (`-m | --memoryprofiler`).
 5. `python pipeline.py --config /path/to/config.yaml -l -m` to run the scaLR.
 
-## Examples configs
+## Example configs
+
+### Config for cell type classification and biomarker identification
+
+NOTE: Below are just suggestions for the model parameters. Feel free to play around with them for tuning the model & improving the results.
+
+An example configuration file for the current dataset, incorporating the edits below, can be found at '`scaLR/tutorials/pipeline/config_celltype.yaml`. Update the device as cuda or cpu as per the requirement.
+
+- **Device setup*** 
+    - Update device: 'cuda' for GPU enabled runtype, else device: 'cpu' for CPU enabled runtype.
+- **Experiment Config**
+    - The default exp_run number is 0.If not changed, the celltype classification experiment would be exp_run_0 with all the pipeline results.
+- **Data Config**
+    - Update the full_datapath to `data/modified_adata.h5ad` (as we will include GeneRecallCurve in the downstream).
+    - Specify the num_workers value for effective parallelization.
+    - Set target to cell_type.
+- **Feature Selection**
+    - Specify the num_workers value for effective parallelization.
+    - Update the model layers to [5000, 10], as there are only 10 cell types in the dataset.
+    - Change epoch to 10.
+- **Final Model Training**
+    - Update the model layers to the same as for feature selection: [5000, 10].
+    - Change epoch to 100.
+- **Analysis**
+    - Downstream Analysis
+        - Uncomment the test_samples_downstream_analysis section.
+        -   Update the reference_genes_path to `scaLR/tutorials/pipeline/grc_reference_gene.csv`.
+        - Refer to the section below:
+    ```
+    # Config file for pipeline run for cell type classification.
+
+    # DEVICE SETUP.
+    device: 'cuda'
+
+    # EXPERIMENT.
+    experiment:
+        dirpath: 'scalr_experiments'
+        exp_name: 'exp_name'
+        exp_run: 0
+
+    # DATA CONFIG.
+    data:
+        sample_chunksize: 20000
+
+        train_val_test:
+            full_datapath: 'data/modified_adata.h5ad'
+            num_workers: 2
+
+            splitter_config:
+                name: GroupSplitter
+                params:
+                    split_ratio: [7, 1, 2.5]
+                    stratify: 'donor_id'
+
+            # split_datapaths: ''
+
+        # preprocess:
+        #     - name: SampleNorm
+        #       params:
+        #             **args
+
+        #     - name: StandardScaler
+        #       params: 
+        #             **args
+
+        target: cell_type    
+
+    # FEATURE SELECTION.
+    feature_selection:
+
+        # score_matrix: '/path/to/matrix'
+        feature_subsetsize: 5000
+        num_workers: 2
+
+        model:
+            name: SequentialModel
+            params:
+                layers: [5000, 10]
+                weights_init_zero: True
+
+        model_train_config:
+            trainer: SimpleModelTrainer
+
+            dataloader: 
+                name: SimpleDataLoader
+                params:
+                    batch_size: 25000
+                    padding: 5000
+            
+            optimizer:
+                name: SGD
+                params:
+                    lr: 1.0e-3
+                    weight_decay: 0.1
+
+            loss:
+                name: CrossEntropyLoss
+            
+            epochs: 10
+
+        scoring_config: 
+            name: LinearScorer
+            
+        features_selector:
+            name: AbsMean
+            params:
+                k: 5000
+
+    # FINAL MODEL TRAINING.
+    final_training:
+
+        model:
+            name: SequentialModel
+            params:
+                layers: [5000, 10]
+                dropout: 0
+                weights_init_zero: False
+
+        model_train_config:
+            resume_from_checkpoint: null
+
+            trainer: SimpleModelTrainer
+
+            dataloader: 
+                name: SimpleDataLoader
+                params:
+                    batch_size: 15000
+            
+            optimizer:
+                name: Adam
+                params:
+                    lr: 1.0e-3
+                    weight_decay: 0
+
+            loss:
+                name: CrossEntropyLoss
+            
+            epochs: 100
+
+            callbacks:
+                - name: TensorboardLogger
+                - name: EarlyStopping
+                params:
+                    patience: 3
+                    min_delta: 1.0e-4
+                - name: ModelCheckpoint
+                params:
+                    interval: 5
+    analysis:
+
+        model_checkpoint: ''
 
-### Config edits (For clinical condition-specific biomarker identification and DGE analysis)
+        dataloader:
+            name: SimpleDataLoader
+            params:
+                batch_size: 15000
+
+        gene_analysis:
+            scoring_config:
+                name: LinearScorer
+
+            features_selector:
+                name: ClasswisePromoters
+                params:
+                    k: 100
+        test_samples_downstream_analysis:
+            - name: GeneRecallCurve
+              params:
+                reference_genes_path: 'scaLR/tutorials/pipeline/grc_reference_gene.csv'
+                top_K: 300
+                plots_per_row: 3
+                features_selector:
+                    name: ClasswiseAbs
+                    params: {}
+            - name: Heatmap
+              params: {}
+            - name: RocAucCurve
+              params: {}
+    ```
+### Config for clinical condition-specific biomarker identification and DGE analysis
 
-An example configuration file for the current dataset, incorporating the edits below, can be found at: scaLR/tutorials/pipeline/config_clinical.yaml.Please update the device as CUDA or CPU as per runtype
+An example configuration file (`scaLR/tutorials/pipeline/config_clinical.yaml`). Update the device as CUDA or CPU as per the requirement.
 
 - Experiment Config
   - Make sure to change the exp_run number if you have an experiment with the same number earlier related to cell classification. As we have done one experiment earlier, we'll change the number now to '1'.
@@ -83,10 +259,10 @@ An example configuration file for the current dataset, incorporating the edits b
   - epoch as 100.
 - Analysis
   - Downstream Analysis
-     - Uncomment the full_samples_downstream_analysis section.
+     - Uncomment the full_samples_downstream_analysis section for example config file.
      - We are not performing the 'gene_recall_curve' analysis in this case. It can be performed if the COVID-19/normal specific genes are available, but there are many possibilities of genes in the case of normal conditions.
-     - There are two options to perform differential gene expression (DGE) analysis: DgePseudoBulk and DgeLMEM. The parameters are updated as follows. Note that DgeLMEM may take a bit more time, as the multiprocessing is not very       efficient with only 2 CPUs in the current Colab runtime.
-     - Please refer to the section below:
+     - There are two options to perform differential gene expression (DGE) analysis: **DgePseudoBulk and DgeLMEM**. The parameters are updated as follows. Note that DgeLMEM may take a bit more time, as the multiprocessing is not very efficient with only 2 CPUs in the current Colab runtime.
+     - Refer to the section below:
 
     ```
     analysis:
@@ -102,67 +278,6 @@ An example configuration file for the current dataset, incorporating the edits b
           scoring_config:
               name: LinearScorer
     
-          features_selector:
-              name: ClasswisePromoters
-              params:
-                  k: 100
-      full_samples_downstream_analysis:
-          - name: Heatmap
-            params:
-              top_n_genes: 100
-          - name: RocAucCurve
-            params: {}
-          - name: DgePseudoBulk
-            params:
-                celltype_column: 'cell_type'
-                design_factor: 'disease'
-                factor_categories: ['COVID-19', 'normal']
-                sum_column: 'donor_id'
-                cell_subsets: ['conventional dendritic cell', 'natural killer cell']
-          - name: DgeLMEM
-            params:
-              fixed_effect_column: 'disease'
-              fixed_effect_factors: ['COVID-19', 'normal']
-              group: 'donor_id'
-              celltype_column: 'cell_type'
-              cell_subsets: ['conventional dendritic cell']
-              gene_batch_size: 1000
-              coef_threshold: 0.1
-    ```
-### Config edits (For clinical condition-specific biomarker identification and DGE analysis)
-  An example configuration file for the current dataset, incorporating the edits below, can be found at: scaLR/tutorials/pipeline/config_clinical.yaml.Please update the device as cuda or cpu as per runtype
-
-- Experiment Config
-    - Make sure to change the exp_run number if you have an experiment with the same number earlier related to cell classification.As we have done one experiment earlier, we'll change the number now to '1'.
-- Data Config
-    - The full_datapath remains the same as above.
-    - Change the target to disease (this column contains data for clinical conditions, COVID-19/normal).
-- Feature Selection
-    - Update the model layers to [5000, 2], as there are only two types of clinical conditions.
-    - epoch as 10.
-- Final Model Training
-    - Update the model layers to the same as for feature selection: [5000, 2].
-    - epoch as 100.
-- Analysis
-    - Downstream Analysis
-      - Uncomment the full_samples_downstream_analysis section.
-      - We are not performing the 'gene_recall_curve' analysis in this case. It can be performed if the COVID-19/normal specific genes are available, but there are many possibilities of genes in the case of normal conditions.
-      - There are two options to perform differential gene expression (DGE) analysis: DgePseudoBulk and DgeLMEM. The parameters are updated as follows. Note that DgeLMEM may take a bit more time, as the multiprocessing is not very efficient with only 2 CPUs in the current Colab runtime.
-      - Please refer to the section below:
-    ```
-    analysis:
-    
-      model_checkpoint: ''
-    
-      dataloader:
-          name: SimpleDataLoader
-          params:
-              batch_size: 15000
-    
-      gene_analysis:
-          scoring_config:
-              name: LinearScorer
-    
           features_selector:
               name: ClasswisePromoters
               params:
@@ -192,16 +307,17 @@ An example configuration file for the current dataset, incorporating the edits b
     ```
 
 ## Interactive tutorials
-Detailed tutorials have been made on how to use some functionalities as a scaLR library. Find the links below.
+Detailed tutorials have been made on how to use some pipeline functionalities as a scaLR library. Find the links below.
 
 - **scaLR pipeline** [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/infocusp/scaLR/blob/main/tutorials/pipeline/scalr_pipeline.ipynb)
 - **Differential gene expression analysis** [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/infocusp/scaLR/blob/main/tutorials/analysis/differential_gene_expression/dge.ipynb)
 - **Gene recall curve** [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/infocusp/scaLR/blob/main/tutorials/analysis/gene_recall_curve/gene_recall_curve.ipynb)
 - **Normalization** [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/infocusp/scaLR/blob/main/tutorials/preprocessing/normalization.ipynb)
 - **Batch correction** [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/infocusp/scaLR/blob/main/tutorials/preprocessing/batch_correction.ipynb)
-- **SHAP analysis** [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/infocusp/scaLR/blob/main/tutorials/analysis/shap_analysis/shap_heatmap.ipynb)
 
-## Experiment Output Structure
+- **An example of jupyter notebook to [run scaLR in local machine](https://github.com/infocusp/scaLR/blob/main/tutorials/pipeline/scalr_pipeline_local_run.ipynb)**.
+
+## Experiment output structure
 - **pipeline.py**:
 The main script that performs an end-to-end run.
     - `exp_dir`: root experiment directory for the storage of all step outputs of the platform specified in the config.
@@ -256,8 +372,6 @@ Performs evaluation of best model trained on user-defined metrics on the test se
                     - `lmemDGE_celltype.csv`: contains LMEM DGE results between selected factor categories for a celltype.
                     - `lmemDGE_fixed_effect_factor_X.svg`: volcano plot of coefficient vs -log10(p-value) of genes.
   
-
-
 ## Citation
 
 Jogani Saiyam, Anand Santosh Pol, Mayur Prajapati, Amit Samal, Kriti Bhatia, Jayendra Parmar, Urvik Patel, Falak Shah, Nisarg Vyas, and Saurabh Gupta. "scaLR: a low-resource deep neural network-based platform for single cell analysis and biomarker discovery." bioRxiv (2024): 2024-09.
diff --git a/tutorials/pipeline/scalr_pipeline_local_run.ipynb b/tutorials/pipeline/scalr_pipeline_local_run.ipynb
new file mode 100644
index 0000000..c9fe5ec
--- /dev/null
+++ b/tutorials/pipeline/scalr_pipeline_local_run.ipynb
@@ -0,0 +1,1766 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "dfGECxsGN9bo"
+   },
+   "source": [
+    "<left><img src=\"https://github.com/infocusp/scaLR/raw/sj/fullntest_samples_analysis/img/scaLR_logo.png\" width=\"150\" height=\"180\"></left>\n",
+    "\n",
+    "# <span style=\"color: steelblue;\">Single-cell analysis using Low Resource (scaLR)</span>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "Xna7qg2PgjJm"
+   },
+   "source": [
+    "\n",
+    "\n",
+    "**Note:**  \n",
+    "1. If scaLR is intended to be run on a local system, please ensure that an `ipy kernel` with Python version `3.10` is selected. Then, all the required installations can be performed as mentioned in the section below.\n",
+    "\n",
+    "2. If scaLR has already been installed as mentioned in [Pre-requisites and installation scaLR](https://github.com/infocusp/scaLR), the repository cloning and requirement installation steps below can be skipped. Selecting the `ipy kernel` can be done as follows:\n",
+    "\n",
+    "    - Open the terminal and run:  \n",
+    "     \n",
+    "        ```\n",
+    "        conda install -c anaconda ipykernel\n",
+    "        python -m ipykernel install --user --name=scaLR_env\n",
+    "        ```\n",
+    "    - Select `scaLR_env` as the `ipy kernel` in `scalr_pipeline.ipynb`.  \n",
+    "    - Finally, update the system path for scaLR, as mentioned in the shell before data download. e.g.:  \n",
+    "        ```\n",
+    "        sys.path.append('path/to/scaLR/')\n",
+    "        ```    \n",
+    "## <span style=\"color: steelblue;\">Cloning scaLR</span>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "CdutIWiy8xJb"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Cloning into 'scaLR'...\n",
+      "remote: Enumerating objects: 3452, done.\u001b[K\n",
+      "remote: Counting objects: 100% (372/372), done.\u001b[K\n",
+      "remote: Compressing objects: 100% (181/181), done.\u001b[K\n",
+      "remote: Total 3452 (delta 243), reused 261 (delta 189), pack-reused 3080 (from 1)\u001b[K\n",
+      "Receiving objects: 100% (3452/3452), 170.03 MiB | 2.80 MiB/s, done.\n",
+      "Resolving deltas: 100% (2073/2073), done.\n"
+     ]
+    }
+   ],
+   "source": [
+    "!git clone https://github.com/infocusp/scaLR.git"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "MLJo_0EugjJq"
+   },
+   "source": [
+    "Install all requirements after cloning the repository, excluding packages that are pre-installed in Colab."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "9dQLPmLwPL0C"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Defaulting to user installation because normal site-packages is not writeable\n",
+      "Requirement already satisfied: anndata==0.10.9 in /home/amit.samal/.local/lib/python3.10/site-packages (0.10.9)\n",
+      "Requirement already satisfied: isort==5.13.2 in /home/amit.samal/.local/lib/python3.10/site-packages (5.13.2)\n",
+      "Collecting loky==3.4.1\n",
+      "  Downloading loky-3.4.1-py3-none-any.whl.metadata (6.4 kB)\n",
+      "Requirement already satisfied: pillow==10.4.0 in /home/amit.samal/.local/lib/python3.10/site-packages (10.4.0)\n",
+      "Requirement already satisfied: pydeseq2==0.4.11 in /home/amit.samal/.local/lib/python3.10/site-packages (0.4.11)\n",
+      "Requirement already satisfied: pyparsing==3.2.0 in /home/amit.samal/.local/lib/python3.10/site-packages (3.2.0)\n",
+      "Requirement already satisfied: pytest==8.3.3 in /home/amit.samal/.local/lib/python3.10/site-packages (8.3.3)\n",
+      "Requirement already satisfied: PyYAML==6.0.2 in /home/amit.samal/.local/lib/python3.10/site-packages (6.0.2)\n",
+      "Requirement already satisfied: scanpy==1.10.3 in /home/amit.samal/.local/lib/python3.10/site-packages (1.10.3)\n",
+      "Requirement already satisfied: scikit-learn==1.5.2 in /home/amit.samal/.local/lib/python3.10/site-packages (1.5.2)\n",
+      "Requirement already satisfied: shap==0.46.0 in /home/amit.samal/.local/lib/python3.10/site-packages (0.46.0)\n",
+      "Requirement already satisfied: tensorboard==2.17.0 in /home/amit.samal/.local/lib/python3.10/site-packages (2.17.0)\n",
+      "Requirement already satisfied: toml==0.10.2 in /home/amit.samal/.local/lib/python3.10/site-packages (0.10.2)\n",
+      "Requirement already satisfied: tqdm==4.66.5 in /home/amit.samal/.local/lib/python3.10/site-packages (4.66.5)\n",
+      "Requirement already satisfied: yapf==0.40.2 in /home/amit.samal/.local/lib/python3.10/site-packages (0.40.2)\n",
+      "Requirement already satisfied: array-api-compat!=1.5,>1.4 in /home/amit.samal/.local/lib/python3.10/site-packages (from anndata==0.10.9) (1.5.1)\n",
+      "Requirement already satisfied: exceptiongroup in /home/amit.samal/.local/lib/python3.10/site-packages (from anndata==0.10.9) (1.2.0)\n",
+      "Requirement already satisfied: h5py>=3.1 in /home/amit.samal/.local/lib/python3.10/site-packages (from anndata==0.10.9) (3.10.0)\n",
+      "Requirement already satisfied: natsort in /home/amit.samal/.local/lib/python3.10/site-packages (from anndata==0.10.9) (8.4.0)\n",
+      "Requirement already satisfied: numpy>=1.23 in /home/amit.samal/.local/lib/python3.10/site-packages (from anndata==0.10.9) (1.26.3)\n",
+      "Requirement already satisfied: packaging>=20.0 in /home/amit.samal/.local/lib/python3.10/site-packages (from anndata==0.10.9) (24.0)\n",
+      "Requirement already satisfied: pandas!=2.1.0rc0,!=2.1.2,>=1.4 in /home/amit.samal/.local/lib/python3.10/site-packages (from anndata==0.10.9) (1.5.3)\n",
+      "Requirement already satisfied: scipy>1.8 in /home/amit.samal/.local/lib/python3.10/site-packages (from anndata==0.10.9) (1.12.0)\n",
+      "Requirement already satisfied: cloudpickle in /home/amit.samal/.local/lib/python3.10/site-packages (from loky==3.4.1) (3.0.0)\n",
+      "Requirement already satisfied: matplotlib>=3.6.2 in /home/amit.samal/.local/lib/python3.10/site-packages (from pydeseq2==0.4.11) (3.8.3)\n",
+      "Requirement already satisfied: iniconfig in /home/amit.samal/.local/lib/python3.10/site-packages (from pytest==8.3.3) (2.0.0)\n",
+      "Requirement already satisfied: pluggy<2,>=1.5 in /home/amit.samal/.local/lib/python3.10/site-packages (from pytest==8.3.3) (1.5.0)\n",
+      "Requirement already satisfied: tomli>=1 in /home/amit.samal/.local/lib/python3.10/site-packages (from pytest==8.3.3) (2.1.0)\n",
+      "Requirement already satisfied: joblib in /home/amit.samal/.local/lib/python3.10/site-packages (from scanpy==1.10.3) (1.3.2)\n",
+      "Requirement already satisfied: legacy-api-wrap>=1.4 in /home/amit.samal/.local/lib/python3.10/site-packages (from scanpy==1.10.3) (1.4)\n",
+      "Requirement already satisfied: networkx>=2.7 in /home/amit.samal/.local/lib/python3.10/site-packages (from scanpy==1.10.3) (3.2.1)\n",
+      "Requirement already satisfied: numba>=0.56 in /home/amit.samal/.local/lib/python3.10/site-packages (from scanpy==1.10.3) (0.59.1)\n",
+      "Requirement already satisfied: patsy in /home/amit.samal/.local/lib/python3.10/site-packages (from scanpy==1.10.3) (0.5.6)\n",
+      "Requirement already satisfied: pynndescent>=0.5 in /home/amit.samal/.local/lib/python3.10/site-packages (from scanpy==1.10.3) (0.5.11)\n",
+      "Requirement already satisfied: seaborn>=0.13 in /home/amit.samal/.local/lib/python3.10/site-packages (from scanpy==1.10.3) (0.13.2)\n",
+      "Requirement already satisfied: session-info in /home/amit.samal/.local/lib/python3.10/site-packages (from scanpy==1.10.3) (1.0.0)\n",
+      "Requirement already satisfied: statsmodels>=0.13 in /home/amit.samal/.local/lib/python3.10/site-packages (from scanpy==1.10.3) (0.14.1)\n",
+      "Requirement already satisfied: umap-learn!=0.5.0,>=0.5 in /home/amit.samal/.local/lib/python3.10/site-packages (from scanpy==1.10.3) (0.5.5)\n",
+      "Requirement already satisfied: threadpoolctl>=3.1.0 in /home/amit.samal/.local/lib/python3.10/site-packages (from scikit-learn==1.5.2) (3.4.0)\n",
+      "Requirement already satisfied: slicer==0.0.8 in /home/amit.samal/.local/lib/python3.10/site-packages (from shap==0.46.0) (0.0.8)\n",
+      "Requirement already satisfied: absl-py>=0.4 in /home/amit.samal/.local/lib/python3.10/site-packages (from tensorboard==2.17.0) (2.1.0)\n",
+      "Requirement already satisfied: grpcio>=1.48.2 in /home/amit.samal/.local/lib/python3.10/site-packages (from tensorboard==2.17.0) (1.70.0)\n",
+      "Requirement already satisfied: markdown>=2.6.8 in /home/amit.samal/.local/lib/python3.10/site-packages (from tensorboard==2.17.0) (3.7)\n",
+      "Requirement already satisfied: protobuf!=4.24.0,<5.0.0,>=3.19.6 in /home/amit.samal/.local/lib/python3.10/site-packages (from tensorboard==2.17.0) (4.25.6)\n",
+      "Requirement already satisfied: setuptools>=41.0.0 in /usr/lib/python3/dist-packages (from tensorboard==2.17.0) (59.6.0)\n",
+      "Requirement already satisfied: six>1.9 in /usr/lib/python3/dist-packages (from tensorboard==2.17.0) (1.16.0)\n",
+      "Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /home/amit.samal/.local/lib/python3.10/site-packages (from tensorboard==2.17.0) (0.7.2)\n",
+      "Requirement already satisfied: werkzeug>=1.0.1 in /home/amit.samal/.local/lib/python3.10/site-packages (from tensorboard==2.17.0) (3.1.3)\n",
+      "Requirement already satisfied: importlib-metadata>=6.6.0 in /home/amit.samal/.local/lib/python3.10/site-packages (from yapf==0.40.2) (8.6.1)\n",
+      "Requirement already satisfied: platformdirs>=3.5.1 in /home/amit.samal/.local/lib/python3.10/site-packages (from yapf==0.40.2) (4.2.0)\n",
+      "Requirement already satisfied: zipp>=3.20 in /home/amit.samal/.local/lib/python3.10/site-packages (from importlib-metadata>=6.6.0->yapf==0.40.2) (3.21.0)\n",
+      "Requirement already satisfied: contourpy>=1.0.1 in /home/amit.samal/.local/lib/python3.10/site-packages (from matplotlib>=3.6.2->pydeseq2==0.4.11) (1.2.0)\n",
+      "Requirement already satisfied: cycler>=0.10 in /home/amit.samal/.local/lib/python3.10/site-packages (from matplotlib>=3.6.2->pydeseq2==0.4.11) (0.12.1)\n",
+      "Requirement already satisfied: fonttools>=4.22.0 in /home/amit.samal/.local/lib/python3.10/site-packages (from matplotlib>=3.6.2->pydeseq2==0.4.11) (4.50.0)\n",
+      "Requirement already satisfied: kiwisolver>=1.3.1 in /home/amit.samal/.local/lib/python3.10/site-packages (from matplotlib>=3.6.2->pydeseq2==0.4.11) (1.4.5)\n",
+      "Requirement already satisfied: python-dateutil>=2.7 in /home/amit.samal/.local/lib/python3.10/site-packages (from matplotlib>=3.6.2->pydeseq2==0.4.11) (2.9.0.post0)\n",
+      "Requirement already satisfied: llvmlite<0.43,>=0.42.0dev0 in /home/amit.samal/.local/lib/python3.10/site-packages (from numba>=0.56->scanpy==1.10.3) (0.42.0)\n",
+      "Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas!=2.1.0rc0,!=2.1.2,>=1.4->anndata==0.10.9) (2022.1)\n",
+      "Requirement already satisfied: MarkupSafe>=2.1.1 in /home/amit.samal/.local/lib/python3.10/site-packages (from werkzeug>=1.0.1->tensorboard==2.17.0) (3.0.2)\n",
+      "Requirement already satisfied: stdlib-list in /home/amit.samal/.local/lib/python3.10/site-packages (from session-info->scanpy==1.10.3) (0.10.0)\n",
+      "Downloading loky-3.4.1-py3-none-any.whl (54 kB)\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m54.6/54.6 kB\u001b[0m \u001b[31m1.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\n",
+      "\u001b[?25hInstalling collected packages: loky\n",
+      "Successfully installed loky-3.4.1\n",
+      "\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m25.0.1\u001b[0m\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
+      "Defaulting to user installation because normal site-packages is not writeable\n",
+      "Requirement already satisfied: memory-profiler==0.61.0 in /home/amit.samal/.local/lib/python3.10/site-packages (0.61.0)\n",
+      "Requirement already satisfied: psutil in /home/amit.samal/.local/lib/python3.10/site-packages (from memory-profiler==0.61.0) (5.9.8)\n",
+      "\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m25.0.1\u001b[0m\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
+     ]
+    }
+   ],
+   "source": [
+    "import sys\n",
+    "imported_packages = {pkg.split('.')[0] for pkg in sys.modules.keys()}\n",
+    "ignore_libraries = \"|\".join(imported_packages)\n",
+    "\n",
+    "!pip install $(grep -ivE \"$ignore_libraries\" scaLR/requirements.txt)\n",
+    "!pip install memory-profiler==0.61.0"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# # Uncomment and run the following if the scaLR pipeline is to be executed locally after installation, as explained in Note 2.\n",
+    "# import sys\n",
+    "# sys.path.append('path/to/scaLR/')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "0DvyBaoIPdnX"
+   },
+   "source": [
+    "## <span style=\"color: steelblue;\">Downloading input anndata from `cellxgene`</span>\n",
+    "- Currently the pipeline expects all datasets in [anndata](https://anndata.readthedocs.io/en/latest/tutorials/notebooks/getting-started.html) formats (`.h5ad` files only).\n",
+    "- The anndata object should contain cell samples as `obs` and genes as `var`.\n",
+    "- `adata.X`: contains normalized gene counts/expression values (Typically `log1p` normalized, data ranging from 0-10).\n",
+    "- `adata.obs`: contains any metadata regarding cells, including a column for `target` which will be used for classification. The index of `adata.obs` is cell_barcodes.\n",
+    "- `adata.var`: contains all gene_names as Index.\n",
+    "\n",
+    "The dataset we are about to download contains two clinical conditions (COVID-19 and normal) and links variations in immune response to disease severity and outcomes over time[(Liu et al. (2021))](https://doi.org/10.1016/j.cell.2021.02.018)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "id": "loCfvnwt9ei1"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "--2025-02-27 18:52:02--  https://datasets.cellxgene.cziscience.com/21ef2ea2-cbed-4b6c-a572-0ddd1d9020bc.h5ad\n",
+      "Resolving datasets.cellxgene.cziscience.com (datasets.cellxgene.cziscience.com)... 18.239.111.15, 18.239.111.109, 18.239.111.30, ...\n",
+      "Connecting to datasets.cellxgene.cziscience.com (datasets.cellxgene.cziscience.com)|18.239.111.15|:443... connected.\n",
+      "HTTP request sent, awaiting response... 200 OK\n",
+      "Length: 980103606 (935M) [binary/octet-stream]\n",
+      "Saving to: ‘data/21ef2ea2-cbed-4b6c-a572-0ddd1d9020bc.h5ad’\n",
+      "\n",
+      "21ef2ea2-cbed-4b6c- 100%[===================>] 934.70M  3.21MB/s    in 4m 48s  \n",
+      "\n",
+      "2025-02-27 18:56:51 (3.25 MB/s) - ‘data/21ef2ea2-cbed-4b6c-a572-0ddd1d9020bc.h5ad’ saved [980103606/980103606]\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# This shell will take approximately 00:00:53 (hh:mm:ss) to run.\n",
+    "!wget -P data https://datasets.cellxgene.cziscience.com/21ef2ea2-cbed-4b6c-a572-0ddd1d9020bc.h5ad"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "tSiYIOo8P_3b"
+   },
+   "source": [
+    "## <span style=\"color: steelblue;\">Data exploration</span>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "id": "23C87j3PR9ox"
+   },
+   "outputs": [],
+   "source": [
+    "from IPython.display import SVG, display\n",
+    "import warnings\n",
+    "import anndata as ad\n",
+    "from anndata import AnnData\n",
+    "import numpy as np\n",
+    "import pandas as pd"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "id": "eDH3GxXr-er6"
+   },
+   "outputs": [],
+   "source": [
+    "adata = ad.read_h5ad(\"data/21ef2ea2-cbed-4b6c-a572-0ddd1d9020bc.h5ad\",backed='r')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "id": "SS4oTWW6Xn8c"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "The anndata has '125117' cells and '30695' genes\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(f\"\\nThe anndata has '{adata.n_obs}' cells and '{adata.n_vars}' genes\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "id": "z1u-kctbSStJ"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>dsm_severity_score_group</th>\n",
+       "      <th>disease_ontology_term_id</th>\n",
+       "      <th>severity</th>\n",
+       "      <th>tissue_ontology_term_id</th>\n",
+       "      <th>timepoint</th>\n",
+       "      <th>outcome</th>\n",
+       "      <th>dsm_severity_score</th>\n",
+       "      <th>days_since_hospitalized</th>\n",
+       "      <th>age</th>\n",
+       "      <th>donor_id</th>\n",
+       "      <th>...</th>\n",
+       "      <th>tissue_type</th>\n",
+       "      <th>cell_type</th>\n",
+       "      <th>assay</th>\n",
+       "      <th>disease</th>\n",
+       "      <th>organism</th>\n",
+       "      <th>sex</th>\n",
+       "      <th>tissue</th>\n",
+       "      <th>self_reported_ethnicity</th>\n",
+       "      <th>development_stage</th>\n",
+       "      <th>observation_joinid</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>AAACCTGAGAAACCTA-1_1</th>\n",
+       "      <td>DSM_low</td>\n",
+       "      <td>MONDO:0100096</td>\n",
+       "      <td>Moderate</td>\n",
+       "      <td>UBERON:0000178</td>\n",
+       "      <td>T0</td>\n",
+       "      <td>alive</td>\n",
+       "      <td>-1.950858</td>\n",
+       "      <td>1.0</td>\n",
+       "      <td>55.0</td>\n",
+       "      <td>HGR0000083</td>\n",
+       "      <td>...</td>\n",
+       "      <td>tissue</td>\n",
+       "      <td>non-classical monocyte</td>\n",
+       "      <td>10x 5' v1</td>\n",
+       "      <td>COVID-19</td>\n",
+       "      <td>Homo sapiens</td>\n",
+       "      <td>male</td>\n",
+       "      <td>blood</td>\n",
+       "      <td>European</td>\n",
+       "      <td>55-year-old stage</td>\n",
+       "      <td>!9L}G4hgnw</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>AAACCTGAGGGTTTCT-1_1</th>\n",
+       "      <td>DSM_high</td>\n",
+       "      <td>MONDO:0100096</td>\n",
+       "      <td>Critical</td>\n",
+       "      <td>UBERON:0000178</td>\n",
+       "      <td>T0</td>\n",
+       "      <td>alive</td>\n",
+       "      <td>-0.092375</td>\n",
+       "      <td>13.0</td>\n",
+       "      <td>40.0</td>\n",
+       "      <td>HGR0000078</td>\n",
+       "      <td>...</td>\n",
+       "      <td>tissue</td>\n",
+       "      <td>classical monocyte</td>\n",
+       "      <td>10x 5' v1</td>\n",
+       "      <td>COVID-19</td>\n",
+       "      <td>Homo sapiens</td>\n",
+       "      <td>female</td>\n",
+       "      <td>blood</td>\n",
+       "      <td>European</td>\n",
+       "      <td>40-year-old stage</td>\n",
+       "      <td>YRcUzlVyg0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>AAACCTGCACCTGGTG-1_1</th>\n",
+       "      <td>DSM_high</td>\n",
+       "      <td>MONDO:0100096</td>\n",
+       "      <td>Critical</td>\n",
+       "      <td>UBERON:0000178</td>\n",
+       "      <td>T0</td>\n",
+       "      <td>alive</td>\n",
+       "      <td>2.954350</td>\n",
+       "      <td>1.0</td>\n",
+       "      <td>60.0</td>\n",
+       "      <td>HGR0000098</td>\n",
+       "      <td>...</td>\n",
+       "      <td>tissue</td>\n",
+       "      <td>CD16-positive, CD56-dim natural killer cell, h...</td>\n",
+       "      <td>10x 5' v1</td>\n",
+       "      <td>COVID-19</td>\n",
+       "      <td>Homo sapiens</td>\n",
+       "      <td>male</td>\n",
+       "      <td>blood</td>\n",
+       "      <td>European</td>\n",
+       "      <td>60-year-old stage</td>\n",
+       "      <td>)*azge@M0l</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>AAACCTGGTCCGAGTC-1_1</th>\n",
+       "      <td>DSM_high</td>\n",
+       "      <td>MONDO:0100096</td>\n",
+       "      <td>Critical</td>\n",
+       "      <td>UBERON:0000178</td>\n",
+       "      <td>T0</td>\n",
+       "      <td>deceased</td>\n",
+       "      <td>3.276233</td>\n",
+       "      <td>6.0</td>\n",
+       "      <td>76.0</td>\n",
+       "      <td>HGR0000141</td>\n",
+       "      <td>...</td>\n",
+       "      <td>tissue</td>\n",
+       "      <td>classical monocyte</td>\n",
+       "      <td>10x 5' v1</td>\n",
+       "      <td>COVID-19</td>\n",
+       "      <td>Homo sapiens</td>\n",
+       "      <td>male</td>\n",
+       "      <td>blood</td>\n",
+       "      <td>European</td>\n",
+       "      <td>76-year-old stage</td>\n",
+       "      <td>E&lt;FU`+QN&amp;T</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>AAACCTGGTGCCTTGG-1_1</th>\n",
+       "      <td>DSM_low</td>\n",
+       "      <td>MONDO:0100096</td>\n",
+       "      <td>Critical</td>\n",
+       "      <td>UBERON:0000178</td>\n",
+       "      <td>T0</td>\n",
+       "      <td>alive</td>\n",
+       "      <td>-0.348888</td>\n",
+       "      <td>1.0</td>\n",
+       "      <td>70.0</td>\n",
+       "      <td>HGR0000093</td>\n",
+       "      <td>...</td>\n",
+       "      <td>tissue</td>\n",
+       "      <td>classical monocyte</td>\n",
+       "      <td>10x 5' v1</td>\n",
+       "      <td>COVID-19</td>\n",
+       "      <td>Homo sapiens</td>\n",
+       "      <td>male</td>\n",
+       "      <td>blood</td>\n",
+       "      <td>European</td>\n",
+       "      <td>70-year-old stage</td>\n",
+       "      <td>2MZ#6SX}{g</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>5 rows × 32 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                     dsm_severity_score_group disease_ontology_term_id  \\\n",
+       "AAACCTGAGAAACCTA-1_1                  DSM_low            MONDO:0100096   \n",
+       "AAACCTGAGGGTTTCT-1_1                 DSM_high            MONDO:0100096   \n",
+       "AAACCTGCACCTGGTG-1_1                 DSM_high            MONDO:0100096   \n",
+       "AAACCTGGTCCGAGTC-1_1                 DSM_high            MONDO:0100096   \n",
+       "AAACCTGGTGCCTTGG-1_1                  DSM_low            MONDO:0100096   \n",
+       "\n",
+       "                      severity tissue_ontology_term_id timepoint   outcome  \\\n",
+       "AAACCTGAGAAACCTA-1_1  Moderate          UBERON:0000178        T0     alive   \n",
+       "AAACCTGAGGGTTTCT-1_1  Critical          UBERON:0000178        T0     alive   \n",
+       "AAACCTGCACCTGGTG-1_1  Critical          UBERON:0000178        T0     alive   \n",
+       "AAACCTGGTCCGAGTC-1_1  Critical          UBERON:0000178        T0  deceased   \n",
+       "AAACCTGGTGCCTTGG-1_1  Critical          UBERON:0000178        T0     alive   \n",
+       "\n",
+       "                      dsm_severity_score days_since_hospitalized   age  \\\n",
+       "AAACCTGAGAAACCTA-1_1           -1.950858                     1.0  55.0   \n",
+       "AAACCTGAGGGTTTCT-1_1           -0.092375                    13.0  40.0   \n",
+       "AAACCTGCACCTGGTG-1_1            2.954350                     1.0  60.0   \n",
+       "AAACCTGGTCCGAGTC-1_1            3.276233                     6.0  76.0   \n",
+       "AAACCTGGTGCCTTGG-1_1           -0.348888                     1.0  70.0   \n",
+       "\n",
+       "                        donor_id  ... tissue_type  \\\n",
+       "AAACCTGAGAAACCTA-1_1  HGR0000083  ...      tissue   \n",
+       "AAACCTGAGGGTTTCT-1_1  HGR0000078  ...      tissue   \n",
+       "AAACCTGCACCTGGTG-1_1  HGR0000098  ...      tissue   \n",
+       "AAACCTGGTCCGAGTC-1_1  HGR0000141  ...      tissue   \n",
+       "AAACCTGGTGCCTTGG-1_1  HGR0000093  ...      tissue   \n",
+       "\n",
+       "                                                              cell_type  \\\n",
+       "AAACCTGAGAAACCTA-1_1                             non-classical monocyte   \n",
+       "AAACCTGAGGGTTTCT-1_1                                 classical monocyte   \n",
+       "AAACCTGCACCTGGTG-1_1  CD16-positive, CD56-dim natural killer cell, h...   \n",
+       "AAACCTGGTCCGAGTC-1_1                                 classical monocyte   \n",
+       "AAACCTGGTGCCTTGG-1_1                                 classical monocyte   \n",
+       "\n",
+       "                          assay   disease      organism     sex tissue  \\\n",
+       "AAACCTGAGAAACCTA-1_1  10x 5' v1  COVID-19  Homo sapiens    male  blood   \n",
+       "AAACCTGAGGGTTTCT-1_1  10x 5' v1  COVID-19  Homo sapiens  female  blood   \n",
+       "AAACCTGCACCTGGTG-1_1  10x 5' v1  COVID-19  Homo sapiens    male  blood   \n",
+       "AAACCTGGTCCGAGTC-1_1  10x 5' v1  COVID-19  Homo sapiens    male  blood   \n",
+       "AAACCTGGTGCCTTGG-1_1  10x 5' v1  COVID-19  Homo sapiens    male  blood   \n",
+       "\n",
+       "                     self_reported_ethnicity  development_stage  \\\n",
+       "AAACCTGAGAAACCTA-1_1                European  55-year-old stage   \n",
+       "AAACCTGAGGGTTTCT-1_1                European  40-year-old stage   \n",
+       "AAACCTGCACCTGGTG-1_1                European  60-year-old stage   \n",
+       "AAACCTGGTCCGAGTC-1_1                European  76-year-old stage   \n",
+       "AAACCTGGTGCCTTGG-1_1                European  70-year-old stage   \n",
+       "\n",
+       "                      observation_joinid  \n",
+       "AAACCTGAGAAACCTA-1_1          !9L}G4hgnw  \n",
+       "AAACCTGAGGGTTTCT-1_1          YRcUzlVyg0  \n",
+       "AAACCTGCACCTGGTG-1_1          )*azge@M0l  \n",
+       "AAACCTGGTCCGAGTC-1_1          E<FU`+QN&T  \n",
+       "AAACCTGGTGCCTTGG-1_1          2MZ#6SX}{g  \n",
+       "\n",
+       "[5 rows x 32 columns]"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Cell metadata\n",
+    "adata.obs.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "id": "_tyf2F8CfjEh"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "classical monocyte                                       78908\n",
+       "CD16-positive, CD56-dim natural killer cell, human       28705\n",
+       "non-classical monocyte                                    6160\n",
+       "natural killer cell                                       3825\n",
+       "platelet                                                  3370\n",
+       "CD16-negative, CD56-bright natural killer cell, human     1237\n",
+       "conventional dendritic cell                                991\n",
+       "plasmacytoid dendritic cell                                787\n",
+       "granulocyte                                                776\n",
+       "intermediate monocyte                                      358\n",
+       "Name: cell_type, dtype: int64"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "adata.obs.cell_type.value_counts()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "id": "5_3LHFBzfINy"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "['non-classical monocyte', 'classical monocyte', 'CD16-positive, CD56-dim natural killer cell, ..., 'natural killer cell', 'plasmacytoid dendritic cell', 'conventional dendritic cell', 'platelet', 'CD16-negative, CD56-bright natural killer cel..., 'granulocyte', 'intermediate monocyte']\n",
+       "Categories (10, object): ['granulocyte', 'platelet', 'natural killer cell', 'plasmacytoid dendritic cell', ..., 'CD16-negative, CD56-bright natural killer cel..., 'CD16-positive, CD56-dim natural killer cell, ..., 'conventional dendritic cell', 'intermediate monocyte']"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Number of cell types\n",
+    "adata.obs.cell_type.unique()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "id": "9XkI2eftfdyd"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "['HGR0000083', 'HGR0000078', 'HGR0000098', 'HGR0000141', 'HGR0000093', ..., 'SHD3', 'HGR0000101', 'HGR0000135', 'SHD5', 'SHD6']\n",
+       "Length: 46\n",
+       "Categories (46, object): ['AA220014', 'AA220534', 'AA220907', 'HDML', ..., 'SHD4', 'SHD5', 'SHD6', 'SHD7']"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Number of donors\n",
+    "adata.obs.donor_id.unique()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "id": "9Vdh27XgfVB6"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "COVID-19    99152\n",
+       "normal      25965\n",
+       "Name: disease, dtype: int64"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Number of clinical conditions\n",
+    "adata.obs.disease.value_counts()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "id": "22QwSULlUjcO"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([[0.        , 0.        , 0.        , 0.        , 0.        ,\n",
+       "        0.        , 0.        , 0.        , 0.        , 0.        ],\n",
+       "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
+       "        0.        , 0.        , 0.        , 0.        , 0.        ],\n",
+       "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
+       "        0.        , 0.        , 0.        , 0.        , 0.        ],\n",
+       "       [0.        , 0.        , 0.        , 0.        , 0.        ,\n",
+       "        0.        , 0.        , 0.        , 0.        , 0.        ],\n",
+       "       [0.        , 0.99008936, 0.        , 0.        , 0.        ,\n",
+       "        0.        , 0.        , 0.        , 0.        , 0.        ]],\n",
+       "      dtype=float32)"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "#Gene expression values of first 5 cells and 10 genes.\n",
+    "adata.X[:5,:10].A"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "id": "tHyKtjllgTYf"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([2264.9421, 2374.6707, 2097.2356, 2345.2798, 2542.3647, 2362.8406,\n",
+       "       2241.9297, 1986.2373, 2578.1968, 2652.637 ], dtype=float32)"
+      ]
+     },
+     "execution_count": 13,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Verifying normalized values in X\n",
+    "# Getting the sum of gene expression values for the first 10 cells (should be floating-point values).\n",
+    "adata.X[:10,:].A.sum(axis=1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {
+    "id": "pTRA4cFsgYTa"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Max value : 8.524538040161133 | Min value : 0.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Getting the maximum and minimum gene expression values for the first 1000 cells.\n",
+    "max_val = np.max(adata.X[:1000, :].A)\n",
+    "min_val = np.min(adata.X[:1000, :].A)\n",
+    "print(f'Max value : {max_val} | Min value : {min_val}')\n",
+    "# Raising a warning if the values are outside the 0-10 range\n",
+    "if max_val > 10 or min_val < 0:\n",
+    "    warnings.warn(f\"Warning: Expression Value out of range! Max: {max_val}, Min: {min_val}. Expected range is 0-10.\", UserWarning)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {
+    "id": "bd2fTv0gdluU"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>mvp.mean</th>\n",
+       "      <th>mvp.dispersion</th>\n",
+       "      <th>mvp.dispersion.scaled</th>\n",
+       "      <th>mvp.variable</th>\n",
+       "      <th>feature_is_filtered</th>\n",
+       "      <th>feature_name</th>\n",
+       "      <th>feature_reference</th>\n",
+       "      <th>feature_biotype</th>\n",
+       "      <th>feature_length</th>\n",
+       "      <th>feature_type</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>ENSG00000168454</th>\n",
+       "      <td>0.000380</td>\n",
+       "      <td>1.168876</td>\n",
+       "      <td>0.181734</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>TXNDC2</td>\n",
+       "      <td>NCBITaxon:9606</td>\n",
+       "      <td>gene</td>\n",
+       "      <td>1703</td>\n",
+       "      <td>protein_coding</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>ENSG00000197852</th>\n",
+       "      <td>0.035995</td>\n",
+       "      <td>1.634179</td>\n",
+       "      <td>0.886458</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>INKA2</td>\n",
+       "      <td>NCBITaxon:9606</td>\n",
+       "      <td>gene</td>\n",
+       "      <td>1217</td>\n",
+       "      <td>protein_coding</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>ENSG00000196878</th>\n",
+       "      <td>0.008862</td>\n",
+       "      <td>1.617729</td>\n",
+       "      <td>0.861545</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>LAMB3</td>\n",
+       "      <td>NCBITaxon:9606</td>\n",
+       "      <td>gene</td>\n",
+       "      <td>3931</td>\n",
+       "      <td>protein_coding</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>ENSG00000256540</th>\n",
+       "      <td>0.000022</td>\n",
+       "      <td>1.660993</td>\n",
+       "      <td>0.927070</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>IQSEC3-AS1</td>\n",
+       "      <td>NCBITaxon:9606</td>\n",
+       "      <td>gene</td>\n",
+       "      <td>1065</td>\n",
+       "      <td>lncRNA</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>ENSG00000139180</th>\n",
+       "      <td>0.090100</td>\n",
+       "      <td>1.184720</td>\n",
+       "      <td>0.205731</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NDUFA9</td>\n",
+       "      <td>NCBITaxon:9606</td>\n",
+       "      <td>gene</td>\n",
+       "      <td>782</td>\n",
+       "      <td>protein_coding</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                 mvp.mean  mvp.dispersion  mvp.dispersion.scaled  \\\n",
+       "ENSG00000168454  0.000380        1.168876               0.181734   \n",
+       "ENSG00000197852  0.035995        1.634179               0.886458   \n",
+       "ENSG00000196878  0.008862        1.617729               0.861545   \n",
+       "ENSG00000256540  0.000022        1.660993               0.927070   \n",
+       "ENSG00000139180  0.090100        1.184720               0.205731   \n",
+       "\n",
+       "                 mvp.variable  feature_is_filtered feature_name  \\\n",
+       "ENSG00000168454         False                False       TXNDC2   \n",
+       "ENSG00000197852         False                False        INKA2   \n",
+       "ENSG00000196878         False                False        LAMB3   \n",
+       "ENSG00000256540         False                False   IQSEC3-AS1   \n",
+       "ENSG00000139180         False                False       NDUFA9   \n",
+       "\n",
+       "                feature_reference feature_biotype feature_length  \\\n",
+       "ENSG00000168454    NCBITaxon:9606            gene           1703   \n",
+       "ENSG00000197852    NCBITaxon:9606            gene           1217   \n",
+       "ENSG00000196878    NCBITaxon:9606            gene           3931   \n",
+       "ENSG00000256540    NCBITaxon:9606            gene           1065   \n",
+       "ENSG00000139180    NCBITaxon:9606            gene            782   \n",
+       "\n",
+       "                   feature_type  \n",
+       "ENSG00000168454  protein_coding  \n",
+       "ENSG00000197852  protein_coding  \n",
+       "ENSG00000196878  protein_coding  \n",
+       "ENSG00000256540          lncRNA  \n",
+       "ENSG00000139180  protein_coding  "
+      ]
+     },
+     "execution_count": 15,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "#Gene metadata\n",
+    "adata.var.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "sfgBeaLumPuV"
+   },
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "QLTg-WK-hTS7"
+   },
+   "source": [
+    "### <span style=\"color: steelblue;\">Modifying `var` index (Optional)</span>\n",
+    "- The `index` values in this AnnData object are the `gene_ids`. To retrieve the literature genes associated with a particular cell type, we need the gene symbols, which are present in `feature_name` column. Therefore, we'll replace the index values with gene symbols.\n",
+    "- This will be helpful when analyzing the `GeneRecallCurve` later.\n",
+    "- This step can be skipped if the `reference_genes.csv` already contains gene IDs corresponding to each cell type, or if the user does not want to perform the `GeneRecallCurve` analysis.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {
+    "id": "qoSHdJtwgPaA"
+   },
+   "outputs": [],
+   "source": [
+    "adata.var.set_index('feature_name',inplace=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {
+    "id": "p3LvDmZmhJ_c"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>mvp.mean</th>\n",
+       "      <th>mvp.dispersion</th>\n",
+       "      <th>mvp.dispersion.scaled</th>\n",
+       "      <th>mvp.variable</th>\n",
+       "      <th>feature_is_filtered</th>\n",
+       "      <th>feature_reference</th>\n",
+       "      <th>feature_biotype</th>\n",
+       "      <th>feature_length</th>\n",
+       "      <th>feature_type</th>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>feature_name</th>\n",
+       "      <th></th>\n",
+       "      <th></th>\n",
+       "      <th></th>\n",
+       "      <th></th>\n",
+       "      <th></th>\n",
+       "      <th></th>\n",
+       "      <th></th>\n",
+       "      <th></th>\n",
+       "      <th></th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>TXNDC2</th>\n",
+       "      <td>0.000380</td>\n",
+       "      <td>1.168876</td>\n",
+       "      <td>0.181734</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NCBITaxon:9606</td>\n",
+       "      <td>gene</td>\n",
+       "      <td>1703</td>\n",
+       "      <td>protein_coding</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>INKA2</th>\n",
+       "      <td>0.035995</td>\n",
+       "      <td>1.634179</td>\n",
+       "      <td>0.886458</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NCBITaxon:9606</td>\n",
+       "      <td>gene</td>\n",
+       "      <td>1217</td>\n",
+       "      <td>protein_coding</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>LAMB3</th>\n",
+       "      <td>0.008862</td>\n",
+       "      <td>1.617729</td>\n",
+       "      <td>0.861545</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NCBITaxon:9606</td>\n",
+       "      <td>gene</td>\n",
+       "      <td>3931</td>\n",
+       "      <td>protein_coding</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>IQSEC3-AS1</th>\n",
+       "      <td>0.000022</td>\n",
+       "      <td>1.660993</td>\n",
+       "      <td>0.927070</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NCBITaxon:9606</td>\n",
+       "      <td>gene</td>\n",
+       "      <td>1065</td>\n",
+       "      <td>lncRNA</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>NDUFA9</th>\n",
+       "      <td>0.090100</td>\n",
+       "      <td>1.184720</td>\n",
+       "      <td>0.205731</td>\n",
+       "      <td>False</td>\n",
+       "      <td>False</td>\n",
+       "      <td>NCBITaxon:9606</td>\n",
+       "      <td>gene</td>\n",
+       "      <td>782</td>\n",
+       "      <td>protein_coding</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "              mvp.mean  mvp.dispersion  mvp.dispersion.scaled  mvp.variable  \\\n",
+       "feature_name                                                                  \n",
+       "TXNDC2        0.000380        1.168876               0.181734         False   \n",
+       "INKA2         0.035995        1.634179               0.886458         False   \n",
+       "LAMB3         0.008862        1.617729               0.861545         False   \n",
+       "IQSEC3-AS1    0.000022        1.660993               0.927070         False   \n",
+       "NDUFA9        0.090100        1.184720               0.205731         False   \n",
+       "\n",
+       "              feature_is_filtered feature_reference feature_biotype  \\\n",
+       "feature_name                                                          \n",
+       "TXNDC2                      False    NCBITaxon:9606            gene   \n",
+       "INKA2                       False    NCBITaxon:9606            gene   \n",
+       "LAMB3                       False    NCBITaxon:9606            gene   \n",
+       "IQSEC3-AS1                  False    NCBITaxon:9606            gene   \n",
+       "NDUFA9                      False    NCBITaxon:9606            gene   \n",
+       "\n",
+       "             feature_length    feature_type  \n",
+       "feature_name                                 \n",
+       "TXNDC2                 1703  protein_coding  \n",
+       "INKA2                  1217  protein_coding  \n",
+       "LAMB3                  3931  protein_coding  \n",
+       "IQSEC3-AS1             1065          lncRNA  \n",
+       "NDUFA9                  782  protein_coding  "
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Now the index values are the gene symbols.\n",
+    "adata.var.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {
+    "id": "6yCi6UQ-kh0Q"
+   },
+   "outputs": [],
+   "source": [
+    "# Saving file for further analysis\n",
+    "# This shell will take approximately 00:00:47 (hh:mm:ss) to run.\n",
+    "adata.obs.index = adata.obs.index.astype(str)\n",
+    "adata.var.index = adata.var.index.astype(str)\n",
+    "AnnData(X=adata.X,obs=adata.obs,var=adata.var).write('data/modified_adata.h5ad',compression='gzip')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "e1WBarmdY0h5"
+   },
+   "source": [
+    "## <span style=\"color: steelblue;\">scaLR pipeline </span>\n",
+    "\n",
+    "1. The **scaLR** pipeline consists of four stages:\n",
+    "   - Data ingestion\n",
+    "   - Feature selection\n",
+    "   - Final model training\n",
+    "   - Analysis\n",
+    "\n",
+    "2. The user needs to modify the configuration file (`config.yml`) available at `scaLR/config` for each stage of the pipeline according to the requirements. Simply omit or comment out the stages of the pipeline that you do not wish to run.\n",
+    "\n",
+    "3. Refer to `config.yml` and its detailed configuration [README](https://github.com/infocusp/scaLR/blob/main/config/README.md) file for instructions on how to use different parameters and files.\n",
+    "\n",
+    "### <span style=\"color: steelblue;\">Config edits (For Cell Type Classification and Biomarker Identification)</span>\n",
+    "\n",
+    "NOTE: Below are just suggestions for the model parameters. Feel free to play around with them for tuning the model & improving the results.\n",
+    "\n",
+    "*An example configuration file for the current dataset, incorporating the edits below, can be found at `scaLR/tutorials/pipeline/config_celltype.yaml`. Please update the device as `cuda` or `cpu` as per runtype.*\n",
+    "\n",
+    "- **Device setup**.\n",
+    "  -Update `device: 'cuda'` for `GPU` enabled runtype, else `device: 'cpu'` for `CPU` enabled  runtype.\n",
+    "- **Experiment Config**\n",
+    "  - The default `exp_run` number is `0`.If not changed, the celltype classification experiment would be `exp_run_0` with all the pipeline results.\n",
+    "- **Data Config**\n",
+    "  - Update the `full_datapath` to `data/modified_adata.h5ad` (as we will include `GeneRecallCurve` in the downstream).\n",
+    "  - Specify the `num_workers` value for effective parallelization.\n",
+    "  - Set `target` to `cell_type`.\n",
+    "- **Feature Selection**\n",
+    "  - Specify the `num_workers` value for effective parallelization.\n",
+    "  - Update the model layers to `[5000, 10]`, as there are only 10 cell types in the dataset.\n",
+    "  - Change `epoch` to `10`.\n",
+    "- **Final Model Training**\n",
+    "  - Update the model layers to the same as for feature selection: `[5000, 10]`.\n",
+    "  - Change `epoch` to `100`.\n",
+    "- **Analysis**\n",
+    "  - **Downstream Analysis**\n",
+    "    - Uncomment the `test_samples_downstream_analysis` section.\n",
+    "    - Update the `reference_genes_path` to `scaLR/tutorials/pipeline/grc_reference_gene.csv`.\n",
+    "    - Please refer to the section below:\n",
+    "\n",
+    "    ```\n",
+    "    analysis:\n",
+    "\n",
+    "        model_checkpoint: ''\n",
+    "\n",
+    "        dataloader:\n",
+    "            name: SimpleDataLoader\n",
+    "            params:\n",
+    "                batch_size: 15000\n",
+    "\n",
+    "        gene_analysis:\n",
+    "            scoring_config:\n",
+    "                name: LinearScorer\n",
+    "\n",
+    "            features_selector:\n",
+    "                name: ClasswisePromoters\n",
+    "                params:\n",
+    "                    k: 100\n",
+    "        test_samples_downstream_analysis:\n",
+    "            - name: GeneRecallCurve\n",
+    "              params:\n",
+    "                reference_genes_path: 'scaLR/tutorials/pipeline/grc_reference_gene.csv'\n",
+    "                top_K: 300\n",
+    "                plots_per_row: 3\n",
+    "                features_selector:\n",
+    "                    name: ClasswiseAbs\n",
+    "                    params: {}\n",
+    "            - name: Heatmap\n",
+    "              params: {}\n",
+    "            - name: RocAucCurve\n",
+    "              params: {}\n",
+    "\n",
+    "\n",
+    "\n",
+    "### <span style=\"color: steelblue;\">Config edits (For clinical condition specific biomarker identification and DGE analysis) </span>\n",
+    "\n",
+    "*An example configuration file for the current dataset, incorporating the edits below, can be found at : `scaLR/tutorials/pipeline/config_clinical.yaml`.Please update the device as `cuda` or `cpu` as per runtype*\n",
+    "\n",
+    "- **Experiment Config**\n",
+    "  - Make sure to change the `exp_run` number if you have an experiment with the same number earlier related to cell classification.As we have done one experiment earlier, we'll change the number now to '1'.\n",
+    "- **Data Config**\n",
+    "  - The `full_datapath` remains the same as above.\n",
+    "  - Change the `target` to `disease` (this column contains data for clinical conditions, `COVID-19/normal`).\n",
+    "- **Feature Selection**\n",
+    "  - Update the model layers to `[5000, 2]`, as there are only two types of clinical conditions.\n",
+    "  -`epoch` as 10.\n",
+    "- **Final Model Training**\n",
+    "  - Update the model layers to the same as for feature selection: `[5000, 2]`.\n",
+    "  - `epoch` as 100.\n",
+    "- **Analysis**\n",
+    "  - **Downstream Analysis**\n",
+    "    - Uncomment the `full_samples_downstream_analysis` section.\n",
+    "    - We are not performing the 'gene_recall_curve' analysis in this case. It can be performed if the `COVID-19/normal` specific genes are available, but there are many possibilities of genes in the case of normal conditions.\n",
+    "    - There are two options to perform differential gene expression (DGE) analysis: `DgePseudoBulk` and `DgeLMEM`. The parameters are updated as follows. Note that `DgeLMEM` may take a bit more time, as the multiprocessing is not very efficient with only 2 CPUs in the current Colab runtime.\n",
+    "    - Please refer to the section below:\n",
+    "    ```\n",
+    "    analysis:\n",
+    "\n",
+    "        model_checkpoint: ''\n",
+    "\n",
+    "        dataloader:\n",
+    "            name: SimpleDataLoader\n",
+    "            params:\n",
+    "                batch_size: 15000\n",
+    "\n",
+    "        gene_analysis:\n",
+    "            scoring_config:\n",
+    "                name: LinearScorer\n",
+    "\n",
+    "            features_selector:\n",
+    "                name: ClasswisePromoters\n",
+    "                params:\n",
+    "                    k: 100\n",
+    "        full_samples_downstream_analysis:\n",
+    "            - name: Heatmap\n",
+    "              params:\n",
+    "                top_n_genes: 100\n",
+    "            - name: RocAucCurve\n",
+    "              params: {}\n",
+    "            - name: DgePseudoBulk\n",
+    "              params:\n",
+    "                  celltype_column: 'cell_type'\n",
+    "                  design_factor: 'disease'\n",
+    "                  factor_categories: ['COVID-19', 'normal']\n",
+    "                  sum_column: 'donor_id'\n",
+    "                  cell_subsets: ['conventional dendritic cell', 'natural killer cell']\n",
+    "            - name: DgeLMEM\n",
+    "              params:\n",
+    "                fixed_effect_column: 'disease'\n",
+    "                fixed_effect_factors: ['COVID-19', 'normal']\n",
+    "                group: 'donor_id'\n",
+    "                celltype_column: 'cell_type'\n",
+    "                cell_subsets: ['conventional dendritic cell']\n",
+    "                gene_batch_size: 1000\n",
+    "                coef_threshold: 0.1\n",
+    "                "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "Wny28AQQm6xB"
+   },
+   "source": [
+    "### <span style=\"color: steelblue;\">Run Pipeline </span>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {
+    "id": "uLgN7MDv7hV-"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "/bin/bash: line 1: python: command not found\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Possible flags using 'scaLR/pipeline.py'\n",
+    "!python scaLR/pipeline.py --help"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "kTAOOj1CgjJy"
+   },
+   "source": [
+    "#### Cell type classification"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "metadata": {
+    "id": "xqvT9AiQFVGq"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "2025-02-27 19:02:51,535 - ROOT - INFO : Experiment directory: `scalr_experiments/exp_name_0`\n",
+      "2025-02-27 19:02:51,544 - ROOT - INFO : Data Ingestion pipeline running\n",
+      "2025-02-27 19:02:51,544 - DataIngestion - INFO : Generating Train, Validation and Test sets\n",
+      "2025-02-27 19:03:35,769 - DataIngestion - INFO : Generate label mappings for all columns in metadata\n",
+      "2025-02-27 19:03:36,946 - ROOT - INFO : Feature Extraction pipeline running\n",
+      "2025-02-27 19:03:36,946 - File Utils - INFO : Data Loaded from Final datapaths\n",
+      "2025-02-27 19:03:37,467 - FeatureExtraction - INFO : Feature subset models training\n",
+      "2025-02-27 19:05:09,181 - ModelTraining - INFO : Building model training artifacts\n",
+      "2025-02-27 19:05:09,253 - ModelTraining - INFO : Building model training artifacts\n",
+      "2025-02-27 19:05:09,295 - ModelTraining - INFO : Building model training artifacts\n",
+      "2025-02-27 19:05:09,393 - ModelTraining - INFO : Building model training artifacts\n",
+      "2025-02-27 19:05:09,750 - ModelTraining - INFO : Training the model\n",
+      "2025-02-27 19:05:09,751 - ModelTraining - INFO : Training the model\n",
+      "2025-02-27 19:05:09,770 - ModelTraining - INFO : Training the model\n",
+      "2025-02-27 19:05:09,881 - ModelTraining - INFO : Training the model\n",
+      "2025-02-27 19:05:16,105 - ModelTraining - INFO : Building model training artifacts\n",
+      "2025-02-27 19:05:16,106 - ModelTraining - INFO : Training the model\n",
+      "2025-02-27 19:05:16,153 - ModelTraining - INFO : Building model training artifacts\n",
+      "2025-02-27 19:05:16,154 - ModelTraining - INFO : Training the model\n",
+      "2025-02-27 19:05:16,168 - ModelTraining - INFO : Building model training artifacts\n",
+      "2025-02-27 19:05:16,174 - ModelTraining - INFO : Training the model\n",
+      "2025-02-27 19:05:20,327 - FeatureExtraction - INFO : Feature scoring\n",
+      "2025-02-27 19:05:20,712 - FeatureExtraction - INFO : Top features extraction\n",
+      "2025-02-27 19:05:20,719 - FeatureExtraction - INFO : Writing feature-subset data onto disk\n",
+      "2025-02-27 19:05:51,902 - ROOT - INFO : Final Model Training pipeline running\n",
+      "2025-02-27 19:05:51,905 - File Utils - INFO : Data Loaded from Feature subset datapaths\n",
+      "2025-02-27 19:05:52,382 - ModelTraining - INFO : Building model training artifacts\n",
+      "2025-02-27 19:05:52,841 - ModelTraining - INFO : Training the model\n",
+      "2025-02-27 19:05:59,278 - ROOT - INFO : Analysis pipeline running\n",
+      "2025-02-27 19:05:59,281 - File Utils - INFO : Data Loaded from Feature subset datapaths\n",
+      "2025-02-27 19:05:59,676 - File Utils - INFO : Data Loaded from Feature subset datapaths\n",
+      "2025-02-27 19:05:59,805 - File Utils - INFO : Data Loaded from Feature subset datapaths\n",
+      "2025-02-27 19:06:00,379 - Eval&Analysis - INFO : Calculating accuracy and generating classification report on test set\n",
+      "/home/amit.samal/.local/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1531: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
+      "  _warn_prf(average, modifier, f\"{metric.capitalize()} is\", len(result))\n",
+      "/home/amit.samal/.local/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1531: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
+      "  _warn_prf(average, modifier, f\"{metric.capitalize()} is\", len(result))\n",
+      "/home/amit.samal/.local/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1531: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
+      "  _warn_prf(average, modifier, f\"{metric.capitalize()} is\", len(result))\n",
+      "2025-02-27 19:06:03,433 - Eval&Analysis - INFO : Performing gene analysis\n",
+      "2025-02-27 19:06:03,433 - FeatureExtraction - INFO : Feature scoring\n",
+      "2025-02-27 19:06:03,471 - FeatureExtraction - INFO : Top features extraction\n",
+      "2025-02-27 19:06:03,540 - Eval&Analysis - INFO : Performing Downstream Analysis on test samples\n",
+      "2025-02-27 19:06:03,540 - Eval&Analysis - INFO : Performing GeneRecallCurve\n",
+      "2025-02-27 19:06:04,781 - Eval&Analysis - INFO : Performing Heatmap\n",
+      "2025-02-27 19:06:09,548 - Eval&Analysis - INFO : Performing RocAucCurve\n",
+      "2025-02-27 19:06:09,929 - ROOT - INFO : Total time taken: 198.401921749115 s\n",
+      "2025-02-27 19:06:09,929 - ROOT - INFO : Maximum memory usage: 1915.5625 MB\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Command to run end to end pipeline.\n",
+    "# This shell will take approximately 00:21:15 (hh:mm:ss) on GPU to run.()\n",
+    "\n",
+    "!python3 scaLR/pipeline.py --config scaLR/tutorials/pipeline/config_celltype.yaml -l -m"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "0IRSOT64gjJy"
+   },
+   "source": [
+    "#### Clinical condition specific biomarker identification and differential gene expression analysis"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "e71LHxUvgjJy"
+   },
+   "outputs": [],
+   "source": [
+    "## It takes 01:16:58 (hh:mm:ss) to run on the CPU for clinical condition-specific biomarker identification.\n",
+    "## To reduce the runtime, please comment out the 'DgeLMEM' section under the 'full_samples_downstream_analysis.\n",
+    "\n",
+    "!python scaLR/pipeline.py --config scaLR/tutorials/pipeline/config_clinical.yaml -l -m"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "yviraKXXgjJy"
+   },
+   "source": [
+    "Pipeline logs can be found at `scalr_experiments/exp_name_0/logs.txt` (cell type classification)\n",
+    "\n",
+    "For clinical condition specific biomarker identification, the logs can be found at `scalr_experiments/exp_name_1/logs.txt`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "oe4d74mjIcgW"
+   },
+   "source": [
+    "### <span style=\"color: steelblue;\">Results </span>\n",
+    "We have done the celltype classification and biomarker discovery with name `exp_name_0`.\n",
+    "\n",
+    "- The  classification report can be found at `scalr_experiments/exp_name_0/analysis/classification_report.csv`\n",
+    "\n",
+    "- Top-5k Biomarkers can be found at `scalr_experiments/exp_name_0/analysis/gene_analysis/top_features.json`.\n",
+    "\n",
+    "- `Heatmaps` for each class(cell types) can be found at `scalr_experiments/exp_name_0/analysis/test_samples/heatmaps`\n",
+    "\n",
+    "- `Gene_recall_curve`, and `roc_auc` data can be found at `scalr_experiments/exp_name_0/analysis/test_samples/`.\n",
+    "\n",
+    "- `score_matrix.csv` with gene scores for all classes can be found at `scalr_experiments/exp_name_0/analysis/gene_analysis/score_matrix.csv`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "MM5v5OTcQocC"
+   },
+   "outputs": [],
+   "source": [
+    "#Classification report\n",
+    "pd.read_csv('/content/scalr_experiments/exp_name_0/analysis/classification_report.csv',index_col=0)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "rNZt8t-_gjJz"
+   },
+   "outputs": [],
+   "source": [
+    "#ROC_AUC\n",
+    "display(SVG('/content/scalr_experiments/exp_name_0/analysis/test_samples/roc_auc.svg'))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "JBYVFclUgjJz"
+   },
+   "outputs": [],
+   "source": [
+    "# Heatmap for cell type 'classical monocyte'\n",
+    "display(SVG('/content/scalr_experiments/exp_name_0/analysis/test_samples/heatmaps/classical monocyte.svg'))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "zbui27nxIh_J"
+   },
+   "outputs": [],
+   "source": [
+    "# Gene recall curve\n",
+    "display(SVG('scalr_experiments/exp_name_0/analysis/test_samples/gene_recall_curve.svg'))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "52n0PSr87FjJ"
+   },
+   "source": [
+    "\n",
+    "For clinical condition-specific biomarker identification and DGE analysis with the experiment name `exp_name_1`. All analysis results can be viewed in the `exp_name_1` directory, as explained above for cell type classification. The difference is that we have results for only two classes in `exp_name_1`, namely `COVID-19` and `normal`, along with the results for DGE analysis."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "Fgu3MIxggjJ3"
+   },
+   "outputs": [],
+   "source": [
+    "# DgePseudoBulk results for 'conventional dendritic cell' in 'COVID-19' w.r.t. 'normal' samples\n",
+    "pd.read_csv('/content/scalr_experiments/exp_name_1/analysis/full_samples/pseudobulk_dge_result/pbkDGE_conventionaldendriticcell_COVID-19_vs_normal.csv')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "7n_AczPkgjJ3"
+   },
+   "outputs": [],
+   "source": [
+    "# Volcano plot of `log2FoldChange` vs `-log10(pvalue)` in gene expression for\n",
+    "# 'conventional dendritic cell' in 'COVID-19' w.r.t. 'normal' samples.\n",
+    "display(SVG('/content/scalr_experiments/exp_name_1/analysis/full_samples/pseudobulk_dge_result/pbkDGE_conventionaldendriticcell_COVID-19_vs_normal.svg'))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "Js1lFjQagjJ3"
+   },
+   "source": [
+    "*Note*: A `Fold Change (FC)` of 1.5 units in the figure above is equivalent to a `log2 Fold Change` of 0.584."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "RL5n6rqzR4Sc"
+   },
+   "source": [
+    "## <span style=\"color: steelblue;\">Running scaLR in modules</span>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "6jypX2axToza"
+   },
+   "source": [
+    "### Imports"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "yqnxGZnHIiJr"
+   },
+   "outputs": [],
+   "source": [
+    "import sys\n",
+    "sys.path.append('scaLR/')\n",
+    "import os\n",
+    "from os import path\n",
+    "\n",
+    "from scalr.data_ingestion_pipeline import DataIngestionPipeline\n",
+    "from scalr.eval_and_analysis_pipeline import EvalAndAnalysisPipeline\n",
+    "from scalr.feature_extraction_pipeline import FeatureExtractionPipeline\n",
+    "from scalr.model_training_pipeline import ModelTrainingPipeline\n",
+    "from scalr.utils import read_data\n",
+    "from scalr.utils import write_data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "tObhEJKkT0Ew"
+   },
+   "source": [
+    "### Load Config\n",
+    "\n",
+    "Running with example config files with required edits. Make sure to change the experiment name if required."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "dbrUCh-LTxbl"
+   },
+   "outputs": [],
+   "source": [
+    "config = read_data('scaLR/tutorials/pipeline/config_celltype.yaml')\n",
+    "# config = read_data('scaLR/tutorials/pipeline/config_clinical.yaml')\n",
+    "config"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "XU-FLwPlULd1"
+   },
+   "outputs": [],
+   "source": [
+    "dirpath = config['experiment']['dirpath']\n",
+    "exp_name = config['experiment']['exp_name']\n",
+    "exp_run = config['experiment']['exp_run']\n",
+    "dirpath = os.path.join(dirpath, f'{exp_name}_{exp_run}')\n",
+    "os.makedirs(dirpath, exist_ok=True)\n",
+    "device = config['device']"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "C44uQoNiUe4M"
+   },
+   "source": [
+    "### Data Ingestion"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "JX5nB5gzUh7L"
+   },
+   "outputs": [],
+   "source": [
+    "# This shell will take approximately 00:01:23 (hh:mm:ss) to run.\n",
+    "\n",
+    "data_dirpath = path.join(dirpath, 'data')\n",
+    "os.makedirs(data_dirpath, exist_ok=True)\n",
+    "\n",
+    "# Initialize Data Ingestion object\n",
+    "ingest_data = DataIngestionPipeline(config['data'], data_dirpath)\n",
+    "\n",
+    "# Generate Train, Validation and Test Splits for pipeline\n",
+    "ingest_data.generate_train_val_test_split()\n",
+    "\n",
+    "# Apply pre-processing on data\n",
+    "# Fit on Train data, and then apply on the entire data\n",
+    "ingest_data.preprocess_data()\n",
+    "\n",
+    "# We generate label mapings from the metadata, which is used for\n",
+    "# labels, etc.\n",
+    "ingest_data.generate_mappings()\n",
+    "\n",
+    "# All the additional data generated (label mappings, data splits, etc.)\n",
+    "# are passed onto the config for future use in pipeline\n",
+    "config['data'] = ingest_data.get_updated_config()\n",
+    "write_data(config, path.join(dirpath, 'config.yaml'))\n",
+    "del ingest_data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "qc76-jFSVmfY"
+   },
+   "source": [
+    "### Feature Selection"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "w4CfG8YQVoTJ"
+   },
+   "outputs": [],
+   "source": [
+    "# This shell will take approximately 00:19:02 (hh:mm:ss) to run.\n",
+    "\n",
+    "feature_extraction_dirpath = path.join(dirpath, 'feature_extraction')\n",
+    "os.makedirs(feature_extraction_dirpath, exist_ok=True)\n",
+    "\n",
+    "# Initialize Feature Extraction object\n",
+    "extract_features = FeatureExtractionPipeline(\n",
+    "    config['feature_selection'], feature_extraction_dirpath, device)\n",
+    "extract_features.load_data_and_targets_from_config(config['data'])\n",
+    "\n",
+    "# Train feature subset models and get scores for each feature/genes\n",
+    "extract_features.feature_subsetted_model_training()\n",
+    "extract_features.feature_scoring()\n",
+    "\n",
+    "# Extract top features by some algorithm, and write a feature-subsetted\n",
+    "# dataset\n",
+    "extract_features.top_feature_extraction()\n",
+    "config['data'] = extract_features.write_top_features_subset_data(\n",
+    "    config['data'])\n",
+    "\n",
+    "# All the additional data generated (subset data splits, etc.)\n",
+    "# are passed onto the config for future use in pipeline\n",
+    "config['feature_selection'] = extract_features.get_updated_config()\n",
+    "write_data(config, path.join(dirpath, 'config.yaml'))\n",
+    "del extract_features"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "z-Scub2RVtqi"
+   },
+   "source": [
+    "### Final Model Training"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "Roc1gACAVoY6"
+   },
+   "outputs": [],
+   "source": [
+    "# This shell will take approximately 00:06:20 (hh:mm:ss) to run.\n",
+    "\n",
+    "model_training_dirpath = path.join(dirpath, 'model')\n",
+    "os.makedirs(model_training_dirpath, exist_ok=True)\n",
+    "\n",
+    "# Initialize Final Model Training object\n",
+    "model_trainer = ModelTrainingPipeline(\n",
+    "    config['final_training']['model'],\n",
+    "    config['final_training']['model_train_config'],\n",
+    "    model_training_dirpath, device)\n",
+    "model_trainer.load_data_and_targets_from_config(config['data'])\n",
+    "\n",
+    "# Build the training artifacts from config, and train the model\n",
+    "model_trainer.build_model_training_artifacts()\n",
+    "model_trainer.train()\n",
+    "\n",
+    "# All the additional data generated (model defaults filled, etc.)\n",
+    "# are passed onto the config for future use in pipeline\n",
+    "model_config, model_train_config = model_trainer.get_updated_config()\n",
+    "config['final_training']['model'] = model_config\n",
+    "config['final_training']['model_train_config'] = model_train_config\n",
+    "write_data(config, path.join(dirpath, 'config.yaml'))\n",
+    "del model_trainer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "GZFd8R8QWpmS"
+   },
+   "source": [
+    "### Evaluation and Analysis"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "w71AS8mXVob9"
+   },
+   "outputs": [],
+   "source": [
+    "# This shell will take approximately 00:00:26 (hh:mm:ss) to run.\n",
+    "\n",
+    "analysis_dirpath = path.join(dirpath, 'analysis')\n",
+    "os.makedirs(analysis_dirpath, exist_ok=True)\n",
+    "\n",
+    "# Get path of the best trained model\n",
+    "config['analysis']['model_checkpoint'] = path.join(\n",
+    "    model_training_dirpath, 'best_model')\n",
+    "\n",
+    "# Initialize Evaluation and Analysis Pipeline object\n",
+    "analyser = EvalAndAnalysisPipeline(config['analysis'], analysis_dirpath,\n",
+    "                                    device)\n",
+    "analyser.load_data_and_targets_from_config(config['data'])\n",
+    "\n",
+    "# Perform evaluation of trained model on test data and generate\n",
+    "# classification report\n",
+    "analyser.evaluation_and_classification_report()\n",
+    "\n",
+    "# Perform gene analysis based on the trained model to get\n",
+    "# top genes / biomarker analysis\n",
+    "analyser.gene_analysis()\n",
+    "\n",
+    "# Perform downstream analysis on all samples / test samples\n",
+    "analyser.full_samples_downstream_anlaysis()\n",
+    "analyser.test_samples_downstream_anlaysis()\n",
+    "\n",
+    "# All the additional data generated\n",
+    "# are passed onto the config for future use in pipeline\n",
+    "config['analysis'] = analyser.get_updated_config()\n",
+    "write_data(config, path.join(dirpath, 'config.yaml'))\n",
+    "del analyser"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "XCThcOt8gjJ5"
+   },
+   "source": [
+    "Analysis results can be viewed inside `scalr_experiments` under the `exp_name` specified in the `config.yaml`, as mentioned above."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "0V2-AKThIaks"
+   },
+   "source": []
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "gpuType": "T4",
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}

	dsm_severity_score_group	disease_ontology_term_id	severity	tissue_ontology_term_id	timepoint	outcome	dsm_severity_score	days_since_hospitalized	age	donor_id	...	tissue_type	cell_type	assay	disease	organism	sex	tissue	self_reported_ethnicity	development_stage	observation_joinid
AAACCTGAGAAACCTA-1_1	DSM_low	MONDO:0100096	Moderate	UBERON:0000178	T0	alive	-1.950858	1.0	55.0	HGR0000083	...	tissue	non-classical monocyte	10x 5' v1	COVID-19	Homo sapiens	male	blood	European	55-year-old stage	!9L}G4hgnw
AAACCTGAGGGTTTCT-1_1	DSM_high	MONDO:0100096	Critical	UBERON:0000178	T0	alive	-0.092375	13.0	40.0	HGR0000078	...	tissue	classical monocyte	10x 5' v1	COVID-19	Homo sapiens	female	blood	European	40-year-old stage	YRcUzlVyg0
AAACCTGCACCTGGTG-1_1	DSM_high	MONDO:0100096	Critical	UBERON:0000178	T0	alive	2.954350	1.0	60.0	HGR0000098	...	tissue	CD16-positive, CD56-dim natural killer cell, h...	10x 5' v1	COVID-19	Homo sapiens	male	blood	European	60-year-old stage	)*azge@M0l
AAACCTGGTCCGAGTC-1_1	DSM_high	MONDO:0100096	Critical	UBERON:0000178	T0	deceased	3.276233	6.0	76.0	HGR0000141	...	tissue	classical monocyte	10x 5' v1	COVID-19	Homo sapiens	male	blood	European	76-year-old stage	E<FU`+QN&T
AAACCTGGTGCCTTGG-1_1	DSM_low	MONDO:0100096	Critical	UBERON:0000178	T0	alive	-0.348888	1.0	70.0	HGR0000093	...	tissue	classical monocyte	10x 5' v1	COVID-19	Homo sapiens	male	blood	European	70-year-old stage	2MZ#6SX}{g
	mvp.mean	mvp.dispersion	mvp.dispersion.scaled	mvp.variable	feature_is_filtered	feature_name	feature_reference	feature_biotype	feature_length	feature_type
ENSG00000168454	0.000380	1.168876	0.181734	False	False	TXNDC2	NCBITaxon:9606	gene	1703	protein_coding
ENSG00000197852	0.035995	1.634179	0.886458	False	False	INKA2	NCBITaxon:9606	gene	1217	protein_coding
ENSG00000196878	0.008862	1.617729	0.861545	False	False	LAMB3	NCBITaxon:9606	gene	3931	protein_coding
ENSG00000256540	0.000022	1.660993	0.927070	False	False	IQSEC3-AS1	NCBITaxon:9606	gene	1065	lncRNA
ENSG00000139180	0.090100	1.184720	0.205731	False	False	NDUFA9	NCBITaxon:9606	gene	782	protein_coding
	mvp.mean	mvp.dispersion	mvp.dispersion.scaled	mvp.variable	feature_is_filtered	feature_reference	feature_biotype	feature_length	feature_type
feature_name
TXNDC2	0.000380	1.168876	0.181734	False	False	NCBITaxon:9606	gene	1703	protein_coding
INKA2	0.035995	1.634179	0.886458	False	False	NCBITaxon:9606	gene	1217	protein_coding
LAMB3	0.008862	1.617729	0.861545	False	False	NCBITaxon:9606	gene	3931	protein_coding
IQSEC3-AS1	0.000022	1.660993	0.927070	False	False	NCBITaxon:9606	gene	1065	lncRNA
NDUFA9	0.090100	1.184720	0.205731	False	False	NCBITaxon:9606	gene	782	protein_coding