Cancer Systems Biology, Section of Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800, Lyngby, Denmark
This repository contains scripts for case studies related to the discovery of cancer driver genes using the Moonlight framework. The case studies are conducted on basal-like breast cancer, lung adenocarcinoma, and thyroid cancer using data from The Cancer Genome Atlas (TCGA). The associated publication is:
Revealing cancer driver genes through integrative transcriptomic and epigenomic analyses with Moonlight. Mona Nourbakhsh, Yuanning Zheng, Humaira Noor, Matteo Tiberti, Olivier Gevaert, Elena Papaleo. bioRxiv 2024.03.14.584946; doi: https://doi.org/10.1101/2024.03.14.584946
Please cite the above publication if you use the contents, scripts or results for your own research.
Below are instructions for reproducing the analyses.
This GitHub repository contains scripts associated with the publication
with a main folder for each cancer (sub)type. Within each cancer (sub)type folder,
a subfolder called scripts contains the associated scripts. The scripts are
numbered according to the order in which they are run.
The corresponding OSF repository contains data and results associated with
the scripts and is organized in the same way as the GitHub repository with a
main folder for each cancer (sub)type. Within each cancer (sub)type folder,
subfolders called data and results contain the associated data and results,
respectively. The results files in results are numbered according to the
script that generated them.
All the analyses have been performed on a GNU/Linux server.
NB: When reproducing the analyses and results, the user cannot expect to obtain identical results to the ones of the case studies and associated with the publication due to stochastic processes in the GRN step of the Moonlight protocol.
In order to reproduce the paper data, you will need to set up a conda environment
on which the expected version of R and the required packages will be installed;
this requires being able to run Anaconda by means of the conda executable.
If you don't have access to conda please see the Miniconda installer page for instructions on how to install Miniconda.
Once you have access to conda, follow the below instructions:
- Clone our github repository into a local directory on your local machine:
git clone https://github.com/ELELAB/Moonlight2_GMA_case_studies.git
cd Moonlight2_GMA_case_studies
- Create a virtual environment using conda and activate it. The environment directory should be placed in the Moonlight2_GMA_case_studies folder:
conda env create --prefix ./methyl_case --file conda_environment.yml
conda activate ./methyl_case
- Download data from the COSMIC Cancer Gene Census.
This data can be downloaded from https://cancer.sanger.ac.uk/census by exporting it as
a
csvfile or from https://cancer.sanger.ac.uk/cosmic/download/cosmic/v99/cancergenecensus by choosing the file from theCRCh28genome and afterwards converting it to acsvfile. Once the data from the Cancer Gene Census has been downloaded, it must be acsvfile namedcancer_gene_census.csvand this file must be placed in thedatafolder of each cancer (sub)type:
breast_basal/data/cancer_gene_census.csv
lung/data/cancer_gene_census.csv
thyroid/data/cancer_gene_census.csv
- Run the analyses:
bash ./run_all.sh
WARNING: our scripts use the renv
R package to handle automatic dependency installation. Renv writes packages in
its own cache folder, which is by default in the user's home directory. This might not be
desirable if free space in the home directory is limited. You can change the location of
the Renv root folder by setting a system environment variable - please see comments
in the run_all.sh script.
The run_all.sh script will perform the following steps to reproduce all results and data:
-
Download data from the corresponding OSF repository which contains the required data to run the analyses and all results associated with the analyses.
-
Install in the environment all necessary packages to run the analyses.
-
Perform all analyses for basal-like breast cancer.
-
Perform all analyses for lung adenocarcinoma.
-
Perform all analsyes for thyroid cancer.
-
Compare results across cancer (sub)types.