You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This project aims to become a collection of standard analytical modules for genomic and transcriptomic data. Too often do we copy-paste from each other’s pipelines, which has several pitfalls. Fortunately, all of these problems can be solved with standardized analytical modules, and the benefits are many.
Always activate this environment before running any pipelines that use LCR-modules.
conda activate opv12
You can check out demo project for the examples of how to use LCR-modules based on the data type, for example to analyze capture (capture_Snakefile.smk) or mrna (mrna_Snakefile.smk) data.
cd demo
./dry-run.sh capture_Snakefile.smk
./dry-run.sh mrna_Snakefile.smk
Module levels overview
Level 1 modules perform low-level tasks such as adapter trimming, quality control, and alignment of sequencing files, and obtaining data from repositories such as the European Genome-phenome Archive (EGA). These modules also perform gene expression analyses, including alignment using STAR and calculating mRNA abundance using salmon. Level 2 modules perform routine tasks for cancer analysis, such as detecting and annotating simple somatic mutations, copy-number alterations, and structural variations. Next, the level 3 modules perform analyses that rely on cohort-level aggregation. The cohorts and data sets can be flexibly defined based on different clinical characteristics through a set of configuration files. The modules at this level operate on the outputs of level 2 modules and perform tasks such as aggregation of individual files into cohort-level merges. Example workflows include analyses of mutation signatures, identification of significantly mutated genes, and sample classification into genetic subgroups.
Currently available modules
The tables below list the purpose of each module and supported sequencing types.
The LCR-modues is not intended for installation and use on personal devices (phones, laptops, personal workstations) and due to the high computational requirements of a number of tools (GATK, STAR, hmftools etc.) it is recommended for use on high performance computers with Unix OS. For processing of the large number of samples in parallel, we recommend computing clusters with scheduling managers support. We recommend the use of LCR-modules on Linux and portability to other operating systems is not supported when file systems are case-insensitive (APFS, NTFS), or has not been tested.
About
Collection of standard analytical pipelines for genomic and transcriptomic data