lcr-modules: Standardizing genomic analyses

This project aims to become a collection of standard analytical modules for genomic and transcriptomic data. Too often do we copy-paste from each other’s pipelines, which has several pitfalls. Fortunately, all of these problems can be solved with standardized analytical modules, and the benefits are many.

Documentation: https://github.com/LCR-BCCRC/lcr-modules/wiki

License: LICENSE

Installing compatible Snakemake

Run the following commands in your terminal to create the opv12 environment with all necessary dependencies.

conda deactivate
git clone https://github.com/LCR-BCCRC/lcr-modules.git
git clone https://github.com/LCR-BCCRC/lcr-scripts.git
cd lcr-modules/
conda env create -f demo/env.yaml

Always activate this environment before running any pipelines that use LCR-modules.

conda activate opv12

You can check out demo project for the examples of how to use LCR-modules based on the data type, for example to analyze capture (capture_Snakefile.smk) or mrna (mrna_Snakefile.smk) data.

cd demo
./dry-run.sh capture_Snakefile.smk
./dry-run.sh mrna_Snakefile.smk

Module levels overview

Level 1 modules perform low-level tasks such as adapter trimming, quality control, and alignment of sequencing files, and obtaining data from repositories such as the European Genome-phenome Archive (EGA). These modules also perform gene expression analyses, including alignment using STAR and calculating mRNA abundance using salmon. Level 2 modules perform routine tasks for cancer analysis, such as detecting and annotating simple somatic mutations, copy-number alterations, and structural variations. Next, the level 3 modules perform analyses that rely on cohort-level aggregation. The cohorts and data sets can be flexibly defined based on different clinical characteristics through a set of configuration files. The modules at this level operate on the outputs of level 2 modules and perform tasks such as aggregation of individual files into cohort-level merges. Example workflows include analyses of mutation signatures, identification of significantly mutated genes, and sample classification into genetic subgroups.

Currently available modules

The tables below list the purpose of each module and supported sequencing types.

Level 1

Alignment

module	seq_type	input_type	output_type	data_type
bwa_mem	capture; genome	FASTQ	BAM/CRAM	Illumina short reads
star	mrna	FASTQ	BAM	Illumina short reads

Name		Name	Last commit message	Last commit date
Latest commit History 2,749 Commits
.github		.github
demo		demo
docs		docs
envs		envs
images		images
modules		modules
oncopipe		oncopipe
schemas		schemas
template-level3		template-level3
template		template
workflows/reference_files		workflows/reference_files
.gitignore		.gitignore
.gitmodules		.gitmodules
.readthedocs.yaml		.readthedocs.yaml
INSTALL		INSTALL
LICENSE		LICENSE
README.md		README.md

Purpose	# modules
Alignment	2
Archive download	1
Fastq processing	2
Genome build conversion	1
Phasing long reads	1
QC	2

Purpose	# modules
CNV calling	5
DNA modification analysis	1
Gene expression	2
Pathogen analysis	1
Phasing long reads	3
Structural variants	3
Structural variants long reads	2
TCR, IG, HLA analysis	3
Variant annotation	1
Variant calling	6
Variant calling long reads	4

Purpose	# modules
Aggregation	3
Classifiers	2
Microenvironment	1
Mutation signatures	1
Mutation significance	8

module	seq_type	input_type	output_type	data_type
bam2fastq	capture; genome; mrna	BAM/CRAM	FASTQ	Illumina short reads
cutadapt	capture; genome	FASTQ	FASTQ	Illumina short reads

module	seq_type	input_type	output_type	data_type
picard_qc	capture; genome; mrna	BAM/CRAM	TSV	Illumina short reads
qc	capture; genome	BAM/CRAM	TSV	Illumina short reads

module	seq_type	input_type	output_type	data_type
battenberg	capture; genome	BAM/CRAM	SEG	Illumina short reads
cnvkit	capture; genome	BAM/CRAM	SEG	Illumina short reads
controlfreec	genome	BAM/CRAM	SEG	Illumina short reads
ichorcna	genome	BAM/CRAM	SEG	Illumina short reads
sequenza	capture; genome	BAM/CRAM	SEG	Illumina short reads

module	seq_type	input_type	output_type	data_type
salmon	mrna	FASTQ	TSV	Illumina short reads
stringtie	mrna	BAM/CRAM	GTF	Illumina short reads

module	seq_type	input_type	output_type	data_type
freebayes	capture; genome	BAM/CRAM	VCF	Illumina short reads
nanomethphase	promethION	BAM/CRAM	TSV	Long reads
whatshap	genome; promethION	BAM/CRAM	VCF	Illumina short reads

module	seq_type	input_type	output_type	data_type
gridss	capture; genome	BAM/CRAM	VCF	Illumina short reads
hmftools	genome	BAM/CRAM	VCF	Illumina short reads
manta	capture; genome; mrna	BAM/CRAM	VCF	Illumina short reads

module	seq_type	input_type	output_type	data_type
cutesv	promethION	BAM	VCF	Long reads
sniffles	promethION	BAM/CRAM	VCF	Long reads

module	seq_type	input_type	output_type	data_type
igcaller	capture; genome	BAM/CRAM	TSV	Illumina short reads
mixcr	genome; mrna	BAM/CRAM	TSV	Illumina short reads
spechla	capture; genome; mrna	BAM/CRAM	TSV	Illumina short reads

module	seq_type	input_type	output_type	data_type
lofreq	capture; genome	BAM/CRAM	VCF	Illumina short reads
mutect2	capture; genome	BAM/CRAM	VCF	Illumina short reads
sage	capture; genome	BAM/CRAM	VCF	Illumina short reads
slms_3	capture; genome	BAM/CRAM	VCF	Illumina short reads
strelka	capture; genome	BAM/CRAM	VCF	Illumina short reads
varscan	capture; genome	BAM	VCF	Illumina short reads

module	seq_type	input_type	output_type	data_type
clair3	promethION	BAM	VCF	Long reads
clairs	promethION	BAM/CRAM	VCF	Long reads
clairs_to	promethION	BAM/CRAM	VCF	Long reads
nanopolish	promethION	BAM/CRAM	VCF	Long reads

module	seq_type	input_type	output_type	data_type
cnv_master	capture; genome	SEG	merged SEG	Illumina short reads
starfish	capture; genome; mrna	VCF	VCF	Illumina short reads
svar_master	capture; genome	BEDPE	merged BEDPE	Illumina short reads

module	seq_type	input_type	output_type	data_type
dlbclass	capture; genome	VARIOUS	TSV	Illumina short reads
lymphgen	capture; genome	VARIOUS	TSV	Illumina short reads

module	seq_type	input_type	output_type	data_type
dnds	capture; genome	MAF	TSV	Illumina short reads
fishhook	capture; genome	MAF	TSV	Illumina short reads
gistic2	capture; genome	SEG	TSV	Illumina short reads
hotmaps	capture; genome	MAF	TSV	Illumina short reads
mutsig	capture; genome	MAF	TSV	Illumina short reads
oncodriveclustl	capture; genome	MAF	TSV	Illumina short reads
oncodrivefml	capture; genome	MAF	TSV	Illumina short reads
rainstorm	capture; genome	MAF	BED	Illumina short reads

Folders and files

Latest commit

History

Repository files navigation

lcr-modules: Standardizing genomic analyses

Installing compatible Snakemake

Module levels overview

Currently available modules

Table of Contents

Level 1

Level 2

Level 3

Level 1

Alignment

Archive download

Fastq processing

Genome build conversion

Phasing long reads

QC

Level 2

CNV calling

DNA modification analysis

Gene expression

Pathogen analysis

Phasing long reads

Structural variants

Structural variants long reads

TCR, IG, HLA analysis

Variant annotation

Variant calling

Variant calling long reads

Level 3

Aggregation

Classifiers

Microenvironment

Mutation signatures

Mutation significance

Known limitations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages