Skip to content

bofosu01/rna_seq_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HCMV RNA-seq Analysis Pipeline

Overview

This repository contains a reproducible Snakemake-based Python pipeline for analyzing Human cytomegalovirus (HCMV; Human cytomegalovirus) transcriptomes from RNA-seq data.

Overview of Analysis

We compared HCMV transcriptomes from;

Donor Timepoint SRA Accession
Donor 1 2 dpi SRR5660030
Donor 1 6 dpi SRR5660033
Donor 3 2 dpi SRR5660044
Donor 3 6 dpi SRR5660045

All the results are summarized in a single output file called Ofosu_PipelineReport.txt

What the pipeline does

For each sample, the pipeline performs the following steps:


  • Download SRA reads and converted to fastq (Documented only)
  • Extracts coding sequences (CDS) sequences from the HCMV reference genome
  • Builds a kallisto transcriptome index
  • Quantifies transcript expression using kallisto
  • Runs sleuth differential expression analysis (R script)
  • Builds a Bowtie2 genome index
  • Maps reads to the HCMV genome
  • Counts reads before and after mapping
  • Assembles RNA reads using SPAdes
  • Extracts the longest contig from each assembly
  • Runs BLASTN of the longest contig against a Betaherpesvirinae nucleotide database
  • Generate a PipelineReport.txt

Download SRA Data

# Example for one sample
fasterq-dump SRR5660030 --split-files

Repeated for :

  • SRR5660030

  • SRR5660033

  • SRR5660044

  • SRR5660045

These files were subsampled into TESTS/ used in the pipeline

Dependencies

The following tools and softwares must be installed and available:

Downloading the repository

In your Terminal, run in any directory of your choice:

git clone https://github.com/bofosu01/rna_seq_analysis.git

Moving into the directory containing all the files in the github repository

cd rna_seq_analysis


total 24K
drwxr-xr-x 2 bofosu student 4.0K Feb 22 19:15 BLAST

drwxr-xr-x 2 bofosu student 4.0K Feb 22 19:15 GCF_000845245.1

-rw-r--r-- 1 bofosu student 2.2K Feb 22 19:15 README.md

drwxr-xr-x 2 bofosu student 4.0K Feb 22 19:15 SCRIPTS

drwxr-xr-x 3 bofosu student 4.0K Feb 22 19:16 SNAKEMAKE

drwxr-xr-x 2 bofosu student 4.0K Feb 22 19:15 TESTS

Move into the SNAKEMAKE directory. It contains the Snakefile

cd SNAKEMAKE

total 8.0K
-rw-r--r-- 1 bofosu student 7.7K Feb 22 19:15 Snakefile

How to run the pipeline

Dry run Check that everything is connected correctly

snakemake --dry-run

Run the pipeline

snakemake --cores 4

Results


total 32K
drwxr-xr-x  2 bofosu student 4.0K Feb 22 19:15 BLAST

drwxr-xr-x  2 bofosu student 4.0K Feb 22 19:15 GCF_000845245.1

-rw-r--r--  1 bofosu student 2.6K Feb 22 19:25 PipelineReport.txt

drwxr-xr-x 11 bofosu student 4.0K Feb 22 19:25 OUTPUTS

-rw-r--r--  1 bofosu student 2.2K Feb 22 19:15 README.md

drwxr-xr-x  2 bofosu student 4.0K Feb 22 19:25 SCRIPTS

drwxr-xr-x  3 bofosu student 4.0K Feb 22 19:25 SNAKEMAKE

drwxr-xr-x  2 bofosu student 4.0K Feb 22 19:15 TESTS

About

This repository contains a snakemake pipeline to run some rna seq analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors