sgTiler

An ultra-fast tool to design tiling sgRNAs for any genomic region

sgTiler predicts efficient sgRNA spacer sequences and distribute them optimally in any input DNA sequence. sgTiler is a great tool to design tiling sgRNA library that aims to minimalize the number of sgRNAs needed to maximally cover the input DNA sequence. This tool provides great flexibility to users to design the library in MUCH greater speed than any other sgRNA desinging tool currently available. sgTiler is best suited for designing tiling sgRNAs targeting regulatory regions including promoters and enhancers, however, it can also be used to target exons or any other part of the genome.

Installation

sgTiler requires python 2.7+ and bowtie installed in the system.

Download sgTiler.py
Install bowtie 1.x if not already installed. Please refer to http://bowtie-bio.sourceforge.net/manual.shtml#obtaining-bowtie for bowtie installation. Easiest way to do that is to download the appropriate version of bowtie from here, unzip and provide the path to bowtie file in --bowtie-path argument. Please note that you don't have to provide the --bowtie-path if bowtie is already installed in your system and can be run in command line from any directory. You should download bowtie-1.2.2-mingw-x86_64.zip for Windows computer, bowtie-1.2.2-macos-x86_64.zip for Mac and bowtie-1.2.2-linux-x86_64.zip for Linux.
Download bowtie genome index file from ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/. E.g., for Hg19, download this file: ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/hg19.ebwt.zip and unzip in the current directory.
Download exons.zip in the current directory and unzip.
Optionally, downlaod wgEncodeRegDnaseClusteredV3.consensus.simplified.bed in the current directory.

Running

Run sgTiler.py in any command line environment.

Input files

sgTiler requires four input files:

bowtie index file for the desired genome. Pre-built bowtie index for several genomes are available here: ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/
Input fasta file
Genome exon regions in bed format (provided in the github page or you can use your own). The provided exons regions are curanted from GENCODE v19 for Hg19. You can provide your own bed file.
Open chromatin or any histone mark in bed format. A list of highly consensus DNase hypersensitive (DHS) regions common in at least 113 cell lines is provided in the github page (wgEncodeRegDnaseClusteredV3.consensus.simplified.bed). This file is downloaded from ENCODE project and curated in a way that any DHS site present in less than 113 cell lines are removed. It is recommended to use your own open chromatin region, e.g., DNase sensitive sites or H3K27ac marks or H3K4m1 marks or transcription factor binding sites relevant to the cell line or system you are performing the screening in. However, if you do not have such marks, you can use the file provided which will only likely overestimate the off-target potential score.

(Please note that the provided files are in Hg19)

Example run command

python sgTiler.py -i input.fa --bowtie-index hg19.ebwt/hg19 --dhs wgEncodeRegDnaseClusteredV3.consensus.simplified.bed --gtf allExons.sorted.merged.gencodev19.hg19.bed --verbose --dir output_boxplots --output sgTiler_output

Command options

The sgTiler.py has the following command options:

-i              Required. input fasta file
--bowtie-index  Required. Path to bowtie index file
--dhs           Required. Path to regulatory regions bed file 
--gtf           Required. Path to exon bed file 
--output        Required. Output prefix
--bowtie-path   Optional. If bowtie is not installed, you can download bowtie and provide the path to the bowtie file. E.g., ./bowtie-1.2.2-macos-x86_64/bowtie
--pam           PAM sequence. Default: NGG
--gc-min        Minimum GC content in percentage. Default: 20
--gc-max        Maximum GC content in percentage. Default: 80
--nthreads      No. of threads for parallel processing. Default: 4
--strand        Strand to find the sgRNAs. Options: positive, negative or both. Default: both
--length        Length of sgRNA without the PAM. Default: 19
--missmatch     Minimum missmatch allowed in offtargets. Default: 2
--sg-expected   Expected approximate # of sgRNAs per 100bp. Default: 7
--sg-flex       Room of flexibility for evenness in distribution. Default: 3
--dir           Directory to store plots
--plot-off      Do not generate plots
--optimize-off  Do not perform optimization
--distribution-off Do not filter for distribution
--save_tmp      Save all temporary files
--pam-off       Skips writing the PAM sequence to the output file
-v              Turn on verbosity
-h              Show command help

It is recommended to leave all options to their default values when possible. However, two major input parameters which highly infleunce the number of sgRNAs are --sg-expected and --sg-flex. --sg-expected and --sg-flex determines the distance between two sgRNAs and strictness of the distance, respectively. For example, --sg-expected 10 indicates that 10 spacer sequences are desired in each 100bp input sequence. Accordingly, sgTiler will detect positions in the input sequences that are evenly disparsed and 10bp away from each other - 5,15,25,35..95. But not always there is a candidate spacer sequence at these exact positions. For this, sgTiler makes windows spanning these positions and picks the best sgRNA in each window. The size of these windows are determined by --sg-flex. For example, sg-expected 10 --sg-flex 3 will makes windows of 2-8, 12-18, 22-28 and so on. The tool implements a scoring method to pick the best sgRNA within each of these windows. An user can play with these two parameters to decide how densely or disparsed the tiling should be. The tool outputs figures of distribution of spacer sequence for each input sequence. The user can check the figures to choose the optimum numbers.

sgTiler automatically chooses the best sgRNAs combining their efficiency score and off-target potential. The user only has to decide how dense or dispersed the library should be.

Output

The tool output four text files and two pdf files:

.all.txt - list of all candidate sgRNAs
.sgRNAs.txt - list of filtered sgRNAs
.stats.txt - list of sgRNA details for the input sequences
.report.txt - a summary report with important statistics
.sgrna_count.pdf - graphical summaries of no. of sgRNAs and
.bp_coverage.pdf - sequence coverage per input region.

The main output file to be looked at is .sgRNAs.txt. The columns in this file are sequence id, sgRNA start position (from 5' end of the input sequence if + strand or 3' end if - strand), sgRNA end position, sgRNA id, sgRNA sequence, efficiency score, off-target potential (OTP) score. Higher the efficiency score and lower the OTP score is better.

Additionally, SgTiler generates graphical representation of distribution of sgRNA for each individual input region. Combining the overall statistics, user can predict the success of the screening.

Additional help

Pre-built bowtie index for several genomes are available here: ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/

Please email musaddeque.ahmed@gmail.com for any further help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sgTiler

Installation

Running

Input files

Example run command

Command options

Output

Additional help

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md		README.md
exons.zip		exons.zip
input.fa		input.fa
sgTiler.py		sgTiler.py
wgEncodeRegDnaseClusteredV3.consensus.simplified.bed		wgEncodeRegDnaseClusteredV3.consensus.simplified.bed

Folders and files

Latest commit

History

Repository files navigation

sgTiler

Installation

Running

Input files

Example run command

Command options

Output

Additional help

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages