diff --git a/README.org b/README.md similarity index 57% rename from README.org rename to README.md index 432ff2b..f87e8a0 100644 --- a/README.org +++ b/README.md @@ -1,15 +1,21 @@ -* proovframe: frame-shift correction for long read (meta)genomics +[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/proovframe/README.html) +[![Anaconda-Server Badge](https://img.shields.io/conda/dn/bioconda/proovframe.svg?style=flat)](https://anaconda.org/bioconda/proovframe) +[![DOI](https://img.shields.io/badge/DOI-10.1101/2021.08.23.457338-blue)](https://doi.org/10.1101/2021.08.23.457338) +[![Anaconda-Server Badge](https://anaconda.org/bioconda/proovframe/badges/license.svg)](https://anaconda.org/bioconda/proovframe) + + +proovframe: frame-shift correction for long read (meta)genomics +========================================= Gene prediction on long reads, aka PacBio and Nanopore, is often impaired by indels causing frameshift. Proovframe detects and corrects frameshifts in coding sequences from raw long reads or long-read derived assemblies. -#+ATTR_HTML: :width 600px -[[file:implementation.png]] + Proovframe uses frameshift-aware alignments to reference proteins as guides, and conservatively restores frame-fidelity by 1/2-base deletions or insertions of -"N/NN"s, and masking of premature stops ("NNN"). +`N/NN`s, and masking of premature stops (`NNN`). Good results can already be obtained with distantly related guide proteins- successfully tested with sets with <60% amino acid identity. @@ -20,22 +26,40 @@ consensus-polishing approaches for assemblies. It can be used on raw reads directly, which means it can be used on data lacking sequencing depth for consensus polishing - a common problem for a lot of rare things from environmental metagenomic samples, for example. - -** Usage +## Install + +### bioconda -Requires [[https://github.com/bbuchfink/diamond][DIAMOND v2.0.3]] or newer for mapping. +``` +conda install -c bioconda proovframe +``` -#+begin_src sh -# install +### Manual + +Requires [DIAMOND v2.0.3](https://github.com/bbuchfink/diamond) or newer for mapping. + +``` git clone https://github.com/thackl/proovframe -# map proteins to reads +``` + +It is ready to be used. The tool lives in `proovframe/bin/proovframe` + + +## Usage + +map proteins to reads: +``` proovframe/bin/proovframe map -a proteins.faa -o raw-seqs.tsv raw-seqs.fa -# fix frameshifts in reads +``` + +fix frameshifts in reads: +``` proovframe/bin/proovframe fix -o corrected-seqs.fa raw-seqs.fa raw-seqs.tsv -#+end_src +``` + -** Citing +## Citing If you use proovframe and DIAMOND please cite: