Skip to content

SRA tutorial

mapostolides edited this page Sep 14, 2020 · 7 revisions

How to download files from Sequence Read Archive (SRA)

This document details how to download sequence data from NCBI's Sequence Read Archive (https://trace.ncbi.nlm.nih.gov/Traces/sra/).

We will use SRR1657561 as an example, because it is a relatively small SRA file (~262M)

You can download this file from SRA via the web interface.

<SRA has changed where it stores .sra files. ftp no longer works>

Converting .sra to .fastq

Once you have downloaded the .sra file, you will need to use the sratoolkit software to convert the .sra file to a .fastq file.

Next, use the fastq-dump component of the sratoolkit to convert to fastq format:
$ fastq-dump --outdir <outdir_name> --gzip --split-3 SRR1657561.sra

  • the --split-3 flag will generate either 2 or 3 files: Two paired-end read files, and a third file containing any singleton reads (if there are any). In this case, two files will be generated inside the specified output directory:
    SRR1657561_1.fastq.gz SRR1657561_2.fastq.gz

It is possible to use several different flags for the fastq-dump program. If you are interested, or need your .fastq file to be in a different format, you can experiment with them:
https://ncbi.github.io/sra-tools/fastq-dump.html

Clone this wiki locally