-
Notifications
You must be signed in to change notification settings - Fork 2
SRA tutorial
This document details how to download sequence data from NCBI's Sequence Read Archive (https://trace.ncbi.nlm.nih.gov/Traces/sra/).
We will use SRR1657561 as an example, because it is a relatively small SRA file (~262M)
You can download this file from SRA via the web interface.
<SRA has changed where it stores .sra files. ftp no longer works>
Once you have downloaded the .sra file, you will need to use the sratoolkit software to convert the .sra file to a .fastq file.
Next, use the fastq-dump component of the sratoolkit to convert to fastq format:
$ fastq-dump --outdir <outdir_name> --gzip --split-3 SRR1657561.sra
- the
--split-3flag will generate either 2 or 3 files: Two paired-end read files, and a third file containing any singleton reads (if there are any). In this case, two files will be generated inside the specified output directory:
SRR1657561_1.fastq.gz SRR1657561_2.fastq.gz
It is possible to use several different flags for the fastq-dump program. If you are interested, or need your .fastq file to be in a different format, you can experiment with them:
https://ncbi.github.io/sra-tools/fastq-dump.html