This project demonstrates how to download and parse FASTA sequence data using Biopython.
The dataset used here is the ls_orchid.fasta file from the Biopython documentation examples.
ls_orchid.fastaβ FASTA file containing orchid DNA sequences (downloaded from Biopython GitHub examples).parser.pyβ Python script to parse and store sequences using Biopython'sSeqIOmodule.
from Bio import SeqIOThe SeqIO module allows reading and writing of sequence file formats such as FASTA, GenBank, etc.
sequences = []We create an empty list called sequences to store the DNA sequences extracted from the FASTA file.
for seq_record in SeqIO.parse("ls_orchid.fasta", "fasta"):
sequences.append(seq_record.seq)SeqIO.parse()reads the FASTA file one record at a time.- Each record (
seq_record) contains:seq_record.idβ Identifier of the sequence.seq_record.seqβ Actual DNA sequence.
- We append only the sequence (
seq_record.seq) to oursequenceslist.
After running the script, the list sequences will hold all DNA sequences from the FASTA file.
Example output (first few sequences):
[Seq('MATTYGGTTGGA...'), Seq('CTTAGGCTCCTG...'), ...]- Install Biopython:
pip install biopython- Download the FASTA file (Python version of wget):
import urllib.request
url = "https://raw.githubusercontent.com/biopython/biopython/master/Doc/examples/ls_orchid.fasta"
urllib.request.urlretrieve(url, "ls_orchid.fasta")- Run the parser script to load sequences.
- DNA sequence analysis
- Motif finding
- Sequence alignment
- Bioinformatics pipelines