Skip to content

Latest commit

 

History

History
86 lines (61 loc) · 2.17 KB

File metadata and controls

86 lines (61 loc) · 2.17 KB

Orchid FASTA File Parser with Biopython

This project demonstrates how to download and parse FASTA sequence data using Biopython. The dataset used here is the ls_orchid.fasta file from the Biopython documentation examples.


📂 Files in this Project

  • ls_orchid.fasta → FASTA file containing orchid DNA sequences (downloaded from Biopython GitHub examples).
  • parser.py → Python script to parse and store sequences using Biopython's SeqIO module.

▶️ Code Explanation

Step 1: Import Biopython's SeqIO

from Bio import SeqIO

The SeqIO module allows reading and writing of sequence file formats such as FASTA, GenBank, etc.


Step 2: Initialize a List to Store Sequences

sequences = []

We create an empty list called sequences to store the DNA sequences extracted from the FASTA file.


Step 3: Parse the FASTA File

for seq_record in SeqIO.parse("ls_orchid.fasta", "fasta"):
    sequences.append(seq_record.seq)
  • SeqIO.parse() reads the FASTA file one record at a time.
  • Each record (seq_record) contains:
    • seq_record.id → Identifier of the sequence.
    • seq_record.seq → Actual DNA sequence.
  • We append only the sequence (seq_record.seq) to our sequences list.

Step 4: Output

After running the script, the list sequences will hold all DNA sequences from the FASTA file.

Example output (first few sequences):

[Seq('MATTYGGTTGGA...'), Seq('CTTAGGCTCCTG...'), ...]

⚡ Usage

  1. Install Biopython:
pip install biopython
  1. Download the FASTA file (Python version of wget):
import urllib.request

url = "https://raw.githubusercontent.com/biopython/biopython/master/Doc/examples/ls_orchid.fasta"
urllib.request.urlretrieve(url, "ls_orchid.fasta")
  1. Run the parser script to load sequences.

✅ Applications

  • DNA sequence analysis
  • Motif finding
  • Sequence alignment
  • Bioinformatics pipelines

📖 References