COVID-19 Genome Analysis using Biopython

This project demonstrates how to fetch, analyze, and manipulate the COVID-19 genome using Biopython.

Overview

Fetches the complete SARS-CoV-2 genome from the NCBI database.
Analyzes nucleotide sequences (length, composition, etc.).
Uses Biopython modules: Entrez (for data retrieval) and SeqIO (for parsing sequences).

Dataset

Accession ID: MN908947.3
Source: NCBI Nucleotide Database
Description: Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome.

Code Explanation

1. Fetching the Genome from NCBI

from Bio import Entrez, SeqIO
Entrez.email = "your_email@example.com"  # Required by NCBI

handle = Entrez.efetch(db="nucleotide", id="MN908947", rettype="gb", retmode="text")
recs = list(SeqIO.parse(handle, 'gb'))
handle.close()

Entrez.efetch(): Fetches genome data from NCBI.
rettype="gb": Retrieves data in GenBank format.
SeqIO.parse(): Parses the GenBank record into a sequence object.

2. Extracting the Genome Sequence

covid_dna = recs[0].seq
print(f"Length of the genome: {len(covid_dna)}")

Extracts the genome sequence as a Seq object.
Prints the number of nucleotides.

3. Analyzing the Genome

You can perform:

Length analysis (number of nucleotides).
Base composition: count of A, T, G, C.
Sub-sequence extraction for specific regions.

Example:

from Bio.SeqUtils import gc_fraction
gc_content = gc_fraction(covid_dna) * 100
print(f"GC Content: {gc_content:.2f}%")

4. Possible Further Analyses

Translate the genome into protein sequences.
Identify open reading frames (ORFs).
Perform BLAST analysis to compare with other viral genomes.

Requirements

Python 3.x
Biopython (pip install biopython)
Internet access (to fetch data from NCBI)

How to Run

Install dependencies:

pip install biopython

Set your email in the Entrez.email field (mandatory for NCBI requests).
Run the notebook or script to fetch and analyze the genome.

References

NCBI GenBank Accession: MN908947.3
Biopython documentation: https://biopython.org/wiki/Documentation

License

Open-source and free to use for research and educational purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
Genome Analysis.ipynb		Genome Analysis.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COVID-19 Genome Analysis using Biopython

Overview

Dataset

Code Explanation

1. Fetching the Genome from NCBI

2. Extracting the Genome Sequence

3. Analyzing the Genome

4. Possible Further Analyses

Requirements

How to Run

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

COVID-19 Genome Analysis using Biopython

Overview

Dataset

Code Explanation

1. Fetching the Genome from NCBI

2. Extracting the Genome Sequence

3. Analyzing the Genome

4. Possible Further Analyses

Requirements

How to Run

References

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages