mtmap.py is a Python-based tool designed for the systematic extraction, normalization, and visualization of mitochondrial genome annotations from GenBank files. The pipeline integrates data parsing, canonical gene normalization, and multi-format visualization to support comparative mitochondrial genomics, evolutionary studies, and molecular systematics.
The tool automatically generates:
- Linear mitochondrial genome maps, displaying coding sequences (CDS), rRNAs, and tRNAs.
- Annotation tables (
CSV) with standardized gene nomenclature. - Binary gene presence/absence matrices across multiple records.
- Heatmaps representing gene presence across species.
- Multi-format outputs (
PNG,SVG,PDF) suitable for publications or presentations.
Mitochondrial genomes (mtDNA) are a cornerstone in phylogenetics, population genetics, and molecular systematics. However, annotations from GenBank are often inconsistent due to synonym usage (COI, COX1, MT-CO1) and variation in feature naming. This tool addresses these issues by:
- Normalization of gene nomenclature: Standardizing synonyms to canonical forms (e.g.,
COI→COX1,COB→CYTB). - Anchored genome linearization: Re-aligning the circular mitochondrial genome based on a selected reference gene (e.g.,
COX1,ND1,CYTB, or a specific tRNA). - Comparative visualization: Producing presence/absence matrices and annotated maps for comparative mitochondrial genomics.
- Multi-format outputs: Allowing downstream use in both automated pipelines and human-readable figures.
- Flexible Input: Accepts single GenBank files or directories containing multiple entries.
- Robust Parsing: Extracts
CDS,rRNA, andtRNAfeatures, including metadata such as species and accession. - Gene Anchoring: Linearization of circular genomes based on user-defined anchor gene.
- Normalization Layer: Maps synonyms and heuristic product descriptions to canonical keys.
- Data Export:
- Complete annotations (
*_annotations.csv) - Gene presence/absence matrix (
*_presence.csv)
- Complete annotations (
- Visualization:
- Linear mitochondrial genome maps
- Heatmap of presence/absence for canonical mitochondrial genes
The presence/absence matrix evaluates canonical mitochondrial genes:
COX1, COX2, COX3,
ND1, ND2, ND3, ND4, ND4L, ND5, ND6,
CYTB, ATP6, ATP8, 12S, 16S
In addition, all tRNAs are included and represented with one-letter amino acid codes.
- Python ≥ 3.8
- Required libraries:
pip install biopython pandas matplotlib
mtmap.py # Main script
examples/ # Example GenBank files (user-provided)
html/ # MTMap — Mitochondrial Genome Mapper (Pyodide)
README.md # Documentation
python mtmap.py my_genome.gbk --out-prefix results/mtmappython mtmap.py genomes_directory/ --out-prefix results/comparisonpython mtmap.py genomes_directory/ --out-prefix results/anchored --anchor-gene COX1Given --out-prefix results/mtmap, the tool will generate:
-
Tables
mtmap_annotations.csv→ Full annotation tablemtmap_presence.csv→ Gene presence/absence matrixmtmap_normposdist.csv→ Distance matrix (normalized positions)
-
Plots
mtmap_map.(png|svg|pdf)→ Linear mitochondrial genome mapmtmap_presence.(png|svg|pdf)→ Presence/absence heatmapmtmap_normposdist.(png|svg|pdf)→ Distance heatmap
python mtmap.py examples/ --out-prefix results/mtDNA --anchor-gene COX1results/mtDNA_map.png→ Annotated mitochondrial genome maps.results/mtDNA_presence.csv→ Presence/absence matrix of canonical genes.results/mtDNA_presence.png→ Heatmap summarizing gene distribution.results/mtDNA_normposdist.csv→ Distance matrix (normalized positions).results/mtDNA_normposdist.png→ Distance heatmap.
- Normalization Strategy
- Synonym resolution: e.g.,
COI,COXI,MT-CO1→COX1. - Heuristic recognition based on product names (
NADH dehydrogenase subunit 4L→ND4L).
- Synonym resolution: e.g.,
- Visualization
- Coding genes: pastel color scheme (deterministic mapping).
- rRNAs: distinct category (
12S,16S). - tRNAs: pale yellow with one-letter amino acid labels.
- Robustness
- Handles multi-entry GenBank files.
- Fallback colors generated deterministically for unknown/non-target genes.
- Comparative genomics of mitochondrial gene content.
- Phylogenetic and evolutionary studies.
- Validation of mitochondrial genome assemblies.
- Educational visualization of mtDNA architecture.
If you use this tool in scientific work, please cite as:
LaBiOmicS (2025). Mitochondrial Map (mtmap.py): a Python tool for comparative mitochondrial genome visualization and annotation normalization. GitHub Repository.
Distributed under the MIT License.
You are free to use, modify, and distribute this tool with appropriate credit.
This section reproduces a complete run of mtmap.py using a panel of publicly available vertebrate mitochondrial GenBank files (uploaded to this repository's /examples directory). We anchor the circular mtDNA on COX1 for consistent linearization across species.
python mtmap.py /path/to/examples --out-prefix mtmap_demo --anchor-gene COX1
- The demo uses the following GenBank files: NC_000861_Salvelinus_alpinus.gbk, NC_000890_Mustelus_manazo.gbk, NC_000893_Amblyraja_radiata.gbk, NC_001131_Lampetra_fluviatilis.gbk, NC_001606_Cyprinus_carpio.gbk, NC_001626_Petromyzon_marinus.gbk, NC_001708_Protopterus_dolloi.gbk, NC_001717_Oncorhynchus_mykiss.gbk, NC_001727_Formosania_lacustris.gbk, NC_001778_Polypterus_ornatipinnis.gbk
- Anchoring to
COX1aligns the 0 bp position to the start ofCOX1(if present); records lacking the anchor remain unshifted.- Figures are exported in publication-ready formats (PNG 300 dpi, SVG with text preserved, and PDF).