Skip to content

OlmLab/ZipStrain

Repository files navigation

ZipStrain

Strain-resolution metagenomics at scale.

ZipStrain is a Python package specializing in strain-level metagenomic analysis. It profiles mapped reads into per-position nucleotide counts and compares metagenomic samples at genome and gene scope. ZipStrain is designed for large datasets, with an accompanying nextflow pipeline (See the documentation).

Documentation Tests Conda Docker Coverage Python 3.12+

ZipStrain logo

Developed by Parsa Ghadermazi and the Olm Lab, University of Colorado Boulder.

Overview

ZipStrain workflow

ZipStrain provides a full workflow for strain-level metagenomic analysis:

  • Profile BAM files into nucleotide-resolution A/C/G/T count tables
  • Generate companion genome and gene coverage summaries during profiling
  • Compare samples with popANI, conANI, cosANI, IBS, and identical-gene metrics
  • Run pairwise comparisons with Polars or DuckDB engines
  • Scale out with resumable local or Slurm-backed task execution
  • Run end-to-end Nextflow pipelines from local reads or SRA accessions
  • Build reference bundles directly from Sylph abundance tables
  • Use the CLI for production workflows or the Python API for custom analysis

Installation

Install from PyPI:

pip install zipstrain
zipstrain --version
zipstrain test

ZipStrain requires Python 3.12+. If you install with pip, install samtools separately.

Other supported installation paths:

  • Conda: conda install -c conda-forge -c bioconda -c defaults zipstrain
  • Docker: docker run -it parsaghadermazi/zipstrain:<version> zipstrain test
  • Apptainer: apptainer run docker://parsaghadermazi/zipstrain:<version> zipstrain test

More details: Installation Guide

Command Layout

Command Purpose
zipstrain profile Batch-profile multiple BAM files
zipstrain compare genomes Batch genome-level comparisons
zipstrain compare genes Batch gene-level comparisons
zipstrain utilities ... Single-sample tools, preparation helpers, format conversion, and database builders
zipstrain test Validate the local installation

Quick Start

1. Prepare profiling assets

zipstrain utilities prepare_profiling \
  --reference-fasta reference_genomes.fna \
  --gene-fasta reference_genomes_gene.fasta \
  --stb-file reference_genomes.stb \
  --output-dir profiling_assets

2. Build a null model

zipstrain utilities build-null-model \
  --error-rate 0.001 \
  --max-total-reads 10000 \
  --p-threshold 0.05 \
  --output-file null_model.parquet

3. Profile BAM files in batch

samples.csv must contain sample_name and bamfile columns.

zipstrain profile \
  --input-table samples.csv \
  --stb-file reference_genomes.stb \
  --null-model null_model.parquet \
  --gene-range-table profiling_assets/gene_range_table.tsv \
  --bed-file profiling_assets/genomes_bed_file.bed \
  --genome-length-file profiling_assets/genome_lengths.parquet \
  --run-dir profile_run

4. Run genome comparisons

After preparing a comparison object, launch batched genome comparisons:

zipstrain compare genomes \
  --genome-comparison-object genome_compare.json \
  --run-dir genome_compare_run \
  --calculate all

For ANI-only runs:

zipstrain compare genomes \
  --genome-comparison-object genome_compare.json \
  --run-dir genome_compare_run \
  --calculate ani

Set --engine duckdb to switch the genome-compare backend from the default Polars engine.

Comparison-object creation and single-pair helpers live under zipstrain utilities. The full command reference is linked below.

Nextflow Workflows

ZipStrain ships with Nextflow workflows for:

  • read mapping
  • BAM profiling
  • SRA-to-profile processing
  • genome comparisons
  • gene comparisons

Example:

nextflow run zipstrain.nf \
  --mode profile \
  --input_table bams.csv \
  --reference_genome reference_genomes.fna \
  --gene_file reference_genomes_gene.fasta \
  --stb reference_genomes.stb \
  --output_dir out_profile \
  -c conf.config \
  -profile docker \
  -resume

See: Nextflow Pipeline Guide

Documentation

Full documentation is available at:

OlmLab.github.io/ZipStrain

Page What it covers
Installation PyPI, Conda, Docker, and Apptainer setup
CLI Reference Commands, options, and workflow layout
Tutorial End-to-end usage examples
Nextflow Pipelines Cluster-ready workflows
Build Genome DB Build reference bundles from Sylph output
Python API Programmatic usage

Citation

If you use ZipStrain in your research, please cite the project and check back here for the formal citation entry.

License

ZipStrain is distributed under the terms described in LICENSE.

About

Official Repository for ZipStrain python package

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors