ZipStrain

Strain-resolution metagenomics at scale.

ZipStrain is a Python package specializing in strain-level metagenomic analysis. It profiles mapped reads into per-position nucleotide counts and compares metagenomic samples at genome and gene scope. ZipStrain is designed for large datasets, with an accompanying nextflow pipeline (See the documentation).

Developed by Parsa Ghadermazi and the Olm Lab, University of Colorado Boulder.

Overview

ZipStrain provides a full workflow for strain-level metagenomic analysis:

Profile BAM files into nucleotide-resolution A/C/G/T count tables
Generate companion genome and gene coverage summaries during profiling
Compare samples with popANI, conANI, cosANI, IBS, and identical-gene metrics
Run pairwise comparisons with Polars or DuckDB engines
Scale out with resumable local or Slurm-backed task execution
Run end-to-end Nextflow pipelines from local reads or SRA accessions
Build reference bundles directly from Sylph abundance tables
Use the CLI for production workflows or the Python API for custom analysis

Installation

Install from PyPI:

pip install zipstrain
zipstrain --version
zipstrain test

ZipStrain requires Python 3.12+. If you install with pip, install samtools separately.

Other supported installation paths:

Conda: conda install -c conda-forge -c bioconda -c defaults zipstrain
Docker: docker run -it parsaghadermazi/zipstrain:<version> zipstrain test
Apptainer: apptainer run docker://parsaghadermazi/zipstrain:<version> zipstrain test

More details: Installation Guide

Command Layout

Command	Purpose
`zipstrain profile`	Batch-profile multiple BAM files
`zipstrain compare genomes`	Batch genome-level comparisons
`zipstrain compare genes`	Batch gene-level comparisons
`zipstrain utilities ...`	Single-sample tools, preparation helpers, format conversion, and database builders
`zipstrain test`	Validate the local installation

Quick Start

1. Prepare profiling assets

zipstrain utilities prepare_profiling \
  --reference-fasta reference_genomes.fna \
  --gene-fasta reference_genomes_gene.fasta \
  --stb-file reference_genomes.stb \
  --output-dir profiling_assets

2. Build a null model

zipstrain utilities build-null-model \
  --error-rate 0.001 \
  --max-total-reads 10000 \
  --p-threshold 0.05 \
  --output-file null_model.parquet

3. Profile BAM files in batch

samples.csv must contain sample_name and bamfile columns.

zipstrain profile \
  --input-table samples.csv \
  --stb-file reference_genomes.stb \
  --null-model null_model.parquet \
  --gene-range-table profiling_assets/gene_range_table.tsv \
  --bed-file profiling_assets/genomes_bed_file.bed \
  --genome-length-file profiling_assets/genome_lengths.parquet \
  --run-dir profile_run

4. Run genome comparisons

After preparing a comparison object, launch batched genome comparisons:

zipstrain compare genomes \
  --genome-comparison-object genome_compare.json \
  --run-dir genome_compare_run \
  --calculate all

For ANI-only runs:

zipstrain compare genomes \
  --genome-comparison-object genome_compare.json \
  --run-dir genome_compare_run \
  --calculate ani

Set --engine duckdb to switch the genome-compare backend from the default Polars engine.

Comparison-object creation and single-pair helpers live under zipstrain utilities. The full command reference is linked below.

Nextflow Workflows

ZipStrain ships with Nextflow workflows for:

read mapping
BAM profiling
SRA-to-profile processing
genome comparisons
gene comparisons

Example:

nextflow run zipstrain.nf \
  --mode profile \
  --input_table bams.csv \
  --reference_genome reference_genomes.fna \
  --gene_file reference_genomes_gene.fasta \
  --stb reference_genomes.stb \
  --output_dir out_profile \
  -c conf.config \
  -profile docker \
  -resume

See: Nextflow Pipeline Guide

Documentation

Full documentation is available at:

OlmLab.github.io/ZipStrain

Page	What it covers
Installation	PyPI, Conda, Docker, and Apptainer setup
CLI Reference	Commands, options, and workflow layout
Tutorial	End-to-end usage examples
Nextflow Pipelines	Cluster-ready workflows
Build Genome DB	Build reference bundles from Sylph output
Python API	Programmatic usage

Citation

If you use ZipStrain in your research, please cite the project and check back here for the formal citation entry.

License

ZipStrain is distributed under the terms described in LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 178 Commits
.githooks		.githooks
.github		.github
Scripts		Scripts
dockerfile		dockerfile
docs		docs
zipstrain		zipstrain
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
conf.config		conf.config
mkdocs.yml		mkdocs.yml
zipstrain.nf		zipstrain.nf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ZipStrain

Overview

Installation

Command Layout

Quick Start

1. Prepare profiling assets

2. Build a null model

3. Profile BAM files in batch

4. Run genome comparisons

Nextflow Workflows

Documentation

Citation

License

About

Uh oh!

Releases 10

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ZipStrain

Overview

Installation

Command Layout

Quick Start

1. Prepare profiling assets

2. Build a null model

3. Profile BAM files in batch

4. Run genome comparisons

Nextflow Workflows

Documentation

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages