Developed by Parsa Ghadermazi and the Olm Lab, University of Colorado Boulder.
ZipStrain provides a full workflow for strain-level metagenomic analysis:
- Profile BAM files into nucleotide-resolution A/C/G/T count tables
- Generate companion genome and gene coverage summaries during profiling
- Compare samples with popANI, conANI, cosANI, IBS, and identical-gene metrics
- Run pairwise comparisons with Polars or DuckDB engines
- Scale out with resumable local or Slurm-backed task execution
- Run end-to-end Nextflow pipelines from local reads or SRA accessions
- Build reference bundles directly from Sylph abundance tables
- Use the CLI for production workflows or the Python API for custom analysis
Install from PyPI:
pip install zipstrain
zipstrain --version
zipstrain testZipStrain requires Python 3.12+.
If you install with pip, install samtools separately.
Other supported installation paths:
- Conda:
conda install -c conda-forge -c bioconda -c defaults zipstrain - Docker:
docker run -it parsaghadermazi/zipstrain:<version> zipstrain test - Apptainer:
apptainer run docker://parsaghadermazi/zipstrain:<version> zipstrain test
More details: Installation Guide
| Command | Purpose |
|---|---|
zipstrain profile |
Batch-profile multiple BAM files |
zipstrain compare genomes |
Batch genome-level comparisons |
zipstrain compare genes |
Batch gene-level comparisons |
zipstrain utilities ... |
Single-sample tools, preparation helpers, format conversion, and database builders |
zipstrain test |
Validate the local installation |
zipstrain utilities prepare_profiling \
--reference-fasta reference_genomes.fna \
--gene-fasta reference_genomes_gene.fasta \
--stb-file reference_genomes.stb \
--output-dir profiling_assetszipstrain utilities build-null-model \
--error-rate 0.001 \
--max-total-reads 10000 \
--p-threshold 0.05 \
--output-file null_model.parquetsamples.csv must contain sample_name and bamfile columns.
zipstrain profile \
--input-table samples.csv \
--stb-file reference_genomes.stb \
--null-model null_model.parquet \
--gene-range-table profiling_assets/gene_range_table.tsv \
--bed-file profiling_assets/genomes_bed_file.bed \
--genome-length-file profiling_assets/genome_lengths.parquet \
--run-dir profile_runAfter preparing a comparison object, launch batched genome comparisons:
zipstrain compare genomes \
--genome-comparison-object genome_compare.json \
--run-dir genome_compare_run \
--calculate allFor ANI-only runs:
zipstrain compare genomes \
--genome-comparison-object genome_compare.json \
--run-dir genome_compare_run \
--calculate aniSet --engine duckdb to switch the genome-compare backend from the default Polars engine.
Comparison-object creation and single-pair helpers live under zipstrain utilities.
The full command reference is linked below.
ZipStrain ships with Nextflow workflows for:
- read mapping
- BAM profiling
- SRA-to-profile processing
- genome comparisons
- gene comparisons
Example:
nextflow run zipstrain.nf \
--mode profile \
--input_table bams.csv \
--reference_genome reference_genomes.fna \
--gene_file reference_genomes_gene.fasta \
--stb reference_genomes.stb \
--output_dir out_profile \
-c conf.config \
-profile docker \
-resumeFull documentation is available at:
| Page | What it covers |
|---|---|
| Installation | PyPI, Conda, Docker, and Apptainer setup |
| CLI Reference | Commands, options, and workflow layout |
| Tutorial | End-to-end usage examples |
| Nextflow Pipelines | Cluster-ready workflows |
| Build Genome DB | Build reference bundles from Sylph output |
| Python API | Programmatic usage |
If you use ZipStrain in your research, please cite the project and check back here for the formal citation entry.
ZipStrain is distributed under the terms described in LICENSE.
