Releases · fiberseq/FiberHMM

01 Apr 14:46

mtcicero26

v2.4.0

5e41e49

v2.4.0 Latest

Latest

What's New

`fiberhmm-daf-encode` — new CLI tool

Preprocessor for plain aligned DAF-seq BAMs (e.g. from minimap2). Identifies C→T / G→A deamination mismatches via the MD tag, encodes them as IUPAC Y/R in the query sequence, and adds st:Z strand tags — making the BAM ready for fiberhmm-apply --mode daf.

Usage:

# Basic
fiberhmm-daf-encode -i aligned.bam -o encoded.bam

# Streaming pipeline
fiberhmm-daf-encode -i aligned.bam -o - | \
    fiberhmm-apply --mode daf --streaming -i - -o output/

# Full pipeline from alignment
minimap2 --MD -a ref.fa reads.fq | samtools view -b | \
    fiberhmm-daf-encode -i - -o - | \
    fiberhmm-apply --mode daf --streaming -i - -o output/

Features:

Automatic per-read strand detection (or force with --strand CT/GA)
Reference FASTA fallback when MD tags are missing (--reference)
Streaming stdin/stdout support for piping
Preserves all existing tags (MM/ML for dual-labeling, etc.)
Sort + index on file output

Full Changelog: v2.3.1...v2.4.0

Assets 2

30 Mar 15:34

mtcicero26

v2.3.1

35010ab

v2.3.1

IUPAC R/Y DAF-seq Support

DAF-seq BAMs that encode deamination events as IUPAC ambiguity codes (R/Y) in the sequence instead of MM/ML tags are now auto-detected and processed under --mode daf.

Y in sequence marks deaminated C (+ strand, st:Z:CT)
R in sequence marks deaminated G (- strand, st:Z:GA)
No new CLI flags needed — auto-detection handles both encodings transparently
Both paths produce identical encoder output (verified on 3,929 real reads)

Assets 2

28 Mar 05:51

mtcicero26

v2.3.0

81e4878

v2.3.0

v2.3.0: Streaming Pipeline & Production Integration

New Features

Streaming producer-consumer pipeline — sliding-window architecture keeps multiple compute chunks in flight, enabling overlap of I/O and HMM inference. Works with unaligned/unindexed BAMs and stdin.
Stdout output (-o -) — pipe BAM directly to downstream tools (e.g., fiberhmm-apply -o - | ft fire)
pysam multithreaded I/O (--io-threads) — htslib decompression/compression threading for all processing modes
Auto-detection — stdin or missing BAM index automatically triggers streaming mode
Unaligned read processing (--process-unmapped) — process reads without alignment coordinates
Headerless BAM support (check_sq=False) — handle BAMs without sequence dictionaries

CLI Flags

--streaming — explicitly use streaming pipeline mode
--io-threads N — htslib decompression threads (default: 4)
--chunk-size N — reads per compute chunk (default: 500)
--process-unmapped — process unmapped reads with sequences

Bug Fixes

Fix region-parallel worker never writing footprint tags (indentation bug in _process_region_to_bam)
Skip sort/index for unaligned output

Test Suite

16 streaming correctness tests (order preservation, tag validity, determinism, edge cases)
4 cross-mode equivalence tests (streaming == region-parallel == legacy, r=1.0)
Benchmark suite: throughput, scaling, I/O vs compute, memory, pysam threads
Synthetic BAM generation fixtures with valid MM/ML tags

Performance

~350 reads/s on 200GB unaligned BAM over NAS (network storage)
~5,700 reads/s on local SSD with 4 cores
Memory bounded at ~30MB regardless of input size
Full fiberhmm | ft fire pipeline validated on 2.8M reads

Assets 2

20 Mar 20:58

mtcicero26

v2.2.0

f19c21d

v2.2.0

FiberHMM v2.2.0

Bug Fixes

Fix BAM tag types: as/ns/al/nl tags now correctly use B:I (unsigned 32-bit int arrays) per the Fiber-seq BAM format spec. Previously, pysam inferred B:C or B:S from value magnitudes, breaking compatibility with pyft and other fibertools ecosystem tools.
Remove unreliable BAM mode auto-detection: Mode is now resolved as command line > model metadata > pacbio-fiber default. The auto-detect heuristic misidentified BAMs with both 5mC and m6A tags as DAF-seq, causing confusing warnings.

Improvements

MSPs now match fibertools convention: Only nucleosome-sized footprints (>= 85bp by default) act as MSP boundaries. Small footprints are absorbed into surrounding MSPs, consistent with how fibertools defines MSPs.
Skip reason reporting: Both region-parallel and standard processing paths now track and print a summary of why reads were skipped (low MAPQ, too short, no modifications, no footprints, etc.)
Improved CLI help text: --min-mapq and --min-read-length descriptions now explain filtering behavior and how to override defaults.
Read filtering documentation: README now includes a table of all skip reasons, default thresholds, and override flags.

New Flags

--no-msps — Suppress as/al/aq MSP tag output. Useful for Fiber-seq workflows where MSPs are computed separately by fibertools.
--nuc-min-size (default: 85) — Minimum footprint size to count as nucleosome-sized for MSP boundary detection.

Assets 2

06 Mar 15:46

mtcicero26

v2.1.0

be05da2

FiberHMM v2.1.0

What's New

Posteriors Export (`fiberhmm-posteriors`)

New standalone CLI for exporting per-position HMM posterior probabilities (P(footprint) per position per read).

Two output formats:

Gzipped TSV — no extra dependencies, base64-encoded uint8 posteriors
HDF5 — streaming batched writes, requires pip install h5py

Format is auto-detected from file extension (.tsv.gz → TSV, .h5/.hdf5 → HDF5), or set explicitly with --format.

# TSV (no extra deps)
fiberhmm-posteriors -i tagged.bam -m model.json -o posteriors.tsv.gz -c 4

# HDF5 (requires h5py)
fiberhmm-posteriors -i tagged.bam -m model.json -o posteriors.h5 -c 4

Other Changes

New optional dependency group: pip install fiberhmm[posteriors] (installs h5py)
pip install fiberhmm[all] now includes h5py
Removed auxiliary DAF-seq preprocessing examples
--output-posteriors flag on fiberhmm-apply now auto-activates when posteriors package is present

Install / Upgrade

pip install --upgrade fiberhmm

Assets 2

22 Feb 21:05

mtcicero26

v2.0.0

34a2d4b

FiberHMM v2.0.0

Complete rewrite of FiberHMM as a proper Python package.

What's New

Python package: pip install fiberhmm with CLI entry points (fiberhmm-apply, fiberhmm-train, fiberhmm-probs, fiberhmm-extract, fiberhmm-utils)
Native HMM: No hmmlearn dependency; optional Numba JIT for ~10x speedup
Region-parallel processing: Scales linearly with cores (-c 8)
Fibertools-compatible output: Tagged BAM with ns/nl/as/al tags
Pre-trained models: Hia5 (PacBio + Nanopore), DddA (PacBio), DddB (Nanopore)
Consolidated utilities: fiberhmm-utils with convert, inspect, transfer, adjust subcommands
JSON model format: Portable, human-readable; legacy pickle/NPZ still supported for loading

Pre-trained Models

Model	Enzyme	Platform	Mode
`hia5_pacbio.json`	Hia5 (m6A)	PacBio	`pacbio-fiber`
`hia5_nanopore.json`	Hia5 (m6A)	Nanopore	`nanopore-fiber`
`ddda_pacbio.json`	DddA (deamination)	PacBio	`daf`
`dddb_nanopore.json`	DddB (deamination)	Nanopore	`daf`

Quick Start

pip install fiberhmm

# Call footprints with a pre-trained model
fiberhmm-apply -i experiment.bam -m models/hia5_pacbio.json -o output/ -c 8

See the README for full documentation.

Assets 2

19 Feb 15:05

mtcicero26

v1.4

67dcb70

v1.4

Updated the main scripts to account for weird chromosome names in the genome assembly. Now, it substitutes unallowed characters for the h5 file with additional underscores, and then downstream encodes the chromosome names to match. Also, simplified the chromosome parsing to try to match existing tools (remove '>' and read until first whitespace). Began implementing option to not include specific chromosomes or only use certain ones.

Assets 2

12 Nov 17:39

mtcicero26

v1.3.2

04815f7

Version 1.3.2

Set parameter "starting_it" to -1 by default, as otherwise the first chunk of the bed file would be skipped.

Assets 2

12 Nov 04:21

mtcicero26

v1.3.1

7127a6a

Version 1.3.1

Added a parameter -d to apply_model_multiprocess.py. This script has a tendency to hang unexpectedly (especially when using many CPU cores). Using this parameter, you can specify an existing temporary directory with the footprint-bed file chunks from a previous, failed run (in your original outdir). The script will then read and skip quickly past chunks of the m6a bed already footprint called, and then resume where it had left off.

Assets 2

11 Nov 18:39

mtcicero26

v1.3

4e53d7d

FiberHMM v1.3

Added a new parameter, -e to train and apply model scripts. This allows the user to set a minimum level of methylation required for a read to be used in training or to be kept after the model application. This is helpful if there are a subset of reads with very low methylation due to experimental issues.

Adjusted apply and train model scripts to use the reference-based position of m6a instead of the read-based position. This resolves issues related to poorly aligned reads having methylations and footprints outside of the expected range.

Assets 2

Releases: fiberseq/FiberHMM

v2.4.0

What's New

fiberhmm-daf-encode — new CLI tool

Uh oh!

v2.3.1

IUPAC R/Y DAF-seq Support

Uh oh!

v2.3.0

v2.3.0: Streaming Pipeline & Production Integration

New Features

CLI Flags

Bug Fixes

Test Suite

Performance

Uh oh!

v2.2.0

FiberHMM v2.2.0

Bug Fixes

Improvements

New Flags

Uh oh!

FiberHMM v2.1.0

What's New

Posteriors Export (fiberhmm-posteriors)

Other Changes

Install / Upgrade

Uh oh!

FiberHMM v2.0.0

What's New

Pre-trained Models

Quick Start

Uh oh!

v1.4

Uh oh!

Version 1.3.2

Uh oh!

Version 1.3.1

Uh oh!

FiberHMM v1.3

Uh oh!

`fiberhmm-daf-encode` — new CLI tool

Posteriors Export (`fiberhmm-posteriors`)