Open
Conversation
Add new nf-core module wrapping `modkit extract full`, which transforms the MM/ML tags in a modBAM into a tab-separated per-read-per-position probability table. Output can be BGZF-compressed via `--bgzf` in `ext.args`. Useful for downstream custom filtering, plotting, and ML training on read-level methylation probabilities. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3c8814f to
46887a7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR checklist
Summary
Adds a new nf-core module wrapping `modkit extract full`, which transforms the MM/ML tags in a modBAM into a tab-separated per-read-per-position probability table. Emits one row for every modified-base probability call in every read.
The module auto-detects `--bgzf` in `ext.args` and adjusts the output filename suffix accordingly (`.tsv` vs `.tsv.gz`), so users don't get a misleading extension when enabling compression.
Why
`modkit extract full` is the source of truth for read-level methylation probabilities and is essential for custom downstream filtering, phased methylation plots, and ML training on raw probability distributions. Paired with `modkit extract calls` (companion PR) which emits thresholded categorical calls.
Test data
Uses the existing `test.sorted.phased.bam` from nf-core/test-datasets (modules branch). No new test data required.
🤖 Generated with Claude Code