A lightweight, reproducible workflow for processing Whole-Genome Bisulfite Sequencing (WGBS) data and performing DMR analysis, designed for epigenomic analysis and downstream integrative/regulatory modeling.
This repo provides a clean and transparent pipeline for:
- β Bisulfite read alignment (Bismark)
- β Deduplication (deduplicate_bismark)
- β Cytosine methylation extraction (bismark_methylation_extractor)
- β DMR calling (methylKit)
- β DMR annotation (GENCODE + CpG islands)
- β Visualization (Manhattan plots)
WGBS enables base-resolution DNA methylation measurement genome-wide, but turning raw FASTQs into analysis-ready methylation tables and interpretable DMR results can be tedious and error-prone.
This workflow is intentionally minimal (4 scripts, no heavy framework), but aims to be:
- reproducible (explicit inputs/outputs)
- portable (no hard-coded cluster paths in scripts)
- transparent (each step is a standalone script)
- compatible with downstream epigenomic / multi-omics integration
- Alignment
- DMR calling
- Annotation
- Plot
project/
βββ data/ # input FASTQs (not tracked)
βββ refs/ # reference files (not tracked)
β βββ gencode.v38.annotation.gtf
β βββ hg38.cpgs.island.txt
βββ scripts/ # pipeline scripts (tracked)
β βββ 01_alignment.sh
β βββ 02_dmr_calling.R
β βββ 03_annotation.R
β βββ 04_plot_manhattan.R
βββ results/ # outputs (not tracked)
βββ 01_alignment/
βββ 02_dmr/
βββ 03_annotation/
βββ 04_plots/
---
## π§ Requirements
- **Bismark** β₯ v0.22.3
- **Bowtie2**
- **samtools**
- **methylKit**
- Linux environment