The workshop will analyze shotgun sequenced human gut metagenomes from Parkinson's Disease and Control subjects. We will introduce Nextflow pipeline Tofu-MAaPO as a tool to automatically perform metagenomic profiling, both at taxonomic and functional levels, from raw sequenced reads. Then, we will jump to a post-analysis using the R environment. We will compare the (dis-)similarities between gut microbiomes, investigate their alpha diversity and perform multivariate differential abundance tests to detect changes in the gut microbiome of diseased subjects. This workshop will allow the participants to gather experience in microbiome analysis using state of the art tools which cover an end-to-end workflow from reads to biological insights.
To prepare for this workshop, we recommend reviewing the following resources:
-
Starting a Metagenomics Project – Data Processing and Visualization for Metagenomics
Overview of metagenomic data processing and visualization approaches -
NFDI4 microbiota KMC workshop
Our materials from the NFDI4 microbiota KMC workshop, covering amplicon sequencing analysis -
Introduction to R
Basic R programming for microbiome analysis
- Seqera Command Line Interface Documentation
Understanding workflow execution with Nextflow/Seqera
This hands-on workshop uses Jupyter notebooks with R kernel to perform interactive microbiome analysis. The workshop is divided into sequential parts, each building on the previous one.
Before starting the workshop, you need to install the required R environment:
-
Install the conda environment from the provided YAML file:
conda env create -f env_workshopDK2026.yml conda activate r_workshopDK2026
-
Launch Jupyter notebook:
jupyter notebook
-
Open the notebooks in the
Scripts/folder sequentially.
The Scripts/ folder contains the following Jupyter notebooks (R kernel):
File: 2_setup_and_data_loading.ipynb
- Load required R packages (tidyverse, phyloseq, vegan, ggpubr, DESeq2)
- Import pre-processed MetaPhlAn4 taxonomic abundance data
- Create phyloseq object from taxonomy and metadata
- Filter low-abundance taxa for downstream analysis
File: 3.1_explore_data.ipynb
- Inspect phyloseq object structure (samples, taxa, metadata)
- Examine taxonomy table (Kingdom → Species)
- Check abundance matrix dimensions and read counts
- Summarize sample metadata (age, sex, disease status)
- Calculate unique taxa per taxonomic rank
File: 3.2-composition_barplot.ipynb
- Visualize taxonomic composition of PD and Control samples with stacked barplots
- Compare raw counts versus relative abundances to account for sequencing depth differences
- Explore broad taxonomic patterns at the Phylum level across disease groups
- Practice adapting the same approach to the Family level as an exercise
File: 3.3-heatmap_top_species.ipynb
- Identify the most prevalent and most variable microbial species across samples
- Apply Z-score transformation to selected species abundances for cross-sample comparison
- Build annotated heatmaps with sample metadata and taxonomic labels
- Use clustering to reveal species and sample patterns associated with PD and Control groups
File: 4.1_alpha_diversity.ipynb
- Calculate alpha diversity metrics:
- Observed species (richness)
- Shannon index (diversity with evenness)
- Simpson index (dominance)
- Compare filtered vs unfiltered data
- Statistical testing:
- Wilcoxon rank-sum test (non-parametric)
- Linear models with covariates (Age, Sex, Case_status)
- Visualize diversity differences with boxplots
- Interpret clinical significance of diversity changes
File: 4.2-beta_diversity.ipynb
- Compute between-sample dissimilarities using Bray-Curtis distance
- Visualize sample relationships with distance heatmaps and NMDS ordination plots
- Test whether PD and Control microbiomes differ in overall community composition
- Perform PERMANOVA models with and without covariate adjustment
File: 4.3-differential_abundance.ipynb
- Run MaAsLin2 to identify taxa associated with PD versus Control status
- Include clinical covariates in multivariable differential abundance models
- Inspect effect sizes, p-values, and q-values from the association results
- Summarize candidate PD-associated species with tables, volcano plots, and a follow-up heatmap exercise
Study: PRJNA834801
Subset: 40 samples (20 Parkinson's Disease, 20 Healthy Controls)
Sequencing: Shotgun metagenomics (Illumina paired-end)
Sample type: Human gut microbiome (stool)
Metadata: Age, Sex, Disease status
Pipeline: TOFU-MAaPO (Nextflow)
- R ≥ 4.2
- Jupyter notebook
- Conda (for environment management)
For questions or issues, please open an issue on this repository or contact the instructors. o.brovkina@ikmb.uni-kiel.de a.quevedo@ikmb.uni-kiel.de
This workshop material is provided for educational purposes. The dataset is from PRJNA834801.
- Pipeline: TOFU-MAaPO (Taxonomic and Functional Microbiome Analysis Pipeline)
- Data: Parkinson's Disease gut microbiome study (PRJNA834801)