Hands-on Metagenome Analysis: A Parkinson's Disease Case Study

Workshop Description

The workshop will analyze shotgun sequenced human gut metagenomes from Parkinson's Disease and Control subjects. We will introduce Nextflow pipeline Tofu-MAaPO as a tool to automatically perform metagenomic profiling, both at taxonomic and functional levels, from raw sequenced reads. Then, we will jump to a post-analysis using the R environment. We will compare the (dis-)similarities between gut microbiomes, investigate their alpha diversity and perform multivariate differential abundance tests to detect changes in the gut microbiome of diseased subjects. This workshop will allow the participants to gather experience in microbiome analysis using state of the art tools which cover an end-to-end workflow from reads to biological insights.

Pre-Workshop Materials

To prepare for this workshop, we recommend reviewing the following resources:

Essential Reading

Starting a Metagenomics Project – Data Processing and Visualization for Metagenomics
Overview of metagenomic data processing and visualization approaches
NFDI4 microbiota KMC workshop
Our materials from the NFDI4 microbiota KMC workshop, covering amplicon sequencing analysis
Introduction to R
Basic R programming for microbiome analysis

Additional Resources

Seqera Command Line Interface Documentation
Understanding workflow execution with Nextflow/Seqera

Workshop Structure

This hands-on workshop uses Jupyter notebooks with R kernel to perform interactive microbiome analysis. The workshop is divided into sequential parts, each building on the previous one.

Environment Setup

Before starting the workshop, you need to install the required R environment:

Install the conda environment from the provided YAML file:

conda env create -f env_workshopDK2026.yml
conda activate r_workshopDK2026

Launch Jupyter notebook:
```
jupyter notebook
```
Open the notebooks in the Scripts/ folder sequentially.

Scripts Overview

The Scripts/ folder contains the following Jupyter notebooks (R kernel):

Part 2: Setup and Data Loading

File: 2_setup_and_data_loading.ipynb

Load required R packages (tidyverse, phyloseq, vegan, ggpubr, DESeq2)
Import pre-processed MetaPhlAn4 taxonomic abundance data
Create phyloseq object from taxonomy and metadata
Filter low-abundance taxa for downstream analysis

Part 3: Taxonomic Composition & Visualization

3.1 Explore the Data

File: 3.1_explore_data.ipynb

Inspect phyloseq object structure (samples, taxa, metadata)
Examine taxonomy table (Kingdom → Species)
Check abundance matrix dimensions and read counts
Summarize sample metadata (age, sex, disease status)
Calculate unique taxa per taxonomic rank

3.2 Composition Barplots

File: 3.2-composition_barplot.ipynb

Visualize taxonomic composition of PD and Control samples with stacked barplots
Compare raw counts versus relative abundances to account for sequencing depth differences
Explore broad taxonomic patterns at the Phylum level across disease groups
Practice adapting the same approach to the Family level as an exercise

3.3 Heatmap of Top Species

File: 3.3-heatmap_top_species.ipynb

Identify the most prevalent and most variable microbial species across samples
Apply Z-score transformation to selected species abundances for cross-sample comparison
Build annotated heatmaps with sample metadata and taxonomic labels
Use clustering to reveal species and sample patterns associated with PD and Control groups

Part 4: Diversity Analysis

4.1 Alpha Diversity

File: 4.1_alpha_diversity.ipynb

Calculate alpha diversity metrics:
- Observed species (richness)
- Shannon index (diversity with evenness)
- Simpson index (dominance)
Compare filtered vs unfiltered data
Statistical testing:
- Wilcoxon rank-sum test (non-parametric)
- Linear models with covariates (Age, Sex, Case_status)
Visualize diversity differences with boxplots
Interpret clinical significance of diversity changes

4.2 Beta Diversity

File: 4.2-beta_diversity.ipynb

Compute between-sample dissimilarities using Bray-Curtis distance
Visualize sample relationships with distance heatmaps and NMDS ordination plots
Test whether PD and Control microbiomes differ in overall community composition
Perform PERMANOVA models with and without covariate adjustment

4.3 Differential Abundance

File: 4.3-differential_abundance.ipynb

Run MaAsLin2 to identify taxa associated with PD versus Control status
Include clinical covariates in multivariable differential abundance models
Inspect effect sizes, p-values, and q-values from the association results
Summarize candidate PD-associated species with tables, volcano plots, and a follow-up heatmap exercise

Dataset Information

Study: PRJNA834801
Subset: 40 samples (20 Parkinson's Disease, 20 Healthy Controls)
Sequencing: Shotgun metagenomics (Illumina paired-end)
Sample type: Human gut microbiome (stool)
Metadata: Age, Sex, Disease status

Pipeline: TOFU-MAaPO (Nextflow)

Requirements

Software

R ≥ 4.2
Jupyter notebook
Conda (for environment management)

Support

For questions or issues, please open an issue on this repository or contact the instructors. o.brovkina@ikmb.uni-kiel.de a.quevedo@ikmb.uni-kiel.de

License

This workshop material is provided for educational purposes. The dataset is from PRJNA834801.

Acknowledgments

Pipeline: TOFU-MAaPO (Taxonomic and Functional Microbiome Analysis Pipeline)
Data: Parkinson's Disease gut microbiome study (PRJNA834801)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Data		Data
Scripts		Scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env_workshopDK2026.yml		env_workshopDK2026.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hands-on Metagenome Analysis: A Parkinson's Disease Case Study

Workshop Description

Pre-Workshop Materials

Essential Reading

Additional Resources

Workshop Structure

Environment Setup

Scripts Overview

Part 2: Setup and Data Loading

Part 3: Taxonomic Composition & Visualization

3.1 Explore the Data

3.2 Composition Barplots

3.3 Heatmap of Top Species

Part 4: Diversity Analysis

4.1 Alpha Diversity

4.2 Beta Diversity

4.3 Differential Abundance

Dataset Information

Requirements

Software

Support

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hands-on Metagenome Analysis: A Parkinson's Disease Case Study

Workshop Description

Pre-Workshop Materials

Essential Reading

Additional Resources

Workshop Structure

Environment Setup

Scripts Overview

Part 2: Setup and Data Loading

Part 3: Taxonomic Composition & Visualization

3.1 Explore the Data

3.2 Composition Barplots

3.3 Heatmap of Top Species

Part 4: Diversity Analysis

4.1 Alpha Diversity

4.2 Beta Diversity

4.3 Differential Abundance

Dataset Information

Requirements

Software

Support

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages