Skip to content

ikmb/workshopDK2026

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hands-on Metagenome Analysis: A Parkinson's Disease Case Study

Workshop Description

The workshop will analyze shotgun sequenced human gut metagenomes from Parkinson's Disease and Control subjects. We will introduce Nextflow pipeline Tofu-MAaPO as a tool to automatically perform metagenomic profiling, both at taxonomic and functional levels, from raw sequenced reads. Then, we will jump to a post-analysis using the R environment. We will compare the (dis-)similarities between gut microbiomes, investigate their alpha diversity and perform multivariate differential abundance tests to detect changes in the gut microbiome of diseased subjects. This workshop will allow the participants to gather experience in microbiome analysis using state of the art tools which cover an end-to-end workflow from reads to biological insights.

Pre-Workshop Materials

To prepare for this workshop, we recommend reviewing the following resources:

Essential Reading

Additional Resources


Workshop Structure

This hands-on workshop uses Jupyter notebooks with R kernel to perform interactive microbiome analysis. The workshop is divided into sequential parts, each building on the previous one.

Environment Setup

Before starting the workshop, you need to install the required R environment:

  1. Install the conda environment from the provided YAML file:

    conda env create -f env_workshopDK2026.yml
    conda activate r_workshopDK2026
  2. Launch Jupyter notebook:

    jupyter notebook
  3. Open the notebooks in the Scripts/ folder sequentially.


Scripts Overview

The Scripts/ folder contains the following Jupyter notebooks (R kernel):

Part 2: Setup and Data Loading

File: 2_setup_and_data_loading.ipynb

  • Load required R packages (tidyverse, phyloseq, vegan, ggpubr, DESeq2)
  • Import pre-processed MetaPhlAn4 taxonomic abundance data
  • Create phyloseq object from taxonomy and metadata
  • Filter low-abundance taxa for downstream analysis

Part 3: Taxonomic Composition & Visualization

3.1 Explore the Data

File: 3.1_explore_data.ipynb

  • Inspect phyloseq object structure (samples, taxa, metadata)
  • Examine taxonomy table (Kingdom → Species)
  • Check abundance matrix dimensions and read counts
  • Summarize sample metadata (age, sex, disease status)
  • Calculate unique taxa per taxonomic rank

3.2 Composition Barplots

File: 3.2-composition_barplot.ipynb

  • Visualize taxonomic composition of PD and Control samples with stacked barplots
  • Compare raw counts versus relative abundances to account for sequencing depth differences
  • Explore broad taxonomic patterns at the Phylum level across disease groups
  • Practice adapting the same approach to the Family level as an exercise

3.3 Heatmap of Top Species

File: 3.3-heatmap_top_species.ipynb

  • Identify the most prevalent and most variable microbial species across samples
  • Apply Z-score transformation to selected species abundances for cross-sample comparison
  • Build annotated heatmaps with sample metadata and taxonomic labels
  • Use clustering to reveal species and sample patterns associated with PD and Control groups

Part 4: Diversity Analysis

4.1 Alpha Diversity

File: 4.1_alpha_diversity.ipynb

  • Calculate alpha diversity metrics:
    • Observed species (richness)
    • Shannon index (diversity with evenness)
    • Simpson index (dominance)
  • Compare filtered vs unfiltered data
  • Statistical testing:
    • Wilcoxon rank-sum test (non-parametric)
    • Linear models with covariates (Age, Sex, Case_status)
  • Visualize diversity differences with boxplots
  • Interpret clinical significance of diversity changes

4.2 Beta Diversity

File: 4.2-beta_diversity.ipynb

  • Compute between-sample dissimilarities using Bray-Curtis distance
  • Visualize sample relationships with distance heatmaps and NMDS ordination plots
  • Test whether PD and Control microbiomes differ in overall community composition
  • Perform PERMANOVA models with and without covariate adjustment

4.3 Differential Abundance

File: 4.3-differential_abundance.ipynb

  • Run MaAsLin2 to identify taxa associated with PD versus Control status
  • Include clinical covariates in multivariable differential abundance models
  • Inspect effect sizes, p-values, and q-values from the association results
  • Summarize candidate PD-associated species with tables, volcano plots, and a follow-up heatmap exercise

Dataset Information

Study: PRJNA834801
Subset: 40 samples (20 Parkinson's Disease, 20 Healthy Controls)
Sequencing: Shotgun metagenomics (Illumina paired-end)
Sample type: Human gut microbiome (stool)
Metadata: Age, Sex, Disease status

Pipeline: TOFU-MAaPO (Nextflow)


Requirements

Software

  • R ≥ 4.2
  • Jupyter notebook
  • Conda (for environment management)

Support

For questions or issues, please open an issue on this repository or contact the instructors. o.brovkina@ikmb.uni-kiel.de a.quevedo@ikmb.uni-kiel.de

License

This workshop material is provided for educational purposes. The dataset is from PRJNA834801.


Acknowledgments

  • Pipeline: TOFU-MAaPO (Taxonomic and Functional Microbiome Analysis Pipeline)
  • Data: Parkinson's Disease gut microbiome study (PRJNA834801)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors