Skip to content

HeilemannLab/SMLMdataAnalyser

Repository files navigation

SMLMdataAnalyser

Automatic tool for analysing SMLM data

Overview

SMLMdataAnalyser is a pipeline for automated analysis of single-molecule localization microscopy (SMLM) pre-clustered datasets.

The tool processes multiple experimental conditions and protein targets simultaneously and extracts quantitative information from localization data. The pipeline requires cluster centers HDF5 files (can be generated in Picasso software).

The resulting metrics can be used to assess protein density, oligomerization states, spatial organization, and interaction probabilities.

Instalation

The required Python environment can be installed using conda and the provided environment file (environment.yaml).

  1. Clone the repository:
git clone https://github.com/HeilemannLab/SMLMdataAnalyser.git
cd SMLMdataAnalyser
  1. Create conda enviroment and install required packages:
conda env create -f environment.yaml
  1. Activate the environment:
conda activate SMLMdataAnalyser

The environment was tested with Python 3.8.

Usage

Download scripts and create conda env -> Organize input data -> Adjust config.yaml file -> run main.py to run the analysis -> run plots.py to get plots of analyzed data

Data Structure

Input Directory Organization

Organize your data in the following structure:

input_directory/
├── condition_1/
│   ├── protein_A/
│   │   ├── file1.hdf5
│   │   ├── file1_mask.png
│   │   ├── file2.hdf5
│   │   └── file2_mask.png
│   └── protein_B/
│       ├── file1.hdf5
│       ├── file1_mask.png
│       ├── file2.hdf5
│       └── file2_mask.png
└── condition_2/
    ├── protein_A/
    └── protein_B/

File Formats

  • HDF5 files: Clustered centers files contain localization data with fields: x, y, n, area, convexhull, group
  • YAML files: Contain metadata of corresponding HDF5 file
  • Mask files: PNG images defining cell boundaries (255 = inside cell, 0 = outside). Can be generated in ImageJ/Fiji or Picasso software

Configuration File (config.yaml)

  • Set input and output directories:
input_dir: 'path/to/input/directory'
output_dir: "path/to/output/directory"
output_plots: "path/to/plots/directory"
  • Define analysis steps that will be performed
    • Dark-time analysis requires dark_mean and Td columns (these will be calculated automatically in a future release).
    • Cross-colocalization requires at least two proteins; self-colocalization is always performed.
dark_time_analysis: True
cross_coloc: True
patching: True
simulation: True
  • Set parameters for analysis
analysis:
  pixel_size: 158
  coloc_threshold: 100   # in nm
  patch_size: 3000      # in nm

Analysis Pipeline

1. Data Loading and Preprocessing

Both self- and cross-colocalization analyses are performed using KDTree algorithms from the scikit-learn library. Mask files are used to restrict analysis to cell boundaries. Additional masks can be provided for dark-time analysis to calibrate the signal from nonspecific antibody binding outside the cell.

2. Colocalization Analysis

For both, self and cross colocalization, uses KDtree algorithms from scikit-learn library.

For each localization, the algorithm:

  • identifies neighboring localizations within the specified distance threshold
  • extracts their distances
  • records the indices of neighboring molecules

These results are used to quantify molecular proximity and interaction probabilities.

3. Simulated Data Generation

To validate the statistical significance of observed spatial patterns, the pipeline generates simulated datasets by randomly placing localizations within cell boundaries. The same colocalization analysis is then applied to the simulated data, enabling comparison between experimental and randomized distributions.

Data visualization

A separate visualization script is provided to generate summary plots from the analysis output. Currently, visualization must be executed independently from the main analysis pipeline. In future releases, it will be integrated into the main workflow.

The visualization module can generate plots including:

  • clusters and localizations density distributions

  • colocalization distance distributions

  • dark-time histograms

  • cluster size and oligomerization statistics

  • comparison between experimental and simulated datasets

  • statistical comparison between different conditions and protein targets

Plots are saved to the directory specified in output_plots.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

Citation

If you use this pipeline in your research, please cite:

Alexandra Kaminer, Yunqing Li, Hans-Dieter Barth, Marina S. Dietz, Mike Heilemann.
Quantitative mapping of nanoscale EGFR–Grb2 assemblies by DNA-PAINT
bioRxiv 2026.02.16.706070; doi: https://doi.org/10.64898/2026.02.16.706070

About

Automatic tool for analysing SMLM data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages