SMLMdataAnalyser

Automatic tool for analysing SMLM data

Overview

SMLMdataAnalyser is a pipeline for automated analysis of single-molecule localization microscopy (SMLM) pre-clustered datasets.

The tool processes multiple experimental conditions and protein targets simultaneously and extracts quantitative information from localization data. The pipeline requires cluster centers HDF5 files (can be generated in Picasso software).

The resulting metrics can be used to assess protein density, oligomerization states, spatial organization, and interaction probabilities.

Instalation

The required Python environment can be installed using conda and the provided environment file (environment.yaml).

Clone the repository:

git clone https://github.com/HeilemannLab/SMLMdataAnalyser.git
cd SMLMdataAnalyser

Create conda enviroment and install required packages:

conda env create -f environment.yaml

Activate the environment:

conda activate SMLMdataAnalyser

The environment was tested with Python 3.8.

Usage

Download scripts and create conda env -> Organize input data -> Adjust config.yaml file -> run main.py to run the analysis -> run plots.py to get plots of analyzed data

Data Structure

Input Directory Organization

Organize your data in the following structure:

input_directory/
├── condition_1/
│   ├── protein_A/
│   │   ├── file1.hdf5
│   │   ├── file1_mask.png
│   │   ├── file2.hdf5
│   │   └── file2_mask.png
│   └── protein_B/
│       ├── file1.hdf5
│       ├── file1_mask.png
│       ├── file2.hdf5
│       └── file2_mask.png
└── condition_2/
    ├── protein_A/
    └── protein_B/

File Formats

HDF5 files: Clustered centers files contain localization data with fields: x, y, n, area, convexhull, group
YAML files: Contain metadata of corresponding HDF5 file
Mask files: PNG images defining cell boundaries (255 = inside cell, 0 = outside). Can be generated in ImageJ/Fiji or Picasso software

Configuration File (`config.yaml`)

Set input and output directories:

input_dir: 'path/to/input/directory'
output_dir: "path/to/output/directory"
output_plots: "path/to/plots/directory"

Define analysis steps that will be performed
- Dark-time analysis requires dark_mean and Td columns (these will be calculated automatically in a future release).
- Cross-colocalization requires at least two proteins; self-colocalization is always performed.

dark_time_analysis: True
cross_coloc: True
patching: True
simulation: True

Set parameters for analysis

analysis:
  pixel_size: 158
  coloc_threshold: 100   # in nm
  patch_size: 3000      # in nm

Analysis Pipeline

1. Data Loading and Preprocessing

Both self- and cross-colocalization analyses are performed using KDTree algorithms from the scikit-learn library. Mask files are used to restrict analysis to cell boundaries. Additional masks can be provided for dark-time analysis to calibrate the signal from nonspecific antibody binding outside the cell.

2. Colocalization Analysis

For both, self and cross colocalization, uses KDtree algorithms from scikit-learn library.

For each localization, the algorithm:

identifies neighboring localizations within the specified distance threshold
extracts their distances
records the indices of neighboring molecules

These results are used to quantify molecular proximity and interaction probabilities.

3. Simulated Data Generation

To validate the statistical significance of observed spatial patterns, the pipeline generates simulated datasets by randomly placing localizations within cell boundaries. The same colocalization analysis is then applied to the simulated data, enabling comparison between experimental and randomized distributions.

Data visualization

A separate visualization script is provided to generate summary plots from the analysis output. Currently, visualization must be executed independently from the main analysis pipeline. In future releases, it will be integrated into the main workflow.

The visualization module can generate plots including:

clusters and localizations density distributions
colocalization distance distributions
dark-time histograms
cluster size and oligomerization statistics
comparison between experimental and simulated datasets
statistical comparison between different conditions and protein targets

Plots are saved to the directory specified in output_plots.

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

Citation

If you use this pipeline in your research, please cite:

Alexandra Kaminer, Yunqing Li, Hans-Dieter Barth, Marina S. Dietz, Mike Heilemann.
Quantitative mapping of nanoscale EGFR–Grb2 assemblies by DNA-PAINT
bioRxiv 2026.02.16.706070; doi: https://doi.org/10.64898/2026.02.16.706070

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
collectors.py		collectors.py
colocalization.py		colocalization.py
config.yaml		config.yaml
dark_time_analysis.py		dark_time_analysis.py
environment.yaml		environment.yaml
main.py		main.py
plots.py		plots.py
req.txt		req.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMLMdataAnalyser

Overview

Instalation

Usage

Data Structure

Input Directory Organization

File Formats

Configuration File (`config.yaml`)

Analysis Pipeline

1. Data Loading and Preprocessing

2. Colocalization Analysis

3. Simulated Data Generation

Data visualization

Contributing

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SMLMdataAnalyser

Overview

Instalation

Usage

Data Structure

Input Directory Organization

File Formats

Configuration File (config.yaml)

Analysis Pipeline

1. Data Loading and Preprocessing

2. Colocalization Analysis

3. Simulated Data Generation

Data visualization

Contributing

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Configuration File (`config.yaml`)

Packages