Skip to content

JarningGau/spatio_DARLIN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spatio_DARLIN

This is a Snakemake pipeline for automated preprocessing of spatial lineage tracing data from SpatioDARLIN. This repository is a fork of snakemake_DARLIN, modified to support spatial lineage tracing data generated by DARLIN mouse and the BMKMANU S3000 platform.

The preprocessing pipeline includes:

  • Lineage barcode identification and quality control
  • Spatial barcode parsing
  • Allele annotation
  • Grouping spots into segmented cells
  • Generating final clone-by-spots and clone-by-cells matrices

Requirements

  • Conda (for environment management)
  • MATLAB (must be available in command line interface)
  • Python 3.9
  • Snakemake 7.24.0
  • BSTMatrix (quantification pipeline for BMKMANU S3000)

Update: We provided alternative choice for allele annotation when matlab is unaccessible. Details in darlinpy.

Installation

1. Install BSTMatrix

cd /path/to/tools
# wget http://www.bmkmanu.com/wp-content/uploads/2024/07/BSTMatrix_v2.4.f.1.zip
## download latest version
wget http://www.bmkmanu.com/wp-content/uploads/2025/09/BSTMatrix_v2.4.f.4_release_20250902.zip -O BSTMatrix.zip
unzip BSTMatrix.zip
## conda env for BSTMatrix
cd BSTMatrix
conda env create -n BST-env -f environment.yaml

export PATH=/path/to/tools/BSTMatrix:$PATH

2. Create a conda environment for spatio_darlin

kernel_name='spatio_darlin'
conda create -n $kernel_name python=3.9 --yes
conda activate $kernel_name
conda install -c conda-forge -c bioconda snakemake=7.24.0 --yes
pip install jupyterlab umi_tools seaborn papermill biopython cutadapt
pip install numpy==1.24.4
python -m ipykernel install --user --name=$kernel_name
code_directory='.' # change it to the directory where you want to put the packages
cd $code_directory

# Install darlin (this repository)
# If you haven't already cloned this repo, run:
git clone https://github.com/JarningGau/spatio_DARLIN --depth=1
cd spatio_DARLIN  # or navigate to where you cloned this repository
python setup.py develop
cd ..

3. Install MATLAB code

Install the following dependencies in your desired code directory:

# Download MATLAB code Custom_CARLIN for allele annotation
mkdir -p CARLIN_pipeline
cd CARLIN_pipeline
git clone https://github.com/ShouWenWang-Lab/Custom_CARLIN --depth=1
cd ..

Note:

  1. Ensure MATLAB is installed and available in your command line interface (accessible via matlab command).
  2. If matlab is unaccessiable, darlinpy is alternative choice for allele calling.
pip install git+https://github.com/JarningGau/darlinpy.git

Usage

Quick Start (Run Test)

To test the pipeline with example data:

conda activate $kernel_name
cd test
bash download_bmk.sh

This will download the test data. After downloading, you can run the test pipeline:

# if matlab is accessible
bash test_bmk_matlab.sh 
# else
bash test_bmk.sh

Input Data Structure

The pipeline expects the following input data structure:

data/BMKS3000/
├── fastq/                    # Sequencing reads
│   ├── <sample>_<locus>_R1.fastq.gz
│   └── <sample>_<locus>_R2.fastq.gz
├── images/                   # Image files for BSTMatrix pipeline
│   ├── <sample>_FL.tif       # ssDNA, not neccessary when segmentation results are provided.
│   ├── <sample>_HE.tif       # HE
│   └── <sample>_HE.txt       # Encoding positions of spatial barcodes
└── segmentation/             # Cell segmentation results from BSTMatrix
    └── <sample>/
        ├── all_barcode_num.txt      # Spots -> cellbin relationship, obtained when perform spatial mRNA-seq data preprocessing.
        └── barcodes_pos.tsv.gz      # Spatial barcode positions

Input file descriptions:

  • FASTQ files: Paired-end sequencing reads. Naming convention: <sample>_<locus>_R1.fastq.gz and <sample>_<locus>_R2.fastq.gz, where <locus> can be CA, RA, or TA.
  • Image files: Required for BSTMatrix pipeline
    • <sample>_FL.tif: Fluorescence image (ssDNA)
    • <sample>_HE.tif: H&E stained image
    • <sample>_HE.txt: Image metadata
  • Segmentation files: Generated from BSTMatrix on mRNA data
    • all_barcode_num.txt: Maps spots to cell bins
    • barcodes_pos.tsv.gz: Spatial coordinates of barcodes

Configuration Files

Each analysis requires a YAML configuration file. The test directory contains example configs:

test_BMKS3000/
├── config-CA.yaml    # Configuration for CA locus
├── config-RA.yaml    # Configuration for RA locus
└── config-TA.yaml    # Configuration for TA locus

Configuration File Example

Below is an example configuration file with explanations:

# Sample list to process
SampleList: ['L0927_Brain']
# Template type: 'Tigre_2022_v2' (TA), 'Rosa_v2' (RA), or 'cCARLIN' (CA)
template: 'cCARLIN'
# Directory paths (relative to the config file location)
raw_fastq_dir: '../data/BMKS3000/fastq'
image_dir: '../data/BMKS3000/images'
segmentation_dir: '../data/BMKS3000/segmentation'
# Cutadapt parameters
cutadapt:
  base_quality_cutoff: 10
  threads: 8
# BSTMatrix parameters
BSTMatrix:
  threads: 8
# QC parameters
QC:
  ## Step1. Correct sequencing error (errorous nucleotides)
  LB_error_rate: 0.02
  ## Step2. Remove amplification artifacts (chimeric molecules)
  major_fraction_threshold_molecule: 0.8
  ## Step3. Remove capture-oligo carryover artifacts (fake spots)
  ## (SR) spots with k = reads/UMIs >= this value
  slope_cutoff: 10
  ## (SR+UR+LR) molecules with supported reads >= this value
  reads_cutoff: 10

Output Files

After a successful run (using the bundled test configs or your own), the workspace will resemble:

test_BMKS3000/
├── BST_config/      # BSTMatrix configuration files
├── BST_output/      # Outputs from BSTMatrix
├── config-*.yaml    # Input configs (CA/RA/TA)
├── cutadapt/        # Primer-trimmed FASTQs: reads1, spatial barcode + UMI; reads2, lineage barcode
├── DARLIN/          # Intermediate DARLIN pipeline products
├── outs/            # Aggregated results
└── slim_fastq/      # FASTQs for allele annotation

The final results live in test_BMKS3000/outs/:

test_BMKS3000/outs/
└── L0927_Brain_CA/
    ├── all.done
    ├── cellbin/        # Cell-bin level matrices
    ├── level_1         # spots-bin, level 1 matrices (3μm)
    ├── ...
    └── level_18        # spots-bin, level 18 matrices (99μm)
Level 18 9 7 6 5 4 3 2 1
Resolution (μm) 99 48 37 31 25 20 14 8 3

Running the Pipeline

To run the pipeline with your own data:

  1. Create a configuration file following the example above
  2. Ensure your input data follows the expected structure
  3. Run Snakemake:
conda activate $kernel_name
## When matlab is avaliable
snakemake --snakefile snakefiles/BMKS3000_matlab.smk --configfile <your_config.yaml> -c <cores>
## Otherwise
snakemake --snakefile snakefiles/BMKS3000.smk --configfile <your_config.yaml> -c <cores>

Replace <your_config.yaml> with the path to your configuration file and <cores> with the number of CPU cores to use.

Additional Resources

For upstream analysis of BMKMANU S3000 spatial transcriptomics data, see the upstream analysis documentation.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors