Spatio_DARLIN

This is a Snakemake pipeline for automated preprocessing of spatial lineage tracing data from SpatioDARLIN. This repository is a fork of snakemake_DARLIN, modified to support spatial lineage tracing data generated by DARLIN mouse and the BMKMANU S3000 platform.

The preprocessing pipeline includes:

Lineage barcode identification and quality control
Spatial barcode parsing
Allele annotation
Grouping spots into segmented cells
Generating final clone-by-spots and clone-by-cells matrices

Requirements

Conda (for environment management)
MATLAB (must be available in command line interface)
Python 3.9
Snakemake 7.24.0
BSTMatrix (quantification pipeline for BMKMANU S3000)

Update: We provided alternative choice for allele annotation when matlab is unaccessible. Details in darlinpy.

Installation

1. Install BSTMatrix

cd /path/to/tools
# wget http://www.bmkmanu.com/wp-content/uploads/2024/07/BSTMatrix_v2.4.f.1.zip
## download latest version
wget http://www.bmkmanu.com/wp-content/uploads/2025/09/BSTMatrix_v2.4.f.4_release_20250902.zip -O BSTMatrix.zip
unzip BSTMatrix.zip
## conda env for BSTMatrix
cd BSTMatrix
conda env create -n BST-env -f environment.yaml

export PATH=/path/to/tools/BSTMatrix:$PATH

2. Create a conda environment for spatio_darlin

kernel_name='spatio_darlin'
conda create -n $kernel_name python=3.9 --yes
conda activate $kernel_name
conda install -c conda-forge -c bioconda snakemake=7.24.0 --yes
pip install jupyterlab umi_tools seaborn papermill biopython cutadapt
pip install numpy==1.24.4
python -m ipykernel install --user --name=$kernel_name

code_directory='.' # change it to the directory where you want to put the packages
cd $code_directory

# Install darlin (this repository)
# If you haven't already cloned this repo, run:
git clone https://github.com/JarningGau/spatio_DARLIN --depth=1
cd spatio_DARLIN  # or navigate to where you cloned this repository
python setup.py develop
cd ..

3. Install MATLAB code

Install the following dependencies in your desired code directory:

# Download MATLAB code Custom_CARLIN for allele annotation
mkdir -p CARLIN_pipeline
cd CARLIN_pipeline
git clone https://github.com/ShouWenWang-Lab/Custom_CARLIN --depth=1
cd ..

Note:

Ensure MATLAB is installed and available in your command line interface (accessible via matlab command).
If matlab is unaccessiable, darlinpy is alternative choice for allele calling.

pip install git+https://github.com/JarningGau/darlinpy.git

Usage

Quick Start (Run Test)

To test the pipeline with example data:

conda activate $kernel_name
cd test
bash download_bmk.sh

This will download the test data. After downloading, you can run the test pipeline:

# if matlab is accessible
bash test_bmk_matlab.sh 
# else
bash test_bmk.sh

Input Data Structure

The pipeline expects the following input data structure:

data/BMKS3000/
├── fastq/                    # Sequencing reads
│   ├── <sample>_<locus>_R1.fastq.gz
│   └── <sample>_<locus>_R2.fastq.gz
├── images/                   # Image files for BSTMatrix pipeline
│   ├── <sample>_FL.tif       # ssDNA, not neccessary when segmentation results are provided.
│   ├── <sample>_HE.tif       # HE
│   └── <sample>_HE.txt       # Encoding positions of spatial barcodes
└── segmentation/             # Cell segmentation results from BSTMatrix
    └── <sample>/
        ├── all_barcode_num.txt      # Spots -> cellbin relationship, obtained when perform spatial mRNA-seq data preprocessing.
        └── barcodes_pos.tsv.gz      # Spatial barcode positions

Input file descriptions:

FASTQ files: Paired-end sequencing reads. Naming convention: <sample>_<locus>_R1.fastq.gz and <sample>_<locus>_R2.fastq.gz, where <locus> can be CA, RA, or TA.
Image files: Required for BSTMatrix pipeline
- <sample>_FL.tif: Fluorescence image (ssDNA)
- <sample>_HE.tif: H&E stained image
- <sample>_HE.txt: Image metadata
Segmentation files: Generated from BSTMatrix on mRNA data
- all_barcode_num.txt: Maps spots to cell bins
- barcodes_pos.tsv.gz: Spatial coordinates of barcodes

Configuration Files

Each analysis requires a YAML configuration file. The test directory contains example configs:

test_BMKS3000/
├── config-CA.yaml    # Configuration for CA locus
├── config-RA.yaml    # Configuration for RA locus
└── config-TA.yaml    # Configuration for TA locus

Configuration File Example

Below is an example configuration file with explanations:

# Sample list to process
SampleList: ['L0927_Brain']
# Template type: 'Tigre_2022_v2' (TA), 'Rosa_v2' (RA), or 'cCARLIN' (CA)
template: 'cCARLIN'
# Directory paths (relative to the config file location)
raw_fastq_dir: '../data/BMKS3000/fastq'
image_dir: '../data/BMKS3000/images'
segmentation_dir: '../data/BMKS3000/segmentation'
# Cutadapt parameters
cutadapt:
  base_quality_cutoff: 10
  threads: 8
# BSTMatrix parameters
BSTMatrix:
  threads: 8
# QC parameters
QC:
  ## Step1. Correct sequencing error (errorous nucleotides)
  LB_error_rate: 0.02
  ## Step2. Remove amplification artifacts (chimeric molecules)
  major_fraction_threshold_molecule: 0.8
  ## Step3. Remove capture-oligo carryover artifacts (fake spots)
  ## (SR) spots with k = reads/UMIs >= this value
  slope_cutoff: 10
  ## (SR+UR+LR) molecules with supported reads >= this value
  reads_cutoff: 10

Output Files

After a successful run (using the bundled test configs or your own), the workspace will resemble:

test_BMKS3000/
├── BST_config/      # BSTMatrix configuration files
├── BST_output/      # Outputs from BSTMatrix
├── config-*.yaml    # Input configs (CA/RA/TA)
├── cutadapt/        # Primer-trimmed FASTQs: reads1, spatial barcode + UMI; reads2, lineage barcode
├── DARLIN/          # Intermediate DARLIN pipeline products
├── outs/            # Aggregated results
└── slim_fastq/      # FASTQs for allele annotation

The final results live in test_BMKS3000/outs/:

test_BMKS3000/outs/
└── L0927_Brain_CA/
    ├── all.done
    ├── cellbin/        # Cell-bin level matrices
    ├── level_1         # spots-bin, level 1 matrices (3μm)
    ├── ...
    └── level_18        # spots-bin, level 18 matrices (99μm)

Level	18	9	7	6	5	4	3	2	1
Resolution (μm)	99	48	37	31	25	20	14	8	3

Running the Pipeline

To run the pipeline with your own data:

Create a configuration file following the example above
Ensure your input data follows the expected structure
Run Snakemake:

conda activate $kernel_name
## When matlab is avaliable
snakemake --snakefile snakefiles/BMKS3000_matlab.smk --configfile <your_config.yaml> -c <cores>
## Otherwise
snakemake --snakefile snakefiles/BMKS3000.smk --configfile <your_config.yaml> -c <cores>

Replace <your_config.yaml> with the path to your configuration file and <cores> with the number of CPU cores to use.

Additional Resources

For upstream analysis of BMKMANU S3000 spatial transcriptomics data, see the upstream analysis documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spatio_DARLIN

Requirements

Installation

1. Install BSTMatrix

2. Create a conda environment for spatio_darlin

3. Install MATLAB code

Usage

Quick Start (Run Test)

Input Data Structure

Configuration Files

Configuration File Example

Output Files

Running the Pipeline

Additional Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
QC		QC
bin		bin
darlin		darlin
doc		doc
snakefiles		snakefiles
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Spatio_DARLIN

Requirements

Installation

1. Install BSTMatrix

2. Create a conda environment for spatio_darlin

3. Install MATLAB code

Usage

Quick Start (Run Test)

Input Data Structure

Configuration Files

Configuration File Example

Output Files

Running the Pipeline

Additional Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages