Home

Tutorial and development test dataset

This wiki will explain how to successfully run snakemake-run-cellranger and acts as a test dataset for any developers who want to contribute to the project. Please, read this tutorial carefully and if any problems arise, create a new issue so we can improve it for all users. If you have not installed our tool yet, please refer to the installation instructions on the main page.

Tutorial and test case for development for GEX, ATAC, and ARC workflow

In this section, we will run cellranger-snakemake using a test dataset from Cell Ranger (derived from fasta files are used by the internal testing tool for cellranger called cellranger testrun) to get new users ready to go as well as developers who need test cases for each of the workflow modes. Follow the steps to set up the test dataset, and run basic commands.

📌 Note: This section works the same for either GEX or ATAC and has a few modifications for ARC which we note below.

1. Launch the conda environment

If you haven't installed cellranger-snakemake already, please refer to the installation instructions on the main page or see our detailed conda environment setup guide below.

Activate the cellranger-snakemake conda environment with the following command:

conda activate snakemake8

2. Explore the help menus

Here is how you can check out the help menu for all positional arugments:

# Read about positional arguments
snakemake-run-cellranger --help

To learn more about a specific positional argument, include the argument and --help like this:

# Help menu for run positional argument
snakemake-run-cellranger run --help

3. Generate input files for test

The command snakemake-run-cellranger generate-test-data conveniently creates a directory containing all the input files necessary you need to run the test dataset.

cd cellranger-snakemake/tests

# Read about test data set
snakemake-run-cellranger generate-test-data -h

snakemake-run-cellranger generate-test-data GEX --output-dir 00_TEST_DATA_GEX
snakemake-run-cellranger generate-test-data ATAC --output-dir 00_TEST_DATA_ATAC
snakemake-run-cellranger generate-test-data ARC --output-dir 00_TEST_DATA_ARC

This should have produced the following file structure:

$ tree 00_TEST_DATA_GEX
00_TEST_DATA_GEX
├── HPC_profiles
│   └── config.yaml
├── libraries_list_gex.tsv
├── reference_gex.txt
└── test_config_gex.yaml

4. Input files

Let's walk through the input files necessary to run the workflow!

`config.yaml`

This YAML file contains all the bells and whistles needed to run the underlying snakemake workflow!

To generate the config files for a workflow customized to your data run this command below. It will interactively ask you which steps you plan on running and automatically produce a config file.

snakemake-run-cellranger init-config

You can also run this command to generate a default config yaml with every configuration available:

snakemake-run-cellranger init-config --get-default-config

For this tutorial, here is the test config yaml file:

$ cat 00_TEST_DATA_GEX/test_config_gex.yaml
project_name: test_gex
output_dir: test_output_gex
resources:
  mem_gb: 64
  tmpdir: ''
directories_suffix: none
cellranger_gex:
  enabled: true
  reference: /path/to/cellranger-9.0.1/external/cellranger_tiny_ref
  libraries: 00_TEST_DATA_GEX/libraries_list_gex.tsv
  chemistry: auto
  normalize: none
  create-bam: false
  threads: 10
  mem_gb: 64
demultiplexing:
  enabled: true
  method: vireo
  vireo:
    donors: 2
    cellsnp:
      vcf: /path/to/vcf/file.vcf.gz
      threads: 4
      min_maf: 0.0
      min_count: 1
      umi_tag: Auto
      cell_tag: CB
      gzip: true
doublet_detection:
  enabled: false
  method: scrublet
  scrublet:
    expected_doublet_rate: 0.06
    min_counts: 2
    min_cells: 3
celltype_annotation:
  enabled: false
  method: celltypist
  celltypist:
    model: Immune_All_Low.pkl
    majority_voting: false

`libraries_list.tsv`

This input file is a TSV file that contains the metadata and paths for your cellranger libraries. Here is the format:

batch	capture	sample	fastqs
A	1	ABC-A-1	path/to/data/GEX/fastqs/
A	2	IJK-A-2	path/to/data/GEX/fastqs/
B	1	XYZ-A-1	path/to/data/GEX/fastqs/

Column descriptions:

batch: batch ID for grouping captures
capture: capture identifier or lane on the 10X chip
sample: prefix of the filenames of FASTQs to select
fastqs: full path(s) to where the input FASTQ files are located - if providing multiple paths, separate them with commas.

Note: For the ARC workflow, the input file is a little bit different. You will need to create a tab-separated file that contains the metadata and paths for to cellranger ARC library csv files (files that contain paths the ATAC and GEX FASTQ files). This file, which we will call libraries_list_ARC.tsv during this tutorial, follows the following format:

batch	capture	CSV
A	1	path/to/data/ATAC/ARC_library.csv/
A	2	path/to/data/ATAC/ARC_library.csv/
A	3	path/to/data/ATAC/ARC_library.csv/

Column descriptions:

batch: batch ID for grouping captures
capture: capture identifier or lane on the 10X chip
CSV: path to ARC library CSV (contains paths to fastas for both GEX and ATAC)

`HPC_profiles/`

The HPC_profiles/ directory contains another config.yaml that configures the cloud computing and HPC infrastructure settings to help snakemake launch parallel jobs. This config would be the argument for snakemake --profile HPC_profiles. You can read more about it here. See section 7 for detailed usage.

For this test dataset, we made the default HPC profile config to be compatible with SLURM. However, you can install another executor to match you local HPC/cloud computing infrastructure.

$ cat 00_TEST_DATA_GEX/HPC_profiles/config.yaml
executor: slurm
jobs: 10
default-resources:
- slurm_account=pi-lbarreiro
- slurm_partition=lbarreiro-hm
- runtime=720
retries: 2
latency-wait: 60
printshellcmds: true
keep-going: true
rerun-incomplete: true

5. Run a dry run

Before you run the workflow it's a good idea to see how many jobs will be run to make sure your input files contain all the paths.

# Read about this command
snakemake-run-cellranger run -h

# Dry run
snakemake-run-cellranger run --config-file 00_TEST_DATA_GEX/test_config_gex.yaml --cores 1 --dry-run

You can also visualize this with a dag file:

# Generate workflow DAG
snakemake-run-cellranger run --config-file 00_TEST_DATA_GEX/test_config_gex.yaml --cores 1 --dag | dot -Tpng > dag.png

6. Run the tool!

# Remove previous test runs
rm -rf 1_L00*
rm -r test_output_gex

# Local execution
snakemake-run-cellranger run --config-file 00_TEST_DATA_GEX/test_config_gex.yaml --cores 1

The flag --snakemake-args passes and arguments after it directly to snakemake. Please note that this flag has to be the very last flag in the command:

# Local execution - add more arguments to snakemake
snakemake-run-cellranger run --config-file 00_TEST_DATA_GEX/test_config_gex.yaml --cores 1 --snakemake-args --jobs 2

7. Launching on HPC

To launch on the HPC, we will use the --snakemake-args command to pass additional arguments to snakemake to let it know we are going to use an HPC. The --snakemake-args must be the LAST argument and anything after it will be snakemake arguments passed directly to snakemake.

Note: If the directory gets locked, see the unlocking instructions in the FAQ section.

The argument we will be passing straight to snakemake will be --profile. The provides snakemake with a path to a configurgation file that contains parameters fro runnign the is workflow on an HPC or cloud computing environment. Run snakemake -h to read more detail.

The command snakemake-run-cellranger generate-test-data you ran above already produced a boiler plate config yaml file filled out for SLURM here:

$ cat 00_TEST_DATA_GEX/profiles/config.yaml
executor: 'slurm'
jobs: 1
default-resources:
- slurm_account=
- slurm_partition=
- mem_mb=
- runtime=
retries: 2
latency-wait: 60

You read about HPC executor functionality here. Fill out this config with HPC/cloud computing info that works for you! We made autogenerated an example for Slurm.

What is the difference between --cores and --jobs? The --cores command assigns the number of CPUs per jobs while the --jobs argument controls how many parallel jobs can be run at the same time.

snakemake-run-cellranger run --config-file 00_TEST_DATA_GEX/test_config_gex.yaml --cores 1 --snakemake-args --unlock

# HPC execution - `--cores all` tell snakemake to use the `threads` assigned to each rule.
snakemake-run-cellranger run --config-file 00_TEST_DATA_GEX/test_config_gex.yaml \
                             --cores all \
                             --snakemake-args --profile 00_TEST_DATA_GEX/HPC_profiles

FAQs

Unlocking a directory

Sometimes if a snakemake workflow ends prematurely, snakemake will lock the directory and you will get a message like this:

Error: Directory cannot be locked. This usually means that another Snakemake instance is running on this directory. Another possibility is that a previous run exited unexpectedly.

You can fix this by using the --additional-params command, which passes snakemake arguments directly to snakemake.

Run the following command to unlock the directory so you can restart the workflow.

snakemake-run-cellranger run --config-file 00_TEST_DATA_GEX/test_config_gex.yaml --cores 1 --snakemake-args --unlock

`

Developers notes

Input files for testing this workflow can be derived from the files used in cellranger testrun described here. You can extract these paths and generate a libraries_list.tsv for cellranger, cellranger-atac, or cellranger-arc like this:

For a complete tutorial on running these test datasets, see the tutorial section above.

snakemake-run-cellranger generate-test-data GEX --output-dir 00_TEST_DATA_GEX
snakemake-run-cellranger generate-test-data ATAC --output-dir 00_TEST_DATA_ATAC
snakemake-run-cellranger generate-test-data ARC --output-dir 00_TEST_DATA_ARC

Install conda environment from scratch!

1. Set up conda

You will need Miniconda to install the package.

To check if conda is installed properly, run this:

$ conda --version
conda 23.7.4

Once you have confirmed you have conda installed, run this command to make sure you are up-to-date:

conda update conda

Create the new environment

conda env remove --name snakemake8
conda create -n snakemake8 -c conda-forge -c bioconda -y python=3.12 mamba

Activate environment!

conda activate snakemake8

Install all the necessary packages!

mamba install -c conda-forge -c bioconda -y \
    snakemake \
    pandas \
    numpy \
    pyyaml \
    graphviz \
    bcftools \
    samtools

Install some packages via pip

pip install snakemake-executor-plugin-slurm

Install `cellranger-snakemake` into the environment

pip install -e .

Verify snakemake-run-cellranger installation:

snakemake-run-cellranger --help  # or your CLI entry point

Install and link Cell Ranger to the conda environment

Once installed, link Cell Ranger, Cell Ranger ATAC, Cell Ranger ARC to the conda env:

# Link Cell Ranger
SOURCE="/path/to/cellranger"
ENV_NAME="snakemake8"
TARGET="$(conda info --base)/envs/${ENV_NAME}/bin/cellranger"

ln -s "$SOURCE" "$TARGET"

# Link Cell Ranger ATAC
SOURCE="/path/to/cellranger-atac"
ENV_NAME="snakemake8"
TARGET="$(conda info --base)/envs/${ENV_NAME}/bin/cellranger-atac"

ln -s "$SOURCE" "$TARGET"

# Link Cell Ranger ARC
SOURCE="/path/to/cellranger-arc"
ENV_NAME="snakemake8"
TARGET="$(conda info --base)/envs/${ENV_NAME}/bin/cellranger-arc"

ln -s "$SOURCE" "$TARGET"

Check installations

# Check all Cell Ranger tools
snakemake-run-cellranger check-versions

# OR check specific workflow requirements
snakemake-run-cellranger check-versions --workflow GEX
snakemake-run-cellranger check-versions --workflow ATAC
snakemake-run-cellranger check-versions --workflow ARC

Export env yaml

# Export with exact versions
conda env export --name snakemake8 | grep -v "^prefix:" > environment.yaml

# Export without builds (more portable)
conda env export --name snakemake8 --no-builds | grep -v "^prefix:" > environment_portable.yaml

Generate consensus VCF from test data

cd /project/lbarreiro/USERS/mschechter/github/cellranger-snakemake/tests

# Make consensus VCF with bcftools

# Demux with cellsnp-lite + vireo
cellsnp-lite \
    -s test_output_gex/01_CELLRANGERGEX_COUNT/1_L001/outs/possorted_genome_bam.bam \
    -b test_output_gex/01_CELLRANGERGEX_COUNT/1_L001/outs/filtered_feature_bc_matrix/barcodes.tsv.gz \
    -O cellsnp_output_L001 \
    -R /project/lbarreiro/USERS/daniel/cellranger-snakemake/tests/L001.vcf.gz \
    -p 4 \
    --minMAF 0.0 \
    --minCOUNT 1 \
    --UMItag Auto \
    --cellTAG CB \
    --gzip

pip install -U vireoSNP
vireo -c cellsnp_output_L001 -N 2 -o vireo_output_L001_k2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Tutorial and development test dataset

Table of Contents

Tutorial and test case for development for GEX, ATAC, and ARC workflow

1. Launch the conda environment

2. Explore the help menus

3. Generate input files for test

4. Input files

`config.yaml`

`libraries_list.tsv`

`HPC_profiles/`

5. Run a dry run

6. Run the tool!

7. Launching on HPC

FAQs

Unlocking a directory

Developers notes

Install conda environment from scratch!

1. Set up conda

Create the new environment

Activate environment!

Install all the necessary packages!

Install some packages via pip

Install `cellranger-snakemake` into the environment

Install and link Cell Ranger to the conda environment

Check installations

Export env yaml

Generate consensus VCF from test data

Clone this wiki locally

Home

Tutorial and development test dataset

Table of Contents

Tutorial and test case for development for GEX, ATAC, and ARC workflow

1. Launch the conda environment

2. Explore the help menus

3. Generate input files for test

4. Input files

config.yaml

libraries_list.tsv

HPC_profiles/

5. Run a dry run

6. Run the tool!

7. Launching on HPC

FAQs

Unlocking a directory

Developers notes

Install conda environment from scratch!

1. Set up conda

Create the new environment

Activate environment!

Install all the necessary packages!

Install some packages via pip

Install cellranger-snakemake into the environment

Install and link Cell Ranger to the conda environment

Check installations

Export env yaml

Generate consensus VCF from test data

Clone this wiki locally

`config.yaml`

`libraries_list.tsv`

`HPC_profiles/`

Install `cellranger-snakemake` into the environment