mschecht · danielsarj · Apr 16, 2026
diff --git a/docs/source/PBMC_1K_ATAC.md b/docs/source/PBMC_1K_ATAC.md
@@ -116,6 +116,8 @@ cellranger_atac:
   threads: 10
   mem_gb: 64
   runtime_minutes: 720  # max SLURM job runtime in minutes (default: 720 = 12 hours)
+  anndata_threads: 1
+  anndata_mem_gb: 16
   directories:
     LOGS_DIR: 00_LOGS
 doublet_detection:
@@ -199,19 +201,13 @@ sc-preprocess run --config-file pipeline_config.yaml --cores 1 --dag | dot -Tpng
 
 Here we will break down the meaning of each rule so you can keep track of what's going on. If you want more detail please refer to the [Pipeline Rules Reference](pipeline_rules.md).
 
-**cellranger_atac_count**: Runs the command [cellranger-atac count](https://www.10xgenomics.com/support/software/cell-ranger-atac/latest/analysis/running-pipelines/command-line-arguments#count) per capture, aligning ATAC reads to the reference genome and producing a peak-barcode matrix.
-
-**create_atac_anndata**: Converts data from the Cell Ranger ATAC output to a per-capture [AnnData object](https://anndata.readthedocs.io/en/latest/) (`.h5ad`), adding traceability metadata (`batch_id`, `capture_id`, `cell_id`).
-
-**cellranger_atac_aggr**: Runs [cellranger-atac aggr](https://www.10xgenomics.com/support/software/cell-ranger-atac/latest/analysis/running-pipelines/command-line-arguments#aggr) which aggregates all per-capture Cell Ranger ATAC outputs within a batch into a single normalized count matrix.
-
-**aggregate_atac_batch**: Merges all per-capture AnnData objects into a single batch-level `.h5ad` file, verifying `cell_id` uniqueness across captures.
-
-**run_scrublet**: Runs Scrublet doublet detection on each per-capture AnnData object, adding doublet scores and predictions to cell metadata.
-
-**enrich_atac_metadata**: Joins all downstream preprocessing metadata from doublet detection into the batch-level AnnData object.
-
-**all**: Final Snakemake rule that collects all expected outputs to ensure the full workflow is completed.
+* **cellranger_atac_count**: Runs the command [cellranger-atac count](https://www.10xgenomics.com/support/software/cell-ranger-atac/latest/analysis/running-pipelines/command-line-arguments#count) per capture, aligning ATAC reads to the reference genome and producing a peak-barcode matrix.
+* **create_atac_anndata**: Converts data from the Cell Ranger ATAC output to a per-capture [AnnData object](https://anndata.readthedocs.io/en/latest/) (`.h5ad`), adding traceability metadata (`batch_id`, `capture_id`, `cell_id`).
+* **cellranger_atac_aggr**: Runs [cellranger-atac aggr](https://www.10xgenomics.com/support/software/cell-ranger-atac/latest/analysis/running-pipelines/command-line-arguments#aggr) which aggregates all per-capture Cell Ranger ATAC outputs within a batch into a single normalized count matrix.
+* **aggregate_atac_batch**: Merges all per-capture AnnData objects into a single batch-level `.h5ad` file, verifying `cell_id` uniqueness across captures.
+* **run_scrublet**: Runs Scrublet doublet detection on each per-capture AnnData object, adding doublet scores and predictions to cell metadata.
+* **enrich_atac_metadata**: Joins all downstream preprocessing metadata from doublet detection into the batch-level AnnData object.
+* **all**: Final Snakemake rule that collects all expected outputs to ensure the full workflow is completed.
 
 ### Local Execution
 
@@ -479,27 +475,27 @@ grep -R "error" 1K_PBMC_ATAC_PROCESSED/00_LOGS
 
 The `.done` files are an internal checklist to keep track of a subset of rules that finished (don't worry about it unless you are a developer and want to contribute to the code base).
 
-`01_CELLRANGERATAC_COUNT/`
+* `01_CELLRANGERATAC_COUNT/`
 
 Here you will find all of the `Cell Ranger ATAC count` outputs for each individual capture.
 
-`02_CELLRANGERATAC_AGGR/`
+* `02_CELLRANGERATAC_AGGR/`
 
 This will be the aggregated count matrices across batches. In this tutorial there is only one capture so you won't find any processed data here.
 
-`03_ANNDATA/`
+* `03_ANNDATA/`
 
 Here you will find an `AnnData` object for every capture.
 
-`04_BATCH_OBJECTS/`
+* `04_BATCH_OBJECTS/`
 
 Batch-level `AnnData` object created by merging all per-capture objects from `03_ANNDATA/`. This is the aggregated, pre-metadata-enriched object — all cells from all captures in the batch are present, and `cell_id` uniqueness is verified. It does not yet contain doublet scores or demultiplexing results.
 
-`06_DOUBLET_DETECTION/`
+* `06_DOUBLET_DETECTION/`
 
 Doublet detection outputs from `Scrublet`.
 
-`07_FINAL/`
+* `07_FINAL/`
 
 The final enriched `AnnData` object with all preprocessing metadata joined in, ready for downstream analysis.
 

diff --git a/docs/source/PBMC_3k_multiome.md b/docs/source/PBMC_3k_multiome.md
@@ -94,7 +94,7 @@ The [3-column library CSV file](https://www.10xgenomics.com/support/software/cel
 
 ```bash
 FASTQ_DIR=$(realpath pbmc_unsorted_3k)
-sed -i "s|pbmc_unsorted_3k/gex|${FASTQ_DIR}/gex|; s|pbmc_unsorted_3k/atac|${FASTQ_DIR}/atac|" pbmc_unsorted_3k_library.csv
+sed -i "s|/path/to/fastqs/pbmc_unsorted_3k/gex|${FASTQ_DIR}/gex|; s|/path/to/fastqs/pbmc_unsorted_3k/atac|${FASTQ_DIR}/atac|" pbmc_unsorted_3k_library.csv
 ```
 
 Finish filling out the `pipeline_config.yaml` with paths to necessary files e.g. `libraries_list.tsv` and reference genome path:
@@ -112,6 +112,8 @@ cellranger_arc:
   reference: /path/to/refdata-cellranger-arc-GRCh38-2024-A # <- add the correct path!
   libraries: libraries_list.tsv
   normalize: none
+  anndata_threads: 1
+  anndata_mem_gb: 16
   directories:
     LOGS_DIR: 00_LOGS
   threads: 10
@@ -199,19 +201,13 @@ sc-preprocess run --config-file pipeline_config.yaml --cores 1 --dag | dot -Tpng
 
 Here we will break down the meaning of each rule so you can keep track of what's going on. If you want more detail please refer to the [Pipeline Rules Reference](pipeline_rules.md).
 
-**cellranger_arc_count**: Runs the command [cellranger-arc count](https://www.10xgenomics.com/support/software/cell-ranger-arc/latest/analysis/running-pipelines/command-line-arguments#count) per capture, aligning GEX and ATAC reads to the reference genome and producing a joint feature-barcode matrix.
-
-**create_arc_mudata**: Converts data from the Cell Ranger ARC output to per-capture [MuData object](https://mudata.readthedocs.io/stable/) (`.h5mu`) using the command [mu.read_10x_mtx()](https://muon.readthedocs.io/en/latest/api/generated/muon.read_10x_mtx.html), adding traceability metadata (`batch_id`, `capture_id`, `cell_id`).
-
-**cellranger_arc_aggr**: Runs [cellranger-arc aggr](https://www.10xgenomics.com/support/software/cell-ranger-arc/latest/analysis/running-pipelines/command-line-arguments#aggr) which aggregates all per-capture Cell Ranger ARC outputs within a batch into a single normalized count matrix.
-
-**aggregate_arc_batch**: Merges all per-capture MuData objects into a single batch-level `.h5mu` file, verifying `cell_id` uniqueness across captures.
-
-**run_scrublet**: Runs Scrublet doublet detection on the GEX modality of each per-capture MuData object, adding doublet scores and predictions to cell metadata.
-
-**enrich_arc_metadata**: Joins all downstream preprocessing metadata from demultiplexing and doublet detection into the batch-level MuData object.
-
-**all**: Final Snakemake rule that collects all expected outputs to ensure the full workflow is completed.
+* **cellranger_arc_count**: Runs the command [cellranger-arc count](https://www.10xgenomics.com/support/software/cell-ranger-arc/latest/analysis/running-pipelines/command-line-arguments#count) per capture, aligning GEX and ATAC reads to the reference genome and producing a joint feature-barcode matrix.
+* **create_arc_mudata**: Converts data from the Cell Ranger ARC output to per-capture [MuData object](https://mudata.readthedocs.io/stable/) (`.h5mu`) using the command [mu.read_10x_mtx()](https://muon.readthedocs.io/en/latest/api/generated/muon.read_10x_mtx.html), adding traceability metadata (`batch_id`, `capture_id`, `cell_id`).
+* **cellranger_arc_aggr**: Runs [cellranger-arc aggr](https://www.10xgenomics.com/support/software/cell-ranger-arc/latest/analysis/running-pipelines/command-line-arguments#aggr) which aggregates all per-capture Cell Ranger ARC outputs within a batch into a single normalized count matrix.
+* **aggregate_arc_batch**: Merges all per-capture MuData objects into a single batch-level `.h5mu` file, verifying `cell_id` uniqueness across captures.
+* **run_scrublet**: Runs Scrublet doublet detection on the GEX modality of each per-capture MuData object, adding doublet scores and predictions to cell metadata.
+* **enrich_arc_metadata**: Joins all downstream preprocessing metadata from demultiplexing and doublet detection into the batch-level MuData object.
+* **all**: Final Snakemake rule that collects all expected outputs to ensure the full workflow is completed.
 
 ### Local Execution
 
@@ -504,27 +500,27 @@ grep -R "error" 3K_PBMC_MULTIOME_PROCESSED/00_LOGS
 
 The `.done` files are an internal checklist to keep track of a subset of rules that finished (don't worry about it unless you are a developer and want to contribute to the code base).
 
-`01_CELLRANGERARC_COUNT/`
+* `01_CELLRANGERARC_COUNT/`
 
 Here you will find all of the `Cell Ranger count` outputs for each individual capture.
 
-`02_CELLRANGERARC_AGGR/`
+* `02_CELLRANGERARC_AGGR/`
 
 This will be the aggregated count matrices across batches. In this tutorial there is only one capture so you won't find any processed data here.
 
-`03_ANNDATA/`
+* `03_ANNDATA/`
 
 Here you will find `MuData` objects for every capture. In this case it will be Muon because multiome.
 
-`04_BATCH_OBJECTS/`
+* `04_BATCH_OBJECTS/`
 
 Batch-level `MuData` object created by merging all per-capture objects from `03_ANNDATA/`. This is the aggregated, pre-metadata-enriched object — all cells from all captures in the batch are present, and `cell_id` uniqueness is verified. It does not yet contain doublet scores or demultiplexing results.
 
-`06_DOUBLET_DETECTION/`
+* `06_DOUBLET_DETECTION/`
 
 Doublet detection outputs from `Scrublet`
 
-`07_FINAL/`
+`* 07_FINAL/`
 
 Next, we print the Snakemake command running under the hood for convenient debugging. The `--snakefile` path will reflect where `sc-preprocess` is installed in your environment — this is expected and you don't need to use this path directly.
 

diff --git a/docs/source/PBMC_GEX.md b/docs/source/PBMC_GEX.md
@@ -117,6 +117,8 @@ cellranger_gex:
   create-bam: true
   threads: 10
   mem_gb: 64
+  anndata_threads: 1
+  anndata_mem_gb: 32
   directories:
     LOGS_DIR: 00_LOGS
 doublet_detection:
@@ -201,19 +203,13 @@ sc-preprocess run --config-file pipeline_config.yaml --cores 1 --dag | dot -Tpng
 
 Here we will break down the meaning of each rule so you can keep track of what's going on. If you want more detail please refer to the [Pipeline Rules Reference](pipeline_rules.md).
 
-**cellranger_gex_count**: Runs the command [cellranger count](https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/running-pipelines/command-line-arguments#count) per capture, aligning GEX reads to the reference genome and producing a gene-barcode matrix.
-
-**create_gex_anndata**: Converts data from the Cell Ranger GEX output to a per-capture [AnnData object](https://anndata.readthedocs.io/en/latest/) (`.h5ad`) using `sc.read_10x_h5()`, adding traceability metadata (`batch_id`, `capture_id`, `cell_id`).
-
-**cellranger_gex_aggr**: Runs [cellranger aggr](https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/running-pipelines/command-line-arguments#aggr) which aggregates all per-capture Cell Ranger GEX outputs within a batch into a single normalized count matrix.
-
-**aggregate_gex_batch**: Merges all per-capture AnnData objects into a single batch-level `.h5ad` file, verifying `cell_id` uniqueness across captures.
-
-**run_scrublet**: Runs Scrublet doublet detection on each per-capture AnnData object, adding doublet scores and predictions to cell metadata.
-
-**enrich_gex_metadata**: Joins all downstream preprocessing metadata from demultiplexing and doublet detection into the batch-level AnnData object.
-
-**all**: Final Snakemake rule that collects all expected outputs to ensure the full workflow is completed.
+* **cellranger_gex_count**: Runs the command [cellranger count](https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/running-pipelines/command-line-arguments#count) per capture, aligning GEX reads to the reference genome and producing a gene-barcode matrix.
+* **create_gex_anndata**: Converts data from the Cell Ranger GEX output to a per-capture [AnnData object](https://anndata.readthedocs.io/en/latest/) (`.h5ad`) using `sc.read_10x_h5()`, adding traceability metadata (`batch_id`, `capture_id`, `cell_id`).
+* **cellranger_gex_aggr**: Runs [cellranger aggr](https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/running-pipelines/command-line-arguments#aggr) which aggregates all per-capture Cell Ranger GEX outputs within a batch into a single normalized count matrix.
+* **aggregate_gex_batch**: Merges all per-capture AnnData objects into a single batch-level `.h5ad` file, verifying `cell_id` uniqueness across captures.
+* **run_scrublet**: Runs Scrublet doublet detection on each per-capture AnnData object, adding doublet scores and predictions to cell metadata.
+* **enrich_gex_metadata**: Joins all downstream preprocessing metadata from demultiplexing and doublet detection into the batch-level AnnData object.
+* **all**: Final Snakemake rule that collects all expected outputs to ensure the full workflow is completed.
 
 ### Local Execution
 
@@ -494,27 +490,27 @@ grep -R "error" 1K_PBMC_GEX_PROCESSED/00_LOGS
 
 The `.done` files are an internal checklist to keep track of a subset of rules that finished (don't worry about it unless you are a developer and want to contribute to the code base).
 
-`01_CELLRANGERGEX_COUNT/`
+* `01_CELLRANGERGEX_COUNT/`
 
 Here you will find all of the `Cell Ranger count` outputs for each individual capture.
 
-`02_CELLRANGERGEX_AGGR/`
+* `02_CELLRANGERGEX_AGGR/`
 
 This will be the aggregated count matrices across batches. In this tutorial there is only one capture so you won't find any processed data here.
 
-`03_ANNDATA/`
+* `03_ANNDATA/`
 
 Here you will find an `AnnData` object for every capture.
 
-`04_BATCH_OBJECTS/`
+* `04_BATCH_OBJECTS/`
 
 Batch-level `AnnData` object created by merging all per-capture objects from `03_ANNDATA/`. This is the aggregated, pre-metadata-enriched object — all cells from all captures in the batch are present, and `cell_id` uniqueness is verified. It does not yet contain doublet scores or demultiplexing results.
 
-`06_DOUBLET_DETECTION/`
+* `06_DOUBLET_DETECTION/`
 
 Doublet detection outputs from `Scrublet`.
 
-`07_FINAL/`
+* `07_FINAL/`
 
 The final enriched `AnnData` object with all preprocessing metadata joined in, ready for downstream analysis.
 
@@ -577,3 +573,32 @@ Immediately visualize QC metrics:
 ```python
 sc.pl.violin(adata, ['total_counts', 'n_genes_by_counts', 'pct_counts_mt'], jitter=0.4, multi_panel=True)
 ```
+
+### Seurat
+
+The easiest way to load the final AnnData object in `07_FINAL/` into R to be analyzed with Seurat is by using the SeuratDisk package, as follows:
+
+```R
+library(Seurat)
+library(SeuratDisk)
+
+# Convert .h5ad to .h5seurat format
+Convert("1K_PBMC_GEX_PROCESSED/07_FINAL/1_gex.h5ad", dest = "h5seurat", overwrite = TRUE)
+
+# Load the converted file into a Seurat object
+seurat_obj <- LoadH5Seurat("1K_PBMC_GEX_PROCESSED/07_FINAL/1_gex.h5seurat")
+```
+
+A second option, which directly loads the AnnData file without creating an intermediate file, uses the zellkonverter and SingleCellExperiment packages. For example:
+
+```R
+library(Seurat)
+library(zellkonverter)
+library(SingleCellExperiment)
+
+# Read the .h5ad file as a SingleCellExperiment object
+sce <- readH5AD("1K_PBMC_GEX_PROCESSED/07_FINAL/1_gex.h5ad")
+
+# Convert to Seurat object
+seurat_obj <- as.Seurat(sce, counts = "X", data = NULL)
+```