Skip to content

Latest commit

 

History

History
115 lines (83 loc) · 3.66 KB

File metadata and controls

115 lines (83 loc) · 3.66 KB

Getting started

This guide takes you from a fresh clone to a running pipeline.

Prerequisites

  • Linux host (the pipeline has been developed and tested on Linux; it will not run on macOS or Windows as-is).
  • Conda or Miniforge.
  • Nextflow >=23.04 on your PATH.
  • A ProSeg binary built for your architecture. MerXen calls it as an external subprocess.
  • Ample RAM. Defaults target a 75-CPU / 600 GB machine; segmentation alone reserves 500 GB by default. See Configuration to dial this down.
  • GPU strongly recommended for the Cellpose-SAM step (CPU fallback works but is very slow on full sections).

1. Install the Python environment

git clone <repo-url> MerXen
cd MerXen

conda env create -f environment.yml
conda activate merxen

The environment installs Python 3.12 and then pip install -e ".[dev]", which pulls every runtime and dev dependency from pyproject.toml and registers the merxen CLI entry point.

2. Install the pre-commit hooks

pre-commit install
pre-commit install --hook-type pre-push

The pre-commit hook runs ruff on every commit; the pre-push hook runs the test suite. See Development workflow.

3. Set environment variables

cp .env.example .env

Fill in at least:

Variable Description
PROSEG_BINARY Absolute path to the ProSeg binary.
MERXEN_OUTPUT_ROOT Directory to write pipeline outputs into.
MERXEN_MAX_RAM_GB System RAM in GB the pipeline is allowed to use (default 600).

.env is git-ignored. See Configuration for the full list and how these are consumed.

4. Sanity check: run the tests

pytest                          # fast tests
pytest -m "not slow"            # alias for the same thing
pytest --run-slow               # include integration tests

If pytest passes, your Python install is healthy.

5. Create a samplesheet

Copy the template and fill it in with your own dataset paths:

cp workflows/samplesheet.example.csv workflows/samplesheet.csv

Each row pairs one MERSCOPE folder with one Xenium folder. See the full schema in Samplesheet format.

6. Run the pipeline

nextflow run workflows/main.nf \
    --samplesheet workflows/samplesheet.csv \
    --outdir ./results \
    --proseg_binary "$PROSEG_BINARY"

Outputs land in ./results/<pair_id>/.... Nextflow also writes an HTML report, execution timeline, and trace TSV under ./results/nextflow/.

More on invocation options (resume, caching, force rebuild) in Running the pipeline. For an explanation of every directory and file produced, see Outputs.

Troubleshooting

merxen: command not found — activate the conda env: conda activate merxen. The merxen CLI is registered by pyproject.toml:44.

Proseg binary '...' not found or not executable — the --proseg_binary path is wrong, or the binary is not executable. chmod +x it, or rebuild from github.com/dcjones/proseg.

Missing required parameter: --samplesheet — you invoked nextflow run without --samplesheet or --proseg_binary. Both are required.

Out of memory — lower the per-process memory requests in workflows/nextflow.config and set MERXEN_MAX_RAM_GB accordingly.

Cellpose GPU errors — set --cellpose_gpu false to force CPU mode.