Getting started

This guide takes you from a fresh clone to a running pipeline.

Prerequisites

Linux host (the pipeline has been developed and tested on Linux; it will not run on macOS or Windows as-is).
Conda or Miniforge.
Nextflow >=23.04 on your PATH.
A ProSeg binary built for your architecture. MerXen calls it as an external subprocess.
Ample RAM. Defaults target a 75-CPU / 600 GB machine; segmentation alone reserves 500 GB by default. See Configuration to dial this down.
GPU strongly recommended for the Cellpose-SAM step (CPU fallback works but is very slow on full sections).

1. Install the Python environment

git clone <repo-url> MerXen
cd MerXen

conda env create -f environment.yml
conda activate merxen

The environment installs Python 3.12 and then pip install -e ".[dev]", which pulls every runtime and dev dependency from pyproject.toml and registers the merxen CLI entry point.

2. Install the pre-commit hooks

pre-commit install
pre-commit install --hook-type pre-push

The pre-commit hook runs ruff on every commit; the pre-push hook runs the test suite. See Development workflow.

3. Set environment variables

cp .env.example .env

Fill in at least:

Variable	Description
`PROSEG_BINARY`	Absolute path to the ProSeg binary.
`MERXEN_OUTPUT_ROOT`	Directory to write pipeline outputs into.
`MERXEN_MAX_RAM_GB`	System RAM in GB the pipeline is allowed to use (default 600).

.env is git-ignored. See Configuration for the full list and how these are consumed.

4. Sanity check: run the tests

pytest                          # fast tests
pytest -m "not slow"            # alias for the same thing
pytest --run-slow               # include integration tests

If pytest passes, your Python install is healthy.

5. Create a samplesheet

Copy the template and fill it in with your own dataset paths:

cp workflows/samplesheet.example.csv workflows/samplesheet.csv

Each row pairs one MERSCOPE folder with one Xenium folder. See the full schema in Samplesheet format.

6. Run the pipeline

nextflow run workflows/main.nf \
    --samplesheet workflows/samplesheet.csv \
    --outdir ./results \
    --proseg_binary "$PROSEG_BINARY"

Outputs land in ./results/<pair_id>/.... Nextflow also writes an HTML report, execution timeline, and trace TSV under ./results/nextflow/.

More on invocation options (resume, caching, force rebuild) in Running the pipeline. For an explanation of every directory and file produced, see Outputs.

Troubleshooting

merxen: command not found — activate the conda env: conda activate merxen. The merxen CLI is registered by pyproject.toml:44.

Proseg binary '...' not found or not executable — the --proseg_binary path is wrong, or the binary is not executable. chmod +x it, or rebuild from github.com/dcjones/proseg.

Missing required parameter: --samplesheet — you invoked nextflow run without --samplesheet or --proseg_binary. Both are required.

Out of memory — lower the per-process memory requests in workflows/nextflow.config and set MERXEN_MAX_RAM_GB accordingly.

Cellpose GPU errors — set --cellpose_gpu false to force CPU mode.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting started

Prerequisites

1. Install the Python environment

2. Install the pre-commit hooks

3. Set environment variables

4. Sanity check: run the tests

5. Create a samplesheet

6. Run the pipeline

Troubleshooting

FilesExpand file tree

getting-started.md

Latest commit

History

getting-started.md

File metadata and controls

Getting started

Prerequisites

1. Install the Python environment

2. Install the pre-commit hooks

3. Set environment variables

4. Sanity check: run the tests

5. Create a samplesheet

6. Run the pipeline

Troubleshooting