This guide takes you from a fresh clone to a running pipeline.
- Linux host (the pipeline has been developed and tested on Linux; it will not run on macOS or Windows as-is).
- Conda or Miniforge.
- Nextflow
>=23.04on yourPATH. - A ProSeg binary built for your architecture. MerXen calls it as an external subprocess.
- Ample RAM. Defaults target a 75-CPU / 600 GB machine; segmentation alone reserves 500 GB by default. See Configuration to dial this down.
- GPU strongly recommended for the Cellpose-SAM step (CPU fallback works but is very slow on full sections).
git clone <repo-url> MerXen
cd MerXen
conda env create -f environment.yml
conda activate merxenThe environment installs Python 3.12 and then pip install -e ".[dev]", which
pulls every runtime and dev dependency from pyproject.toml
and registers the merxen CLI entry point.
pre-commit install
pre-commit install --hook-type pre-pushThe pre-commit hook runs ruff on every commit; the pre-push hook runs the
test suite. See Development workflow.
cp .env.example .envFill in at least:
| Variable | Description |
|---|---|
PROSEG_BINARY |
Absolute path to the ProSeg binary. |
MERXEN_OUTPUT_ROOT |
Directory to write pipeline outputs into. |
MERXEN_MAX_RAM_GB |
System RAM in GB the pipeline is allowed to use (default 600). |
.env is git-ignored. See Configuration for the full list
and how these are consumed.
pytest # fast tests
pytest -m "not slow" # alias for the same thing
pytest --run-slow # include integration testsIf pytest passes, your Python install is healthy.
Copy the template and fill it in with your own dataset paths:
cp workflows/samplesheet.example.csv workflows/samplesheet.csvEach row pairs one MERSCOPE folder with one Xenium folder. See the full schema in Samplesheet format.
nextflow run workflows/main.nf \
--samplesheet workflows/samplesheet.csv \
--outdir ./results \
--proseg_binary "$PROSEG_BINARY"Outputs land in ./results/<pair_id>/.... Nextflow also writes an HTML
report, execution timeline, and trace TSV under ./results/nextflow/.
More on invocation options (resume, caching, force rebuild) in Running the pipeline. For an explanation of every directory and file produced, see Outputs.
merxen: command not found — activate the conda env:
conda activate merxen. The merxen CLI is registered by
pyproject.toml:44.
Proseg binary '...' not found or not executable — the --proseg_binary
path is wrong, or the binary is not executable. chmod +x it, or rebuild from
github.com/dcjones/proseg.
Missing required parameter: --samplesheet — you invoked nextflow run
without --samplesheet or --proseg_binary. Both are required.
Out of memory — lower the per-process memory requests in
workflows/nextflow.config and set
MERXEN_MAX_RAM_GB accordingly.
Cellpose GPU errors — set --cellpose_gpu false to force CPU mode.