Mussel

This is a fork of Faisal Mahmood's CLAM repository (GPL v3 license), with a handful of modifications:

Added additional foundation models for generating embeddings
Added zero-shot tissue-type annotation of tiles
Added caching of images for inference right on the tiles (rather than on embeddings)
Added microns per pixel (mpp) as parameter for tiling, supported regardless of native slide resolution
Made usable for job submission (one script run, one slide)
Removed modeling
Updated the tiling algorithm

Installation

System requirements

Supported systems:

Mac OS (x86 and ARM) (cpu only)
Linux (x86) (cpu and gpu)

Supported slide formats

Mussel reads whole-slide images via tiffslide (backed by tifffile). The following formats are supported:

Extension	Format	Scanner / Vendor	Tiffslide support
`.svs`	Aperio SVS	Leica (Aperio)	✅ Full
`.scn`	Leica SCN	Leica	✅ Full
`.tif` / `.tiff`	TIFF, BigTIFF, OME-TIFF	Generic / various	✅ Full
`.ndpi`	Hamamatsu NDPI	Hamamatsu	⚠️ Partial — see notes
`.bif`	Ventana BIF	Roche (Ventana)	⚠️ Partial — see notes
`.mrxs`	MIRAX	3DHISTECH	⚠️ Generic TIFF — see notes
`.vms` / `.vmu`	Hamamatsu VMS / VMU	Hamamatsu	⚠️ Generic TIFF — see notes
`.qptiff`	PerkinElmer / Akoya QPTIFF	PerkinElmer / Akoya	⚠️ Generic TIFF — see notes
`.czi`	Carl Zeiss CZI	Zeiss	⚠️ Generic TIFF — see notes

Format support notes:

SVS, SCN, TIFF/BigTIFF/OME-TIFF — fully supported; tiffslide parses vendor metadata and reliably populates tiffslide.mpp-x.
NDPI — tiffslide's Hamamatsu parser is marked "only partially implemented"; MPP is read from standard TIFF resolution tags (tiff.XResolution / tiff.ResolutionUnit). Most Hamamatsu scanners embed resolution in TIFF tags, so this works in practice. If MPP is wrong or missing, use seg_config.slide_mpp_override.
BIF — tiffslide has no special Ventana parser; falls back to generic TIFF tag reading. Use seg_config.slide_mpp_override if MPP is not found automatically.
MRXS — tiffslide uses generic TIFF parsing. MRXS is a multi-file format: the .mrxs file must be accompanied by its sidecar directory (same name, no extension) in the same location; moving only the .mrxs file will cause a read error.
VMS / VMU — older Hamamatsu pyramid formats; treated as generic TIFF. These formats are uncommon on modern scanners; test before relying on them in production.
QPTIFF — PerkinElmer/Akoya format; treated as generic TIFF. Multiplex (multi-channel) QPTIFF files are supported for tiling but feature extraction uses the first channel only.
CZI — Zeiss format; tifffile provides CZI support. Multi-series CZI files (multiple acquisitions in one file) are supported but only the first series (index 0) is used.

MPP (microns per pixel) retrieval — Mussel reads MPP from slide metadata using the following fallback chain:

slide_mpp_override CLI parameter — if provided, used directly; all metadata reading is skipped
tiffslide.mpp-x — standard property populated by tiffslide for all supported formats
aperio.MPP / openslide.mpp-x — legacy vendor property names
tiff.XResolution + tiff.ResolutionUnit — raw TIFF resolution tags converted to µm/px; tiffslide exposes these for partially-supported formats (NDPI, BIF, MRXS, QPTIFF, CZI) even when it cannot normalize them to tiffslide.mpp-x
Magnification-based estimate: scans aperio.AppMag, openslide.objective-power, and tiffslide.objective-power; computes MPP as 10.0 / magnification
Configurable default (0.5 µm/px, typical for 20× TCGA slides) with a warning log

When slides lack MPP metadata and the default 0.5 µm/px doesn't match the actual scanner resolution, pass the known value explicitly:

tessellate slide_path=slide.svs seg_config.slide_mpp_override=1.0 ...
tessellate_extract_features slide_path=slide.svs seg_config.slide_mpp_override=0.25 ...
export_tiles slide_path=slide.svs slide_mpp_override=0.5 ...

Pre-requisites

uv

curl -LsSf https://astral.sh/uv/install.sh | sh

Create virtual environment and install packages

Model inference may require either PyTorch or TensorFlow, depending on which foundation models you wish to use. Because it can be challenging to satisfy the dependencies for both of those at the same time, you need to choose whether to install the module for PyTorch or for TensorFlow.

In addition, you can choose to install Mussel with or without GPU support. GPUs are necessary to run model inference for feature extraction or for generating class embeddings, but other operations can just run on cpus. (Technically, model inference can just run on cpus, as well, but it's very slow.)

PyTorch

Install PyTorch support first, then models are downloaded automatically on first use:

uv sync --extra torch-gpu   # GPU (CUDA) — recommended
uv sync --extra torch-cpu   # CPU only (Mac or CPU-only Linux)

PyTorch is required for the following patch encoders:

Model	`model_type`	Access	HuggingFace
ResNet-50	`RESNET50`	public	built-in (torchvision)
TransPath	`CTRANSPATH`	public	Xiyue-Wang/TransPath
OpenCLIP	`CLIP`	public	wisdomik/QuiltNet-B-16-PMB
Phikon	`PHIKON`	public	owkin/phikon
Phikon-v2	`PHIKON_V2`	public	owkin/phikon-v2
Midnight-12k	`MIDNIGHT12K`	public	kaiko-ai/midnight
Prov-GigaPath	`GIGAPATH`	🔒 gated	prov-gigapath/prov-gigapath
Virchow	`VIRCHOW`	🔒 gated	paige-ai/Virchow
Virchow2	`VIRCHOW2`	🔒 gated	paige-ai/Virchow2
H-Optimus-0	`OPTIMUS`	🔒 gated	bioptimus/H-optimus-0
H-Optimus-1	`H_OPTIMUS_1`	🔒 gated	bioptimus/H-optimus-1
H0-mini	`H0_MINI`	🔒 gated	bioptimus/H0-mini
UNI	`UNI`	🔒 gated	MahmoodLab/UNI
UNI2	`UNI2`	🔒 gated	MahmoodLab/UNI2-h
CONCH v1.5	`CONCH1_5`	🔒 gated	MahmoodLab/TITAN
GPFM	`GPFM`	public	majiabo/GPFM
Hibou-L	`HIBOU_L`	🔒 gated	histai/hibou-L
CONCH v1.0	`CONCH_V1`	🔒 gated	MahmoodLab/CONCH
Kaiko ViT-S/8	`KAIKO_VITS8`	public	1aurent/vit_small_patch8_224.kaiko_ai_towards_large_pathology_fms
Kaiko ViT-S/16	`KAIKO_VITS16`	public	1aurent/vit_small_patch16_224.kaiko_ai_towards_large_pathology_fms
Kaiko ViT-B/8	`KAIKO_VITB8`	public	1aurent/vit_base_patch8_224.kaiko_ai_towards_large_pathology_fms
Kaiko ViT-B/16	`KAIKO_VITB16`	public	1aurent/vit_base_patch16_224.kaiko_ai_towards_large_pathology_fms
Kaiko ViT-L/14	`KAIKO_VITL14`	public	1aurent/vit_large_patch14_reg4_224.kaiko_ai_towards_large_pathology_fms
Lunit DINO ViT-S/8	`LUNIT_VITS8`	public	1aurent/vit_small_patch8_224.lunit_dino
Lunit DINO ViT-S/16	`LUNIT_VITS16`	public	1aurent/vit_small_patch16_224.lunit_dino
OpenMidnight	`OPENMIDNIGHT`	🔒 gated	SophontAI/OpenMidnight
GenBio-PathFM	`GENBIO_PATHFM`	🔒 gated	genbio-ai/genbio-pathfm

And the following slide encoders (aggregate patch features into a single slide embedding):

Model	`model_type`	Patch encoder	Access	HuggingFace
Prov-GigaPath	`GIGAPATH_SLIDE`	`GIGAPATH`	🔒 gated	prov-gigapath/prov-gigapath
TITAN	`TITAN_SLIDE`	`CONCH1_5`	🔒 gated	MahmoodLab/TITAN
PRISM	`PRISM_SLIDE`	`VIRCHOW`	🔒 gated	paige-ai/Prism
FEATHER	`FEATHER_SLIDE`	`CONCH1_5`	🔒 gated	MahmoodLab/abmil.base.conch_v15.pc108-24k
MADELEINE	`MADELEINE_SLIDE`	`CLIP`	🔒 gated	MahmoodLab/madeleine
CHIEF	`CHIEF_SLIDE`	`CTRANSPATH`	⬇ access req.	hms-dbmi/CHIEF

🔒 Gated models require signing an access agreement on the HuggingFace model page and setting your token:

export HF_TOKEN=hf_...

⬇ Models requiring access request are downloaded automatically once access is granted:

Model	Request access	Notes
CHIEF (`CHIEF_SLIDE`)	Google Drive folder	Request via hms-dbmi/CHIEF; `gdown` downloads automatically on first use

TransPath (CTRANSPATH) and CHIEF (CHIEF_SLIDE) are downloaded automatically via gdown on first use (cached in the HuggingFace hub cache directory).

GenBio-PathFM (GENBIO_PATHFM) downloads its model architecture code from GitHub on first use and caches it at ~/.cache/mussel/genbio_pathfm/. The model weights are downloaded from HuggingFace (requires a token with access to genbio-ai/genbio-pathfm).

OpenMidnight (OPENMIDNIGHT) uses the DINOv2 ViT-G/14 architecture from the facebookresearch/dinov2 torch.hub repository. On first use, Mussel downloads the repository code and caches it at ~/.cache/torch/hub/facebookresearch_dinov2_main/. The model weights are downloaded from HuggingFace (requires a token with access to SophontAI/OpenMidnight).

TensorFlow

TensorFlow is required for GooglePath only:

Model	`model_type`	Access	HuggingFace
GooglePath	`GOOGLEPATH`	🔒 gated	google/path-foundation

uv sync --extra tensorflow-gpu   # GPU (CUDA)
uv sync --extra tensorflow-cpu   # CPU only (e.g. Mac)

Neural segmentation (`seg_model="neural"`)

Mussel includes built-in neural tissue segmentation using a DeepLabV3-ResNet50 model (2-class: tissue vs background) trained on histopathology slides as part of the HEST project at the Mahmood Lab, Harvard Medical School.

Pre-trained weights are hosted at MahmoodLab/hest-tissue-seg on HuggingFace and are downloaded automatically on first use (no account or token required). The model operates at 1 µm/px; Mussel handles resampling automatically.

Reference: Chan et al., "A Pathology Foundation Model for Cancer Diagnosis and Prognosis Prediction", Nature 2025. [paper] [GitHub] [HuggingFace model card]

No extra packages are required — it works with any torch-gpu or torch-cpu install:

uv sync --extra torch-gpu

Then pass seg_config.seg_model=neural to tessellate or tessellate_extract_features. A CUDA GPU is recommended for practical performance but CPU inference is supported.

Development Notes

Any commands executed using uv run <command...> are automatically executed in the project environment.
You can also explicitly activate the virtual environment created by uv by executing

source .venv/bin/activate

To install Mussel into an existing environment, activate that environment and use uv pip or conda to install one of Mussel[torch-gpu], Mussel[tensorflow-gpu], Mussel[torch-cpu], or Mussel[tensorflow-cpu] into that environment. (Here, Mussel would be replaced with the path to the Mussel repo you've checked out.)

(The example commands in README-commands.md all expect you to have a activated python environment, so that uv run isn't necessary.)

Modifying package requirements

Use uv sync --extra <extra-deps> to install this project and its dependencies into the project's virtual environment, where is one of torch-gpu, tensorflow-gpu, torch-cpu, or tensorflow-cpu
Execute uv sync --extra <extra-deps> after making any changes to the requirements.

uv sync --extra torch-gpu

Cloud/Remote slide processing

Mussel can process slides stored on the cloud or remote object stores via the tiffslide and fsspec packages. In order to properly configure mussel for this use case ensure that you:

Install additional packages via uv sync --extra remote
Have a valid cloud profile set up on your machine (e.g. you have an access key and secret key for your profile stored in your ~/.aws/credentials)
Have a valid configuration for fsspec defined in your configuration in ~/.config/fsspec/ directory (e.g. you have a ~/.config/fsspec/s3.json file with the profile set to the profile defined in ~/.aws/credentials and all required client_kwargs are specified)

Run unit tests

Make sure that the dev dependencies are installed. (They should be installed by default.)

uv run pytest tests

Create conda environment

To install this module into an existing Python environment, activate that environment and install mussel and its extra dependencies with the command, (for example)

uv pip install .[torch-gpu]

Command-line interface

Mussel provides a set of CLI tools for tiling whole-slide images, working with tiled slides, and generating feature embeddings with pathology foundation models. The tools currently available from Mussel are,

tessellate - tiling and foreground detection of whole-slide images
tessellate_extract_features - combined tiling + feature extraction pipeline; supports batch processing from a directory
extract_features - extract features from whole slide images (WSI) using a foundation model.
create_class_embeddings - generate tissue-type embeddings for classifying tiles
annotate - annotate tiles with tissue-types
cache_tiles - save tile information in an efficient form for training
export_tiles - export tiles as individual .png files using an HDF5 tile-coordinate manifest.
filter_features - filter features using a classifier model
merge_annotation_features - merge tile features with annotations from a BMP file.
linear_probe_benchmark - benchmark a linear probe classifier on features extracted from a slide
save_model - download and save a foundation model locally
convert - convert whole-slide images to pyramidal TIFF format (single file or batch)

These are described, with examples, in the accompanying document, README-commands.md

License

This code is made available under the GPLv3 License and is available for non-commercial academic purposes. Forked from CLAM, © Mahmood Lab.

Reference

Please cite the original CLAM paper:

Lu, M.Y., Williamson, D.F.K., Chen, T.Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biomed Eng 5, 555–570 (2021). https://doi.org/10.1038/s41551-020-00682-w

@article{lu2021data,
  title={Data-efficient and weakly supervised computational pathology on whole-slide images},
  author={Lu, Ming Y and Williamson, Drew FK and Chen, Tiffany Y and Chen, Richard J and Barbieri, Matteo and Mahmood, Faisal},
  journal={Nature Biomedical Engineering},
  volume={5},
  number={6},
  pages={555--570},
  year={2021},
  publisher={Nature Publishing Group}
}

Name		Name	Last commit message	Last commit date
Latest commit History 705 Commits
.github/workflows		.github/workflows
docs		docs
mussel		mussel
presets		presets
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
Makefile		Makefile
README-commands.md		README-commands.md
README.md		README.md
entrypoint.sh		entrypoint.sh
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mussel

Installation

System requirements

Supported slide formats

Pre-requisites

Create virtual environment and install packages

PyTorch

TensorFlow

Neural segmentation (`seg_model="neural"`)

Development Notes

Modifying package requirements

Cloud/Remote slide processing

Run unit tests

Create conda environment

Command-line interface

License

Reference

About

Uh oh!

Releases 8

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mussel

Installation

System requirements

Supported slide formats

Pre-requisites

Create virtual environment and install packages

PyTorch

TensorFlow

Neural segmentation (seg_model="neural")

Development Notes

Modifying package requirements

Cloud/Remote slide processing

Run unit tests

Create conda environment

Command-line interface

License

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Neural segmentation (`seg_model="neural"`)

Packages