POLARIX: POLE Analysis through Rapid Image-based Screening in Endometrial Cancer

POLARIX is a deep learning model that predicts POLE mutation status directly from hematoxylin and eosin (H&E) whole slide images (WSIs) in endometrial cancer. It combines calibrated decision thresholds with explainable AI methods to enable molecular screening for endometrial cancer, even in resource-constrained environments.

The model has been validated across multiple independent cohorts with AUROC > 0.95.

Model Overview

Architecture: Attention-based Multiple Instance Learning (MIL)

Training data: Three randomized trials, six retrospective clinical cohorts, and two public datasets—the largest EC database compiled to date

Task: Binary classification (POLE-mutant vs POLE-wildtype)

Explainability: Attention heatmaps pinpoint morphologic features associated with POLE mutations

Output: Calibrated prediction scores with LOW, MID, and HIGH decision thresholds for flexible deployment

Installation

Set up a conda environment and install dependencies:

conda create -n polarix python=3.10
conda activate polarix
pip install -r requirements.txt

Usage

Preprocessing Whole-Slide Images with H-optimus-1

To extract tile-level features from WSIs for training and inference, use extract_features.py.

The model weights are downloaded directly from Hugging Face (bioptimus/H-optimus-1). You'll need to accept the model terms on Hugging Face and log in using huggingface-cli login, or provide a token via the HUGGINGFACE_TOKEN environment variable or --hf_token flag.

python -u ./extract_features.py \
  --slide <slide> \
  --output_dir <output_dir> \
  --batch_size 16 \
  --workers 4

Arguments:

--slide: Path to the input WSI (.mrxs, .tiff, .svs, etc.)
--output_dir: Directory to save .h5 feature files
--hf_token: Hugging Face token (optional; use if you don't have a cached login)
--tile_size: Tile size in micrometers or pixels
--batch_size: Number of tiles processed simultaneously
--workers: Number of data loading threads

Output: Feature files named slideID_features.h5

Training POLARIX

Train the model using extracted feature bags and a manifest CSV. Feature embeddings must be 1536-dimensional (matching H-optimus-1 output).

Your manifest should look like:

slide_id,label,split
case001,1,train
case002,0,train
case003,1,val

The train.py script is set up so that you can evaluate multiple hyperparameter sets in parallel simply by running the script with a different --hp parameter.

Example command:

python train.py \
  --manifest manifest.csv \
  --data_dir data/hoptimus1_features/ \
  --workers 4 \
  --hp 1 \
  --output_dir runs/polarix

Key arguments:

--manifest: CSV with slides, labels, and splits
--data_dir: Folder containing .h5 feature files
--hp: Hyperparameter ID
--output_dir: Destination for checkpoints, predictions, and TensorBoard logs (defaults to ./runs/final)

Model checkpoints, Platt scaler artifacts, predictions, and TensorBoard logs are saved to --output_dir.

Inference

Generate POLE mutation predictions and calibrated probabilities:

python ./inference.py \
  --manifest_test manifest_test.csv \
  --checkpoint checkpoints/POLARIX.pt \
  --checkpoint_platt_model checkpoints/POLARIX_PlattScaler.pkl \
  --data_features_dir data/hoptimus1_features_180um_rawweight \
  --workers 4

Arguments:

--manifest_test: CSV listing slides for testing
--checkpoint: Path to trained model checkpoint
--checkpoint_platt_model: Platt scaler for calibration
--data_features_dir: Directory with feature .h5 files
--workers: Number of data-loading workers

Output is saved as predictions.csv with probabilities and calibrated scores.

Explainability

POLARIX uses attention-based interpretability to ensure predictions are grounded in biologically and histologically meaningful regions.

Visualize attention heatmaps for a specific slide (requires the corresponding feature file and model checkpoint):

python heatmap.py \
  --slide data/slides/<slide_name>.svs \
  --features data/hoptimus1_features/<slide_name>_features.h5 \
  --checkpoint checkpoints/polarix.pt \
  --output_dir results/heatmaps/

Set --features to the .h5 feature bag produced by extract_features.py for the same slide.

This command now also emits <slide_id>_tiles.jsonl and <slide_id>_tissue.geojson beside the rendered heatmap. Each line in <slide_id>_tiles.jsonl is a GeoJSON Feature containing the tile polygon and its attention score, and <slide_id>_tissue.geojson contains the tissue mask as a Polygon or MultiPolygon. These files can be loaded into e.g. QuPath-like tooling or used for downstream analyses.

Or use the demo script, which generates a composite image with the slide ID, prediction, clinical recommendation (NGS or Rule-Out), and attention heatmap:

python demo.py \
  --slide data/slides/<slide_name>.svs \
  --features data/hoptimus1_features/<slide_name>_features.h5 \
  --checkpoint checkpoints/polarix.pt \
  --checkpoint_platt_model checkpoints/polarix_platt.pkl \
  --output_dir results/heatmaps/

For the demo script, set --features to the .h5 feature bag produced by extract_features.py for the same slide.

Citation

van den Berg et al., "Deep Learning-Based Screening for POLE mutations on Histopathology Slides in Endometrial Cancer", medRxiv 2026.02.06.26345335; doi: https://doi.org/10.64898/2026.02.06.26345335.

Acknowledgements

POLARIX was developed through an international collaboration. We're grateful to all participating centers, patients, and colleagues who contributed to model development and validation.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
figures		figures
.gitignore		.gitignore
LICENSE		LICENSE
POLARIX.py		POLARIX.py
README.md		README.md
demo.py		demo.py
earlystop.py		earlystop.py
extract_features.py		extract_features.py
heatmap.py		heatmap.py
heatmap_utils.py		heatmap_utils.py
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

POLARIX: POLE Analysis through Rapid Image-based Screening in Endometrial Cancer

Model Overview

Installation

Usage

Preprocessing Whole-Slide Images with H-optimus-1

Training POLARIX

Inference

Explainability

Citation

Acknowledgements

About

Uh oh!

Contributors

Uh oh!

Languages

License

AIRMEC/POLARIX

Folders and files

Latest commit

History

Repository files navigation

POLARIX: POLE Analysis through Rapid Image-based Screening in Endometrial Cancer

Model Overview

Installation

Usage

Preprocessing Whole-Slide Images with H-optimus-1

Training POLARIX

Inference

Explainability

Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages