feat: add model= parameter to get_features for pre-loaded model reuse#114

Merged

raylim merged 3 commits intomainfrom

feat/get-features-preloaded-model

Apr 3, 2026

Contributor

raylim commented Apr 3, 2026

Summary

Adds an optional model parameter to get_features(). When provided, the pre-loaded model instance is used directly and model_type/model_path are ignored. When model=None (default), behaviour is completely unchanged.

Motivation

Batch workflows that process many slides need to load a feature extractor once and reuse it. Without this parameter, get_features() reloads the model from disk on every call — for 10 slides with CTransPath + Optimus that's 20 unnecessary model loads.

The parameter was present in the mosaic-dev branch but was dropped when v1.3.0 was cut from main, breaking downstream callers (mosaic) that relied on it.

Changes

mussel/utils/feature_extract.py — get_features():

Add model=None parameter after model_path
Wrap model-loading logic in if model is None: block
Add else: logger.info("using pre-loaded model") branch
Move preprocessing = model.get_preprocessing_fun() outside the conditional (needed in both paths)
Document the new parameter in the docstring

Backward compatibility

Default is None, so all existing call-sites are unaffected.

Example usage

# Load once
factory = get_model_factory(ModelType.CTRANSPATH)
model = factory.get_model(model_path, use_gpu=True)

# Reuse across slides — no repeated disk I/O
for slide_path in slides:
    features, _ = get_features(coords, slide_path, attrs, model=model)


          feat: add model= parameter to get_features for pre-loaded model reuse

e6316e9

Adds an optional `model` parameter to `get_features()`. When provided,
the pre-loaded model instance is used directly and `model_type`/`model_path`
are ignored. When `model=None` (default), behaviour is unchanged.

This allows batch workflows to load a model once and reuse it across
many slides, avoiding repeated disk I/O and model initialisation overhead.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings

April 3, 2026 15:21

Copilot started reviewing on behalf of raylim

April 3, 2026 15:22

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

This PR adds support for reusing a pre-loaded feature-extractor model by introducing an optional model parameter to get_features(), enabling batch workflows to avoid repeated model loads.

Changes:

Add model=None parameter to get_features() to allow passing a pre-loaded model instance.
Conditionalize model-loading logic so it only runs when model is None.
Update get_features() docstring to document the new parameter.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mussel/utils/feature_extract.py Show resolved Hide resolved

mussel/utils/feature_extract.py Outdated

Comment on lines 872 to 883

    
                      # Auto-set aggregation_method to "model" if slide_model_type is specified

                      if (

                          use_slide_encoder

                          and slide_model_type is not None

                          and aggregation_method != "model"

                      ):

                          logger.info(

                              f"Auto-selecting patch encoder {required_patch_encoder} "

                              f"as required by slide encoder {slide_model_type}"

                              f"Auto-setting aggregation_method to 'model' since slide_model_type "

                              f"({slide_model_type}) is specified"

                          )

                          model_type = required_patch_encoder

                      # Validate compatibility

                      validate_slide_encoder_compatibility(model_type, slide_model_type)

                          aggregation_method = "model"

Copilot AI Apr 3, 2026

aggregation_method auto-setting based on slide_model_type is now only executed when model is None. If a caller passes a pre-loaded model with use_slide_encoder=True and slide_model_type set, the function will no longer auto-switch aggregation_method to "model", changing behavior compared to the non-preloaded path. Consider running this auto-setting logic regardless of whether the patch model is preloaded.

Copilot uses AI. Check for mistakes.

Contributor Author

raylim Apr 3, 2026

Fixed — the aggregation_method auto-set block is now outside the if model is None: guard and runs unconditionally.

mussel/utils/feature_extract.py Outdated

Comment on lines +884 to +899

+                      # Auto-infer patch encoder from slide encoder if using model-based aggregation
+                      if (
+                          use_slide_encoder
+                          and aggregation_method == "model"
+                          and slide_model_type is not None
+                      ):
+                          required_patch_encoder = get_required_patch_encoder(slide_model_type)
+                          if model_type != required_patch_encoder:
+                              logger.info(
+                                  f"Auto-selecting patch encoder {required_patch_encoder} "
+                                  f"as required by slide encoder {slide_model_type}"
+                              )
+                              model_type = required_patch_encoder
+                          # Validate compatibility
+                          validate_slide_encoder_compatibility(model_type, slide_model_type)

Copilot AI Apr 3, 2026

Slide-encoder compatibility validation and required patch-encoder inference are also skipped when a pre-loaded model is provided, which can allow an incompatible patch encoder to slip through (leading to harder-to-debug failures later). Consider still calling validate_slide_encoder_compatibility(...) when use_slide_encoder + slide_model_type indicates a required patch encoder, even if you don't auto-load/override the patch model.

Copilot uses AI. Check for mistakes.

Contributor Author

raylim Apr 3, 2026

Fixed — validate_slide_encoder_compatibility is now called unconditionally whenever use_slide_encoder + aggregation_method="model" + slide_model_type are set, regardless of whether the patch model is pre-loaded. When loading from disk we still auto-infer model_type first; with a pre-loaded model we validate against the caller-supplied model_type.

mussel/utils/feature_extract.py Show resolved Hide resolved

mussel/utils/feature_extract.py Outdated

Comment on lines 866 to 906

    
                  if model is None:

                      logger.info("loading model checkpoint")

                  # Auto-set aggregation_method to "model" if slide_model_type is specified

                  if (

                      use_slide_encoder

                      and slide_model_type is not None

                      and aggregation_method != "model"

                  ):

                      logger.info(

                          f"Auto-setting aggregation_method to 'model' since slide_model_type "

                          f"({slide_model_type}) is specified"

                      )

                      aggregation_method = "model"

                      if gpu_device_ids:

                          gpu_device_id = gpu_device_ids

                  # Auto-infer patch encoder from slide encoder if using model-based aggregation

                  if (

                      use_slide_encoder

                      and aggregation_method == "model"

                      and slide_model_type is not None

                  ):

                      required_patch_encoder = get_required_patch_encoder(slide_model_type)

                      if model_type != required_patch_encoder:

                      # Auto-set aggregation_method to "model" if slide_model_type is specified

                      if (

                          use_slide_encoder

                          and slide_model_type is not None

                          and aggregation_method != "model"

                      ):

                          logger.info(

                              f"Auto-selecting patch encoder {required_patch_encoder} "

                              f"as required by slide encoder {slide_model_type}"

                              f"Auto-setting aggregation_method to 'model' since slide_model_type "

                              f"({slide_model_type}) is specified"

                          )

                          model_type = required_patch_encoder

                      # Validate compatibility

                      validate_slide_encoder_compatibility(model_type, slide_model_type)

                          aggregation_method = "model"

                  model_factory = get_model_factory(model_type)

                  if model_factory is None:

                      raise ValueError("model not recognized")

                  model = model_factory.get_model(model_path, use_gpu, gpu_device_id)

                      # Auto-infer patch encoder from slide encoder if using model-based aggregation

                      if (

                          use_slide_encoder

                          and aggregation_method == "model"

                          and slide_model_type is not None

                      ):

                          required_patch_encoder = get_required_patch_encoder(slide_model_type)

                          if model_type != required_patch_encoder:

                              logger.info(

                                  f"Auto-selecting patch encoder {required_patch_encoder} "

                                  f"as required by slide encoder {slide_model_type}"

                              )

                              model_type = required_patch_encoder

                          # Validate compatibility

                          validate_slide_encoder_compatibility(model_type, slide_model_type)

                      model_factory = get_model_factory(model_type)

                      if model_factory is None:

                          raise ValueError("model not recognized")

                      model = model_factory.get_model(model_path, use_gpu, gpu_device_id)

                  else:

                      logger.info("using pre-loaded model")

                  preprocessing = model.get_preprocessing_fun()

Copilot AI Apr 3, 2026

There are existing unit tests for mussel.utils.feature_extract (e.g., tests/mussel/utils/test_feature_extract.py), but none cover the new model reuse path in get_features(). Please add tests that (1) ensure get_model_factory(...).get_model(...) is not called when model is provided and (2) confirm positional argument behavior remains correct for existing call patterns.

Copilot uses AI. Check for mistakes.

Contributor Author

raylim Apr 3, 2026

Added tests/mussel/utils/test_feature_extract.py with 7 new tests covering: get_model_factory not called when model= provided; get_model_factory called when it isn't; positional args unchanged (batch_size not mis-interpreted); TypeError on bad model/slide_model interface; slide encoder factory not called when slide_model= provided; validate_slide_encoder_compatibility called even with pre-loaded patch model.

raylim and others added 2 commits

April 3, 2026 09:03


          feat: add model/slide_model params, fix compat validation, keyword-on…

9200db3

…ly placement

- Move model= and new slide_model= to end of get_features() signature so
  positional callers are unaffected (addresses review comment)
- Add slide_model= to _apply_slide_aggregation() for consistency
- Move aggregation_method auto-set and compatibility validation outside
  the model-loading branch so they run for pre-loaded models too
- Validate pre-loaded model interfaces up-front with clear TypeError
- validate_slide_encoder_compatibility() now always called when
  use_slide_encoder + slide_model_type are set

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>


          test: add tests for model/slide_model pre-loaded params in get_features

5314bb8

Covers:
- get_model_factory not called when model= provided
- get_model_factory called when model= not provided
- positional args unchanged (batch_size not interpreted as model)
- TypeError on invalid model/slide_model interface
- slide encoder factory not called when slide_model= provided
- validate_slide_encoder_compatibility called even with pre-loaded model

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

raylim merged commit 8578f52 into main

4 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet