Skip to content

feat: add model= parameter to get_features for pre-loaded model reuse#114

Merged
raylim merged 3 commits intomainfrom
feat/get-features-preloaded-model
Apr 3, 2026
Merged

feat: add model= parameter to get_features for pre-loaded model reuse#114
raylim merged 3 commits intomainfrom
feat/get-features-preloaded-model

Conversation

@raylim
Copy link
Copy Markdown
Contributor

@raylim raylim commented Apr 3, 2026

Summary

Adds an optional model parameter to get_features(). When provided, the pre-loaded model instance is used directly and model_type/model_path are ignored. When model=None (default), behaviour is completely unchanged.

Motivation

Batch workflows that process many slides need to load a feature extractor once and reuse it. Without this parameter, get_features() reloads the model from disk on every call — for 10 slides with CTransPath + Optimus that's 20 unnecessary model loads.

The parameter was present in the mosaic-dev branch but was dropped when v1.3.0 was cut from main, breaking downstream callers (mosaic) that relied on it.

Changes

mussel/utils/feature_extract.pyget_features():

  • Add model=None parameter after model_path
  • Wrap model-loading logic in if model is None: block
  • Add else: logger.info("using pre-loaded model") branch
  • Move preprocessing = model.get_preprocessing_fun() outside the conditional (needed in both paths)
  • Document the new parameter in the docstring

Backward compatibility

Default is None, so all existing call-sites are unaffected.

Example usage

# Load once
factory = get_model_factory(ModelType.CTRANSPATH)
model = factory.get_model(model_path, use_gpu=True)

# Reuse across slides — no repeated disk I/O
for slide_path in slides:
    features, _ = get_features(coords, slide_path, attrs, model=model)

Adds an optional `model` parameter to `get_features()`. When provided,
the pre-loaded model instance is used directly and `model_type`/`model_path`
are ignored. When `model=None` (default), behaviour is unchanged.

This allows batch workflows to load a model once and reuse it across
many slides, avoiding repeated disk I/O and model initialisation overhead.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 3, 2026 15:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for reusing a pre-loaded feature-extractor model by introducing an optional model parameter to get_features(), enabling batch workflows to avoid repeated model loads.

Changes:

  • Add model=None parameter to get_features() to allow passing a pre-loaded model instance.
  • Conditionalize model-loading logic so it only runs when model is None.
  • Update get_features() docstring to document the new parameter.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 872 to 883
# Auto-set aggregation_method to "model" if slide_model_type is specified
if (
use_slide_encoder
and slide_model_type is not None
and aggregation_method != "model"
):
logger.info(
f"Auto-selecting patch encoder {required_patch_encoder} "
f"as required by slide encoder {slide_model_type}"
f"Auto-setting aggregation_method to 'model' since slide_model_type "
f"({slide_model_type}) is specified"
)
model_type = required_patch_encoder
# Validate compatibility
validate_slide_encoder_compatibility(model_type, slide_model_type)
aggregation_method = "model"

Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggregation_method auto-setting based on slide_model_type is now only executed when model is None. If a caller passes a pre-loaded model with use_slide_encoder=True and slide_model_type set, the function will no longer auto-switch aggregation_method to "model", changing behavior compared to the non-preloaded path. Consider running this auto-setting logic regardless of whether the patch model is preloaded.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — the aggregation_method auto-set block is now outside the if model is None: guard and runs unconditionally.

Comment on lines +884 to +899
# Auto-infer patch encoder from slide encoder if using model-based aggregation
if (
use_slide_encoder
and aggregation_method == "model"
and slide_model_type is not None
):
required_patch_encoder = get_required_patch_encoder(slide_model_type)
if model_type != required_patch_encoder:
logger.info(
f"Auto-selecting patch encoder {required_patch_encoder} "
f"as required by slide encoder {slide_model_type}"
)
model_type = required_patch_encoder
# Validate compatibility
validate_slide_encoder_compatibility(model_type, slide_model_type)

Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slide-encoder compatibility validation and required patch-encoder inference are also skipped when a pre-loaded model is provided, which can allow an incompatible patch encoder to slip through (leading to harder-to-debug failures later). Consider still calling validate_slide_encoder_compatibility(...) when use_slide_encoder + slide_model_type indicates a required patch encoder, even if you don't auto-load/override the patch model.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — validate_slide_encoder_compatibility is now called unconditionally whenever use_slide_encoder + aggregation_method="model" + slide_model_type are set, regardless of whether the patch model is pre-loaded. When loading from disk we still auto-infer model_type first; with a pre-loaded model we validate against the caller-supplied model_type.

Comment on lines 866 to 906
if model is None:
logger.info("loading model checkpoint")

# Auto-set aggregation_method to "model" if slide_model_type is specified
if (
use_slide_encoder
and slide_model_type is not None
and aggregation_method != "model"
):
logger.info(
f"Auto-setting aggregation_method to 'model' since slide_model_type "
f"({slide_model_type}) is specified"
)
aggregation_method = "model"
if gpu_device_ids:
gpu_device_id = gpu_device_ids

# Auto-infer patch encoder from slide encoder if using model-based aggregation
if (
use_slide_encoder
and aggregation_method == "model"
and slide_model_type is not None
):
required_patch_encoder = get_required_patch_encoder(slide_model_type)
if model_type != required_patch_encoder:
# Auto-set aggregation_method to "model" if slide_model_type is specified
if (
use_slide_encoder
and slide_model_type is not None
and aggregation_method != "model"
):
logger.info(
f"Auto-selecting patch encoder {required_patch_encoder} "
f"as required by slide encoder {slide_model_type}"
f"Auto-setting aggregation_method to 'model' since slide_model_type "
f"({slide_model_type}) is specified"
)
model_type = required_patch_encoder
# Validate compatibility
validate_slide_encoder_compatibility(model_type, slide_model_type)
aggregation_method = "model"

model_factory = get_model_factory(model_type)
if model_factory is None:
raise ValueError("model not recognized")
model = model_factory.get_model(model_path, use_gpu, gpu_device_id)
# Auto-infer patch encoder from slide encoder if using model-based aggregation
if (
use_slide_encoder
and aggregation_method == "model"
and slide_model_type is not None
):
required_patch_encoder = get_required_patch_encoder(slide_model_type)
if model_type != required_patch_encoder:
logger.info(
f"Auto-selecting patch encoder {required_patch_encoder} "
f"as required by slide encoder {slide_model_type}"
)
model_type = required_patch_encoder
# Validate compatibility
validate_slide_encoder_compatibility(model_type, slide_model_type)

model_factory = get_model_factory(model_type)
if model_factory is None:
raise ValueError("model not recognized")
model = model_factory.get_model(model_path, use_gpu, gpu_device_id)
else:
logger.info("using pre-loaded model")
preprocessing = model.get_preprocessing_fun()
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are existing unit tests for mussel.utils.feature_extract (e.g., tests/mussel/utils/test_feature_extract.py), but none cover the new model reuse path in get_features(). Please add tests that (1) ensure get_model_factory(...).get_model(...) is not called when model is provided and (2) confirm positional argument behavior remains correct for existing call patterns.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests/mussel/utils/test_feature_extract.py with 7 new tests covering: get_model_factory not called when model= provided; get_model_factory called when it isn't; positional args unchanged (batch_size not mis-interpreted); TypeError on bad model/slide_model interface; slide encoder factory not called when slide_model= provided; validate_slide_encoder_compatibility called even with pre-loaded patch model.

raylim and others added 2 commits April 3, 2026 09:03
…ly placement

- Move model= and new slide_model= to end of get_features() signature so
  positional callers are unaffected (addresses review comment)
- Add slide_model= to _apply_slide_aggregation() for consistency
- Move aggregation_method auto-set and compatibility validation outside
  the model-loading branch so they run for pre-loaded models too
- Validate pre-loaded model interfaces up-front with clear TypeError
- validate_slide_encoder_compatibility() now always called when
  use_slide_encoder + slide_model_type are set

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Covers:
- get_model_factory not called when model= provided
- get_model_factory called when model= not provided
- positional args unchanged (batch_size not interpreted as model)
- TypeError on invalid model/slide_model interface
- slide encoder factory not called when slide_model= provided
- validate_slide_encoder_compatibility called even with pre-loaded model

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@raylim raylim merged commit 8578f52 into main Apr 3, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants