Skip to content

feat: aggregate_sample_features saves .pt tensor alongside H5#112

Merged
raylim merged 1 commit intomainfrom
feat/aggregate-save-pt
Apr 3, 2026
Merged

feat: aggregate_sample_features saves .pt tensor alongside H5#112
raylim merged 1 commit intomainfrom
feat/aggregate-save-pt

Conversation

@raylim
Copy link
Copy Markdown
Contributor

@raylim raylim commented Apr 2, 2026

  • Update desc_doc in aggregate_sample_features.py to mention H5 and PT tensor outputs
  • Add opt-out example (save_pt=false) to CLI help text
  • Move new parameters (save_pt, output_pt_suffix) to end of signature for backward compatibility (resolved)
  • Handle stale PT file when save_pt=False (resolved)

Copilot AI review requested due to automatic review settings April 2, 2026 21:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends aggregate_sample_features to save a PyTorch .pt tensor alongside the existing per-sample HDF5 output, enabling downstream consumers (e.g., mussel-nf slide-level inputs) to consume features without an extra conversion step.

Changes:

  • Add optional .pt output to aggregate_sample_features via save_pt (default True) and output_pt_suffix.
  • Expose save_pt / output_pt_suffix through the aggregate_sample_features CLI config and pass-through.
  • Expand tests to assert .pt outputs exist and have expected shapes, plus add an opt-out test for save_pt=False.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
mussel/utils/feature_extract.py Add optional .pt writing for aggregated sample features.
mussel/cli/aggregate_sample_features.py Surface new .pt-output options in the CLI config and wire them into the call.
tests/mussel/cli/test_aggregate_sample_features.py Add .pt existence/shape assertions and a save_pt=False test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +24 to +30
output_dir (str): Directory where output files are written.
output_h5_suffix (str): Filename suffix for H5 output files (default "features.h5").
Each sample writes to "{output_dir}/{sample_id}.{output_h5_suffix}".
output_pt_suffix (str): Filename suffix for PT output files (default "features.pt").
Only used when save_pt=True.
save_pt (bool): Whether to also save a PyTorch .pt tensor alongside the H5
(default True).
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI help text in desc_doc below still says it “writes one output H5 per unique sample”, but the default behavior now also writes a .pt tensor unless save_pt=false. Please update the CLI description/help output to mention the additional PT file and how to opt out, so the CLI’s usage docs match the actual outputs.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a761faa. The desc_doc now says "writes one output H5 and one output PT tensor per unique sample" and includes an H5-only example using save_pt=false.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed during the rebase onto #115desc_doc now reads "writes one output H5 and one output PT tensor per unique sample. To write only the H5 (no PT file), pass save_pt=false" and includes an H5-only example.

@raylim raylim force-pushed the feat/aggregate-save-pt branch from a761faa to 7381074 Compare April 3, 2026 17:04
@raylim raylim changed the base branch from main to refactor/aggregate-sample-features-use-objects April 3, 2026 17:04
Add save_pt=True (default) and output_pt_suffix='features.pt' parameters
to aggregate_sample_features and AggregateSampleFeaturesConfig.

Motivation: downstream pipelines (mussel-nf MERGE_SAMPLE_FEATURES) need
a .pt tensor for slide-level model compatibility. Previously callers had
to manually convert H5 → PT; this makes it native.

Changes:
- mussel/utils/feature_extract.py: add save_pt, output_pt_suffix params;
  save torch.from_numpy(features) via save_torch_tensor when save_pt=True
- mussel/cli/aggregate_sample_features.py: expose save_pt, output_pt_suffix
  in AggregateSampleFeaturesConfig and pass through to the function
- tests: add PT output assertions to existing tests; add
  test_aggregate_sample_features_save_pt_false for opt-out path

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@raylim raylim changed the base branch from refactor/aggregate-sample-features-use-objects to main April 3, 2026 17:06
@raylim raylim force-pushed the feat/aggregate-save-pt branch from 7381074 to 88d5881 Compare April 3, 2026 17:06
@raylim raylim merged commit 46e4b23 into main Apr 3, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants