feat: aggregate_sample_features saves .pt tensor alongside H5#112
feat: aggregate_sample_features saves .pt tensor alongside H5#112
Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends aggregate_sample_features to save a PyTorch .pt tensor alongside the existing per-sample HDF5 output, enabling downstream consumers (e.g., mussel-nf slide-level inputs) to consume features without an extra conversion step.
Changes:
- Add optional
.ptoutput toaggregate_sample_featuresviasave_pt(defaultTrue) andoutput_pt_suffix. - Expose
save_pt/output_pt_suffixthrough theaggregate_sample_featuresCLI config and pass-through. - Expand tests to assert
.ptoutputs exist and have expected shapes, plus add an opt-out test forsave_pt=False.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
mussel/utils/feature_extract.py |
Add optional .pt writing for aggregated sample features. |
mussel/cli/aggregate_sample_features.py |
Surface new .pt-output options in the CLI config and wire them into the call. |
tests/mussel/cli/test_aggregate_sample_features.py |
Add .pt existence/shape assertions and a save_pt=False test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| output_dir (str): Directory where output files are written. | ||
| output_h5_suffix (str): Filename suffix for H5 output files (default "features.h5"). | ||
| Each sample writes to "{output_dir}/{sample_id}.{output_h5_suffix}". | ||
| output_pt_suffix (str): Filename suffix for PT output files (default "features.pt"). | ||
| Only used when save_pt=True. | ||
| save_pt (bool): Whether to also save a PyTorch .pt tensor alongside the H5 | ||
| (default True). |
There was a problem hiding this comment.
The CLI help text in desc_doc below still says it “writes one output H5 per unique sample”, but the default behavior now also writes a .pt tensor unless save_pt=false. Please update the CLI description/help output to mention the additional PT file and how to opt out, so the CLI’s usage docs match the actual outputs.
There was a problem hiding this comment.
@copilot apply changes based on this feedback
There was a problem hiding this comment.
Done in a761faa. The desc_doc now says "writes one output H5 and one output PT tensor per unique sample" and includes an H5-only example using save_pt=false.
There was a problem hiding this comment.
Already addressed during the rebase onto #115 — desc_doc now reads "writes one output H5 and one output PT tensor per unique sample. To write only the H5 (no PT file), pass save_pt=false" and includes an H5-only example.
a761faa to
7381074
Compare
Add save_pt=True (default) and output_pt_suffix='features.pt' parameters to aggregate_sample_features and AggregateSampleFeaturesConfig. Motivation: downstream pipelines (mussel-nf MERGE_SAMPLE_FEATURES) need a .pt tensor for slide-level model compatibility. Previously callers had to manually convert H5 → PT; this makes it native. Changes: - mussel/utils/feature_extract.py: add save_pt, output_pt_suffix params; save torch.from_numpy(features) via save_torch_tensor when save_pt=True - mussel/cli/aggregate_sample_features.py: expose save_pt, output_pt_suffix in AggregateSampleFeaturesConfig and pass through to the function - tests: add PT output assertions to existing tests; add test_aggregate_sample_features_save_pt_false for opt-out path Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7381074 to
88d5881
Compare
desc_docinaggregate_sample_features.pyto mention H5 and PT tensor outputssave_pt=false) to CLI help textsave_pt,output_pt_suffix) to end of signature for backward compatibility (resolved)save_pt=False(resolved)