Support channel selection for multichannel CZI files#95
Conversation
Support channel selection for multichannel CZI files
|
|
||
| num_frames = len(time_stamps) | ||
|
|
||
| # CZI files may contain singleton dimensions and/or channel dimensions. | ||
| # The MPS convention is frames.shape == (x, y, time). | ||
| # | ||
| # For example, some CZI files are loaded as: | ||
| # (y, channel, x, time) | ||
| # or: | ||
| # (time, channel, y, x) | ||
| # | ||
| # This block detects the time axis from the number of timestamps, | ||
| # removes a channel axis if present, and then moves the time axis last. | ||
|
|
||
| if images.ndim < 3: | ||
| raise ValueError( | ||
| f"Unexpected CZI image shape {images.shape}. Expected at least a 3D movie." | ||
| ) | ||
|
|
||
| # Detect time axis using timestamps | ||
| time_axes = [i for i, size in enumerate(images.shape) if size == num_frames] | ||
|
|
||
| if len(time_axes) == 0: | ||
| raise ValueError( | ||
| f"Could not identify time axis in CZI file. " | ||
| f"Image shape is {images.shape}, but number of timestamps is {num_frames}." | ||
| ) | ||
|
|
||
| time_axis = time_axes[-1] | ||
|
|
||
| channel_axes = [ | ||
| i for i, size in enumerate(images.shape) if i != time_axis and size in (2, 3, 4) | ||
| ] | ||
|
|
||
| if images.ndim == 4: | ||
| if len(channel_axes) == 0: | ||
| raise ValueError( | ||
| f"Detected 4D CZI data with shape {images.shape}, " | ||
| "but could not identify a channel axis." | ||
| ) | ||
|
|
||
| channel_axis = channel_axes[0] | ||
| num_channels = images.shape[channel_axis] | ||
|
|
||
| if channel is None: | ||
| raise ValueError( | ||
| f"This CZI file contains {num_channels} channels. " | ||
| f"Available channels are 0 to {num_channels - 1}. " | ||
| "Please select one channel using the --channel option, for example: " | ||
| "mps2mp4 Control_1.czi --channel 1" | ||
| ) | ||
|
|
||
| if channel < 0 or channel >= num_channels: | ||
| raise ValueError( | ||
| f"Requested channel {channel}, but available channels are 0 to {num_channels - 1}." | ||
| ) | ||
|
|
||
| logger.info( | ||
| f"Detected multichannel CZI data with shape {images.shape}. " | ||
| f"Selecting channel {channel} from axis {channel_axis}." | ||
| ) | ||
|
|
||
| images = np.take(images, indices=channel, axis=channel_axis) | ||
|
|
||
| # Recompute time axis after removing the channel axis | ||
| time_axes = [i for i, size in enumerate(images.shape) if size == num_frames] | ||
| if len(time_axes) == 0: | ||
| raise ValueError( | ||
| f"Could not identify time axis after selecting channel. " | ||
| f"Image shape is {images.shape}, num_frames is {num_frames}." | ||
| ) | ||
| time_axis = time_axes[-1] | ||
|
|
||
| elif images.ndim != 3: | ||
| raise ValueError( | ||
| f"Unexpected CZI image shape {images.shape}. " | ||
| "Expected 3D data after squeezing or 4D data with channels." | ||
| ) | ||
|
|
||
| # Move time axis to the last axis | ||
| if time_axis != images.ndim - 1: | ||
| images = np.moveaxis(images, time_axis, -1) | ||
|
|
||
| # Now images should be spatial_x/spatial_y/time, but CZI often gives y/x/time. | ||
| # Match the MPS convention: frames.shape == (x, y, time) | ||
| frames = np.swapaxes(images, 0, 1) |
There was a problem hiding this comment.
This seems to be repeating the code in _format_czi_frames. Should we use this function instead?
|
|
||
|
|
||
| def load_file(fname, ext): | ||
| def load_file(fname, ext, channel=None): |
There was a problem hiding this comment.
Feel free to add some type annotations here (i.e fname should be Union[str, Path], ext should be str and channel should be Optional[int]).
|
|
||
|
|
||
| def test_format_czi_frames_selects_channel_from_4d_data(): | ||
| num_frames = 5 |
There was a problem hiding this comment.
This is a good example because your implementation for _format_czi_frames assumes that the channel has length between 0 and 10. A potential corner case here would be when one e.g have (y, channel, x, time) with num_frames = num_channels. Then the second axis will be treated as the time axis. I think the chances for this to happen is quite small, but perhaps one should do a check for this? At least a tests should be implemented that tests this corner case.
|
|
||
| from mps.load import _format_czi_frames | ||
|
|
||
|
|
There was a problem hiding this comment.
Also add a test for the standard 3D data, and it would also be good to add a test for the full load_czi function. Here you could e.g use a Mock for czifile.CziFile
|
Thanks @finsberg I have addressed the review comments, added the extra CZI tests, fixed the duplicated load_czi logic, and updated the type annotations/path handling. I have also run pytest and pre-commit locally. I requested a re-review now. |
| fname = Path(fname) | ||
| if ext == ".czi": | ||
| data = load_czi(fname) | ||
| data = load_czi(str(fname), channel=channel) |
There was a problem hiding this comment.
Why do we need to convert the filename to a string?
There was a problem hiding this comment.
Because load_file now converts fname to a Path object for type consistency, but the bundled czifile.CziFile does not seem to accept Path objects directly.
When I tested the real command:
mps2mp4 Control_1.czi --channel 1
passing the Path object to load_czi caused:
ValueError: The first parameter must be a file name, seekable binary stream, or FileHandle
Converting only the CZI path back to str before calling load_czi fixes this runtime issue while keeping the Path conversion for the other loaders.
|
|
||
| return MPSData( | ||
| frames=np.swapaxes(images, 0, -1), | ||
| frames=frames, |
There was a problem hiding this comment.
You are missing the swapaxes are you sure this is the same behavior as before?
There was a problem hiding this comment.
Good point. The swapaxes is now handled inside _format_czi_frames.
For the standard 3D CZI case (time, y, x), the previous code:
np.swapaxes(images, 0, -1)
produced (x, y, time).
The new helper first moves the detected time axis to the last position and then applies:
frames = np.swapaxes(images, 0, 1)
so the final output is still (x, y, time). I also added a standard 3D test to check this behavior.
| def test_load_czi_with_mocked_czifile(): | ||
| num_frames = 5 | ||
| y = 20 | ||
| x = 30 |
If the CZI file contains multiple channels and the user does not specify which channel to use, the loader raises a clear error message:
ValueError: This CZI file contains 2 channels. Available channels are 0 to 1. Please select one channel using the --channel option, for example: mps2mp4 Control_1.czi --channel 1
The user can then select any available channel explicitly, for example:
For single-channel CZI files, no channel argument is required, and the file is loaded normally without raising an error.
Issue #94