Function determining dataset store backing incorrectly assumes new numpy array is always created by reader

In the case of a section that is not the last section, a reader for section `n+1` will be created from the writer used in section `n`. The creation of a reader only causes a new numpy array to be created if there is non-zero padding required for section `n+1`: https://github.com/DiamondLightSource/httomo/blob/f9bbccb3e915f4edfa51f77d5ab6fc195e4d5592/httomo/data/dataset_store.py#L301-L303

https://github.com/DiamondLightSource/httomo/blob/f9bbccb3e915f4edfa51f77d5ab6fc195e4d5592/httomo/data/dataset_store.py#L514-L518

https://github.com/DiamondLightSource/httomo/blob/f9bbccb3e915f4edfa51f77d5ab6fc195e4d5592/httomo/data/dataset_store.py#L503-L506

However, the `determine_store_backing()` function assumes that the creation of the reader will always create a new numpy array, so it accounts for the size of the numpy array even though it'll only exist in the case of non-zero padding: https://github.com/DiamondLightSource/httomo/blob/f9bbccb3e915f4edfa51f77d5ab6fc195e4d5592/httomo/runner/dataset_store_backing.py#L174-L178

## Extra info

For some more info on why I think that there's no new numpy array created for the reader of section `n+1` when there is zero padding: when that case occurs, the reader's `self._data` attribute is assigned to the writer's `self._data` attribute (which is a numpy array) and nothing else will happen to the reader's `self._data`: https://github.com/DiamondLightSource/httomo/blob/f9bbccb3e915f4edfa51f77d5ab6fc195e4d5592/httomo/data/dataset_store.py#L284

Meaning, I think that in the case of zero padding, the reader of section `n+1` simply gets a reference to the numpy array from the writer of section `n` and nothing else (ie, no copy is made, no new array is created) so there's no reason for more memory to be allocated when creating the reader for section `n+1`.

Note that the above info is excluding the case of a reslice: in the case of a reslice, stuff will of course happen in the reslice algorithm to cause allocations, but that is separate from purely what the writer and reader are doing with the numpy arrays that represent the chunks associated with a section.

	def _exchange_neighbourhoods(self):
	# we have the core of the chunk in RAM, but without the padding are
	# so we construct the full area with padding in RAM and exchange with MPI

	self._data = self._extend_data_for_padding(self._data)

	def _extend_data_for_padding(self, core_data: np.ndarray) -> np.ndarray:
	padded_shape = list(self._chunk_shape)
	padded_shape[self.slicing_dim] += self._padding[0] + self._padding[1]
	padded_data = np.empty(padded_shape, self._data.dtype)

	return reduce_decorator(_non_last_section_in_pipeline)(
	memory_limit_bytes=memory_limit_bytes,
	write_chunk_bytes=current_chunk_bytes,
	read_chunk_bytes=next_chunk_bytes,
	)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function determining dataset store backing incorrectly assumes new numpy array is always created by reader #693

Extra info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	self._padding = (0, 0) if padding is None else padding
	if self._padding != (0, 0) and not source.is_file_based:
	self._exchange_neighbourhoods()

Function determining dataset store backing incorrectly assumes new numpy array is always created by reader #693

Description

Extra info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions