In the case of a section that is not the last section, a reader for section n+1 will be created from the writer used in section n. The creation of a reader only causes a new numpy array to be created if there is non-zero padding required for section n+1:
|
self._padding = (0, 0) if padding is None else padding |
|
if self._padding != (0, 0) and not source.is_file_based: |
|
self._exchange_neighbourhoods() |
|
def _exchange_neighbourhoods(self): |
|
# we have the core of the chunk in RAM, but without the padding are |
|
# so we construct the full area with padding in RAM and exchange with MPI |
|
|
|
self._data = self._extend_data_for_padding(self._data) |
|
def _extend_data_for_padding(self, core_data: np.ndarray) -> np.ndarray: |
|
padded_shape = list(self._chunk_shape) |
|
padded_shape[self.slicing_dim] += self._padding[0] + self._padding[1] |
|
padded_data = np.empty(padded_shape, self._data.dtype) |
However, the determine_store_backing() function assumes that the creation of the reader will always create a new numpy array, so it accounts for the size of the numpy array even though it'll only exist in the case of non-zero padding:
|
return reduce_decorator(_non_last_section_in_pipeline)( |
|
memory_limit_bytes=memory_limit_bytes, |
|
write_chunk_bytes=current_chunk_bytes, |
|
read_chunk_bytes=next_chunk_bytes, |
|
) |
Extra info
For some more info on why I think that there's no new numpy array created for the reader of section n+1 when there is zero padding: when that case occurs, the reader's self._data attribute is assigned to the writer's self._data attribute (which is a numpy array) and nothing else will happen to the reader's self._data:
Meaning, I think that in the case of zero padding, the reader of section n+1 simply gets a reference to the numpy array from the writer of section n and nothing else (ie, no copy is made, no new array is created) so there's no reason for more memory to be allocated when creating the reader for section n+1.
Note that the above info is excluding the case of a reslice: in the case of a reslice, stuff will of course happen in the reslice algorithm to cause allocations, but that is separate from purely what the writer and reader are doing with the numpy arrays that represent the chunks associated with a section.
In the case of a section that is not the last section, a reader for section
n+1will be created from the writer used in sectionn. The creation of a reader only causes a new numpy array to be created if there is non-zero padding required for sectionn+1:httomo/httomo/data/dataset_store.py
Lines 301 to 303 in f9bbccb
httomo/httomo/data/dataset_store.py
Lines 514 to 518 in f9bbccb
httomo/httomo/data/dataset_store.py
Lines 503 to 506 in f9bbccb
However, the
determine_store_backing()function assumes that the creation of the reader will always create a new numpy array, so it accounts for the size of the numpy array even though it'll only exist in the case of non-zero padding:httomo/httomo/runner/dataset_store_backing.py
Lines 174 to 178 in f9bbccb
Extra info
For some more info on why I think that there's no new numpy array created for the reader of section
n+1when there is zero padding: when that case occurs, the reader'sself._dataattribute is assigned to the writer'sself._dataattribute (which is a numpy array) and nothing else will happen to the reader'sself._data:httomo/httomo/data/dataset_store.py
Line 284 in f9bbccb
Meaning, I think that in the case of zero padding, the reader of section
n+1simply gets a reference to the numpy array from the writer of sectionnand nothing else (ie, no copy is made, no new array is created) so there's no reason for more memory to be allocated when creating the reader for sectionn+1.Note that the above info is excluding the case of a reslice: in the case of a reslice, stuff will of course happen in the reslice algorithm to cause allocations, but that is separate from purely what the writer and reader are doing with the numpy arrays that represent the chunks associated with a section.