Skip to content

Multiple tmp dirs created but only one used for file-backed dataset store #699

@yousefmoazzam

Description

@yousefmoazzam

This only applies to the case when the data processing has the dataset store backed by an hdf5 file.

In such a case, each process creates a temp dir to hold an hdf5 file:

httomo/httomo/cli.py

Lines 428 to 438 in 2acd1b2

if reslice_dir is None:
ctx = tempfile.TemporaryDirectory()
with ctx as tmp_dir:
runner = TaskRunner(
pipeline,
Path(tmp_dir),
global_comm,
monitor=mon,
memory_limit_bytes=memory_limit,
save_snapshots=save_snapshots,
)

but when the writer actually defines the hdf5 filepath, only rank 0's temp dir is used:

filename = self.comm.bcast(filename, root=0)

Meaning, each process creates a temp dir when really only rank 0 needs to create one.

This doesn't have any impact on the functionality, it's simply just a bit confusing when seeing multiple temp dirs created and only one is actually being used.

Metadata

Metadata

Assignees

No one assigned

    Labels

    frameworkData-handling framework relatedminorNice to do but not vital

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions