Skip to content

WSI inference crashes at ~76-100% of predicted tiles due to SlideLoader subprocess teardown destabilizing the multiprocessing Manager #22

@camillaelbaek

Description

@camillaelbaek

Environment
Python 3.13.13
Platform: Linux HPC (SLURM)
Single GPU inference

Description
When running classpose-predict-wsi on large WSI files (~8600 tiles), the process consistently crashes after tile loading completes but while the worker is still processing tiles. The crash produces no Python traceback and exits with code 143 (SIGTERM) or 1, always at the same relative progress point (~76-85% of predicted tiles).

Root cause
The SlideLoader runs as a subprocess sharing a multiprocessing.Manager server with the main process and PostProcessor. When SlideLoader finishes filling the queue and its subprocess exits naturally, Python cleanup of the Manager proxies it holds destabilizes the shared Manager server. The worker (running in the main process) is still actively using Manager-proxied objects (slide.n, predicted_tiles_value, pp.q) at this point, causing a silent crash.

Workaround
Pre-loading all tiles into a local list before starting inference eliminates the concurrency between SlideLoader and the worker:

#After slide initialization, drain the queue fully before starting inference
all_tiles = []
while True:
    item = slide.q.get()
    if item[0] is None:
        break
    all_tiles.append(item)
slide.p.join()  # SlideLoader is fully done before worker starts

Then feed all_tiles into a plain queue.Queue for the worker. This ensures the SlideLoader subprocess is completely finished before inference begins, so its teardown cannot affect the Manager.

Additional notes

The pp.polygons.empty() check in the polygon collection loop is also unreliable for managed queues on large slides (can return True prematurely with ~250k cells). Draining by count (pp.value.value) or writing directly to a file from the PostProcessor subprocess is more robust.
Reproducible across multiple different SVS files and hardware configurations.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions