fix(vit): load the TRT session lazily, not in start() (PyCUDA thread-locality SIGABRT) by gwbischof · Pull Request #48 · NSLS2/holoptycho

Garrett Bischof (gwbischof) · 2026-06-09T20:18:37Z

Found by a detailed main-vs-#37 behavioral audit: the decomposition reproduced all of #37's behavior except one fix.

The gap

PtychoViTInferenceOp.start() on main eagerly loads the TRT/PyCUDA session (pre-#37 behavior, added for first-batch latency). PyCUDA contexts are thread-local, so a context created in start() (framework startup thread) is unavailable in compute() (MultiThreadScheduler worker thread) → SIGABRT in predict().

#37 deliberately reverted this to a lazy load (start() = pass; _compute_inner loads on the first batch in the worker thread). That change lives in PtychoViTInferenceOp, which the decomposition's H2/H5b edited — the start() revert was simply missed.

Fix

Make start() a no-op. _compute_inner already has the if self._session is None: self._init_session() guard, so the engine loads on batch 0 in the correct (compute) thread. start() now matches #37 verbatim.

Not the same as #43

#43 fixed the vit-only-mode SIGABRT — not creating the iterative engine's CuPy context on the ViT GPU. This is a distinct crash: the ViT session's own PyCUDA context being created on the wrong thread, which affects both/vit runs regardless.

Testing

Compose+runtime Holoscan behavior (not unit-testable in CI — the smoke import already fails on a missing TILED_BASE_URL). start() body is byte-identical to #37's; the lazy-load path in _compute_inner is the existing mechanism. Suite unaffected (25 of the fast tests pass; full suite = the usual 2 pre-existing TILED smoke failures).

With this merged, main reproduces #37's behavior in full.

…locality) PtychoViTInferenceOp.start() eagerly created the TRT/PyCUDA session on the framework startup thread. PyCUDA contexts are thread-local, so the context is then unavailable in compute() (a MultiThreadScheduler worker thread), causing a SIGABRT in predict(). Make start() a no-op and let _compute_inner()'s existing `if self._session is None` guard load the engine on the first batch, in the worker thread — paying the ~1-2 s deserialize on batch 0. This restores a fix from #37 that was missed when decomposing it: main retained the pre-#37 eager-preload behavior (added for first-batch latency, before the multithread SIGABRT was discovered). It's a separate crash from the vit-only-mode CuPy SIGABRT already handled in #43 — that one was the iterative engine on the ViT GPU; this is the ViT session's own PyCUDA context. Found by a main-vs-#37 behavioral audit; start() body now matches #37 verbatim. Co-authored-by: Himanshu Goel <4122621+himanshugoel2797@users.noreply.github.com>

Garrett Bischof (gwbischof) merged commit c6285cf into main Jun 9, 2026
5 checks passed

Garrett Bischof (gwbischof) deleted the fix/vit-lazy-session-load branch June 9, 2026 20:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(vit): load the TRT session lazily, not in start() (PyCUDA thread-locality SIGABRT)#48

fix(vit): load the TRT session lazily, not in start() (PyCUDA thread-locality SIGABRT)#48
Garrett Bischof (gwbischof) merged 1 commit into
mainfrom
fix/vit-lazy-session-load

Garrett Bischof (gwbischof) commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Garrett Bischof (gwbischof) commented Jun 9, 2026

The gap

Fix

Not the same as #43

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant