Skip to content

fix(vit): load the TRT session lazily, not in start() (PyCUDA thread-locality SIGABRT)#48

Merged
Garrett Bischof (gwbischof) merged 1 commit into
mainfrom
fix/vit-lazy-session-load
Jun 9, 2026
Merged

fix(vit): load the TRT session lazily, not in start() (PyCUDA thread-locality SIGABRT)#48
Garrett Bischof (gwbischof) merged 1 commit into
mainfrom
fix/vit-lazy-session-load

Conversation

@gwbischof

Copy link
Copy Markdown
Collaborator

Found by a detailed main-vs-#37 behavioral audit: the decomposition reproduced all of #37's behavior except one fix.

The gap

PtychoViTInferenceOp.start() on main eagerly loads the TRT/PyCUDA session (pre-#37 behavior, added for first-batch latency). PyCUDA contexts are thread-local, so a context created in start() (framework startup thread) is unavailable in compute() (MultiThreadScheduler worker thread) → SIGABRT in predict().

#37 deliberately reverted this to a lazy load (start() = pass; _compute_inner loads on the first batch in the worker thread). That change lives in PtychoViTInferenceOp, which the decomposition's H2/H5b edited — the start() revert was simply missed.

Fix

Make start() a no-op. _compute_inner already has the if self._session is None: self._init_session() guard, so the engine loads on batch 0 in the correct (compute) thread. start() now matches #37 verbatim.

Not the same as #43

#43 fixed the vit-only-mode SIGABRT — not creating the iterative engine's CuPy context on the ViT GPU. This is a distinct crash: the ViT session's own PyCUDA context being created on the wrong thread, which affects both/vit runs regardless.

Testing

Compose+runtime Holoscan behavior (not unit-testable in CI — the smoke import already fails on a missing TILED_BASE_URL). start() body is byte-identical to #37's; the lazy-load path in _compute_inner is the existing mechanism. Suite unaffected (25 of the fast tests pass; full suite = the usual 2 pre-existing TILED smoke failures).

With this merged, main reproduces #37's behavior in full.

…locality)

PtychoViTInferenceOp.start() eagerly created the TRT/PyCUDA session on the
framework startup thread. PyCUDA contexts are thread-local, so the context is
then unavailable in compute() (a MultiThreadScheduler worker thread), causing
a SIGABRT in predict(). Make start() a no-op and let _compute_inner()'s
existing `if self._session is None` guard load the engine on the first batch,
in the worker thread — paying the ~1-2 s deserialize on batch 0.

This restores a fix from #37 that was missed when decomposing
it: main retained the pre-#37 eager-preload behavior (added for first-batch
latency, before the multithread SIGABRT was discovered). It's a separate crash
from the vit-only-mode CuPy SIGABRT already handled in #43 — that one was the
iterative engine on the ViT GPU; this is the ViT session's own PyCUDA context.

Found by a main-vs-#37 behavioral audit; start() body now matches #37 verbatim.

Co-authored-by: Himanshu Goel <4122621+himanshugoel2797@users.noreply.github.com>
@gwbischof Garrett Bischof (gwbischof) merged commit c6285cf into main Jun 9, 2026
5 checks passed
@gwbischof Garrett Bischof (gwbischof) deleted the fix/vit-lazy-session-load branch June 9, 2026 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant