fix(vit): load the TRT session lazily, not in start() (PyCUDA thread-locality SIGABRT)#48
Merged
Conversation
…locality) PtychoViTInferenceOp.start() eagerly created the TRT/PyCUDA session on the framework startup thread. PyCUDA contexts are thread-local, so the context is then unavailable in compute() (a MultiThreadScheduler worker thread), causing a SIGABRT in predict(). Make start() a no-op and let _compute_inner()'s existing `if self._session is None` guard load the engine on the first batch, in the worker thread — paying the ~1-2 s deserialize on batch 0. This restores a fix from #37 that was missed when decomposing it: main retained the pre-#37 eager-preload behavior (added for first-batch latency, before the multithread SIGABRT was discovered). It's a separate crash from the vit-only-mode CuPy SIGABRT already handled in #43 — that one was the iterative engine on the ViT GPU; this is the ViT session's own PyCUDA context. Found by a main-vs-#37 behavioral audit; start() body now matches #37 verbatim. Co-authored-by: Himanshu Goel <4122621+himanshugoel2797@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Found by a detailed
main-vs-#37 behavioral audit: the decomposition reproduced all of #37's behavior except one fix.The gap
PtychoViTInferenceOp.start()on main eagerly loads the TRT/PyCUDA session (pre-#37 behavior, added for first-batch latency). PyCUDA contexts are thread-local, so a context created instart()(framework startup thread) is unavailable incompute()(MultiThreadScheduler worker thread) → SIGABRT inpredict().#37 deliberately reverted this to a lazy load (
start()=pass;_compute_innerloads on the first batch in the worker thread). That change lives inPtychoViTInferenceOp, which the decomposition's H2/H5b edited — thestart()revert was simply missed.Fix
Make
start()a no-op._compute_inneralready has theif self._session is None: self._init_session()guard, so the engine loads on batch 0 in the correct (compute) thread.start()now matches #37 verbatim.Not the same as #43
#43 fixed the vit-only-mode SIGABRT — not creating the iterative engine's CuPy context on the ViT GPU. This is a distinct crash: the ViT session's own PyCUDA context being created on the wrong thread, which affects
both/vitruns regardless.Testing
Compose+runtime Holoscan behavior (not unit-testable in CI — the smoke import already fails on a missing TILED_BASE_URL).
start()body is byte-identical to #37's; the lazy-load path in_compute_inneris the existing mechanism. Suite unaffected (25 of the fast tests pass; full suite = the usual 2 pre-existing TILED smoke failures).With this merged,
mainreproduces #37's behavior in full.