Fix CUDA OOM on pipeline switching#403
Conversation
5e0c0b2 to
8449455
Compare
- Add cleanup() to Pipeline base class (gc.collect + empty_cache) - Call cleanup on pipeline unload to free GPU memory - Log GPU memory before/after unload and after load - Clear pinned buffer cache on frame processor stop - Release pipeline reference on pipeline processor stop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: emranemran <emran.mah@gmail.com>
|
This seems to be working. Tried switching from longlive (default settings) to krea (+ vace + v2v + RIFE postprocessor): EDIT: oops hang on, i read these logs incorrectly. Seems like it's not freeing much mem at all: UPDATE: if i switch from longlive -> krea (+ vace + t2v) w/o RIFE, then I see the following so the issue could be the RIFE post-processer doesn't unload properly. |
Add unload callback mechanism so PipelineManager can notify FrameProcessors to release pipeline references before calling cleanup(). This allows gc.collect() + empty_cache() to actually free GPU memory during pipeline switches, not just on session end. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: emranemran <emran.mah@gmail.com>
Signed-off-by: Varshith B <varshith15@gmail.com>
|
just deleting the pipeline and calling gc.collect is not enough the reference to objects which have gpu data need to deleted too, for the internal pipelines (ive added components and state to be deleted just to be extra careful) this should work fine but for plugins its an issue so plugins should also cleanup method delete their gpu objects references from my testing locally : longlive (18 GB) -> flashvsr (14GB) -> streamdiffusion v2 (16B -- where its supposed to be taking 14 GB) cc: @yondonfu |
|
@varshith15 I think from a DX POV it would a pain to require a plugin pipeline to implement a cleanup function so curious if that can be avoided...
Why? If the pipeline is the only one that contains references to the underlying data structures that actually are consuming GPU mem shouldn't it be the case that as long as the there is no remaining ref to the pipeline then a EDIT: Ok to answer my own question I think the issue is that a) if CUDA tensors "escape" the pipeline and are not freed that would keep refs around b) hooks/closures that hold on to refs of vars c) a variety of other ways it seems... |
|
It seems like the only guaranteed clean way to make sure all GPU mem is freed when unloading a pipeline, regardless of how it is implemented, is to isolate it in a subprocess. @leszko looked into this previously, but we tabled because didn't want to take on the complexity (particularly inter-process comms for an already latency sensitive code path). I would treat that path as a separate thing to consider and discuss. This leaves the question - what are the practical steps that can be taken to minimize change of this type of OOM during pipeline switching in the short term? A few ideas from a chat with Claude that is a superset of what is already done in this PR and @varshith15's suggestions: I think what feels reasonable is for there to be a generic cleanup fn that does a best effort cleanup for all pipelines by introspecting |
Signed-off-by: Varshith B <varshith15@gmail.com>
i think we definitely need to go a couple of levels deep, 2 atleast, for instance rife, the added a BFS kinda crawler |
But, if you remove all refs in |
Signed-off-by: Varshith B <varshith15@gmail.com>
yeah this makes sense, is also a lot cleaner, updated it -- i was thinking of only deleting tensors and tensor modules but yeah gc should be able to collect if the ref count goes to 0 -- this should get most of the refs |
|
On the backend-fal-v6 branch I can't seem to repro the issue of memory not being freed with the following combos: video-depth-anything + longlive + rife -> passthrough In all cases, when the switch to passthrough is completed I see the VRAM drop down to baseline levels. I did not get a chance to try with Krea though. So, I wonder if it is Krea specific. If testing on a H100 one thing to try that comes to mind is changing this line scope/src/scope/server/pipeline_manager.py Line 759 in 0febd6a |
Summary
Fixes CUDA out-of-memory errors when switching between pipelines (e.g. longlive → krea).
Root Cause
When
_unload_pipeline_by_id_unsafe()removes a pipeline from the manager's_pipelinesdict, active WebRTC sessions still hold references to the old pipeline through:Because
PipelineProcessorstores a direct reference (self.pipeline = pipeline), the old pipeline and all its GPU tensors survivegc.collect(). Additionally, pinned memory buffers inFrameProcessorwere never released during pipeline switching.Changes
cleanup()to Pipeline base class — callsgc.collect()andtorch.cuda.empty_cache()to free GPU memory. Does NOT clear component/state dicts to avoid race conditions with in-flight frame processing.cleanup()during pipeline unload — in_unload_pipeline_by_id_unsafe(), invoke cleanup before removing referencesself.pipeline = Noneto break the reference chain and allow GC to reclaim GPU memoryTest plan
KeyError: 'vae'uv run pytestpassesuv run daydream-scopestarts without errors🤖 Generated with Claude Code