Fix CUDA OOM on pipeline switching by emranemran · Pull Request #403 · daydreamlive/scope

emranemran · 2026-02-05T20:10:23Z

Summary

Fixes CUDA out-of-memory errors when switching between pipelines (e.g. longlive → krea).

Root Cause

When _unload_pipeline_by_id_unsafe() removes a pipeline from the manager's _pipelines dict, active WebRTC sessions still hold references to the old pipeline through:

WebRTCManager.sessions → Session → VideoProcessingTrack → FrameProcessor
  → PipelineProcessor.pipeline → old pipeline object (still on GPU)

Because PipelineProcessor stores a direct reference (self.pipeline = pipeline), the old pipeline and all its GPU tensors survive gc.collect(). Additionally, pinned memory buffers in FrameProcessor were never released during pipeline switching.

Changes

Add cleanup() to Pipeline base class — calls gc.collect() and torch.cuda.empty_cache() to free GPU memory. Does NOT clear component/state dicts to avoid race conditions with in-flight frame processing.
Call cleanup() during pipeline unload — in _unload_pipeline_by_id_unsafe(), invoke cleanup before removing references
Add GPU memory logging — log free GPU memory before/after unload and after load, wrapped in try/except to prevent logging failures from disrupting pipeline switching
Clear pinned buffer cache on frame processor stop — release DMA transfer buffers that were never freed
Release pipeline reference on processor stop — set self.pipeline = None to break the reference chain and allow GC to reclaim GPU memory

Test plan

Deploy to fal.ai staging and verify GPU memory logs appear
Switch pipelines (longlive → krea) without crashes or KeyError: 'vae'
Confirm GPU memory is reclaimed between pipeline switches
uv run pytest passes
uv run daydream-scope starts without errors

🤖 Generated with Claude Code

- Add cleanup() to Pipeline base class (gc.collect + empty_cache) - Call cleanup on pipeline unload to free GPU memory - Log GPU memory before/after unload and after load - Clear pinned buffer cache on frame processor stop - Release pipeline reference on pipeline processor stop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: emranemran <emran.mah@gmail.com>

emranemran · 2026-02-06T08:20:10Z

This seems to be working. Tried switching from longlive (default settings) to krea (+ vace + v2v + RIFE postprocessor):


Feb 6, 08:15:00 | INFO | 2026-02-06 08:15:00,865 - scope.server.pipeline_manager - INFO - GPU memory free after load: 23.73 GiB
-- | -- | --
  |   | Feb 6, 08:14:58 | INFO | 2026-02-06 08:14:58,524 - scope.server.pipeline_manager - INFO - GPU memory free after load: 23.79 GiB
  |   | Feb 6, 08:12:14 | INFO | 2026-02-06 08:12:14,335 - scope.server.pipeline_manager - INFO - GPU memory free after unload: 64.74 GiB
  |   | Feb 6, 08:12:13 | INFO | 2026-02-06 08:12:13,528 - scope.server.pipeline_manager - INFO - GPU memory free before unload: 64.76 GiB
  |   | Feb 6, 08:11:19 | INFO | 2026-02-06 08:11:19,612 - scope.server.pipeline_manager - INFO - GPU memory free after load: 67.41 GiB
  |   | Feb 6, 08:11:16 | INFO | 2026-02-06 08:11:16,302 - scope.server.pipeline_manager - INFO - GPU memory free after load: 67.41 GiB

EDIT: oops hang on, i read these logs incorrectly. Seems like it's not freeing much mem at all:

  - Longlive used ~12.6 GiB on load (80 - 67.41)
  - Before → after unload: 64.76 → 64.74 GiB — only 0.02 GiB freed
  - Krea then loaded on top, using ~41 GiB (64.74 → 23.79)

UPDATE: if i switch from longlive -> krea (+ vace + t2v) w/o RIFE, then I see the following so the issue could be the RIFE post-processer doesn't unload properly.

Feb 6, 09:13:04 | INFO | 2026-02-06 09:13:04,519 - scope.server.pipeline_manager - INFO - GPU memory free after unload: 78.37 GiB
-- | -- | --
  |   | Feb 6, 09:13:03 | INFO | 2026-02-06 09:13:03,747 - scope.server.pipeline_manager - INFO - GPU memory free before unload: 64.73 GiB
  |   | Feb 6, 09:12:20 | INFO | 2026-02-06 09:12:20,520 - scope.server.pipeline_manager - INFO - GPU memory free after load: 67.20 GiB
  |   | Feb 6, 09:12:18 | INFO | 2026-02-06 09:12:18,902 - scope.server.pipeline_manager - INFO - GPU memory free after load: 67.20 GiB
  |   | Feb 6, 09:12:18 | INFO | 2026-02-06 09:12:18,901 - scope.server.pipeline_manager - INFO - GPU memory free after unload: 67.20 GiB
  |   | Feb 6, 09:12:16 | INFO | 2026-02-06 09:12:16,948 - scope.server.pipeline_manager - INFO - GPU memory free before unload: 26.34 GiB
  |   | Feb 6, 09:08:32 | INFO | 2026-02-06 09:08:32,936 - scope.server.pipeline_manager - INFO - GPU memory free after load: 26.34 GiB
  |   | Feb 6, 09:08:22 | INFO | 2026-02-06 09:08:22,872 - scope.server.pipeline_manager - INFO - GPU memory free after load: 26.65 GiB
  |   | Feb 6, 09:02:04 | INFO | 2026-02-06 09:02:04,712 - scope.server.pipeline_manager - INFO - GPU memory free after unload: 64.66 GiB
  |   | Feb 6, 09:02:04 | INFO | 2026-02-06 09:02:04,291 - scope.server.pipeline_manager - INFO - GPU memory free before unload: 64.74 GiB
  |   | Feb 6, 09:01:22 | INFO | 2026-02-06 09:01:22,428 - scope.server.pipeline_manager - INFO - GPU memory free after load: 67.41 GiB
  |   | Feb 6, 09:01:20 | INFO | 2026-02-06 09:01:20,619 - scope.server.pipeline_manager - INFO - GPU memory free after load: 67.41 GiB

Add unload callback mechanism so PipelineManager can notify FrameProcessors to release pipeline references before calling cleanup(). This allows gc.collect() + empty_cache() to actually free GPU memory during pipeline switches, not just on session end. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: emranemran <emran.mah@gmail.com>

Signed-off-by: Varshith B <varshith15@gmail.com>

varshith15 · 2026-02-06T12:06:38Z

just deleting the pipeline and calling gc.collect is not enough the reference to objects which have gpu data need to deleted too, for the internal pipelines (ive added components and state to be deleted just to be extra careful) this should work fine but for plugins its an issue

so plugins should also cleanup method delete their gpu objects references

from my testing locally : longlive (18 GB) -> flashvsr (14GB) -> streamdiffusion v2 (16B -- where its supposed to be taking 14 GB)

cc: @yondonfu

yondonfu · 2026-02-06T15:43:54Z

@varshith15 I think from a DX POV it would a pain to require a plugin pipeline to implement a cleanup function so curious if that can be avoided...

just deleting the pipeline and calling gc.collect is not enough the reference to objects which have gpu data need to deleted too

Why? If the pipeline is the only one that contains references to the underlying data structures that actually are consuming GPU mem shouldn't it be the case that as long as the there is no remaining ref to the pipeline then a gc.collect() and a CUDA cache clear would wipe any GPU mem that was consumed by the pipeline and its underlying data structures?

EDIT: Ok to answer my own question I think the issue is that a) if CUDA tensors "escape" the pipeline and are not freed that would keep refs around b) hooks/closures that hold on to refs of vars c) a variety of other ways it seems...

yondonfu · 2026-02-06T16:09:20Z

It seems like the only guaranteed clean way to make sure all GPU mem is freed when unloading a pipeline, regardless of how it is implemented, is to isolate it in a subprocess. @leszko looked into this previously, but we tabled because didn't want to take on the complexity (particularly inter-process comms for an already latency sensitive code path). I would treat that path as a separate thing to consider and discuss.

This leaves the question - what are the practical steps that can be taken to minimize change of this type of OOM during pipeline switching in the short term?

A few ideas from a chat with Claude that is a superset of what is already done in this PR and @varshith15's suggestions:

Step 1: Stop active processors and break ref chains BEFORE gc.collect

 The single highest-impact change. Currently _unload_pipeline_by_id_unsafe does:
 1. del self._pipelines[pipeline_id] — drops manager's ref
 2. gc.collect() — but processors still hold refs, so nothing is freed

 Reorder to:
 1. Notify all active FrameProcessors that use this pipeline_id to stop their PipelineProcessors
 2. Each PipelineProcessor.stop() should set self.pipeline = None and drain queues (freeing GPU tensors)
 3. FrameProcessor should clear _pinned_buffer_cache
 4. THEN del self._pipelines[pipeline_id]
 5. THEN gc.collect() + torch.cuda.empty_cache()

 This is similar to PR #403's callback approach, but must be synchronous/blocking — the unload must wait until all processors have actually released their refs.

 Files: pipeline_manager.py, frame_processor.py, pipeline_processor.py

 Step 2: Explicit teardown on Pipeline base class (generic, no plugin work needed)

 Add a teardown() method to Pipeline base class in interface.py that:
 - Iterates self.__dict__ and deletes any torch.nn.Module instances found (catches RIFE's self.rife_interpolator, standard self.components, etc.)
 - Clears self.components, self.state, self.blocks if they exist
 - This is generic — handles any pipeline without requiring custom cleanup
 - Called by pipeline_manager after all external refs are broken (Step 1) but before gc.collect

 This addresses the RIFE-specific problem without requiring each pipeline to implement its own cleanup. The base class just introspects __dict__ for GPU-holding objects.

 Files: interface.py, pipeline_manager.py

 Step 3: Properly drain GPU tensors from queues

 PipelineProcessor.stop() currently drains queues with get_nowait() but the tensors go out of scope one by one. For deterministic cleanup:
 - Drain all tensors into a list, then del the list explicitly
 - Alternatively, just ensure self.pipeline = None is set in stop() (as PR #403 proposes)

 The queue draining in stop() already exists (lines 171-185 of pipeline_processor.py) — just ensure that after draining, there's no lingering ref.

 Files: pipeline_processor.py

 Step 4: Clear pinned buffer cache on stop

 FrameProcessor._pinned_buffer_cache holds pinned CUDA memory buffers that are never released during pipeline switching. Add to FrameProcessor.stop():

 with self._pinned_buffer_lock:
     self._pinned_buffer_cache.clear()

 This is small but pinned memory can be non-trivial.

I think what feels reasonable is for there to be a generic cleanup fn that does a best effort cleanup for all pipelines by introspecting __dict__ and it just leaves possibility of edge cases right now.

Signed-off-by: Varshith B <varshith15@gmail.com>

varshith15 · 2026-02-06T17:32:48Z

I think what feels reasonable is for there to be a generic cleanup fn that does a best effort cleanup for all pipelines by introspecting __dict__ and it just leaves possibility of edge cases right now.

i think we definitely need to go a couple of levels deep, 2 atleast, for instance rife, the RIFEInterpolator isnt torch module, but the model inside it is

added a BFS kinda crawler

yondonfu · 2026-02-06T18:33:02Z

i think we definitely need to go a couple of levels deep, 2 atleast, for instance rife, the RIFEInterpolator isnt torch module, but the model inside it is

But, if you remove all refs in __dict__ even if RIFEInterpolator is nested in some field called self.my_field shouldn't a subsequent cache clear + collect handle that because if self.my_field is gone then RIFEInterpolator has no refs?

Signed-off-by: Varshith B <varshith15@gmail.com>

varshith15 · 2026-02-06T19:28:25Z

But, if you remove all refs in __dict__ even if RIFEInterpolator is nested in some field called self.my_field shouldn't a subsequent cache clear + collect handle that because if self.my_field is gone then RIFEInterpolator has no refs?

yeah this makes sense, is also a lot cleaner, updated it -- i was thinking of only deleting tensors and tensor modules but yeah gc should be able to collect if the ref count goes to 0 -- this should get most of the refs

yondonfu · 2026-02-06T22:03:27Z

On the backend-fal-v6 branch I can't seem to repro the issue of memory not being freed with the following combos:

video-depth-anything + longlive + rife -> passthrough
reward-forcing -> passthrough
memflow -> passthrough

In all cases, when the switch to passthrough is completed I see the VRAM drop down to baseline levels.

I did not get a chance to try with Krea though. So, I wonder if it is Krea specific.

If testing on a H100 one thing to try that comes to mind is changing this line

scope/src/scope/server/pipeline_manager.py

Line 759 in 0febd6a

compile=any(

to compile=False as this is one of the bigger differences relative to other pipelines that is specific to H100s. Perhaps the torch.compile happening under the hood is causing some caching that prevents GPU mem from being freed...

emranemran force-pushed the cuda-oom branch 2 times, most recently from 5e0c0b2 to 8449455 Compare February 5, 2026 20:25

emranemran force-pushed the cuda-oom branch from 2e0169c to 3fe1655 Compare February 6, 2026 07:48

emranemran requested review from mjh1, ryanontheinside, varshith15 and yondonfu February 6, 2026 08:20

emranemran and others added 2 commits February 6, 2026 00:47

fix: cleanup

d785b11

Signed-off-by: Varshith B <varshith15@gmail.com>

Merge branch 'backend-fal-v6' into cuda-oom

4af7802

feat: multilevel cleanup

b541c5d

Signed-off-by: Varshith B <varshith15@gmail.com>

fix: cleanup logic

5340825

Signed-off-by: Varshith B <varshith15@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CUDA OOM on pipeline switching#403

Fix CUDA OOM on pipeline switching#403
emranemran wants to merge 6 commits intobackend-fal-v6from
cuda-oom

emranemran commented Feb 5, 2026 •

edited

Loading

Uh oh!

emranemran commented Feb 6, 2026 •

edited

Loading

Uh oh!

varshith15 commented Feb 6, 2026 •

edited

Loading

Uh oh!

yondonfu commented Feb 6, 2026 •

edited

Loading

Uh oh!

yondonfu commented Feb 6, 2026

Uh oh!

varshith15 commented Feb 6, 2026 •

edited

Loading

Uh oh!

yondonfu commented Feb 6, 2026 •

edited

Loading

Uh oh!

varshith15 commented Feb 6, 2026

Uh oh!

yondonfu commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

emranemran commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Changes

Test plan

Uh oh!

emranemran commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

varshith15 commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yondonfu commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yondonfu commented Feb 6, 2026

Uh oh!

varshith15 commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yondonfu commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

varshith15 commented Feb 6, 2026

Uh oh!

yondonfu commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

emranemran commented Feb 5, 2026 •

edited

Loading

emranemran commented Feb 6, 2026 •

edited

Loading

varshith15 commented Feb 6, 2026 •

edited

Loading

yondonfu commented Feb 6, 2026 •

edited

Loading

varshith15 commented Feb 6, 2026 •

edited

Loading

yondonfu commented Feb 6, 2026 •

edited

Loading