fix: MPS memory reporting and cache synchronization#13287
fix: MPS memory reporting and cache synchronization#13287snwchd71 wants to merge 2 commits intoComfy-Org:masterfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughget_total_memory() and get_free_memory() now treat CPU and Apple MPS separately: CPU reads psutil values; MPS derives totals from 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@execution.py`:
- Around line 780-787: The MPS cache flush branch is currently unreachable when
CacheType.RAM_PRESSURE is set and its threshold is inverted; change the `elif
comfy.model_management.mps_mode():` to an independent `if
comfy.model_management.mps_mode():` so the MPS path runs even when
`free_memory(...)` was called, then flip the comparison to flush when the
allocator cache is large (use `if mem_free_torch > 0 and mem_free_torch >
mem_free_total * 0.25:`) so `comfy.model_management.get_free_memory(...)`
triggers `comfy.model_management.soft_empty_cache()` only when the reusable
torch cache is high (and avoid triggering on a zero fallback).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8004979d-50a4-4946-a801-d7291c61da16
📒 Files selected for processing (2)
comfy/model_management.pyexecution.py
| if self.cache_type == CacheType.RAM_PRESSURE: | ||
| comfy.model_management.free_memory(0, None, pins_required=ram_headroom, ram_required=ram_headroom) | ||
| comfy.memory_management.extra_ram_release(ram_headroom) | ||
| elif comfy.model_management.mps_mode(): | ||
| mem_free_total, mem_free_torch = comfy.model_management.get_free_memory( | ||
| comfy.model_management.get_torch_device(), torch_free_too=True) | ||
| if mem_free_torch < mem_free_total * 0.25: | ||
| comfy.model_management.soft_empty_cache() |
There was a problem hiding this comment.
Make the new MPS cache flush reachable and flip the threshold.
Line 783 makes this branch unreachable whenever CacheType.RAM_PRESSURE is enabled, even though the preceding free_memory(0, None, ...) call will not hit the MPS cache flush in comfy/model_management.py, Lines 717-721 because that path requires device is not None. Line 786 is also reversed: mem_free_torch is the allocator's reusable cache, so the flush should happen when that value is high, not low. As written, the new cleanup misses the hoarding case, and if the MPS stats fallback in comfy/model_management.py, Lines 1497-1505 ever returns mem_free_torch = 0, this condition will sync+empty after every node.
Suggested fix
if self.cache_type == CacheType.RAM_PRESSURE:
comfy.model_management.free_memory(0, None, pins_required=ram_headroom, ram_required=ram_headroom)
comfy.memory_management.extra_ram_release(ram_headroom)
- elif comfy.model_management.mps_mode():
+ if comfy.model_management.mps_mode():
mem_free_total, mem_free_torch = comfy.model_management.get_free_memory(
comfy.model_management.get_torch_device(), torch_free_too=True)
- if mem_free_torch < mem_free_total * 0.25:
+ if mem_free_torch > mem_free_total * 0.25:
comfy.model_management.soft_empty_cache()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@execution.py` around lines 780 - 787, The MPS cache flush branch is currently
unreachable when CacheType.RAM_PRESSURE is set and its threshold is inverted;
change the `elif comfy.model_management.mps_mode():` to an independent `if
comfy.model_management.mps_mode():` so the MPS path runs even when
`free_memory(...)` was called, then flip the comparison to flush when the
allocator cache is large (use `if mem_free_torch > 0 and mem_free_torch >
mem_free_total * 0.25:`) so `comfy.model_management.get_free_memory(...)`
triggers `comfy.model_management.soft_empty_cache()` only when the reusable
torch cache is high (and avoid triggering on a zero fallback).
fc33bfe to
9026aad
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
execution.py (1)
783-787:⚠️ Potential issue | 🟠 MajorMake the MPS cache flush reachable and flip the threshold.
Line 781's
free_memory(0, None, ...)cannot hit the MPS cache path incomfy.model_management.free_memory()because that branch only runs whendevice is not None. Withelif, the new cleanup is skipped wheneverCacheType.RAM_PRESSUREis active, and the comparison is backwards:mem_free_torchis the reusable allocator cache returned byget_free_memory(..., torch_free_too=True), so the hoarding case is> 25%, not< 25%. (docs.pytorch.org)Suggested fix
if self.cache_type == CacheType.RAM_PRESSURE: comfy.model_management.free_memory(0, None, pins_required=ram_headroom, ram_required=ram_headroom) comfy.memory_management.extra_ram_release(ram_headroom) - elif comfy.model_management.mps_mode(): + if comfy.model_management.mps_mode(): mem_free_total, mem_free_torch = comfy.model_management.get_free_memory( comfy.model_management.get_torch_device(), torch_free_too=True) - if mem_free_torch < mem_free_total * 0.25: + if mem_free_torch > 0 and mem_free_torch > mem_free_total * 0.25: comfy.model_management.soft_empty_cache()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@execution.py` around lines 783 - 787, The MPS cache flush branch is unreachable under the current elif and uses the wrong comparison; change the `elif comfy.model_management.mps_mode():` to an independent `if comfy.model_management.mps_mode():` so it still runs when RAM_PRESSURE logic executes, and invert the threshold check to `if mem_free_torch > mem_free_total * 0.25:` (using the values returned by `comfy.model_management.get_free_memory(comfy.model_management.get_torch_device(), torch_free_too=True)`) and call `comfy.model_management.soft_empty_cache()` when that condition is true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy/model_management.py`:
- Around line 447-449: The code sets VRAMState.SHARED for CPUState.MPS but later
a block force-assigns VRAMState.DISABLED for every non-CPUState.GPU backend,
which overwrites the MPS shared setting; update the logic that sets
VRAMState.DISABLED (the branch checking for non-CPUState.GPU) to skip or
preserve VRAMState when cpu_state == CPUState.MPS or when vram_state is already
VRAMState.SHARED so the MPS path can reach the shared-memory handling code
instead of being overwritten.
- Around line 912-920: The MPS memory check uses
torch.mps.recommended_max_memory() as if it were free budget; change the
calculation to subtract torch.mps.current_allocated_memory() so remaining
headroom is used. In the block guarded by vram_state == VRAMState.SHARED (where
model_size is computed from dtype_size(dtype) * parameters), compute mem_budget
as recommended_max_memory() - current_allocated_memory() (falling back to
get_free_memory(torch_dev) if the torch.mps calls fail) and then compare
model_size against mem_budget * 0.4 to decide between torch_dev and cpu_dev.
Ensure you reference torch.mps.recommended_max_memory() and
torch.mps.current_allocated_memory() and keep the existing fallback to
get_free_memory(torch_dev).
---
Duplicate comments:
In `@execution.py`:
- Around line 783-787: The MPS cache flush branch is unreachable under the
current elif and uses the wrong comparison; change the `elif
comfy.model_management.mps_mode():` to an independent `if
comfy.model_management.mps_mode():` so it still runs when RAM_PRESSURE logic
executes, and invert the threshold check to `if mem_free_torch > mem_free_total
* 0.25:` (using the values returned by
`comfy.model_management.get_free_memory(comfy.model_management.get_torch_device(),
torch_free_too=True)`) and call `comfy.model_management.soft_empty_cache()` when
that condition is true.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d2689583-3a42-4304-860f-81d539680a2d
📒 Files selected for processing (2)
comfy/model_management.pyexecution.py
There was a problem hiding this comment.
♻️ Duplicate comments (2)
comfy/model_management.py (1)
447-449:⚠️ Potential issue | 🟠 Major
VRAMState.SHAREDis immediately overwritten toDISABLED.The new assignment at Lines 447-449 sets
VRAMState.SHAREDfor MPS, but the existing code at Line 468-469 (if cpu_state != CPUState.GPU: vram_state = VRAMState.DISABLED) runs afterwards and overwrites it. This makes theVRAMState.SHAREDbranch at Lines 912-920 unreachable.Suggested fix
Change Line 468 to only target CPU, preserving SHARED for MPS:
-if cpu_state != CPUState.GPU: +if cpu_state == CPUState.CPU: vram_state = VRAMState.DISABLED🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@comfy/model_management.py` around lines 447 - 449, The VRAMState.SHARED assignment for CPUState.MPS is being clobbered later by the unconditional check "if cpu_state != CPUState.GPU: vram_state = VRAMState.DISABLED"; update that later condition so it does not overwrite the MPS case — e.g., change the check to only set DISABLED for the pure CPU path (use cpu_state == CPUState.CPU or explicitly exclude CPUState.MPS) so VRAMState.SHARED assigned when cpu_state == CPUState.MPS remains effective; look for symbols cpu_state, CPUState.MPS, VRAMState.SHARED, and the "if cpu_state != CPUState.GPU" branch to modify.execution.py (1)
783-787:⚠️ Potential issue | 🟠 MajorMPS cache flush branch is unreachable under RAM_PRESSURE and has inverted threshold logic.
The
elifmakes this branch unreachable whenCacheType.RAM_PRESSUREis enabled. Additionally, the comparison at Line 786 is inverted compared to the analogous CUDA logic infree_memory()(Lines 719-721 ofmodel_management.py), which flushes whenmem_free_torch > mem_free_total * 0.25. The current<check would flush when the cache is small (not hoarding), which contradicts the stated PR goal.Suggested fix
if self.cache_type == CacheType.RAM_PRESSURE: comfy.model_management.free_memory(0, None, pins_required=ram_headroom, ram_required=ram_headroom) comfy.memory_management.extra_ram_release(ram_headroom) - elif comfy.model_management.mps_mode(): + if comfy.model_management.mps_mode(): mem_free_total, mem_free_torch = comfy.model_management.get_free_memory( comfy.model_management.get_torch_device(), torch_free_too=True) - if mem_free_torch < mem_free_total * 0.25: + if mem_free_torch > mem_free_total * 0.25: comfy.model_management.soft_empty_cache()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@execution.py` around lines 783 - 787, The MPS branch is currently unreachable under CacheType.RAM_PRESSURE because it uses elif and its threshold logic is inverted; change the `elif comfy.model_management.mps_mode()` to an independent `if` so it runs when MPS is active even under RAM_PRESSURE, and invert the comparison in the check that calls `comfy.model_management.soft_empty_cache()` from `mem_free_torch < mem_free_total * 0.25` to `mem_free_torch > mem_free_total * 0.25` so it matches the CUDA logic in `free_memory()` and only flushes when the Torch memory is hoarding; keep the calls to `comfy.model_management.get_free_memory()` and `get_torch_device()` unchanged.
🧹 Nitpick comments (1)
comfy/model_management.py (1)
912-920: Memory budget should account for current allocation.Once the
VRAMState.SHAREDoverwrite issue is fixed, this logic will become reachable. However,torch.mps.recommended_max_memory()represents total recommended budget, not remaining headroom. A device that already has 80% allocated could still pass the 40% check. Consider subtracting current allocation for accurate headroom:Suggested improvement
if vram_state == VRAMState.SHARED: model_size = dtype_size(dtype) * parameters try: - mem_budget = torch.mps.recommended_max_memory() + mem_budget = max( + torch.mps.recommended_max_memory() - torch.mps.current_allocated_memory(), + 0, + ) except Exception: mem_budget = get_free_memory(torch_dev) if model_size < mem_budget * 0.4: return torch_dev return cpu_dev🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@comfy/model_management.py` around lines 912 - 920, The check using torch.mps.recommended_max_memory() must account for already-allocated memory before comparing to model_size; update the VRAMState.SHARED branch so mem_budget represents remaining headroom (either by calling get_free_memory(torch_dev) directly or by subtracting current allocation from torch.mps.recommended_max_memory(), e.g. current_alloc = get_used_memory(torch_dev) or similar helper and then headroom = mem_budget - current_alloc) and then compare model_size against headroom * 0.4; modify the block around VRAMState.SHARED (references: dtype_size, parameters, torch.mps.recommended_max_memory, get_free_memory, torch_dev, cpu_dev) to use that headroom value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@comfy/model_management.py`:
- Around line 447-449: The VRAMState.SHARED assignment for CPUState.MPS is being
clobbered later by the unconditional check "if cpu_state != CPUState.GPU:
vram_state = VRAMState.DISABLED"; update that later condition so it does not
overwrite the MPS case — e.g., change the check to only set DISABLED for the
pure CPU path (use cpu_state == CPUState.CPU or explicitly exclude CPUState.MPS)
so VRAMState.SHARED assigned when cpu_state == CPUState.MPS remains effective;
look for symbols cpu_state, CPUState.MPS, VRAMState.SHARED, and the "if
cpu_state != CPUState.GPU" branch to modify.
In `@execution.py`:
- Around line 783-787: The MPS branch is currently unreachable under
CacheType.RAM_PRESSURE because it uses elif and its threshold logic is inverted;
change the `elif comfy.model_management.mps_mode()` to an independent `if` so it
runs when MPS is active even under RAM_PRESSURE, and invert the comparison in
the check that calls `comfy.model_management.soft_empty_cache()` from
`mem_free_torch < mem_free_total * 0.25` to `mem_free_torch > mem_free_total *
0.25` so it matches the CUDA logic in `free_memory()` and only flushes when the
Torch memory is hoarding; keep the calls to
`comfy.model_management.get_free_memory()` and `get_torch_device()` unchanged.
---
Nitpick comments:
In `@comfy/model_management.py`:
- Around line 912-920: The check using torch.mps.recommended_max_memory() must
account for already-allocated memory before comparing to model_size; update the
VRAMState.SHARED branch so mem_budget represents remaining headroom (either by
calling get_free_memory(torch_dev) directly or by subtracting current allocation
from torch.mps.recommended_max_memory(), e.g. current_alloc =
get_used_memory(torch_dev) or similar helper and then headroom = mem_budget -
current_alloc) and then compare model_size against headroom * 0.4; modify the
block around VRAMState.SHARED (references: dtype_size, parameters,
torch.mps.recommended_max_memory, get_free_memory, torch_dev, cpu_dev) to use
that headroom value.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b6fcf114-de38-481f-841d-d12e32ee26ca
📒 Files selected for processing (2)
comfy/model_management.pyexecution.py
9026aad to
8eb4c4d
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
execution.py (1)
783-787:⚠️ Potential issue | 🟠 MajorMPS cache flush logic has reachability and threshold issues.
Two concerns with this branch:
Unreachable with RAM_PRESSURE: Using
elifmeans this never executes whenCacheType.RAM_PRESSUREis enabled, even though MPS users might want both behaviors.Inverted threshold:
mem_free_torchrepresents cached/reclaimable memory in the allocator. The flush should trigger when this is high (allocator hoarding), not low. Current logic flushes when there's little to reclaim.Additionally, the exception fallback in
get_free_memory()setsmem_free_torch = 0, which would make0 < mem_free_total * 0.25always true, triggering flush after every node.Suggested fix
if self.cache_type == CacheType.RAM_PRESSURE: comfy.model_management.free_memory(0, None, pins_required=ram_headroom, ram_required=ram_headroom) comfy.memory_management.extra_ram_release(ram_headroom) - elif comfy.model_management.mps_mode(): + if comfy.model_management.mps_mode(): mem_free_total, mem_free_torch = comfy.model_management.get_free_memory( comfy.model_management.get_torch_device(), torch_free_too=True) - if mem_free_torch < mem_free_total * 0.25: + if mem_free_torch > 0 and mem_free_torch > mem_free_total * 0.25: comfy.model_management.soft_empty_cache()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@execution.py` around lines 783 - 787, The MPS branch is currently unreachable when RAM_PRESSURE runs (because it uses elif) and uses an inverted threshold and false-positive behavior when get_free_memory() returns 0; change the branch to an independent if (use comfy.model_management.mps_mode()) so it can run alongside RAM_PRESSURE, invert the comparison to flush when mem_free_torch is high (use mem_free_torch > mem_free_total * 0.25), and guard against the exception fallback by treating mem_free_torch == 0 as unknown/skip (only compare/flush when mem_free_torch > 0); use the existing calls comfy.model_management.get_free_memory(comfy.model_management.get_torch_device(), torch_free_too=True) and call comfy.model_management.soft_empty_cache() when the adjusted condition is met.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@execution.py`:
- Around line 783-787: The MPS branch is currently unreachable when RAM_PRESSURE
runs (because it uses elif) and uses an inverted threshold and false-positive
behavior when get_free_memory() returns 0; change the branch to an independent
if (use comfy.model_management.mps_mode()) so it can run alongside RAM_PRESSURE,
invert the comparison to flush when mem_free_torch is high (use mem_free_torch >
mem_free_total * 0.25), and guard against the exception fallback by treating
mem_free_torch == 0 as unknown/skip (only compare/flush when mem_free_torch >
0); use the existing calls
comfy.model_management.get_free_memory(comfy.model_management.get_torch_device(),
torch_free_too=True) and call comfy.model_management.soft_empty_cache() when the
adjusted condition is met.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ceb95581-0eb8-431e-99b5-398e83b80490
📒 Files selected for processing (2)
comfy/model_management.pyexecution.py
- Split MPS from CPU in get_total_memory() and get_free_memory() to use torch.mps APIs (recommended_max_memory, driver_allocated_memory, current_allocated_memory) instead of relying solely on psutil - Add torch.mps.synchronize() to synchronize() and soft_empty_cache() - Add inter-node MPS cache flush when torch free memory is low - Move MPS SHARED vram assignment before --lowvram/--novram checks so users can override it for large models
8eb4c4d to
33b967d
Compare
- Add torch.mps.synchronize() to synchronize() function, mirroring the existing CUDA and XPU paths - Add torch.mps.synchronize() before torch.mps.empty_cache() in soft_empty_cache() to ensure pending MPS operations complete before releasing cached memory - Allow --lowvram and --novram flags to take effect on MPS devices. Previously, MPS unconditionally set vram_state to SHARED regardless of user flags. Now respects set_vram_to when LOW_VRAM or NO_VRAM is requested, while defaulting to SHARED otherwise.
Summary
MPS memory management currently treats MPS identically to CPU, using only
psutil.virtual_memory(). This causes ComfyUI to make poor memory decisions on Apple Silicon, it doesn't know how much MPS-tracked memory is actually in use, andsynchronize()skips MPS entirely.This PR:
get_total_memory()andget_free_memory()to usetorch.mps.recommended_max_memory(),torch.mps.driver_allocated_memory(), andtorch.mps.current_allocated_memory()for accurate reportingtorch.mps.synchronize()tosynchronize()(was missing entirely) and beforeempty_cache()insoft_empty_cache()execution.py— when torch free memory drops below 25% of total free between node executions, flushes the MPS caching allocator to prevent memory hoarding during long denoising loopsAll
torch.mps.*APIs are available since PyTorch 2.1 (minimum requirement is 2.5). Fallbacks are provided viaexcept Exceptionfor safety.Context
torch.mps.current_allocated_memory()soft_empty_cache()only fires between nodesTest plan
torch.mps.recommended_max_memory(),driver_allocated_memory(),current_allocated_memory()return correct values on M5 Prosoft_empty_cache()now properly synchronizes before clearingdev.type == 'mps'/mps_mode())