Dynamic VRAM support by rattus128 · Pull Request #427 · city96/ComfyUI-GGUF

rattus128 · 2026-03-05T19:23:40Z

The new dynamic VRAM system in the comfy-core enhances both RAM and VRAM management. Models are no longer offloader from VRAM to RAM (which has a habit of becoming swap) and are now loadable asynchronously on the sampler first iteration. This gives significant speedup to big multi-model workflows on low-resource systems. VRAM offloading is managed by demand offloading, such there is no need to have VRAM usage esitmates anymore.

The core has already upstreamed several of the resource saving features of GGUF in various forms.

The core linear layers are now inited un-allocated to avoid the naked commit charge for the empty tensor.
Models are loaded with assign=True to avoid deep copy and committed memory on model load (GGUF does similar but with _load_state_dict hooking)
the sft file is mmaped read only to avoid that commit charge. GGUF does this

So this implements a QuantizedTensor backend and subclasses the new ModelPatcherDynamic to bring GGUF+dynamic without needed custom ops.

The patcher subclass is needed to unhook the lora into on-the-fly. Otherwise its just load the state-dict into the new QuantizedTensor and go.

This brings the full feature-set of the core comfy caster to GGUF including, async-offload (and async primary load), pinned-memory and now the dynamic management.

There's some boilerplate to implement downgrade back to ModelPatcher. This is needed for things like torch compiler and hooks where Dynamic VRAM is TBD.

Still drafing and will post some more performance results. I am going to pull a RAM stick and go for some 16GB RAM flows with GGUF.

Example Test conditions:

WAN2.2 14B Q8 GGUF, 640x640x81f, RTX5090, Linux, 96GB, 2x Runs (disk caches warm with model first runs)

Before

Prompt executed in 60.31 seconds
Prompt executed in 55.99 seconds

After

Prompt executed in 48.75 seconds
Prompt executed in 43.35 seconds

Vibe code. To be reviewed.

If in dynamic mode, load GGUF as a QT.

Refactor this to support the new reconstructability protocol in the comfy core. This is needed for DynamicVRAM (to support legacy demotion for fallbacks). Add the logic for dynamic_vram construction. This is also needed for worksplit multi-gpu branch where the model is deep-cloned via reconstruction to put the model on two parallel GPUs.

Factor this out to a helper and implement the new core reconstruction protocol. Consider the mmap_released flag 1:1 with the underlying model such that it moves with the base model in model_override.

m8rr · 2026-03-06T05:21:20Z

https://github.com/rattus128/ComfyUI-GGUF/tree/dynamic-vram

Is this the same thing?
I used the above and an error occurs when using CLIPLoader (GGUF) with GGUF


D:\AI\ComfyUI_windows_portable>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --disable-api-nodes --output-directory E:\output --temp-directory E:\output
Setting output directory to: E:\output
Found comfy_kitchen backend triton: {'available': True, 'disabled': True, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8']}
Found comfy_kitchen backend eager: {'available': True, 'disabled': False, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8', 'scaled_mm_nvfp4']}
Found comfy_kitchen backend cuda: {'available': True, 'disabled': False, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8', 'scaled_mm_nvfp4']}
Checkpoint files will always be loaded safely.
Total VRAM 12282 MB, total RAM 32085 MB
pytorch version: 2.10.0+cu130
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4070 SUPER : cudaMallocAsync
Using async weight offloading with 2 streams
Enabled pinned memory 14438.0
working around nvidia conv3d memory bug.
Using pytorch attention
aimdo: src-win/cuda-detour.c:77:INFO:aimdo_setup_hooks: found driver at 00007FFB60C00000, installing 4 hooks
aimdo: src-win/cuda-detour.c:61:DEBUG:install_hook_entrys: hooks successfully installed
aimdo: src/control.c:66:INFO:comfy-aimdo inited for GPU: NVIDIA GeForce RTX 4070 SUPER (VRAM: 12281 MB)
DynamicVRAM support detected and enabled
Python version: 3.13.9 (tags/v3.13.9:8183fa5, Oct 14 2025, 14:09:13) [MSC v.1944 64 bit (AMD64)]
ComfyUI version: 0.16.3
Setting temp directory to: E:\output\temp
ComfyUI frontend version: 1.39.19
[Prompt Server] web root: D:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\comfyui_frontend_package\static
ComfyUI-GGUF: Allowing full torch compile

Import times for custom nodes:
   0.0 seconds: D:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\websocket_image_save.py
   0.0 seconds: D:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-KJNodes
   0.1 seconds: D:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-GGUF

Context impl SQLiteImpl.
Will assume non-transactional DDL.
Assets scan(roots=['models']) completed in 0.056s (created=0, skipped_existing=81, orphans_pruned=0, total_seen=85)
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
gguf qtypes: F32 (289), Q6_K (337)
Attempting to recreate sentencepiece tokenizer from GGUF file metadata...
Created tokenizer with vocab size of 262208
Dequantizing token_embd.weight to prevent runtime OOM.
clip missing: ['multi_modal_projector.mm_input_projection_weight', 
....
....
'vision_model.post_layernorm.weight', 'vision_model.post_layernorm.bias']
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load LTXAVTEModel_
Model LTXAVTEModel_ prepared for dynamic VRAM loading. 50881MB Staged. 0 patches attached. Force pre-loaded 290 weights: 2995 KB.
!!! Exception during processing !!! shape '[4096, 3840]' is invalid for input of size 12902400
Traceback (most recent call last):
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 524, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 333, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 307, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 295, in process_inputs
    result = f(**inputs)
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\nodes.py", line 80, in encode
    return (clip.encode_from_tokens_scheduled(tokens), )
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\comfy\sd.py", line 313, in encode_from_tokens_scheduled
    pooled_dict = self.encode_from_tokens(tokens, return_pooled=return_pooled, return_dict=True)
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\comfy\sd.py", line 377, in encode_from_tokens
    o = self.cond_stage_model.encode_token_weights(tokens)
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\comfy\text_encoders\lt.py", line 167, in encode_token_weights
    out, pooled, extra = self.gemma3_12b.encode_token_weights(token_weight_pairs)
                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 45, in encode_token_weights
    o = self.encode(to_encode)
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 306, in encode
    return self(tokens)
  File "D:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "D:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 279, in forward
    outputs = self.transformer(None, attention_mask_model, embeds=embeds, num_tokens=num_tokens, intermediate_output=intermediate_output, final_layer_norm_intermediate=self.layer_norm_hidden_state, dtype=torch.float32, embeds_info=embeds_info)
  File "D:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "D:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\comfy\text_encoders\llama.py", line 794, in forward
    return self.model(input_ids, *args, **kwargs)
           ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "D:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\comfy\text_encoders\llama.py", line 719, in forward
    x, current_kv = layer(
                    ~~~~~^
        x=x,
        ^^^^
    ...<3 lines>...
        past_key_value=past_kv,
        ^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "D:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "D:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\comfy\text_encoders\llama.py", line 605, in forward
    x, present_key_value = self.self_attn(
                           ~~~~~~~~~~~~~~^
        hidden_states=x,
        ^^^^^^^^^^^^^^^^
    ...<4 lines>...
        sliding_window=sliding_window,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "D:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "D:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\comfy\text_encoders\llama.py", line 466, in forward
    xq = self.q_proj(hidden_states)
  File "D:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "D:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 373, in forward
    return self.forward_comfy_cast_weights(*args, **kwargs)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 365, in forward_comfy_cast_weights
    weight, bias, offload_stream = cast_bias_weight(self, input, offloadable=True)
                                   ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 228, in cast_bias_weight
    return cast_bias_weight_with_vbar(s, dtype, device, bias_dtype, non_blocking, compute_dtype, want_requant)
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 148, in cast_bias_weight_with_vbar
    comfy.model_management.cast_to_gathered(xfer_source, pin)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 1204, in cast_to_gathered
    dest_views = comfy.memory_management.interpret_gathered_like(tensors, r)
  File "D:\AI\ComfyUI_windows_portable\ComfyUI\comfy\memory_management.py", line 71, in interpret_gathered_like
    actuals[attr] = gathered[offset:offset+size].view(dtype=template.dtype).view(template.shape)
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
RuntimeError: shape '[4096, 3840]' is invalid for input of size 12902400

m8rr · 2026-03-15T09:03:15Z

This version definitely has a speed boost. However, if you're getting errors with the GGUF text encoder like me, try modifying the code as follows. Only the text encoder is operating the old way. it should serve as a good temporary workaround until the update.

nodes.py line 206~ (False->True)

def _load_gguf_clip_patcher(clip_paths, clip_type, disable_dynamic=True):
    return _load_gguf_clip(clip_paths, clip_type, disable_dynamic=disable_dynamic).patcher

def _load_gguf_clip(clip_paths, clip_type, disable_dynamic=True):

kingp0dd · 2026-03-16T23:50:44Z

This version definitely has a speed boost. However, if you're getting errors with the GGUF text encoder like me, try modifying the code as follows. Only the text encoder is operating the old way. it should serve as a good temporary workaround until the update.

nodes.py line 206~ (False->True)
def _load_gguf_clip_patcher(clip_paths, clip_type, disable_dynamic=True):
    return _load_gguf_clip(clip_paths, clip_type, disable_dynamic=disable_dynamic).patcher

def _load_gguf_clip(clip_paths, clip_type, disable_dynamic=True):

That means it's already working. How much% did it save you

m8rr · 2026-03-17T00:20:22Z

without --disable-dynamic-vram

Requested to load LTXAVTEModel_
loaded partially; 8523.00 MB usable, 556.58 MB loaded, 13574.77 MB offloaded, 7966.42 MB buffer reserved, lowvram patches: 0
Attempting to release mmap (267)
loaded partially; 8457.88 MB usable, 491.46 MB loaded, 13639.97 MB offloaded, 7966.42 MB buffer reserved, lowvram patches: 0
gguf qtypes: F32 (2672), BF16 (28), Q6_K (1744)
model weight dtype torch.bfloat16, manual cast: None
model_type FLUX
Requested to load LTXAV
Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0 patches attached.
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:29<00:00,  5.93s/it]
0 models unloaded.
Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0 patches attached.
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:35<00:00, 11.72s/it]
Requested to load AudioVAE
loaded completely; 1968.11 MB usable, 693.46 MB loaded, full load: True
Requested to load VideoVAE
0 models unloaded.
Model VideoVAE prepared for dynamic VRAM loading. 1384MB Staged. 0 patches attached.
Prompt executed in 164.28 seconds
Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0 patches attached.
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:24<00:00,  4.81s/it]
0 models unloaded.
Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0 patches attached.
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:35<00:00, 11.69s/it]
Requested to load AudioVAE
loaded completely; 1934.00 MB usable, 693.46 MB loaded, full load: True
0 models unloaded.
Model VideoVAE prepared for dynamic VRAM loading. 1384MB Staged. 0 patches attached.
Prompt executed in 90.59 seconds
Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0 patches attached.
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:23<00:00,  4.63s/it]
0 models unloaded.
Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0 patches attached.
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:35<00:00, 11.68s/it]
Requested to load AudioVAE
loaded completely; 1966.00 MB usable, 693.46 MB loaded, full load: True
0 models unloaded.
Model VideoVAE prepared for dynamic VRAM loading. 1384MB Staged. 0 patches attached.
Prompt executed in 77.65 seconds

with --disable-dynamic-vram

Requested to load LTXAVTEModel_
loaded partially; 8523.00 MB usable, 556.58 MB loaded, 13574.77 MB offloaded, 7966.42 MB buffer reserved, lowvram patches: 0
Attempting to release mmap (267)
loaded partially; 8457.88 MB usable, 491.46 MB loaded, 13639.97 MB offloaded, 7966.42 MB buffer reserved, lowvram patches: 0
gguf qtypes: F32 (2672), BF16 (28), Q6_K (1744)
model weight dtype torch.bfloat16, manual cast: None
model_type FLUX
Requested to load LTXAV
loaded partially; 9564.67 MB usable, 9525.25 MB loaded, 7689.63 MB offloaded, 39.42 MB buffer reserved, lowvram patches: 0
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:30<00:00,  6.09s/it]
Unloaded partially: 620.48 MB freed, 8904.77 MB remains loaded, 39.42 MB buffer reserved, lowvram patches: 0
0 models unloaded.
Unloaded partially: 1287.84 MB freed, 7616.93 MB remains loaded, 39.47 MB buffer reserved, lowvram patches: 0
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:39<00:00, 13.33s/it]
Requested to load AudioVAE
loaded completely; 2233.75 MB usable, 693.46 MB loaded, full load: True
Requested to load VideoVAE
0 models unloaded.
loaded partially; 0.00 MB usable, 0.00 MB loaded, 1384.94 MB offloaded, 378.02 MB buffer reserved, lowvram patches: 0
Prompt executed in 193.22 seconds
Requested to load LTXAV
loaded partially; 9560.67 MB usable, 9521.25 MB loaded, 7693.63 MB offloaded, 39.42 MB buffer reserved, lowvram patches: 0
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:32<00:00,  6.43s/it]
Unloaded partially: 616.48 MB freed, 8904.77 MB remains loaded, 39.42 MB buffer reserved, lowvram patches: 0
0 models unloaded.
Unloaded partially: 1301.00 MB freed, 7603.77 MB remains loaded, 39.47 MB buffer reserved, lowvram patches: 0
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:42<00:00, 14.15s/it]
Requested to load AudioVAE
loaded completely; 2244.90 MB usable, 693.46 MB loaded, full load: True
Requested to load VideoVAE
0 models unloaded.
loaded partially; 0.00 MB usable, 0.00 MB loaded, 1384.94 MB offloaded, 378.02 MB buffer reserved, lowvram patches: 0
Prompt executed in 97.68 seconds
Requested to load LTXAV
loaded partially; 9560.67 MB usable, 9521.25 MB loaded, 7693.63 MB offloaded, 39.42 MB buffer reserved, lowvram patches: 0
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:31<00:00,  6.25s/it]
Unloaded partially: 616.48 MB freed, 8904.77 MB remains loaded, 39.42 MB buffer reserved, lowvram patches: 0
0 models unloaded.
Unloaded partially: 1301.00 MB freed, 7603.77 MB remains loaded, 39.47 MB buffer reserved, lowvram patches: 0
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:41<00:00, 13.87s/it]
Requested to load AudioVAE
loaded completely; 2244.90 MB usable, 693.46 MB loaded, full load: True
Requested to load VideoVAE
0 models unloaded.
loaded partially; 0.00 MB usable, 0.00 MB loaded, 1384.94 MB offloaded, 378.02 MB buffer reserved, lowvram patches: 0
Prompt executed in 95.82 seconds

kingp0dd · 2026-03-17T11:58:37Z

are there other cli flags needed to enable it? im on v16.4, my startup logs have: DynamicVRAM support detected and enabled but when model is loaded, i don't get the same as yours: Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0 patches attached. I'm using GGUF Wan2.2

…

On Tue, Mar 17, 2026 at 8:20 AM m8rr ***@***.***> wrote: *m8rr* left a comment (city96/ComfyUI-GGUF#427) <#427 (comment)> without --disable-dynamic-vram Requested to load LTXAVTEModel_ loaded partially; 8523.00 MB usable, 556.58 MB loaded, 13574.77 MB offloaded, 7966.42 MB buffer reserved, lowvram patches: 0 Attempting to release mmap (267) loaded partially; 8457.88 MB usable, 491.46 MB loaded, 13639.97 MB offloaded, 7966.42 MB buffer reserved, lowvram patches: 0 gguf qtypes: F32 (2672), BF16 (28), Q6_K (1744) model weight dtype torch.bfloat16, manual cast: None model_type FLUX Requested to load LTXAV Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0 patches attached. 100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:29<00:00, 5.93s/it] 0 models unloaded. Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0 patches attached. 100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:35<00:00, 11.72s/it] Requested to load AudioVAE loaded completely; 1968.11 MB usable, 693.46 MB loaded, full load: True Requested to load VideoVAE 0 models unloaded. Model VideoVAE prepared for dynamic VRAM loading. 1384MB Staged. 0 patches attached. Prompt executed in 164.28 seconds Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0 patches attached. 100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:24<00:00, 4.81s/it] 0 models unloaded. Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0 patches attached. 100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:35<00:00, 11.69s/it] Requested to load AudioVAE loaded completely; 1934.00 MB usable, 693.46 MB loaded, full load: True 0 models unloaded. Model VideoVAE prepared for dynamic VRAM loading. 1384MB Staged. 0 patches attached. Prompt executed in 90.59 seconds Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0 patches attached. 100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:23<00:00, 4.63s/it] 0 models unloaded. Model LTXAV prepared for dynamic VRAM loading. 16918MB Staged. 0 patches attached. 100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:35<00:00, 11.68s/it] Requested to load AudioVAE loaded completely; 1966.00 MB usable, 693.46 MB loaded, full load: True 0 models unloaded. Model VideoVAE prepared for dynamic VRAM loading. 1384MB Staged. 0 patches attached. Prompt executed in 77.65 seconds with --disable-dynamic-vram Requested to load LTXAVTEModel_ loaded partially; 8523.00 MB usable, 556.58 MB loaded, 13574.77 MB offloaded, 7966.42 MB buffer reserved, lowvram patches: 0 Attempting to release mmap (267) loaded partially; 8457.88 MB usable, 491.46 MB loaded, 13639.97 MB offloaded, 7966.42 MB buffer reserved, lowvram patches: 0 gguf qtypes: F32 (2672), BF16 (28), Q6_K (1744) model weight dtype torch.bfloat16, manual cast: None model_type FLUX Requested to load LTXAV loaded partially; 9564.67 MB usable, 9525.25 MB loaded, 7689.63 MB offloaded, 39.42 MB buffer reserved, lowvram patches: 0 100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:30<00:00, 6.09s/it] Unloaded partially: 620.48 MB freed, 8904.77 MB remains loaded, 39.42 MB buffer reserved, lowvram patches: 0 0 models unloaded. Unloaded partially: 1287.84 MB freed, 7616.93 MB remains loaded, 39.47 MB buffer reserved, lowvram patches: 0 100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:39<00:00, 13.33s/it] Requested to load AudioVAE loaded completely; 2233.75 MB usable, 693.46 MB loaded, full load: True Requested to load VideoVAE 0 models unloaded. loaded partially; 0.00 MB usable, 0.00 MB loaded, 1384.94 MB offloaded, 378.02 MB buffer reserved, lowvram patches: 0 Prompt executed in 193.22 seconds Requested to load LTXAV loaded partially; 9560.67 MB usable, 9521.25 MB loaded, 7693.63 MB offloaded, 39.42 MB buffer reserved, lowvram patches: 0 100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:32<00:00, 6.43s/it] Unloaded partially: 616.48 MB freed, 8904.77 MB remains loaded, 39.42 MB buffer reserved, lowvram patches: 0 0 models unloaded. Unloaded partially: 1301.00 MB freed, 7603.77 MB remains loaded, 39.47 MB buffer reserved, lowvram patches: 0 100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:42<00:00, 14.15s/it] Requested to load AudioVAE loaded completely; 2244.90 MB usable, 693.46 MB loaded, full load: True Requested to load VideoVAE 0 models unloaded. loaded partially; 0.00 MB usable, 0.00 MB loaded, 1384.94 MB offloaded, 378.02 MB buffer reserved, lowvram patches: 0 Prompt executed in 97.68 seconds Requested to load LTXAV loaded partially; 9560.67 MB usable, 9521.25 MB loaded, 7693.63 MB offloaded, 39.42 MB buffer reserved, lowvram patches: 0 100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:31<00:00, 6.25s/it] Unloaded partially: 616.48 MB freed, 8904.77 MB remains loaded, 39.42 MB buffer reserved, lowvram patches: 0 0 models unloaded. Unloaded partially: 1301.00 MB freed, 7603.77 MB remains loaded, 39.47 MB buffer reserved, lowvram patches: 0 100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:41<00:00, 13.87s/it] Requested to load AudioVAE loaded completely; 2244.90 MB usable, 693.46 MB loaded, full load: True Requested to load VideoVAE 0 models unloaded. loaded partially; 0.00 MB usable, 0.00 MB loaded, 1384.94 MB offloaded, 378.02 MB buffer reserved, lowvram patches: 0 Prompt executed in 95.82 seconds — Reply to this email directly, view it on GitHub <#427 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACGD6KSN4OMENNNQ5OMLYXL4RCK5ZAVCNFSM6AAAAACWIURSQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DANZRGQ4DEMJTHA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

m8rr · 2026-03-17T13:08:07Z

Check if the quant_ops.py file exists inside the ComfyUI-GGUF folder. If it’s not there, the installation wasn't done correctly.

I installed like this.
git clone -b dynamic-vram https://github.com/rattus128/ComfyUI-GGUF

rattus128 added 6 commits March 5, 2026 21:28

quant_ops: Implement GGUF as a QT

5cea0fa

Vibe code. To be reviewed.

loader changes for Dynamic VRAM

1ef0118

If in dynamic mode, load GGUF as a QT.

nodes: factor out clone() core of GGUFModelPatcher

5e905a1

Factor this out to a helper and implement the new core reconstruction protocol. Consider the mmap_released flag 1:1 with the underlying model such that it moves with the base model in model_override.

GGUFModelPatcherDynamic

4c76351

m8rr mentioned this pull request Mar 15, 2026

My entire Comfy UI is broken since I updated to 0.17. Nothing works Comfy-Org/ComfyUI#12949

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic VRAM support#427

Dynamic VRAM support#427
rattus128 wants to merge 6 commits intocity96:mainfrom
rattus128:dynamic-vram

rattus128 commented Mar 5, 2026 •

edited

Loading

Uh oh!

m8rr commented Mar 6, 2026 •

edited

Loading

Uh oh!

m8rr commented Mar 15, 2026 •

edited

Loading

Uh oh!

kingp0dd commented Mar 16, 2026

Uh oh!

m8rr commented Mar 17, 2026

Uh oh!

kingp0dd commented Mar 17, 2026 via email

Uh oh!

m8rr commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rattus128 commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m8rr commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m8rr commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kingp0dd commented Mar 16, 2026

Uh oh!

m8rr commented Mar 17, 2026

Uh oh!

kingp0dd commented Mar 17, 2026 via email

Uh oh!

m8rr commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rattus128 commented Mar 5, 2026 •

edited

Loading

m8rr commented Mar 6, 2026 •

edited

Loading

m8rr commented Mar 15, 2026 •

edited

Loading