Skip to content

"ChatterBoxVoiceVC: The size of tensor a (2861) must match the size of tensor b (2048) at non-singleton dimension 1" #4

@rikuddo91

Description

@rikuddo91

When using Voice Converter, I get that error. I tried using same audio file (1m55s long), for both source and target and it still gave same error.
I'm posting the entire console output,

Checkpoint files will always be loaded safely.
Total VRAM 8188 MB, total RAM 31969 MB
pytorch version: 2.6.0+cu126
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4070 Laptop GPU : cudaMallocAsync
Using pytorch attention
Python version: 3.11.8 (tags/v3.11.8:db85d51, Feb  6 2024, 22:03:32) [MSC v.1937 64 bit (AMD64)]
ComfyUI version: 0.3.39
ComfyUI frontend version: 1.21.6
[Prompt Server] web root: E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\comfyui_frontend_package\static
E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\kornia\feature\lightglue.py:44: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
🔍 Attempting to import ChatterBox modules...
📁 Node directory: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox
📁 Looking for bundled ChatterBox at: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox\chatterbox
📁 Looking for bundled models at: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox\models\chatterbox
✅ Using BUNDLED ChatterBox from node folder
🔍 Defining ChatterboxTTSNode class with enhanced chunking...
✅ ChatterboxTTSNode class defined
🔍 Defining ChatterboxVCNode class...
✅ ChatterboxVCNode class defined

============================================================
🎉 CHATTERBOX VOICE NODES LOADED SUCCESSFULLY!
============================================================
📦 Using BUNDLED ChatterBox (self-contained)
📁 Node directory: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox
📁 Bundled ChatterBox: True
📁 Bundled models: False
============================================================

✅ Audio Recorder with Volume Control loaded!
🎯 Loading ChatterBox Voice Extension...
✅ ChatterBox TTS package found!
✅ SoundDevice available - Audio recording enabled!
🚀 ChatterBox Voice Extension v1.1.0 loaded with 3 nodes:
   • 🎤 ChatterBox Voice TTS
   • 🔄 ChatterBox Voice Conversion
   • 🎙️ ChatterBox Voice Capture

Import times for custom nodes:
   0.0 seconds: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\websocket_image_save.py
   0.0 seconds: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_Fill-ChatterBox
   2.0 seconds: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox

Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
Loading ChatterboxVC model on cuda...
📁 Found ComfyUI models at: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\models\TTS\chatterbox
📁 Loading VC from ComfyUI models: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\models\TTS\chatterbox
E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\models\lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
input frame rate=25
loaded PerthNet (Implicit) at step 250,000
✅ ChatterboxVC model loaded from comfyui!
!!! Exception during processing !!! The size of tensor a (2861) must match the size of tensor b (2048) at non-singleton dimension 1
Traceback (most recent call last):
  File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\execution.py", line 349, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\execution.py", line 224, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\execution.py", line 196, in _map_node_over_list
    process_inputs(input_dict, i)
  File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\execution.py", line 185, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox\nodes.py", line 598, in convert_voice
    raise e
  File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox\nodes.py", line 576, in convert_voice
    wav = self.model.generate(
          ^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox\chatterbox\vc.py", line 81, in generate
    s3_tokens, _ = self.s3gen.tokenizer(audio_16)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox\chatterbox\models\s3tokenizer\s3tokenizer.py", line 122, in forward
    speech_tokens, speech_token_lens = tokenizer.quantize(mels, mel_lens.to(self.device))
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\s3tokenizer\model_v2.py", line 387, in quantize
    hidden, code_len = self.encoder(mel, mel_len)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\s3tokenizer\model_v2.py", line 349, in forward
    x = block(x, mask.unsqueeze(1), mask_pad, freqs_cis[:x.size(1)])
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\s3tokenizer\model_v2.py", line 282, in forward
    x = x + self.attn(
            ^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\s3tokenizer\model_v2.py", line 247, in forward
    wv, qk, fsm_memory = self.qkv_attention(q, k, v, mask, mask_pad,
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\s3tokenizer\model_v2.py", line 205, in qkv_attention
    q, k = apply_rotary_emb(q, k, freqs_cis=freqs_cis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\s3tokenizer\model_v2.py", line 70, in apply_rotary_emb
    return xq * cos + xq_r * sin, xk * cos + xk_r * sin
           ~~~^~~~~
RuntimeError: The size of tensor a (2861) must match the size of tensor b (2048) at non-singleton dimension 1

Prompt executed in 7.88 seconds```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions