"ChatterBoxVoiceVC: The size of tensor a (2861) must match the size of tensor b (2048) at non-singleton dimension 1"

When using Voice Converter, I get that error. I tried using same audio file (1m55s long), for both source and target and it still gave same error. 
I'm posting the entire console output,

```E:\DeepFake\ComfyUI_windows_portable>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build
Checkpoint files will always be loaded safely.
Total VRAM 8188 MB, total RAM 31969 MB
pytorch version: 2.6.0+cu126
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4070 Laptop GPU : cudaMallocAsync
Using pytorch attention
Python version: 3.11.8 (tags/v3.11.8:db85d51, Feb  6 2024, 22:03:32) [MSC v.1937 64 bit (AMD64)]
ComfyUI version: 0.3.39
ComfyUI frontend version: 1.21.6
[Prompt Server] web root: E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\comfyui_frontend_package\static
E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\kornia\feature\lightglue.py:44: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
🔍 Attempting to import ChatterBox modules...
📁 Node directory: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox
📁 Looking for bundled ChatterBox at: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox\chatterbox
📁 Looking for bundled models at: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox\models\chatterbox
✅ Using BUNDLED ChatterBox from node folder
🔍 Defining ChatterboxTTSNode class with enhanced chunking...
✅ ChatterboxTTSNode class defined
🔍 Defining ChatterboxVCNode class...
✅ ChatterboxVCNode class defined

============================================================
🎉 CHATTERBOX VOICE NODES LOADED SUCCESSFULLY!
============================================================
📦 Using BUNDLED ChatterBox (self-contained)
📁 Node directory: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox
📁 Bundled ChatterBox: True
📁 Bundled models: False
============================================================

✅ Audio Recorder with Volume Control loaded!
🎯 Loading ChatterBox Voice Extension...
✅ ChatterBox TTS package found!
✅ SoundDevice available - Audio recording enabled!
🚀 ChatterBox Voice Extension v1.1.0 loaded with 3 nodes:
   • 🎤 ChatterBox Voice TTS
   • 🔄 ChatterBox Voice Conversion
   • 🎙️ ChatterBox Voice Capture

Import times for custom nodes:
   0.0 seconds: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\websocket_image_save.py
   0.0 seconds: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_Fill-ChatterBox
   2.0 seconds: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox

Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
Loading ChatterboxVC model on cuda...
📁 Found ComfyUI models at: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\models\TTS\chatterbox
📁 Loading VC from ComfyUI models: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\models\TTS\chatterbox
E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\models\lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
input frame rate=25
loaded PerthNet (Implicit) at step 250,000
✅ ChatterboxVC model loaded from comfyui!
!!! Exception during processing !!! The size of tensor a (2861) must match the size of tensor b (2048) at non-singleton dimension 1
Traceback (most recent call last):
  File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\execution.py", line 349, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\execution.py", line 224, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\execution.py", line 196, in _map_node_over_list
    process_inputs(input_dict, i)
  File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\execution.py", line 185, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox\nodes.py", line 598, in convert_voice
    raise e
  File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox\nodes.py", line 576, in convert_voice
    wav = self.model.generate(
          ^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox\chatterbox\vc.py", line 81, in generate
    s3_tokens, _ = self.s3gen.tokenizer(audio_16)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox\chatterbox\models\s3tokenizer\s3tokenizer.py", line 122, in forward
    speech_tokens, speech_token_lens = tokenizer.quantize(mels, mel_lens.to(self.device))
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\s3tokenizer\model_v2.py", line 387, in quantize
    hidden, code_len = self.encoder(mel, mel_len)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\s3tokenizer\model_v2.py", line 349, in forward
    x = block(x, mask.unsqueeze(1), mask_pad, freqs_cis[:x.size(1)])
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\s3tokenizer\model_v2.py", line 282, in forward
    x = x + self.attn(
            ^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\s3tokenizer\model_v2.py", line 247, in forward
    wv, qk, fsm_memory = self.qkv_attention(q, k, v, mask, mask_pad,
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\s3tokenizer\model_v2.py", line 205, in qkv_attention
    q, k = apply_rotary_emb(q, k, freqs_cis=freqs_cis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\s3tokenizer\model_v2.py", line 70, in apply_rotary_emb
    return xq * cos + xq_r * sin, xk * cos + xk_r * sin
           ~~~^~~~~
RuntimeError: The size of tensor a (2861) must match the size of tensor b (2048) at non-singleton dimension 1

Prompt executed in 7.88 seconds```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"ChatterBoxVoiceVC: The size of tensor a (2861) must match the size of tensor b (2048) at non-singleton dimension 1" #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

"ChatterBoxVoiceVC: The size of tensor a (2861) must match the size of tensor b (2048) at non-singleton dimension 1" #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions