When using Voice Converter, I get that error. I tried using same audio file (1m55s long), for both source and target and it still gave same error.
I'm posting the entire console output,
Checkpoint files will always be loaded safely.
Total VRAM 8188 MB, total RAM 31969 MB
pytorch version: 2.6.0+cu126
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4070 Laptop GPU : cudaMallocAsync
Using pytorch attention
Python version: 3.11.8 (tags/v3.11.8:db85d51, Feb 6 2024, 22:03:32) [MSC v.1937 64 bit (AMD64)]
ComfyUI version: 0.3.39
ComfyUI frontend version: 1.21.6
[Prompt Server] web root: E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\comfyui_frontend_package\static
E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\kornia\feature\lightglue.py:44: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
@torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
🔍 Attempting to import ChatterBox modules...
📁 Node directory: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox
📁 Looking for bundled ChatterBox at: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox\chatterbox
📁 Looking for bundled models at: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox\models\chatterbox
✅ Using BUNDLED ChatterBox from node folder
🔍 Defining ChatterboxTTSNode class with enhanced chunking...
✅ ChatterboxTTSNode class defined
🔍 Defining ChatterboxVCNode class...
✅ ChatterboxVCNode class defined
============================================================
🎉 CHATTERBOX VOICE NODES LOADED SUCCESSFULLY!
============================================================
📦 Using BUNDLED ChatterBox (self-contained)
📁 Node directory: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox
📁 Bundled ChatterBox: True
📁 Bundled models: False
============================================================
✅ Audio Recorder with Volume Control loaded!
🎯 Loading ChatterBox Voice Extension...
✅ ChatterBox TTS package found!
✅ SoundDevice available - Audio recording enabled!
🚀 ChatterBox Voice Extension v1.1.0 loaded with 3 nodes:
• 🎤 ChatterBox Voice TTS
• 🔄 ChatterBox Voice Conversion
• 🎙️ ChatterBox Voice Capture
Import times for custom nodes:
0.0 seconds: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\websocket_image_save.py
0.0 seconds: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_Fill-ChatterBox
2.0 seconds: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox
Starting server
To see the GUI go to: http://127.0.0.1:8188
got prompt
Loading ChatterboxVC model on cuda...
📁 Found ComfyUI models at: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\models\TTS\chatterbox
📁 Loading VC from ComfyUI models: E:\DeepFake\ComfyUI_windows_portable\ComfyUI\models\TTS\chatterbox
E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\models\lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
input frame rate=25
loaded PerthNet (Implicit) at step 250,000
✅ ChatterboxVC model loaded from comfyui!
!!! Exception during processing !!! The size of tensor a (2861) must match the size of tensor b (2048) at non-singleton dimension 1
Traceback (most recent call last):
File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\execution.py", line 349, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\execution.py", line 224, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\execution.py", line 196, in _map_node_over_list
process_inputs(input_dict, i)
File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\execution.py", line 185, in process_inputs
results.append(getattr(obj, func)(**inputs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox\nodes.py", line 598, in convert_voice
raise e
File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox\nodes.py", line 576, in convert_voice
wav = self.model.generate(
^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox\chatterbox\vc.py", line 81, in generate
s3_tokens, _ = self.s3gen.tokenizer(audio_16)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ChatterBox\chatterbox\models\s3tokenizer\s3tokenizer.py", line 122, in forward
speech_tokens, speech_token_lens = tokenizer.quantize(mels, mel_lens.to(self.device))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\s3tokenizer\model_v2.py", line 387, in quantize
hidden, code_len = self.encoder(mel, mel_len)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\s3tokenizer\model_v2.py", line 349, in forward
x = block(x, mask.unsqueeze(1), mask_pad, freqs_cis[:x.size(1)])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\s3tokenizer\model_v2.py", line 282, in forward
x = x + self.attn(
^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\s3tokenizer\model_v2.py", line 247, in forward
wv, qk, fsm_memory = self.qkv_attention(q, k, v, mask, mask_pad,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\s3tokenizer\model_v2.py", line 205, in qkv_attention
q, k = apply_rotary_emb(q, k, freqs_cis=freqs_cis)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\DeepFake\ComfyUI_windows_portable\python_embeded\Lib\site-packages\s3tokenizer\model_v2.py", line 70, in apply_rotary_emb
return xq * cos + xq_r * sin, xk * cos + xk_r * sin
~~~^~~~~
RuntimeError: The size of tensor a (2861) must match the size of tensor b (2048) at non-singleton dimension 1
Prompt executed in 7.88 seconds```
When using Voice Converter, I get that error. I tried using same audio file (1m55s long), for both source and target and it still gave same error.
I'm posting the entire console output,