Skip to content

Distortion and vibration in output after tokenizer → Flow → HiFiGAN #1879

@MHussain04

Description

@MHussain04

I fine-tuned Fun-CosyVoice3-0.5B, but after training, during inference I observe significant distortion, noise, and vibration in the generated audio.

To isolate the issue, I performed the following tests:

1. HiFiGAN-only test

  • Regenerated audio directly from an input audio chunk using HiFiGAN (no tokenizer or Flow)
  • Output matches the original clean audio
  • Suggests HiFiGAN is not the source of the issue

2. Full pipeline test (tokenizer → Flow → HiFiGAN)

  • Passed clean audio samples from my dataset through the full pipeline
  • Output contains noticeable vibration and distortion, despite clean input

3. Base vs fine-tuned Flow

Tested with both:

  • Base Flow model
  • Fine-tuned Flow model
  • Both produce similar vibration artifacts

Additional observation:

  • A clicking/mouse-like sound appears at the start and end of generated audio

What I’ve tried:

  • Multiple audio normalization techniques before feeding data to the tokenizer
  • No improvement

Questions:

  • Has anyone encountered similar vibration/distortion artifacts in the tokenizer → Flow → HiFiGAN pipeline?
  • Could this be related to tokenizer encoding/decoding mismatch or preprocessing?
  • Any suggestions on debugging?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions