Skip to content

Questions about model inference #6

@gdpgxy

Description

@gdpgxy

Hello! This is very impressive work!

I have a few questions regarding the model's bitrate:

1. If I want to achieve a bitrate of 0.125kbps, would it be sufficient to configure the decoder to only use the first two layers of the RVQ, like this:

dec_rvq2 = model.decode(enc.audio_codes[:2], return_dict=True)
wav_rvq2 = dec_rvq2.audio.squeeze(0)
torchaudio.save("demo/demo_rec_rvq2.wav", wav_rvq2, sample_rate=model.sampling_rate)

2. For streaming inference, what is the minimum processing latency required for the model? Specifically, for frame-by-frame inference, how many past frames need to be considered? This is similar to feeding the current frame as input while the model is still processing previous frames—the inference for the current frame cannot be completed until the results of the preceding frames are available. This can be understood as a fixed algorithmic latency inherent to the model architecture, independent of the inference platform.

Looking forward to your response!
Best regards!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions