-
Notifications
You must be signed in to change notification settings - Fork 11
Questions about model inference #6
Description
Hello! This is very impressive work!
I have a few questions regarding the model's bitrate:
1. If I want to achieve a bitrate of 0.125kbps, would it be sufficient to configure the decoder to only use the first two layers of the RVQ, like this:
dec_rvq2 = model.decode(enc.audio_codes[:2], return_dict=True)
wav_rvq2 = dec_rvq2.audio.squeeze(0)
torchaudio.save("demo/demo_rec_rvq2.wav", wav_rvq2, sample_rate=model.sampling_rate)
2. For streaming inference, what is the minimum processing latency required for the model? Specifically, for frame-by-frame inference, how many past frames need to be considered? This is similar to feeding the current frame as input while the model is still processing previous frames—the inference for the current frame cannot be completed until the results of the preceding frames are available. This can be understood as a fixed algorithmic latency inherent to the model architecture, independent of the inference platform.
Looking forward to your response!
Best regards!