Questions about model inference

Hello! This is very impressive work! 

I have a few questions regarding the model's bitrate: 

**1. If I want to achieve a bitrate of 0.125kbps, would it be sufficient to configure the decoder to only use the first two layers of the RVQ, like this:**
```
dec_rvq2 = model.decode(enc.audio_codes[:2], return_dict=True)
wav_rvq2 = dec_rvq2.audio.squeeze(0)
torchaudio.save("demo/demo_rec_rvq2.wav", wav_rvq2, sample_rate=model.sampling_rate)
```
**2. For streaming inference, what is the minimum processing latency required for the model? Specifically, for frame-by-frame inference, how many past frames need to be considered? This is similar to feeding the current frame as input while the model is still processing previous frames—the inference for the current frame cannot be completed until the results of the preceding frames are available. This can be understood as a fixed algorithmic latency inherent to the model architecture, independent of the inference platform.** 

Looking forward to your response! 
Best regards!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about model inference #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions about model inference #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions