-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
Hello,
I implemented a support for decoding_input_ids and tested it to see if it works. Although it's not merged yet, I'm working towards contributing canary with streamatt to the repo.
I have a question about the implementation of audio history. I see from the base_streamatt implementation that the audio is supposed to be stored in mel-features. Although it is possible to extract the mel-features first in NeMo framework, it's much easier to work with raw waveform history. I was thinking of some options how to tweak the implementation so that the raw history update is supported:
- In self.audio_subsampling_factor put subsampling_factor * MEL_HOP_SAMPLES. This maps one encoder frame -> raw form. I was worried this might mess up the semantics of the code a bit.
- The second option would be to override _update_audio_history completely, but then there is repeating code.
How would you see this implemented best?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels