Skip to content

Canary-v2 streamatt speech processor #28

@azziko

Description

@azziko

Hello,

I implemented a support for decoding_input_ids and tested it to see if it works. Although it's not merged yet, I'm working towards contributing canary with streamatt to the repo.

I have a question about the implementation of audio history. I see from the base_streamatt implementation that the audio is supposed to be stored in mel-features. Although it is possible to extract the mel-features first in NeMo framework, it's much easier to work with raw waveform history. I was thinking of some options how to tweak the implementation so that the raw history update is supported:

  1. In self.audio_subsampling_factor put subsampling_factor * MEL_HOP_SAMPLES. This maps one encoder frame -> raw form. I was worried this might mess up the semantics of the code a bit.
  2. The second option would be to override _update_audio_history completely, but then there is repeating code.

How would you see this implemented best?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions