Came across https://github.com/SesameAILabs/whisperX
And was wondering if we can add:
Word-level timestamps and Multispeaker ASR using speaker diarization from pyannote-audio (speaker ID labels)
Currently we talk to the agents directly like ptt , but soon I imagine them being active in the background, like a rolling window, where the last x minutes are passively being processed, and could be used if I call the ai retro-actively, where the agent only responds if I invoked it's babe or implicitly called it addressed it. (with sliding window, if not interacting nothing is saved)
And also I might be in a conversations with someone in the presence of the ai , and both of us could be asking it questions. If you can detect me and my friend as separate, you find give him guest permissions and give me master/root permissions
What do you think - Detecting user by voice id + passive background listening
I think it could be very useful
Thanks a lot and a good one!
Came across https://github.com/SesameAILabs/whisperX
And was wondering if we can add:
Word-level timestamps and Multispeaker ASR using speaker diarization from pyannote-audio (speaker ID labels)
Currently we talk to the agents directly like ptt , but soon I imagine them being active in the background, like a rolling window, where the last x minutes are passively being processed, and could be used if I call the ai retro-actively, where the agent only responds if I invoked it's babe or implicitly called it addressed it. (with sliding window, if not interacting nothing is saved)
And also I might be in a conversations with someone in the presence of the ai , and both of us could be asking it questions. If you can detect me and my friend as separate, you find give him guest permissions and give me master/root permissions
What do you think - Detecting user by voice id + passive background listening
I think it could be very useful
Thanks a lot and a good one!