Multispeaker ASR using speaker diarization + Background Continuous Smart Listen

Came across https://github.com/SesameAILabs/whisperX
And was wondering if we can add:
Word-level timestamps and Multispeaker ASR using speaker diarization from pyannote-audio (speaker ID labels)

Currently we talk to the agents directly like ptt , but soon I imagine them being active in the background, like a rolling window, where the last x minutes are passively being processed, and could be used if I call the ai retro-actively, where the agent only responds if I invoked it's babe or implicitly called it addressed it.  (with sliding window, if not interacting nothing is saved)

And also I might be in a conversations with someone in the presence of the ai , and both of us could be asking it questions. If you can detect me and my friend as separate, you find give him guest permissions and give me master/root permissions 

What do you think - Detecting user by voice id + passive background listening 
I think it could be very useful

Thanks a lot and a good one!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multispeaker ASR using speaker diarization + Background Continuous Smart Listen #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multispeaker ASR using speaker diarization + Background Continuous Smart Listen #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions