Skip to content

Multispeaker ASR using speaker diarization + Background Continuous Smart Listen #5

@fire17

Description

@fire17

Came across https://github.com/SesameAILabs/whisperX
And was wondering if we can add:
Word-level timestamps and Multispeaker ASR using speaker diarization from pyannote-audio (speaker ID labels)

Currently we talk to the agents directly like ptt , but soon I imagine them being active in the background, like a rolling window, where the last x minutes are passively being processed, and could be used if I call the ai retro-actively, where the agent only responds if I invoked it's babe or implicitly called it addressed it. (with sliding window, if not interacting nothing is saved)

And also I might be in a conversations with someone in the presence of the ai , and both of us could be asking it questions. If you can detect me and my friend as separate, you find give him guest permissions and give me master/root permissions

What do you think - Detecting user by voice id + passive background listening
I think it could be very useful

Thanks a lot and a good one!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions