-
Notifications
You must be signed in to change notification settings - Fork 183
Update starter to use new turn detection model #85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,7 +12,7 @@ The starter project includes: | |
| - A voice AI pipeline built on [LiveKit Inference](https://docs.livekit.io/agents/models/inference) | ||
| with [models](https://docs.livekit.io/agents/models) from OpenAI, Cartesia, and Deepgram. More than 50 other model providers are supported, including [Realtime models](https://docs.livekit.io/agents/models/realtime) | ||
| - Eval suite based on the LiveKit Agents [testing & evaluation framework](https://docs.livekit.io/agents/start/testing/) | ||
| - [LiveKit Turn Detector](https://docs.livekit.io/agents/logic/turns/turn-detector/) for contextually-aware speaker detection, with multilingual support | ||
| - [LiveKit Turn Detector](https://docs.livekit.io/agents/logic/turns/turn-detector/), a multimodal end-of-turn model that listens to the user's audio directly, combining semantic understanding with acoustic cues for state-of-the-art accuracy across 14 languages | ||
| - [Background voice cancellation](https://docs.livekit.io/transport/media/noise-cancellation/) | ||
| - Deep session insights from LiveKit [Agent Observability](https://docs.livekit.io/deploy/observability/) | ||
| - A Dockerfile ready for [production deployment to LiveKit Cloud](https://docs.livekit.io/deploy/agents/) | ||
|
|
@@ -92,12 +92,14 @@ lk app env -w -d .env.local | |
|
|
||
| ## Run the agent | ||
|
|
||
| Before your first run, you must download certain models such as [Silero VAD](https://docs.livekit.io/agents/logic/turns/vad/) and the [LiveKit turn detector](https://docs.livekit.io/agents/logic/turns/turn-detector/): | ||
| Before your first run, download the [ai-coustics noise cancellation](https://docs.livekit.io/transport/media/noise-cancellation/) model used by the agent: | ||
|
|
||
| ```console | ||
| uv run python src/agent.py download-files | ||
| uv run --module livekit.agents download-files | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. technically, this is not needed anymore because we bundle silero and turn detector in the core package now.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I left this in bc of the noise cancellation needing it. I'll look again, but do you know offhand whether that's right? |
||
| ``` | ||
|
|
||
| The [LiveKit turn detector](https://docs.livekit.io/agents/logic/turns/turn-detector/) and the agent's voice activity detection both run on [LiveKit Inference](https://docs.livekit.io/agents/models/inference) and are built into the Agents SDK, so they don't require a separate download. | ||
|
|
||
| Next, run this command to speak to your agent directly in your terminal: | ||
|
|
||
| ```console | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,13 +7,12 @@ | |
| AgentServer, | ||
| AgentSession, | ||
| JobContext, | ||
| JobProcess, | ||
| TurnHandlingOptions, | ||
| cli, | ||
| inference, | ||
| room_io, | ||
| ) | ||
| from livekit.plugins import ai_coustics, silero | ||
| from livekit.plugins.turn_detector.multilingual import MultilingualModel | ||
| from livekit.plugins import ai_coustics | ||
|
|
||
| logger = logging.getLogger("agent") | ||
|
|
||
|
|
@@ -92,13 +91,6 @@ def __init__(self) -> None: | |
| server = AgentServer() | ||
|
|
||
|
|
||
| def prewarm(proc: JobProcess): | ||
| proc.userdata["vad"] = silero.VAD.load() | ||
|
|
||
|
|
||
| server.setup_fnc = prewarm | ||
|
|
||
|
|
||
| @server.rtc_session(agent_name="my-agent") | ||
| async def my_agent(ctx: JobContext): | ||
| # Logging setup | ||
|
|
@@ -117,10 +109,14 @@ async def my_agent(ctx: JobContext): | |
| tts=inference.TTS( | ||
| model="cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc" | ||
| ), | ||
| # VAD and turn detection are used to determine when the user is speaking and when the agent should respond | ||
| # The LiveKit turn detector determines when the user is done speaking and the agent should respond. | ||
| # AudioTurnDetector is a multimodal model that listens to the user's audio directly, combining | ||
| # semantic understanding with acoustic cues (intonation, pitch, rhythm) for state-of-the-art accuracy. | ||
| # AgentSession supplies the required VAD automatically. | ||
| # See more at https://docs.livekit.io/agents/build/turns | ||
| turn_detection=MultilingualModel(), | ||
| vad=ctx.proc.userdata["vad"], | ||
| turn_handling=TurnHandlingOptions( | ||
| turn_detection=inference.AudioTurnDetector(), | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. renamed to |
||
| ), | ||
| # allow the LLM to generate a response while waiting for the end of turn | ||
| # See more at https://docs.livekit.io/agents/build/audio/#preemptive-generation | ||
| preemptive_generation=True, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should drop the word "multimodal"