Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
LIVEKIT_URL=
LIVEKIT_API_KEY=
LIVEKIT_API_SECRET=

OPENAI_API_KEY=
DEEPGRAM_API_KEY=
CARTESIA_API_KEY=
4 changes: 3 additions & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,7 @@ jobs:

- name: Run tests
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
LIVEKIT_URL: ${{ secrets.LIVEKIT_URL }}
LIVEKIT_API_KEY: ${{ secrets.LIVEKIT_API_KEY }}
LIVEKIT_API_SECRET: ${{ secrets.LIVEKIT_API_SECRET }}
run: uv run pytest -v
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ KMS
.vscode
*.egg-info
.pytest_cache
.ruff_cache
.ruff_cache
27 changes: 15 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,18 @@

# LiveKit Agents Starter - Python

A complete starter project for building voice AI apps with [LiveKit Agents for Python](https://github.com/livekit/agents).
A complete starter project for building voice AI apps with [LiveKit Agents for Python](https://github.com/livekit/agents) and [LiveKit Cloud](https://cloud.livekit.io/).

The starter project includes:

- A simple voice AI assistant based on the [Voice AI quickstart](https://docs.livekit.io/agents/start/voice-ai/)
- Voice AI pipeline based on [OpenAI](https://docs.livekit.io/agents/integrations/llm/openai/), [Cartesia](https://docs.livekit.io/agents/integrations/tts/cartesia/), and [Deepgram](https://docs.livekit.io/agents/integrations/llm/deepgram/)
- Easily integrate your preferred [LLM](https://docs.livekit.io/agents/integrations/llm/), [STT](https://docs.livekit.io/agents/integrations/stt/), and [TTS](https://docs.livekit.io/agents/integrations/tts/) instead, or swap to a realtime model like the [OpenAI Realtime API](https://docs.livekit.io/agents/integrations/realtime/openai)
- A simple voice AI assistant, ready for extension and customization
- A voice AI pipeline with [models](https://docs.livekit.io/agents/models) from OpenAI, Cartesia, and AssemblyAI served through LiveKit Cloud
- Easily integrate your preferred [LLM](https://docs.livekit.io/agents/models/llm/), [STT](https://docs.livekit.io/agents/models/stt/), and [TTS](https://docs.livekit.io/agents/models/tts/) instead, or swap to a realtime model like the [OpenAI Realtime API](https://docs.livekit.io/agents/models/realtime/openai)
- Eval suite based on the LiveKit Agents [testing & evaluation framework](https://docs.livekit.io/agents/build/testing/)
- [LiveKit Turn Detector](https://docs.livekit.io/agents/build/turns/turn-detector/) for contextually-aware speaker detection, with multilingual support
- [LiveKit Cloud enhanced noise cancellation](https://docs.livekit.io/home/cloud/noise-cancellation/)
- [Background voice cancellation](https://docs.livekit.io/home/cloud/noise-cancellation/)
- Integrated [metrics and logging](https://docs.livekit.io/agents/build/metrics/)
- A Dockerfile ready for [production deployment](https://docs.livekit.io/agents/ops/deployment/)

This starter app is compatible with any [custom web/mobile frontend](https://docs.livekit.io/agents/start/frontend/) or [SIP-based telephony](https://docs.livekit.io/agents/start/telephony/).

Expand All @@ -27,19 +28,17 @@ cd agent-starter-python
uv sync
```

Set up the environment by copying `.env.example` to `.env.local` and filling in the required values:
Sign up for [LiveKit Cloud](https://cloud.livekit.io/) then set up the environment by copying `.env.example` to `.env.local` and filling in the required keys:

- `LIVEKIT_URL`: Use [LiveKit Cloud](https://cloud.livekit.io/) or [run your own](https://docs.livekit.io/home/self-hosting/)
- `LIVEKIT_URL`
- `LIVEKIT_API_KEY`
- `LIVEKIT_API_SECRET`
- `OPENAI_API_KEY`: [Get a key](https://platform.openai.com/api-keys) or use your [preferred LLM provider](https://docs.livekit.io/agents/integrations/llm/)
- `DEEPGRAM_API_KEY`: [Get a key](https://console.deepgram.com/) or use your [preferred STT provider](https://docs.livekit.io/agents/integrations/stt/)
- `CARTESIA_API_KEY`: [Get a key](https://play.cartesia.ai/keys) or use your [preferred TTS provider](https://docs.livekit.io/agents/integrations/tts/)

You can load the LiveKit environment automatically using the [LiveKit CLI](https://docs.livekit.io/home/cli/cli-setup):

```bash
lk app env -w .env.local
lk cloud auth
lk app env -w -d .env.local

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my preference, but I like to add the long form so it's more obvious what each does:

lk app env --write --destination .env.local

```

## Run the agent
Expand Down Expand Up @@ -100,12 +99,16 @@ Once you've started your own project based on this repo, you should:

2. **Remove the git tracking test**: Delete the "Check files not tracked in git" step from `.github/workflows/tests.yml` since you'll now want this file to be tracked. These are just there for development purposes in the template repo itself.

3. **Add your own repository secrets**: You must [add secrets](https://docs.github.com/en/actions/how-tos/writing-workflows/choosing-what-your-workflow-does/using-secrets-in-github-actions) for `OPENAI_API_KEY` or your other LLM provider so that the tests can run in CI.
3. **Add your own repository secrets**: You must [add secrets](https://docs.github.com/en/actions/how-tos/writing-workflows/choosing-what-your-workflow-does/using-secrets-in-github-actions) for `LIVEKIT_URL`, `LIVEKIT_API_KEY`, and `LIVEKIT_API_SECRET` so that the tests can run in CI.

## Deploying to production

This project is production-ready and includes a working `Dockerfile`. To deploy it to LiveKit Cloud or another environment, see the [deploying to production](https://docs.livekit.io/agents/ops/deployment/) guide.

## Self-hosted LiveKit

You can also self-host LiveKit instead of using LiveKit Cloud. See the [self-hosting](https://docs.livekit.io/home/self-hosting/) guide for more information. If you choose to self-host, you'll need to also use [model plugins](https://docs.livekit.io/agents/models/#plugins) instead of LiveKit Inference and will need to remove the [LiveKit Cloud noise cancellation](https://docs.livekit.io/home/cloud/noise-cancellation/) plugin.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ description = "Simple voice AI assistant built with LiveKit Agents for Python"
requires-python = ">=3.9"

dependencies = [
"livekit-agents[openai,turn-detector,silero,cartesia,deepgram]~=1.2",
"livekit-agents[silero,turn-detector]~=1.2",
"livekit-plugins-noise-cancellation~=0.2",
"python-dotenv",
]
Expand Down
78 changes: 35 additions & 43 deletions src/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,17 @@

from dotenv import load_dotenv
from livekit.agents import (
NOT_GIVEN,
Agent,
AgentFalseInterruptionEvent,
AgentSession,
JobContext,
JobProcess,
MetricsCollectedEvent,
RoomInputOptions,
RunContext,
WorkerOptions,
cli,
metrics,
)
from livekit.agents.llm import function_tool
from livekit.plugins import cartesia, deepgram, noise_cancellation, openai, silero
from livekit.plugins import noise_cancellation, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel

logger = logging.getLogger("agent")
Expand All @@ -27,27 +23,28 @@
class Assistant(Agent):
def __init__(self) -> None:
super().__init__(
instructions="""You are a helpful voice AI assistant.
instructions="""You are a helpful voice AI assistant. The user is interacting with you via voice, even if you perceive the conversation as text.
You eagerly assist users with their questions by providing information from your extensive knowledge.
Your responses are concise, to the point, and without any complex formatting or punctuation including emojis, asterisks, or other symbols.
You are curious, friendly, and have a sense of humor.""",
)

# all functions annotated with @function_tool will be passed to the LLM when this
# agent is active
@function_tool
async def lookup_weather(self, context: RunContext, location: str):
"""Use this tool to look up current weather information in the given location.

If the location is not supported by the weather service, the tool will indicate this. You must tell the user the location's weather is unavailable.

Args:
location: The location to look up weather information for (e.g. city name)
"""

logger.info(f"Looking up weather for {location}")

return "sunny with a temperature of 70 degrees."
# To add tools, use the @function_tool decorator.
# Here's an example that adds a simple weather tool.
# You also have to add `from livekit.agents.llm import function_tool, RunContext` to the top of this file
# @function_tool
# async def lookup_weather(self, context: RunContext, location: str):
# """Use this tool to look up current weather information in the given location.
#
# If the location is not supported by the weather service, the tool will indicate this. You must tell the user the location's weather is unavailable.
#
# Args:
# location: The location to look up weather information for (e.g. city name)
# """
#
# logger.info(f"Looking up weather for {location}")
#
# return "sunny with a temperature of 70 degrees."
Comment on lines +32 to +47

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be extra clear here to link to tool call docs page and to state that the return value is passed to LLM node?



def prewarm(proc: JobProcess):
Expand All @@ -61,17 +58,17 @@ async def entrypoint(ctx: JobContext):
"room": ctx.room.name,
}

# Set up a voice AI pipeline using OpenAI, Cartesia, Deepgram, and the LiveKit turn detector
# Set up a voice AI pipeline using OpenAI, Cartesia, AssemblyAI, and the LiveKit turn detector
session = AgentSession(
# A Large Language Model (LLM) is your agent's brain, processing user input and generating a response
# See all providers at https://docs.livekit.io/agents/integrations/llm/
llm=openai.LLM(model="gpt-4o-mini"),
# Speech-to-text (STT) is your agent's ears, turning the user's speech into text that the LLM can understand
# See all providers at https://docs.livekit.io/agents/integrations/stt/
stt=deepgram.STT(model="nova-3", language="multi"),
# See all available models at https://docs.livekit.io/agents/models/stt/
stt="assemblyai/universal-streaming:en",
# A Large Language Model (LLM) is your agent's brain, processing user input and generating a response
# See all available models at https://docs.livekit.io/agents/models/llm/
llm="openai/gpt-4.1-mini",
# Text-to-speech (TTS) is your agent's voice, turning the LLM's text into speech that the user can hear
# See all providers at https://docs.livekit.io/agents/integrations/tts/
tts=cartesia.TTS(voice="6f84f4b8-58a2-430c-8c79-688dad597532"),
# See all available models as well as voice selections at https://docs.livekit.io/agents/models/tts/
tts="cartesia/sonic-2:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
# VAD and turn detection are used to determine when the user is speaking and when the agent should respond
# See more at https://docs.livekit.io/agents/build/turns
turn_detection=MultilingualModel(),
Expand All @@ -81,19 +78,16 @@ async def entrypoint(ctx: JobContext):
preemptive_generation=True,
)

# To use a realtime model instead of a voice pipeline, use the following session setup instead:
# To use a realtime model instead of a voice pipeline, use the following session setup instead.
# (Note: This is for the OpenAI Realtime API. For other providers, see https://docs.livekit.io/agents/models/realtime/))
# 1. Install livekit-agents[openai]
# 2. Set OPENAI_API_KEY in .env.local
# 3. Add `from livekit.plugins import openai` to the top of this file
# 4. Use the following session setup instead of the version above
# session = AgentSession(
# # See all providers at https://docs.livekit.io/agents/integrations/realtime/
# llm=openai.realtime.RealtimeModel(voice="marin")
# )

# sometimes background noise could interrupt the agent session, these are considered false positive interruptions
# when it's detected, you may resume the agent's speech
@session.on("agent_false_interruption")
def _on_agent_false_interruption(ev: AgentFalseInterruptionEvent):
logger.info("false positive interruption, resuming")
session.generate_reply(instructions=ev.extra_instructions or NOT_GIVEN)

# Metrics collection, to measure pipeline performance
# For more information, see https://docs.livekit.io/agents/build/metrics/
usage_collector = metrics.UsageCollector()
Expand All @@ -110,9 +104,9 @@ async def log_usage():
ctx.add_shutdown_callback(log_usage)

# # Add a virtual avatar to the session, if desired
# # For other providers, see https://docs.livekit.io/agents/integrations/avatar/
# # For other providers, see https://docs.livekit.io/agents/models/avatar/
# avatar = hedra.AvatarSession(
# avatar_id="...", # See https://docs.livekit.io/agents/integrations/avatar/hedra
# avatar_id="...", # See https://docs.livekit.io/agents/models/avatar/plugins/hedra
# )
# # Start the avatar and wait for it to join
# await avatar.start(session, room=ctx.room)
Expand All @@ -122,9 +116,7 @@ async def log_usage():
agent=Assistant(),
room=ctx.room,
room_input_options=RoomInputOptions(
# LiveKit Cloud enhanced noise cancellation
# - If self-hosting, omit this parameter
# - For telephony applications, use `BVCTelephony` for best results
# For telephony applications, use `BVCTelephony` for best results
noise_cancellation=noise_cancellation.BVC(),
),
)
Expand Down
117 changes: 2 additions & 115 deletions tests/test_agent.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
import pytest
from livekit.agents import AgentSession, llm, mock_tools
from livekit.plugins import openai
from livekit.agents import AgentSession, inference, llm

from agent import Assistant


def _llm() -> llm.LLM:
return openai.LLM(model="gpt-4o-mini")
return inference.LLM(model="openai/gpt-4.1-mini")


@pytest.mark.asyncio
Expand Down Expand Up @@ -41,118 +40,6 @@ async def test_offers_assistance() -> None:
result.expect.no_more_events()


@pytest.mark.asyncio
async def test_weather_tool() -> None:
"""Unit test for the weather tool combined with an evaluation of the agent's ability to incorporate its results."""
async with (
_llm() as llm,
AgentSession(llm=llm) as session,
):
await session.start(Assistant())

# Run an agent turn following the user's request for weather information
result = await session.run(user_input="What's the weather in Tokyo?")

# Test that the agent calls the weather tool with the correct arguments
result.expect.next_event().is_function_call(
name="lookup_weather", arguments={"location": "Tokyo"}
)

# Test that the tool invocation works and returns the correct output
# To mock the tool output instead, see https://docs.livekit.io/agents/build/testing/#mock-tools
result.expect.next_event().is_function_call_output(
output="sunny with a temperature of 70 degrees."
)

# Evaluate the agent's response for accurate weather information
await (
result.expect.next_event()
.is_message(role="assistant")
.judge(
llm,
intent="""
Informs the user that the weather is sunny with a temperature of 70 degrees.

Optional context that may or may not be included (but the response must not contradict these facts)
- The location for the weather report is Tokyo
""",
)
)

# Ensures there are no function calls or other unexpected events
result.expect.no_more_events()


@pytest.mark.asyncio
async def test_weather_unavailable() -> None:
"""Evaluation of the agent's ability to handle tool errors."""
async with (
_llm() as llm,
AgentSession(llm=llm) as sess,
):
await sess.start(Assistant())

# Simulate a tool error
with mock_tools(
Assistant,
{"lookup_weather": lambda: RuntimeError("Weather service is unavailable")},
):
result = await sess.run(user_input="What's the weather in Tokyo?")
result.expect.skip_next_event_if(type="message", role="assistant")
result.expect.next_event().is_function_call(
name="lookup_weather", arguments={"location": "Tokyo"}
)
result.expect.next_event().is_function_call_output()
await result.expect.next_event(type="message").judge(
llm,
intent="""
Acknowledges that the weather request could not be fulfilled and communicates this to the user.

The response should convey that there was a problem getting the weather information, but can be expressed in various ways such as:
- Mentioning an error, service issue, or that it couldn't be retrieved
- Suggesting alternatives or asking what else they can help with
- Being apologetic or explaining the situation

The response does not need to use specific technical terms like "weather service error" or "temporary".
""",
)

# leaving this commented, some LLMs may occasionally try to retry.
# result.expect.no_more_events()


@pytest.mark.asyncio
async def test_unsupported_location() -> None:
"""Evaluation of the agent's ability to handle a weather response with an unsupported location."""
async with (
_llm() as llm,
AgentSession(llm=llm) as sess,
):
await sess.start(Assistant())

with mock_tools(Assistant, {"lookup_weather": lambda: "UNSUPPORTED_LOCATION"}):
result = await sess.run(user_input="What's the weather in Tokyo?")

# Evaluate the agent's response for an unsupported location
await result.expect.next_event(type="message").judge(
llm,
intent="""
Communicates that the weather request for the specific location could not be fulfilled.

The response should indicate that weather information is not available for the requested location, but can be expressed in various ways such as:
- Saying they can't get weather for that location
- Explaining the location isn't supported or available
- Suggesting alternatives or asking what else they can help with
- Being apologetic about the limitation

The response does not need to explicitly state "unsupported" or discourage retrying.
""",
)

# Ensures there are no function calls or other unexpected events
result.expect.no_more_events()


@pytest.mark.asyncio
async def test_grounding() -> None:
"""Evaluation of the agent's ability to refuse to answer when it doesn't know something."""
Expand Down