Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ The following examples include a template configuration or manifest file for eac
| [Cerebrium](/cerebrium) | `cerebrium.toml` example for [Cerebrium](https://cerebrium.ai) |
| [Fly.io](/fly.io) | `fly.toml` example for [Fly.io](https://fly.io) |
| [Kubernetes](/kubernetes) | Example manifest file for any Kubernetes environment |
| [Modal](/modal) | Example based on the python-agent-starter project ready to deploy on [Modal](https://modal.com) (no `Dockerfile` or config file necessary) |
| [Render](/render.com) | `render.yaml` example for [Render](https://render.com) |

## Missing a provider?
Expand Down
104 changes: 104 additions & 0 deletions modal/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Modal LiveKit Agents Deployment Example

This directory contains a [LiveKit](https://livekit.com) voice AI agent deployed on [Modal](https://www.modal.com?utm_source=partner&utm_medium=github&utm_campaign=livekit), a serverless platform for running Python applications. The agent is based on [LiveKit's `agent-starter-python` project](https://github.com/livekit-examples/agent-starter-python)

## Getting Started

Before deploying, ensure you have:

- **Modal Account**: Sign up at [modal.com](https://www.modal.com?utm_source=partner&utm_medium=github&utm_campaign=livekit) and get $30/month of free compute.
- **LiveKit Account**: Set up a [LiveKit](https://livekit.com) account
- **API Keys**:
- [OpenAI](https://openai.com)
- [Cartesia](https://cartesia.com)
- [Deepgram](https://deepgram.com)

### Install Dependencies

The project uses `uv` for dependency management. That said, the only local dependency you need is `modal`. To setup the environment, run

```bash
uv sync
```

### Authenticate Modal

```bash
modal setup
```

### Set Up Secrets on Modal

**Using the Modal dashboard**

Navigate to the Secrets section in the Modal dashboard and add the following secrets:

- `LIVEKIT_URL` - Your LiveKit WebRTC server URL
- `LIVEKIT_API_KEY` - API key for authenticating LiveKit requests
- `LIVEKIT_API_SECRET` - API secret for LiveKit authentication
- `OPENAI_API_KEY` - API key for OpenAI's GPT-based processing
- `CARTESIA_API_KEY` - API key for Cartesia's TTS services
- `DEEPGRAM_API_KEY` - API key for Deepgram's STT services

You can find your LiveKit URL and API keys under **Settings** > **Project** and **Settings** > **Keys** in the LiveKit dashboard.

![Modal Secrets](https://modal-cdn.com/cdnbot/modal-livekit-secretsndip6awa_78ed94b0.webp)

**Using the Modal CLI:**

```bash
modal secret create livekit-voice-agent \
--env LIVEKIT_URL=your_livekit_url \
--env LIVEKIT_API_KEY=your_api_key \
--env LIVEKIT_API_SECRET=your_api_secret \
--env OPENAI_API_KEY=your_openai_key \
--env DEEPGRAM_API_KEY=your_deepgram_key \
--env CARTESIA_API_KEY=your_cartesia_key
```

Once added, you can reference these secrets in your Modal functions.

### Configure LiveKit Webhooks

In your LiveKit project dashboard, create a new Webhook using the URL created when you deploy your Modal app. This URL will be printed to stdout and is also available in your Modal dashboard. It will look something like the URL in the screenshot below:

![settings webhooks](https://modal-cdn.com/cdnbot/livekit-webhooksiceyins6_203427cc.webp)

## Deployment

Run the following command to deploy your Modal app.
```bash
modal deploy -m src.server
```
You can interact with your agent using the hosted [LiveKit Agent Playground](https://docs.livekit.io/agents/start/playground/). When you connect to the room, the `room_started` webhook event will spawn your agent to the room.

## Developing

During development in case be helpful to launch the application using
```
modal serve -m src.server
```
which will reload the app when changes are made to the source code.

## Testing

### Test the Agent

Use the following command to launch your app remotely and execute the tests using `pytest`:
```
modal run -m src.server
```

### Test the Webhook Endpoint

Test the webhook endpoint with a sample LiveKit event from the command line:

```bash
curl -X POST {MODAL_AGENT_WEB_ENDPOINT_URL} \
-H "Authorization: Bearer your_livekit_token" \
-H "Content-Type: application/json" \
-d '{"event": "room_started", "room": {"name": "test-room"}}'
```

Or you can trigger Webhook events from LiveKit Webhooks setting page (the same place you created the new Webhook).

42 changes: 42 additions & 0 deletions modal/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "agent-starter-python"
version = "1.0.0"
description = "Simple voice AI assistant built with LiveKit Agents for Python"
requires-python = ">=3.9"

dependencies = [
"modal",
]

[dependency-groups]
dev = [
"pytest",
"pytest-asyncio",
"ruff",
]

[tool.setuptools.packages.find]
where = ["src"]

[tool.setuptools.package-dir]
"" = "src"

[tool.pytest.ini_options]
asyncio_mode = "auto"
asyncio_default_fixture_loop_scope = "function"

[tool.ruff]
line-length = 88
target-version = "py39"

[tool.ruff.lint]
select = ["E", "F", "W", "I", "N", "B", "A", "C4", "UP", "SIM", "RUF"]
ignore = ["E501"] # Line too long (handled by formatter)

[tool.ruff.format]
quote-style = "double"
indent-style = "space"
1 change: 1 addition & 0 deletions modal/src/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# This file makes the src directory a Python package
138 changes: 138 additions & 0 deletions modal/src/agent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
import logging

from fastapi import FastAPI, Request, Response
from livekit import api
from livekit.agents import (
NOT_GIVEN,
Agent,
AgentFalseInterruptionEvent,
AgentSession,
JobContext,
JobProcess,
MetricsCollectedEvent,
RoomInputOptions,
RunContext,
WorkerOptions,
cli,
metrics,
)
from livekit.agents.llm import function_tool
from livekit.plugins import cartesia, deepgram, noise_cancellation, openai, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel

logger = logging.getLogger("agent")

def download_files():
import subprocess
subprocess.run(["uv", "run", "src/agent.py", "download-files"], cwd="/root")





class Assistant(Agent):
def __init__(self) -> None:
super().__init__(
instructions="""You are a helpful voice AI assistant.
You eagerly assist users with their questions by providing information from your extensive knowledge.
Your responses are concise, to the point, and without any complex formatting or punctuation including emojis, asterisks, or other symbols.
You are curious, friendly, and have a sense of humor.""",
)

# all functions annotated with @function_tool will be passed to the LLM when this
# agent is active
@function_tool
async def lookup_weather(self, context: RunContext, location: str):
"""Use this tool to look up current weather information in the given location.

If the location is not supported by the weather service, the tool will indicate this. You must tell the user the location's weather is unavailable.

Args:
location: The location to look up weather information for (e.g. city name)
"""

logger.info(f"Looking up weather for {location}")

return "sunny with a temperature of 70 degrees."


def prewarm(proc: JobProcess):
proc.userdata["vad"] = silero.VAD.load()

async def entrypoint(ctx: JobContext):
# Logging setup
# Add any other context you want in all log entries here
ctx.log_context_fields = {
"room": ctx.room.name,
}

# Set up a voice AI pipeline using OpenAI, Cartesia, Deepgram, and the LiveKit turn detector
session = AgentSession(
# A Large Language Model (LLM) is your agent's brain, processing user input and generating a response
# See all providers at https://docs.livekit.io/agents/integrations/llm/
llm=openai.LLM(model="gpt-4o-mini"),
# Speech-to-text (STT) is your agent's ears, turning the user's speech into text that the LLM can understand
# See all providers at https://docs.livekit.io/agents/integrations/stt/
stt=deepgram.STT(model="nova-3", language="multi"),
# Text-to-speech (TTS) is your agent's voice, turning the LLM's text into speech that the user can hear
# See all providers at https://docs.livekit.io/agents/integrations/tts/
tts=cartesia.TTS(voice="6f84f4b8-58a2-430c-8c79-688dad597532"),
# VAD and turn detection are used to determine when the user is speaking and when the agent should respond
# See more at https://docs.livekit.io/agents/build/turns
turn_detection=MultilingualModel(),
vad=ctx.proc.userdata["vad"],
# allow the LLM to generate a response while waiting for the end of turn
# See more at https://docs.livekit.io/agents/build/audio/#preemptive-generation
preemptive_generation=True,
)

# To use a realtime model instead of a voice pipeline, use the following session setup instead:
# session = AgentSession(
# # See all providers at https://docs.livekit.io/agents/integrations/realtime/
# llm=openai.realtime.RealtimeModel()
# )

# sometimes background noise could interrupt the agent session, these are considered false positive interruptions
# when it's detected, you may resume the agent's speech
@session.on("agent_false_interruption")
def _on_agent_false_interruption(ev: AgentFalseInterruptionEvent):
logger.info("false positive interruption, resuming")
session.generate_reply(instructions=ev.extra_instructions or NOT_GIVEN)

# Metrics collection, to measure pipeline performance
# For more information, see https://docs.livekit.io/agents/build/metrics/
usage_collector = metrics.UsageCollector()

@session.on("metrics_collected")
def _on_metrics_collected(ev: MetricsCollectedEvent):
metrics.log_metrics(ev.metrics)
usage_collector.collect(ev.metrics)

async def log_usage():
summary = usage_collector.get_summary()
logger.info(f"Usage: {summary}")

ctx.add_shutdown_callback(log_usage)

# # Add a virtual avatar to the session, if desired
# # For other providers, see https://docs.livekit.io/agents/integrations/avatar/
# avatar = hedra.AvatarSession(
# avatar_id="...", # See https://docs.livekit.io/agents/integrations/avatar/hedra
# )
# # Start the avatar and wait for it to join
# await avatar.start(session, room=ctx.room)

# Start the session, which initializes the voice pipeline and warms up the models
await session.start(
agent=Assistant(),
room=ctx.room,
room_input_options=RoomInputOptions(
# LiveKit Cloud enhanced noise cancellation
# - If self-hosting, omit this parameter
# - For telephony applications, use `BVCTelephony` for best results
noise_cancellation=noise_cancellation.BVC(),
),
)

# Join the room and connect to the user
await ctx.connect()
Loading