Voice AI Grade Assistant · LiveKit Agents · Multi-Provider Avatar + Voice · Persona System
A voice AI agent with a lip-synced avatar that answers questions about a student's grades, assignments, and changes. Users interact via a floating widget embedded in a companion web app, speaking to an animated avatar with a cloned voice. The agent reads snapshot data from daily SIS (student information system) portal scrapes, performs deterministic analysis in Python, and narrates results conversationally.
Built to demonstrate:
- LiveKit Agents SDK with tool-calling, avatar, and voice pipeline integration
- Multi-provider architecture: swappable avatar (Hedra, LemonSlice) and TTS (ElevenLabs, Cartesia) per persona
- Persona system with templated instructions, per-character catchphrases, and runtime config merging
- Browser navigation via LiveKit RPC — agent tools auto-navigate the frontend as a side effect of data lookups
- User profiles and session memory via Supabase — onboarding, incremental message persistence, LLM-generated session summaries
- Deterministic analysis layer — tools do the computation, LLM only narrates
User (voice/text via LiveKit React SDK)
│
▼
LiveKit Cloud (STT → LLM → TTS, room management)
│
▼
Agent (this repo)
├── 15 @function_tools — call deterministic Python, return human-readable strings
├── _navigate_browser() — RPC to frontend, auto-navigates on data lookups
├── Persona system — base.md (templated) + persona.md (per-character, gitignored)
├── User profiles — Supabase onboarding, session memory, incremental messages
└── Data layer — local clone of snapshot repo, filesystem reads, no API at runtime
Frontend (separate repo: table-mutation-tracker)
├── AgentWidget — floating voice/video widget with connect/disconnect
├── NavigationHandler — receives RPC, calls router.push()
└── Calendar/diff UI — agent navigates to relevant views automatically
Voice pipeline: Deepgram Nova 3 (STT) → GPT-4.1 (LLM) → ElevenLabs or Cartesia (TTS)
Avatar: Hedra (photorealistic from headshot, 512×512 lip-synced video) or LemonSlice (cartoon/stylized, 368×560). Published as a standard LiveKit video track.
Data flow: Agent clones a private GitHub repo of daily SIS portal scrapes at startup, git-pulls per session. All snapshot reads are filesystem I/O — no API calls at runtime. Rolling index provides pre-computed change counts; individual diffs computed on demand.
Each persona plays "Sally Schoolwork" with a distinct voice, appearance, and personality. Architecture separates committed config from private assets:
personas/
base.md ← Committed. Templated with {{STUDENT_NAME}}, {{SCHOOL_NAME}}.
config.json ← Committed. Provider choices, temperature. No secrets.
config.local.json ← Gitignored. Real names, service IDs.
example/persona.md ← Committed. Template for new personas.
<pseudonym>/ ← Gitignored. persona.md + source media.
load_persona() merges config.json + config.local.json, concatenates base.md + persona.md, and templates placeholders at runtime. Adding a new persona requires zero code changes — just a subdirectory, a markdown file, and a config entry.
- Tools do the analysis, LLM narrates. 15 function tools call deterministic Python. The LLM never sees raw JSON — it receives pre-computed summaries. This keeps responses accurate, fast, and testable.
- Navigation as side effect. Class-specific tools auto-navigate the browser via RPC when they run — no separate "show me" step, no waiting for speech to finish. Aggregate tools (overview, trends) skip navigation since they don't map to a single page.
- Session memory without auth. Device UUID in localStorage → passed as participant identity → keyed in Supabase. Messages saved incrementally per turn; session summary generated by LLM on disconnect with topic/class extraction.
- Persona inheritance. Shared base (student context, guardrails, onboarding) + persona-specific (voice style, catchphrases). Onboarding is profile-level, not persona-level — switching personas doesn't re-onboard.
uv sync # Install dependencies
uv run python src/agent.py download-files # Download ML models (first run)
cp personas/config.local.example.json personas/config.local.json # Add your names + service IDs
cp .env.example .env.local # Add your API keys
uv run python src/agent.py console # Run in terminalEnvironment variables (.env.local):
LIVEKIT_URL,LIVEKIT_API_KEY,LIVEKIT_API_SECRET— LiveKit CloudDATA_REPO_URL— Git URL for snapshot data repoHEDRA_API_KEY— Hedra avatar (optional)ELEVEN_API_KEY— ElevenLabs TTS/voice cloning (optional)LEMONSLICE_API_KEY— LemonSlice avatar (optional)SUPABASE_URL,SUPABASE_KEY— Supabase (optional, for user profiles)
uv run pytest # All tests (88 non-LLM + 11 LLM-dependent)
uv run pytest tests/test_analysis.py # Data layer only (no API keys needed)
uv run pytest tests/test_navigation.py # Navigation logic (no API keys needed)
uv run pytest tests/test_user_store.py # UserStore with mocked Supabase (no API keys needed)lk agent deploy # Build and deploy to LiveKit Cloud
lk agent status # Check deployment
lk agent logs # Tail runtime logsPersona files are baked into the Docker image at build time. The Dockerfile installs git for runtime data repo cloning.
Built using AI-assisted development tooling while maintaining human ownership of architectural decisions, provider selection, persona design, and privacy controls. AI accelerated implementation; system decomposition, multi-provider abstraction, and data governance were deliberate and human-directed.
The focus throughout:
- Deterministic analysis over LLM reasoning. Tools do the computation; the model only narrates. Accurate, fast, testable.
- Provider abstraction over vendor lock-in. Avatar and voice providers are swappable per persona via config — no code changes.
- Persona inheritance over duplication. Shared base context + per-character personality. Adding a persona is a markdown file and a config entry.
- Navigation as side effect over explicit commands. The agent shows you what it's talking about without being asked.
- table-mutation-tracker — Frontend calendar UI + agent widget (branch
feature/livekit-agent-widget) - PLAN.md — Full implementation plan, provider decisions, future phases
- AVATAR_PROVIDERS.md — Avatar and voice provider research
MIT