Skip to content

dlasley/sally-schoolwork

Repository files navigation

Sally Schoolwork

Voice AI Grade Assistant · LiveKit Agents · Multi-Provider Avatar + Voice · Persona System

A voice AI agent with a lip-synced avatar that answers questions about a student's grades, assignments, and changes. Users interact via a floating widget embedded in a companion web app, speaking to an animated avatar with a cloned voice. The agent reads snapshot data from daily SIS (student information system) portal scrapes, performs deterministic analysis in Python, and narrates results conversationally.

Built to demonstrate:

  • LiveKit Agents SDK with tool-calling, avatar, and voice pipeline integration
  • Multi-provider architecture: swappable avatar (Hedra, LemonSlice) and TTS (ElevenLabs, Cartesia) per persona
  • Persona system with templated instructions, per-character catchphrases, and runtime config merging
  • Browser navigation via LiveKit RPC — agent tools auto-navigate the frontend as a side effect of data lookups
  • User profiles and session memory via Supabase — onboarding, incremental message persistence, LLM-generated session summaries
  • Deterministic analysis layer — tools do the computation, LLM only narrates

Architecture

User (voice/text via LiveKit React SDK)
  │
  ▼
LiveKit Cloud (STT → LLM → TTS, room management)
  │
  ▼
Agent (this repo)
  ├── 15 @function_tools — call deterministic Python, return human-readable strings
  ├── _navigate_browser() — RPC to frontend, auto-navigates on data lookups
  ├── Persona system — base.md (templated) + persona.md (per-character, gitignored)
  ├── User profiles — Supabase onboarding, session memory, incremental messages
  └── Data layer — local clone of snapshot repo, filesystem reads, no API at runtime

Frontend (separate repo: table-mutation-tracker)
  ├── AgentWidget — floating voice/video widget with connect/disconnect
  ├── NavigationHandler — receives RPC, calls router.push()
  └── Calendar/diff UI — agent navigates to relevant views automatically

Voice pipeline: Deepgram Nova 3 (STT) → GPT-4.1 (LLM) → ElevenLabs or Cartesia (TTS)

Avatar: Hedra (photorealistic from headshot, 512×512 lip-synced video) or LemonSlice (cartoon/stylized, 368×560). Published as a standard LiveKit video track.

Data flow: Agent clones a private GitHub repo of daily SIS portal scrapes at startup, git-pulls per session. All snapshot reads are filesystem I/O — no API calls at runtime. Rolling index provides pre-computed change counts; individual diffs computed on demand.

Persona System

Each persona plays "Sally Schoolwork" with a distinct voice, appearance, and personality. Architecture separates committed config from private assets:

personas/
  base.md              ← Committed. Templated with {{STUDENT_NAME}}, {{SCHOOL_NAME}}.
  config.json          ← Committed. Provider choices, temperature. No secrets.
  config.local.json    ← Gitignored. Real names, service IDs.
  example/persona.md   ← Committed. Template for new personas.
  <pseudonym>/         ← Gitignored. persona.md + source media.

load_persona() merges config.json + config.local.json, concatenates base.md + persona.md, and templates placeholders at runtime. Adding a new persona requires zero code changes — just a subdirectory, a markdown file, and a config entry.

Key Design Decisions

  • Tools do the analysis, LLM narrates. 15 function tools call deterministic Python. The LLM never sees raw JSON — it receives pre-computed summaries. This keeps responses accurate, fast, and testable.
  • Navigation as side effect. Class-specific tools auto-navigate the browser via RPC when they run — no separate "show me" step, no waiting for speech to finish. Aggregate tools (overview, trends) skip navigation since they don't map to a single page.
  • Session memory without auth. Device UUID in localStorage → passed as participant identity → keyed in Supabase. Messages saved incrementally per turn; session summary generated by LLM on disconnect with topic/class extraction.
  • Persona inheritance. Shared base (student context, guardrails, onboarding) + persona-specific (voice style, catchphrases). Onboarding is profile-level, not persona-level — switching personas doesn't re-onboard.

Setup

uv sync                                    # Install dependencies
uv run python src/agent.py download-files   # Download ML models (first run)
cp personas/config.local.example.json personas/config.local.json  # Add your names + service IDs
cp .env.example .env.local                  # Add your API keys
uv run python src/agent.py console          # Run in terminal

Environment variables (.env.local):

  • LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET — LiveKit Cloud
  • DATA_REPO_URL — Git URL for snapshot data repo
  • HEDRA_API_KEY — Hedra avatar (optional)
  • ELEVEN_API_KEY — ElevenLabs TTS/voice cloning (optional)
  • LEMONSLICE_API_KEY — LemonSlice avatar (optional)
  • SUPABASE_URL, SUPABASE_KEY — Supabase (optional, for user profiles)

Testing

uv run pytest                               # All tests (88 non-LLM + 11 LLM-dependent)
uv run pytest tests/test_analysis.py        # Data layer only (no API keys needed)
uv run pytest tests/test_navigation.py      # Navigation logic (no API keys needed)
uv run pytest tests/test_user_store.py      # UserStore with mocked Supabase (no API keys needed)

Deployment

lk agent deploy    # Build and deploy to LiveKit Cloud
lk agent status    # Check deployment
lk agent logs      # Tail runtime logs

Persona files are baked into the Docker image at build time. The Dockerfile installs git for runtime data repo cloning.

Development Philosophy

Built using AI-assisted development tooling while maintaining human ownership of architectural decisions, provider selection, persona design, and privacy controls. AI accelerated implementation; system decomposition, multi-provider abstraction, and data governance were deliberate and human-directed.

The focus throughout:

  • Deterministic analysis over LLM reasoning. Tools do the computation; the model only narrates. Accurate, fast, testable.
  • Provider abstraction over vendor lock-in. Avatar and voice providers are swappable per persona via config — no code changes.
  • Persona inheritance over duplication. Shared base context + per-character personality. Adding a persona is a markdown file and a config entry.
  • Navigation as side effect over explicit commands. The agent shows you what it's talking about without being asked.

Related

License

MIT

About

Voice AI grade assistant with lip-synced avatar and cloned voice. LiveKit Agents, multi-provider persona system, deterministic analysis tools, browser navigation via RPC.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors