Skip to content

tomgoeck/uschi_ai

Repository files navigation

Meet Uschi and Klaus

Uschi

A self-hosted, playful voice assistant powered by Google Gemini, LiveKit, and OpenClaw. One install script. One API key. Yours.

Uschi runs on your laptop, your home server, or your Raspberry Pi. She talks, she listens, and she has a quiet partner named Klaus who does the actual work in the background — research, sending messages, calling APIs, that sort of thing. They work as a team.

This is an open-source project. No accounts, no telemetry, no cloud lock-in. The only piece you have to plug in is a free Google API key for Gemini.


Meet the team

Uschi is the voice. She listens to you over a microphone, talks back through speakers, and runs entirely on Google's Gemini Live model — one model for speech-to-text, language understanding, and speech synthesis. Low latency, one API key, no provider juggling.

Klaus is the brain. He runs on OpenClaw, a sandboxed agent runtime that handles the real work: searching the web, sending messages, ordering things, calling external APIs. When Uschi gets a request that needs more than a quick answer, she hands it off to Klaus and gets back to talking to you. Klaus drops a status note in her inbox when he's done.

You can talk to either one independently — Uschi over voice, Klaus over chat — or you can let them work together as a team.


Before you start

You need:

  • Docker Desktop (or Docker Engine + Compose v2) — install guide
  • git and curl (pre-installed on macOS and most Linux distros)
  • A free Google Gemini API keyget one here (no credit card required)
  • About 2 GB of free RAM and 10 GB of disk space
  • About 10 minutes

If something goes wrong during the install, jump to docs/05-troubleshooting.md.


Quickstart

git clone https://github.com/tomgoeck/uschi_ai.git
cd uschi_ai
./install.sh

The wizard will:

  1. Check your prerequisites
  2. Walk you through OpenClaw's own setup (for Klaus)
  3. Ask for a free Google Gemini API key
  4. Auto-generate the LiveKit and OpenClaw secrets
  5. Let you pick a voice, name, and language for your assistant
  6. Build the containers and start everything

When it's done, open http://localhost:8484 and click the talk button.


Three ways to install

Mode Best for What you get
Local Trying it out, dev, single-user setups Everything on localhost. Fastest path.
Server Sharing with your household, real deployment Caddy reverse proxy + auto-TLS via Let's Encrypt.
Raspberry Pi A real smart speaker on a shelf Pi client with wake-word detection, systemd autostart.

All three start from the same ./install.sh — the wizard asks which one you want.

See docs/02-install.md for the full guide.


Customizing Uschi

Once she's running, you can change just about everything:

  • Personality — edit openclaw/workspace/VOICE.md (or use the dashboard). Sarcasm allowed.
  • Memoryopenclaw/workspace/USCHI_MEMORY.md is what she remembers about you between sessions.
  • Voice — pick from Gemini's voices (Aoede, Charon, Fenrir, Kore, Puck, ...) in the dashboard settings.
  • Tools — drop a JSON file in openclaw/workspace/uschi_tools/ and Uschi picks it up on the next session. See docs/03-customize.md.
  • Smart home — edit openclaw/workspace/devices.yaml to add your rooms and lights. The built-in control_light tool just publishes a LiveKit data message; your client decides how to actually flip the switch.

Build your own client

Uschi runs on plain LiveKit, which means any device that can speak WebRTC can talk to her: a browser tab, a Raspberry Pi, a custom mobile app, an embedded device. The dashboard's web UI is one client; the included Pi client is another.

If you want to build your own — say, an Android companion app — see docs/04-build-your-client.md. It walks through token issuance, the audio loop, and sample code in JavaScript, Python, and Kotlin.


Honest disclaimers

  • Gemini Live is in preview. The realtime model is fast and good but it has rough edges: certain runtime instruction updates don't take effect, and long-running tool calls can get retried. We document the workarounds in docs/01-how-it-works.md. If you hit something weird, that's the first place to check.
  • Klaus needs his own LLM access. OpenClaw runs as a separate process with its own setup. The first time you run ./install.sh, you'll be handed off to OpenClaw's installer to log into your LLM subscription.
  • No telemetry. Nothing in this repo phones home. The only outbound traffic is to Google for Gemini and whatever APIs Klaus calls on your behalf.
  • This is not a product. It's a self-hosted toy that we use ourselves. PRs welcome, but expect rough edges.

How does it actually work?

Read docs/01-how-it-works.md for the full picture: where audio flows, how Uschi and Klaus communicate, what the dashboard shows you, and what the known limits are.


License

MIT. Take it, fork it, ship it, change everything. Just keep the license header.

About

Livekit Voice Agent with Speech to Speech and OpenClaw AI Agents managing the Voice Agent and difficult tasks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors