🐟 Billy Bass AI

An AI-powered animatronic Big Mouth Billy Bass that listens, thinks, and talks back — with synchronized lip movements, head gestures, and a sarcastic personality.

Built on an ESP32 microcontroller with a Python relay server that bridges to ElevenLabs Conversational AI for real-time voice conversations. Also works as a standalone Bluetooth speaker with automatic dance choreography.

Features

AI Conversations — Say "Hey Jarvis" (or press the button) and Billy responds with real-time voice via ElevenLabs Conversational AI (ASR + LLM + TTS in one pipeline)
Bluetooth Speaker — Pairs as a standard A2DP receiver with real-time lip-sync to any music you stream
Adaptive Lip-Sync — RMS-based envelope tracking with fast attack / smooth release for natural mouth movement on both speech and music
Dance Mode — Detects sustained music playback and triggers head-bobbing dance bursts (10s on, 15s rest) — only after 30s of actual audio energy, not silence
Web Setup Portal — Captive portal (Billy_Setup AP) for WiFi config, mode selection, and server settings from any phone
Button Override — Short press = wake/interrupt; long press (4s) = factory reset back to setup portal

Architecture

┌──────────────────────── ESP32 ───────────────────────┐
│                                                        │
│  Mic (INMP441) ──I2S──► Audio Task ──WS──► Server     │
│                                                        │
│  Server ──WS──► Audio Task ──I2S──► Amp (MAX98357A)   │
│                                                        │
│  Motor Task (60Hz) ──► Mouth PWM (lip-sync)           │
│                    ──► Head GPIO (dance/gestures)      │
│                                                        │
│  State: PORTAL │ BT_STREAMING │ AI_IDLE/LISTEN/REPLY  │
└──────────────────────────────────────────────────────┘
          │ WebSocket (16kHz PCM + JSON)
          ▼
┌──────────────── Relay Server ──────────────────┐
│                                                        │
│  openWakeWord ──► ElevenLabs ConvAI (WebSocket)       │
│                    │ ASR + LLM + TTS                  │
│                    ▼                                    │
│             PCM audio ─────────► ESP32               │
└──────────────────────────────────────────────────────┘

Hardware

Component	Part	Notes
MCU	ESP32-WROOM-32D	No PSRAM — 520KB SRAM, carefully budgeted
Microphone	INMP441	I2S input, 3.3V
Speaker Amp	MAX98357A	I2S output, 5V
Motor Driver	L298N H-Bridge	Mouth (PWM lip-sync) + Head (directional)
Power	3.7V Li-Po → TP4056 → MT3608	Boosted to 5V; star grounding + decoupling caps
Button	Tactile switch on GPIO 33	INPUT_PULLUP, debounced

Pin Map

Microphone (INMP441)           Amplifier (MAX98357A)
  SCK  = GPIO 14                 BCLK = GPIO 26
  WS   = GPIO 15                 LRC  = GPIO 25
  SD   = GPIO 32                 DIN  = GPIO 22

Motor Driver (L298N)           Button
  Mouth IN1 = GPIO 18 (PWM)      GPIO 33
  Mouth IN2 = GPIO 19 (PWM)
  Head  IN3 = GPIO 21
  Head  IN4 = GPIO 23

Getting Started

Prerequisites

PlatformIO (VS Code extension or CLI)
Python 3.10+
No wake word API key needed — openWakeWord is fully free
An ElevenLabs API key + Conversational AI Agent configured in the dashboard

1. Flash the Firmware

git clone https://github.com/serplay/FishAI.git
cd FishAI

pio run --target upload    # Build and flash
pio device monitor         # Serial output (115200 baud)

2. Configure the Fish

On first boot the ESP32 creates an open WiFi AP called Billy_Setup.

Connect to Billy_Setup from your phone — a captive portal appears
Pick your WiFi network and enter the password
Choose mode: AI Agent or Bluetooth Speaker
For AI mode, enter your server's local IP (default port 8765)
Save & Reboot

3. Start the Server (AI Agent Mode)

cd server
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

cp .env.example .env
# Fill in your ElevenLabs API key and Agent ID in .env

python main.py

The server listens on ws://0.0.0.0:8765/ws by default. The ESP32 connects automatically after reboot.

4. Bluetooth Speaker Mode

No server needed — just pair your phone to "Billy Bass" and play music. The fish lip-syncs automatically and starts dancing after 30 seconds of continuous playback.

Project Structure

FishAI/
├── include/                   # ESP32 headers
│   ├── config.h               #   Pin map, timing, enums, constants
│   ├── audio_manager.h        #   I2S mic/amp driver
│   ├── motor_control.h        #   Lip-sync + head motor controller
│   ├── network_manager.h      #   WiFi, portal, WebSocket client
│   └── button_handler.h       #   Debounced button (short/long press)
├── src/                       # ESP32 sources
│   ├── main.cpp               #   State machine, FreeRTOS tasks, A2DP
│   ├── audio_manager.cpp      #   I2S init, read/write
│   ├── motor_control.cpp      #   RMS engine, PWM control, dance logic
│   ├── network_manager.cpp    #   Web server routes, WS events, NVS
│   └── button_handler.cpp     #   ISR-based polling
├── server/                    # Python relay server
│   ├── main.py                #   WS server, session manager
│   ├── pipeline.py            #   ElevenLabs ConvAI bridge
│   ├── wake_word.py           #   openWakeWord wake word engine
│   ├── mock_client.py         #   Desktop mock ESP32 for testing
│   ├── config.py              #   Env vars and constants
│   ├── requirements.txt       #   websockets, openwakeword, aiohttp
│   └── .env.example           #   API key template
└── platformio.ini             # Build config, libs, partition table

Configuration

Firmware Constants

Key tuning parameters in include/config.h:

Constant	Default	What it does
`RMS_SILENCE_THRESHOLD`	`80`	Audio energy floor — below this, mouth stays closed
`LIPSYNC_PEAK_RATIO`	`1.6`	Signal must exceed baseline by 60% to open
`LIPSYNC_ATTACK_ALPHA`	`0.7`	Mouth open speed (higher = snappier)
`LIPSYNC_RELEASE_ALPHA`	`0.45`	Mouth close speed
`BT_DANCE_START_MS`	`30000`	Continuous audio before dance unlocks
`BT_DANCE_BURST_MS`	`10000`	Dance duration per burst
`BT_DANCE_REST_MS`	`15000`	Rest between bursts

Server Environment

See server/.env.example for all options. Key variables:

Variable	Required	Description
`ELEVENLABS_API_KEY`	✅	ElevenLabs API key
`ELEVENLABS_AGENT_ID`	✅	Conversational AI agent ID (from dashboard)
`OWW_MODEL_NAMES`		Wake word model (default: `hey_jarvis_v0.1`)
`OWW_THRESHOLD`		Detection sensitivity (default: `0.5`)
`SERVER_PORT`		Default: `8765`

Firmware Dependencies

Managed automatically by PlatformIO:

Library	Version	Purpose
ESP32-A2DP	v1.8.3	Bluetooth A2DP Sink
arduino-audio-tools	v1.0.2	I2S / A2DP integration
ESPAsyncWebServer	latest	Captive portal web server
WebSockets	≥2.6.1	WebSocket client (AI mode)
ArduinoJson	≥7.3.0	JSON parsing

Constraints & Design Decisions

No PSRAM — all buffers are statically sized or stack-allocated. No malloc in hot paths.
WiFi ⊕ Bluetooth — mutually exclusive due to RAM. Mode is stored in NVS and requires reboot to switch.
~400KB usable heap after FreeRTOS + radio stack. The firmware uses huge_app.csv partitions (~3MB app, no OTA).
ElevenLabs ConvAI — the server is a thin bridge; ASR, LLM, TTS, and turn-taking are all handled by ElevenLabs. Agent personality is configured in the dashboard.
Star grounding + decoupling caps — prevents motor-induced brownouts from crashing the ESP32 on battery power.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
include		include
server		server
src		src
.clangd		.clangd
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
platformio.ini		platformio.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐟 Billy Bass AI

Features

Architecture

Hardware

Getting Started

Prerequisites

1. Flash the Firmware

2. Configure the Fish

3. Start the Server (AI Agent Mode)

4. Bluetooth Speaker Mode

Project Structure

Configuration

Firmware Constants

Server Environment

Firmware Dependencies

Constraints & Design Decisions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🐟 Billy Bass AI

Features

Architecture

Hardware

Getting Started

Prerequisites

1. Flash the Firmware

2. Configure the Fish

3. Start the Server (AI Agent Mode)

4. Bluetooth Speaker Mode

Project Structure

Configuration

Firmware Constants

Server Environment

Firmware Dependencies

Constraints & Design Decisions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages