A complete voice conversation pipeline for the Unitree G1 humanoid robot ("Aletta"), running fully on-robot with no cloud dependency. The robot listens, understands, and responds using a smooth English female voice through its built-in speaker.
- Hold F1 on the Unitree remote → robot listens → release F1 → robot responds
- Press F3 → toggle continuous conversation mode (robot keeps listening after each response)
- Aletta knows her battery level, orientation, network status, uptime, and software stack
git clone --recurse-submodules https://github.com/SaxionMechatronics/unitree_converse.git
cd unitree_converse
./setup.shThe setup script will ask whether you are setting up on the Unitree G1 Jetson or a dev machine, then handle everything automatically:
- Downloads Piper voice model (
en_US-lessac-medium) - Downloads faster-whisper base model
- Installs Piper binary (Jetson) or piper-tts Python package (dev machine)
- Installs and starts Ollama, pulls LLaMA 3.2 3B
- Builds the ROS2 workspace
- Installs and enables systemd services (Jetson only)
Note: If
setup.shfails at any step, see the Manual Installation section below for step-by-step instructions.
| Component | Details |
|---|---|
| Robot | Unitree G1 Edu (29-DOF + Dex3-L hands) |
| Onboard compute | NVIDIA Jetson Orin NX 16GB (192.168.123.164) |
| Dev machine | Ubuntu 22.04, RTX Pro 5000 Blackwell |
| Microphone | G1 built-in mic via RockChip UDP multicast 239.168.123.161:5555 |
| Speaker | G1 built-in speaker via Unitree AudioHub API |
| Remote | Unitree wireless controller (/wirelesscontroller) |
| Network | Ethernet: dev 192.168.123.100 ↔ Jetson 192.168.123.164 |
| Component | Technology |
|---|---|
| ROS2 | Foxy (Jetson) / Humble (dev machine) |
| DDS | CycloneDDS via ~/cyclonedds_ws |
| LLM | Ollama + LLaMA 3.2 3B |
| LLM ROS2 node | bob_llm |
| Speech-to-text | faster-whisper (base, CPU) |
| Text-to-speech | Piper TTS (en_US-lessac-medium) |
| Audio output | Unitree AudioHub API via g1_piper_tts C++ binary |
| Wake word | openWakeWord (hey_jarvis) — disabled in production |
unitree_converse/
├── setup.sh # Interactive setup script
├── unitree_converse.service # Systemd service file
├── ollama.service # Ollama systemd service reference
├── README.md
└── src/
├── g1_voice/
│ ├── g1_voice/
│ │ ├── wake_word_node.py # openWakeWord + keyboard trigger (sim)
│ │ ├── stt_node.py # faster-whisper + UDP mic + stop signal
│ │ ├── tts_node.py # Piper TTS via g1_piper_tts binary
│ │ ├── button_trigger_node.py # F1/F3 remote button mapping
│ │ └── robot_state_node.py # Live robot state injection into LLM
│ ├── launch/
│ │ ├── voice_sim.launch.py # Dev machine (sounddevice mic, Python TTS)
│ │ └── voice_real.launch.py # On-robot (UDP mic, g1_piper_tts)
│ └── config/
│ ├── voice_params.yaml # Dev machine params
│ └── voice_params_real.yaml # Jetson robot params
└── bob_llm/ # LLM ROS2 node (submodule, Foxy-patched fork)
The G1's audio is not handled by the Jetson's ALSA/PulseAudio. It is managed by a separate RockChip MCU at 192.168.123.161.
The RockChip streams raw 16-bit mono 16kHz PCM via UDP multicast. The stt_node joins the multicast group to receive mic audio.
sock.bind(('', 5555))
mreq = struct.pack('4s4s',
socket.inet_aton('239.168.123.161'),
socket.inet_aton('192.168.123.164'))
sock.setsockopt(socket.IPPROTO_IP, socket.IP_ADD_MEMBERSHIP, mreq)The G1 speaker is controlled via the Unitree AudioClient::PlayStream() API which requires 16kHz mono PCM. A custom C++ binary g1_piper_tts handles this:
tts_node → subprocess: g1_piper_tts eth0 < text
│
├── piper → /tmp/tts_raw.wav (22050Hz)
├── sox → /tmp/tts_16k.wav (16000Hz mono)
└── AudioClient::PlayStream() → RockChip → Speaker
Source: ~/unitree_sdk2_latest/example/g1/audio/g1_piper_tts.cpp
Binary: ~/unitree_sdk2_latest/build/bin/g1_piper_tts
| Topic | Type | Purpose |
|---|---|---|
/wirelesscontroller |
unitree_go/msg/WirelessController |
Remote button input |
/g1/voice/trigger |
std_msgs/Bool |
Start recording |
/g1/voice/stop_recording |
std_msgs/Bool |
Stop recording (F1 released) |
/g1/voice/continuous_start |
std_msgs/Bool |
Enable continuous mode |
/g1/voice/continuous_stop |
std_msgs/Bool |
Disable continuous mode |
/g1/stt/transcript |
std_msgs/String |
Raw transcribed speech |
/g1/stt/status |
std_msgs/String |
recording/transcribing/idle |
/g1/robot_state |
std_msgs/String |
Live robot state |
llm_prompt |
std_msgs/String |
Enriched prompt (state + transcript) |
llm_response |
std_msgs/String |
LLM reply |
/g1/tts/status |
std_msgs/String |
speaking/idle |
/g1/button/status |
std_msgs/String |
Button node status |
| Button | Keys Value | Action |
|---|---|---|
| Hold F1 | 64 | Push-to-talk: records while held, transcribes on release |
| F3 | 128 | Toggle continuous conversation mode on/off |
Hold F1 while speaking → release → robot responds → done.
Press F3 to toggle on → robot listens → responds → automatically listens again. Press F3 again to stop.
The pipeline uses two systemd services that auto-start at boot:
ollama.service ← runs LLaMA 3.2 3B on Jetson GPU
unitree_converse.service ← voice pipeline (depends on ollama)
# Check status
sudo systemctl status unitree_converse.service
sudo systemctl status ollama.service
# Follow live logs
journalctl -u unitree_converse.service -f
# Restart after config changes
sudo systemctl restart unitree_converse.service
# Stop before manual launch
sudo systemctl stop unitree_converse.serviceCritical: Never run
ros2 launchmanually while the service is running. Both instances compete for CycloneDDS shared memory and causebad_alloccrashes. Always stop the service first:sudo systemctl stop unitree_converse.service sleep 2 sudo rm -f /dev/shm/fastrtps_* ros2 launch g1_voice voice_real.launch.py
# Manual transcript — bypasses mic, tests LLM + TTS
ros2 topic pub --once /g1/stt/transcript std_msgs/msg/String "data: 'What is your battery level?'"
# Manual trigger — tests full pipeline including mic
ros2 topic pub --once /g1/voice/trigger std_msgs/msg/Bool "data: true"
# Enable continuous mode
ros2 topic pub --once /g1/voice/continuous_start std_msgs/msg/Bool "data: true"
# Check live robot state
ros2 topic echo /g1/robot_state
# Test TTS binary directly
echo "Hello, I am Aletta." | ~/unitree_sdk2_latest/build/bin/g1_piper_tts eth0/llm_node:
ros__parameters:
api_url: "http://localhost:11434/v1"
api_model: "llama3.2"
system_prompt: "You are Aletta, a friendly humanoid robot by Unitree Robotics at Saxion University. You have access to your live robot state in [ROBOT STATE] blocks. Use this to answer questions about your battery, orientation, network, and software. Keep ALL responses under 2 sentences. Be concise."
/stt_node:
ros__parameters:
use_udp_mic: true
udp_multicast_group: "239.168.123.161"
udp_port: 5555
udp_local_ip: "192.168.123.164"
silence_threshold: 0.008
silence_duration: 2.0
recording_duration: 8.0
continuous_mode: false
/tts_node:
ros__parameters:
tts_mode: "binary"
continuous_mode: false| Parameter | Simulation (dev machine) | Real Robot (Jetson) |
|---|---|---|
use_udp_mic |
false (sounddevice) |
true (UDP multicast) |
tts_mode |
python (piper-tts lib) |
binary (g1_piper_tts) |
| ROS2 distro | Humble | Foxy |
| Network interface | lo or wlp132s0f0 |
eth0 |
| CycloneDDS interface | wlan0 (dev) |
eth0 (robot) |
Use these steps if
setup.shfails at any point.
# Python deps
pip3 install faster-whisper sounddevice soundfile tqdm filelock openwakeword
sudo apt-get install -y sox portaudio19-dev
# Piper binary
wget https://github.com/rhasspy/piper/releases/download/2023.11.14-2/piper_linux_aarch64.tar.gz
tar -xzf piper_linux_aarch64.tar.gz
sudo cp piper/piper /usr/local/bin/piper
# Piper voice model
mkdir -p ~/.local/share/piper && cd ~/.local/share/piper
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json
# faster-whisper base model
python3 -c "from faster_whisper import WhisperModel; WhisperModel('base', device='cpu', compute_type='int8')"
# Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2
# Build g1_piper_tts C++ binary
git clone https://github.com/unitreerobotics/unitree_sdk2.git unitree_sdk2_latest
# Add g1_piper_tts.cpp to example/g1/audio/ and update CMakeLists.txt
cd unitree_sdk2_latest/build && cmake .. && make g1_piper_tts -j$(nproc)
# unitree_ros2 messages
git clone https://github.com/unitreerobotics/unitree_ros2.git
cd unitree_ros2/cyclonedds_ws
source /opt/ros/foxy/setup.bash
colcon build --packages-select unitree_go unitree_api unitree_hg
# Build workspace
cd ~/unitree_converse
source /opt/ros/foxy/setup.bash
source ~/cyclonedds_ws/install/setup.bash
source ~/unitree_ros2/cyclonedds_ws/install/setup.bash
colcon build --symlink-install
source install/setup.bash
# Install services
sudo cp unitree_converse.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable ollama.service unitree_converse.service
sudo systemctl start ollama.service unitree_converse.servicepip install faster-whisper sounddevice soundfile tqdm filelock openwakeword piper-tts
sudo apt-get install -y sox portaudio19-dev
mkdir -p ~/.local/share/piper && cd ~/.local/share/piper
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json
python3 -c "from faster_whisper import WhisperModel; WhisperModel('base', device='cpu', compute_type='int8')"
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2
source /opt/ros/humble/setup.bash
cd ~/UNITREE/unitree_converse
colcon build --symlink-install
source install/setup.bash
ros2 launch g1_voice voice_sim.launch.py| Issue | Fix |
|---|---|
bad_alloc on launch |
Service already running. Stop it, clear /dev/shm/fastrtps_*, relaunch. Or reboot. |
cyclonedds.xml must use eth0 |
If set to wlan0 and WiFi not up at boot, DDS fails. Change NetworkInterface to eth0. |
| openWakeWord uses ~10GB RAM | Disable with use_wake_word: false. Use F1 button instead. |
| Battery showing unknown | BmsState uses unitree_hg (not unitree_go) and needs QoS RELIABLE. |
| Piper Python unavailable on aarch64 | Use standalone binary + sox pipeline (g1_piper_tts). |
| Saxion WiFi AP isolation | Cannot SSH over WiFi. Use ethernet (192.168.123.164) or portable router. |
- Custom "Aletta" wake word model
- Voice-controlled motion commands
- Persistent conversation history across reboots
- Upgrade Jetson to ROS2 Humble
- Map additional remote buttons to actions
- bob_llm — ROS2 LLM node
- Piper TTS — Fast local TTS
- faster-whisper — Efficient Whisper implementation
- Unitree Robotics — G1 SDK and AudioHub API
SMART Research Group — Saxion University of Applied Sciences, Enschede, Netherlands