unitree_converse

A complete voice conversation pipeline for the Unitree G1 humanoid robot ("Aletta"), running fully on-robot with no cloud dependency. The robot listens, understands, and responds using a smooth English female voice through its built-in speaker.

Demo

Hold F1 on the Unitree remote → robot listens → release F1 → robot responds
Press F3 → toggle continuous conversation mode (robot keeps listening after each response)
Aletta knows her battery level, orientation, network status, uptime, and software stack

Quick Start

git clone --recurse-submodules https://github.com/SaxionMechatronics/unitree_converse.git
cd unitree_converse
./setup.sh

The setup script will ask whether you are setting up on the Unitree G1 Jetson or a dev machine, then handle everything automatically:

Downloads Piper voice model (en_US-lessac-medium)
Downloads faster-whisper base model
Installs Piper binary (Jetson) or piper-tts Python package (dev machine)
Installs and starts Ollama, pulls LLaMA 3.2 3B
Builds the ROS2 workspace
Installs and enables systemd services (Jetson only)

Note: If setup.sh fails at any step, see the Manual Installation section below for step-by-step instructions.

Architecture

Hardware

Component	Details
Robot	Unitree G1 Edu (29-DOF + Dex3-L hands)
Onboard compute	NVIDIA Jetson Orin NX 16GB (`192.168.123.164`)
Dev machine	Ubuntu 22.04, RTX Pro 5000 Blackwell
Microphone	G1 built-in mic via RockChip UDP multicast `239.168.123.161:5555`
Speaker	G1 built-in speaker via Unitree AudioHub API
Remote	Unitree wireless controller (`/wirelesscontroller`)
Network	Ethernet: dev `192.168.123.100` ↔ Jetson `192.168.123.164`

Software Stack

Component	Technology
ROS2	Foxy (Jetson) / Humble (dev machine)
DDS	CycloneDDS via `~/cyclonedds_ws`
LLM	Ollama + LLaMA 3.2 3B
LLM ROS2 node	bob_llm
Speech-to-text	faster-whisper (base, CPU)
Text-to-speech	Piper TTS (`en_US-lessac-medium`)
Audio output	Unitree AudioHub API via `g1_piper_tts` C++ binary
Wake word	openWakeWord (`hey_jarvis`) — disabled in production

Package Structure

unitree_converse/
├── setup.sh                        # Interactive setup script
├── unitree_converse.service        # Systemd service file
├── ollama.service                  # Ollama systemd service reference
├── README.md
└── src/
    ├── g1_voice/
    │   ├── g1_voice/
    │   │   ├── wake_word_node.py       # openWakeWord + keyboard trigger (sim)
    │   │   ├── stt_node.py             # faster-whisper + UDP mic + stop signal
    │   │   ├── tts_node.py             # Piper TTS via g1_piper_tts binary
    │   │   ├── button_trigger_node.py  # F1/F3 remote button mapping
    │   │   └── robot_state_node.py     # Live robot state injection into LLM
    │   ├── launch/
    │   │   ├── voice_sim.launch.py     # Dev machine (sounddevice mic, Python TTS)
    │   │   └── voice_real.launch.py    # On-robot (UDP mic, g1_piper_tts)
    │   └── config/
    │       ├── voice_params.yaml       # Dev machine params
    │       └── voice_params_real.yaml  # Jetson robot params
    └── bob_llm/                        # LLM ROS2 node (submodule, Foxy-patched fork)

Key Discovery: Audio Routing

The G1's audio is not handled by the Jetson's ALSA/PulseAudio. It is managed by a separate RockChip MCU at 192.168.123.161.

Microphone

The RockChip streams raw 16-bit mono 16kHz PCM via UDP multicast. The stt_node joins the multicast group to receive mic audio.

sock.bind(('', 5555))
mreq = struct.pack('4s4s',
    socket.inet_aton('239.168.123.161'),
    socket.inet_aton('192.168.123.164'))
sock.setsockopt(socket.IPPROTO_IP, socket.IP_ADD_MEMBERSHIP, mreq)

Speaker

The G1 speaker is controlled via the Unitree AudioClient::PlayStream() API which requires 16kHz mono PCM. A custom C++ binary g1_piper_tts handles this:

tts_node → subprocess: g1_piper_tts eth0 < text
               │
               ├── piper → /tmp/tts_raw.wav  (22050Hz)
               ├── sox → /tmp/tts_16k.wav    (16000Hz mono)
               └── AudioClient::PlayStream() → RockChip → Speaker

Source: ~/unitree_sdk2_latest/example/g1/audio/g1_piper_tts.cpp Binary: ~/unitree_sdk2_latest/build/bin/g1_piper_tts

ROS2 Topics

Topic	Type	Purpose
`/wirelesscontroller`	`unitree_go/msg/WirelessController`	Remote button input
`/g1/voice/trigger`	`std_msgs/Bool`	Start recording
`/g1/voice/stop_recording`	`std_msgs/Bool`	Stop recording (F1 released)
`/g1/voice/continuous_start`	`std_msgs/Bool`	Enable continuous mode
`/g1/voice/continuous_stop`	`std_msgs/Bool`	Disable continuous mode
`/g1/stt/transcript`	`std_msgs/String`	Raw transcribed speech
`/g1/stt/status`	`std_msgs/String`	recording/transcribing/idle
`/g1/robot_state`	`std_msgs/String`	Live robot state
`llm_prompt`	`std_msgs/String`	Enriched prompt (state + transcript)
`llm_response`	`std_msgs/String`	LLM reply
`/g1/tts/status`	`std_msgs/String`	speaking/idle
`/g1/button/status`	`std_msgs/String`	Button node status

Button Mapping

Button	Keys Value	Action
Hold F1	64	Push-to-talk: records while held, transcribes on release
F3	128	Toggle continuous conversation mode on/off

Conversation Modes

Push-to-Talk (default)

Hold F1 while speaking → release → robot responds → done.

Continuous Mode

Press F3 to toggle on → robot listens → responds → automatically listens again. Press F3 again to stop.

Systemd Services

The pipeline uses two systemd services that auto-start at boot:

ollama.service              ← runs LLaMA 3.2 3B on Jetson GPU
unitree_converse.service    ← voice pipeline (depends on ollama)

# Check status
sudo systemctl status unitree_converse.service
sudo systemctl status ollama.service

# Follow live logs
journalctl -u unitree_converse.service -f

# Restart after config changes
sudo systemctl restart unitree_converse.service

# Stop before manual launch
sudo systemctl stop unitree_converse.service

Critical: Never run ros2 launch manually while the service is running. Both instances compete for CycloneDDS shared memory and cause bad_alloc crashes. Always stop the service first:
sudo systemctl stop unitree_converse.service
sleep 2
sudo rm -f /dev/shm/fastrtps_*
ros2 launch g1_voice voice_real.launch.py

Testing

# Manual transcript — bypasses mic, tests LLM + TTS
ros2 topic pub --once /g1/stt/transcript std_msgs/msg/String "data: 'What is your battery level?'"

# Manual trigger — tests full pipeline including mic
ros2 topic pub --once /g1/voice/trigger std_msgs/msg/Bool "data: true"

# Enable continuous mode
ros2 topic pub --once /g1/voice/continuous_start std_msgs/msg/Bool "data: true"

# Check live robot state
ros2 topic echo /g1/robot_state

# Test TTS binary directly
echo "Hello, I am Aletta." | ~/unitree_sdk2_latest/build/bin/g1_piper_tts eth0

Configuration (voice_params_real.yaml)

/llm_node:
  ros__parameters:
    api_url: "http://localhost:11434/v1"
    api_model: "llama3.2"
    system_prompt: "You are Aletta, a friendly humanoid robot by Unitree Robotics at Saxion University. You have access to your live robot state in [ROBOT STATE] blocks. Use this to answer questions about your battery, orientation, network, and software. Keep ALL responses under 2 sentences. Be concise."

/stt_node:
  ros__parameters:
    use_udp_mic: true
    udp_multicast_group: "239.168.123.161"
    udp_port: 5555
    udp_local_ip: "192.168.123.164"
    silence_threshold: 0.008
    silence_duration: 2.0
    recording_duration: 8.0
    continuous_mode: false

/tts_node:
  ros__parameters:
    tts_mode: "binary"
    continuous_mode: false

Sim-to-Real Differences

Parameter	Simulation (dev machine)	Real Robot (Jetson)
`use_udp_mic`	`false` (sounddevice)	`true` (UDP multicast)
`tts_mode`	`python` (piper-tts lib)	`binary` (g1_piper_tts)
ROS2 distro	Humble	Foxy
Network interface	`lo` or `wlp132s0f0`	`eth0`
CycloneDDS interface	`wlan0` (dev)	`eth0` (robot)

Manual Installation

Use these steps if setup.sh fails at any point.

Jetson Orin NX (Ubuntu 20.04, ROS2 Foxy)

# Python deps
pip3 install faster-whisper sounddevice soundfile tqdm filelock openwakeword
sudo apt-get install -y sox portaudio19-dev

# Piper binary
wget https://github.com/rhasspy/piper/releases/download/2023.11.14-2/piper_linux_aarch64.tar.gz
tar -xzf piper_linux_aarch64.tar.gz
sudo cp piper/piper /usr/local/bin/piper

# Piper voice model
mkdir -p ~/.local/share/piper && cd ~/.local/share/piper
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json

# faster-whisper base model
python3 -c "from faster_whisper import WhisperModel; WhisperModel('base', device='cpu', compute_type='int8')"

# Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2

# Build g1_piper_tts C++ binary
git clone https://github.com/unitreerobotics/unitree_sdk2.git unitree_sdk2_latest
# Add g1_piper_tts.cpp to example/g1/audio/ and update CMakeLists.txt
cd unitree_sdk2_latest/build && cmake .. && make g1_piper_tts -j$(nproc)

# unitree_ros2 messages
git clone https://github.com/unitreerobotics/unitree_ros2.git
cd unitree_ros2/cyclonedds_ws
source /opt/ros/foxy/setup.bash
colcon build --packages-select unitree_go unitree_api unitree_hg

# Build workspace
cd ~/unitree_converse
source /opt/ros/foxy/setup.bash
source ~/cyclonedds_ws/install/setup.bash
source ~/unitree_ros2/cyclonedds_ws/install/setup.bash
colcon build --symlink-install
source install/setup.bash

# Install services
sudo cp unitree_converse.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable ollama.service unitree_converse.service
sudo systemctl start ollama.service unitree_converse.service

Dev Machine (Ubuntu 22.04, ROS2 Humble)

pip install faster-whisper sounddevice soundfile tqdm filelock openwakeword piper-tts
sudo apt-get install -y sox portaudio19-dev

mkdir -p ~/.local/share/piper && cd ~/.local/share/piper
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json

python3 -c "from faster_whisper import WhisperModel; WhisperModel('base', device='cpu', compute_type='int8')"

curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2

source /opt/ros/humble/setup.bash
cd ~/UNITREE/unitree_converse
colcon build --symlink-install
source install/setup.bash
ros2 launch g1_voice voice_sim.launch.py

Known Issues

Issue	Fix
`bad_alloc` on launch	Service already running. Stop it, clear `/dev/shm/fastrtps_*`, relaunch. Or reboot.
`cyclonedds.xml` must use `eth0`	If set to `wlan0` and WiFi not up at boot, DDS fails. Change `NetworkInterface` to `eth0`.
openWakeWord uses ~10GB RAM	Disable with `use_wake_word: false`. Use F1 button instead.
Battery showing unknown	`BmsState` uses `unitree_hg` (not `unitree_go`) and needs QoS `RELIABLE`.
Piper Python unavailable on aarch64	Use standalone binary + sox pipeline (`g1_piper_tts`).
Saxion WiFi AP isolation	Cannot SSH over WiFi. Use ethernet (`192.168.123.164`) or portable router.

Future Work

Custom "Aletta" wake word model
Voice-controlled motion commands
Persistent conversation history across reboots
Upgrade Jetson to ROS2 Humble
Map additional remote buttons to actions

Credits

bob_llm — ROS2 LLM node
Piper TTS — Fast local TTS
faster-whisper — Efficient Whisper implementation
Unitree Robotics — G1 SDK and AudioHub API

SMART Research Group — Saxion University of Applied Sciences, Enschede, Netherlands

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
ollama.service		ollama.service
setup.sh		setup.sh
unitree_converse.service		unitree_converse.service
unitree_converse_architecture.svg		unitree_converse_architecture.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

unitree_converse

Demo

Quick Start

Architecture

Hardware

Software Stack

Package Structure

Key Discovery: Audio Routing

Microphone

Speaker

ROS2 Topics

Button Mapping

Conversation Modes

Push-to-Talk (default)

Continuous Mode

Systemd Services

Testing

Configuration (voice_params_real.yaml)

Sim-to-Real Differences

Manual Installation

Jetson Orin NX (Ubuntu 20.04, ROS2 Foxy)

Dev Machine (Ubuntu 22.04, ROS2 Humble)

Known Issues

Future Work

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

unitree_converse

Demo

Quick Start

Architecture

Hardware

Software Stack

Package Structure

Key Discovery: Audio Routing

Microphone

Speaker

ROS2 Topics

Button Mapping

Conversation Modes

Push-to-Talk (default)

Continuous Mode

Systemd Services

Testing

Configuration (voice_params_real.yaml)

Sim-to-Real Differences

Manual Installation

Jetson Orin NX (Ubuntu 20.04, ROS2 Foxy)

Dev Machine (Ubuntu 22.04, ROS2 Humble)

Known Issues

Future Work

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages