An intelligent ROS 2 Humble robot that translates vague AAC (Augmentative and Alternative Communication) phrases into precise robot behaviors using Google Gemini AI.
TalkrBot is an advanced assistive robotics system designed to help users with communication difficulties. It uses natural language processing to interpret vague AAC input (like "water yummy more") and translates it into structured robot actions (like "serve more water").
- π§ LLM-Powered Intent Understanding: Uses Google Gemini AI to interpret vague AAC phrases
- π― Context-Aware Planning: Generates step-by-step robot skill plans
- π€ Real Robot Integration: Nav2 navigation and arm control capabilities
- π€ Personalized User Profiles: Remembers user preferences and communication styles
- π Failure Recovery: Intelligent replanning when tasks fail
- π¬ Natural Dialogue: Robot can ask clarifying questions and process responses
- π Task History & Memory: Persistent storage of user interactions and preferences
- π Semantic Search: Query task history using natural language
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β AAC Input β β LLM Layer β β Robot Control β
β β β β β β
β β’ WebSocket βββββΆβ β’ Intent Refiner βββββΆβ β’ Nav2 β
β β’ REST API β β β’ GROOT Planner β β β’ Arm Control β
β β’ ROS Topics β β β’ Memory Node β β β’ Skill Executorβ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Perception β β Feedback β β User Profiles β
β β β β β β
β β’ YOLOv8 β β β’ Failure Handlerβ β β’ Preferences β
β β’ Depth Fusion β β β’ Speech Node β β β’ History β
β β’ Object Detect β β β’ Task History β β β’ Patterns β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
| Package | Purpose | Key Components |
|---|---|---|
talkrbot_aac |
AAC Input Processing | WebSocket server, REST API, input normalization |
talkrbot_llm |
AI/LLM Processing | Intent refinement, planning, memory management |
talkrbot_perception |
Computer Vision | YOLOv8 object detection, depth fusion |
talkrbot_planner |
Robot Control | Skill execution, Nav2 integration |
talkrbot_nav |
Navigation | Mock robot simulation, Nav2 bridge |
talkrbot_feedback |
User Interaction | Failure handling, speech, user profiles |
talkrbot_msgs |
Custom Messages | AACInput, TaskCommand, DetectedObject, etc. |
# Install ROS 2 Humble
sudo apt update
sudo apt install ros-humble-desktop
# Install Python dependencies
pip install google-generativeai ultralytics opencv-python websockets
# Install ROS 2 packages
sudo apt install ros-humble-nav2-* ros-humble-control-msgs ros-humble-trajectory-msgs# Clone the repository
git clone https://github.com/yourusername/TalkrBot.git
cd TalkrBot
# Build the workspace
colcon build
# Source the workspace
source install/setup.bash# Create .env file with your Gemini API key
echo "GEMINI_API_KEY=your_actual_gemini_api_key_here" > .env
# Load environment variables
source .env# Terminal 1: AAC Node (WebSocket + REST API)
ros2 run talkrbot_aac aac_node
# Terminal 2: LLM Intent Refiner
ros2 run talkrbot_llm intent_refiner_node
# Terminal 3: GROOT Planner
ros2 run talkrbot_llm groot_planner_node
# Terminal 4: Skill Executor
ros2 run talkrbot_planner skill_executor_node_simple
# Terminal 5: Mock Robot (for testing)
ros2 run talkrbot_nav mock_robot_node# Send a test AAC input via WebSocket
python3 test_websocket.py
# Or publish directly to ROS topic
ros2 topic pub /aac_input talkrbot_msgs/AACInput "{text: 'I am cold', user_id: 'vedant'}"{
"text": "water yummy more",
"user_id": "vedant",
"metadata": {"source": "web"}
}Expected Output:
[INFO] [llm_node]: Published TaskCommand:
task='serve', object='water', location='None',
parameters='{"quantity": "more"}', confidence=0.9
{
"text": "bring blanket",
"user_id": "sarah"
}Expected Skill Plan:
[
"go_to('closet')",
"grasp('blanket')",
"go_to('user')",
"place('blanket')"
]The system maintains user profiles with preferences:
{
"vedant": {
"likes": ["cold water", "quiet environment"],
"dislikes": ["loud noises", "bright lights"],
"preferred_drinks": ["water", "coffee"],
"default_location": "desk",
"communication_style": "direct"
}
}When tasks fail, the robot can:
- Ask clarifying questions: "I couldn't find the blanket. Is it in the bedroom closet?"
- Suggest alternatives: "Would you like a different blanket or a jacket instead?"
- Replan based on user responses
# Query task history
ros2 topic pub /history_query std_msgs/String "How many times did I ask for water today?"
# Semantic search
ros2 topic pub /semantic_history_query std_msgs/String "Find all tasks related to getting blankets"The robot supports these skills:
| Skill | Description | Example |
|---|---|---|
go_to(location) |
Navigate to location | go_to('kitchen') |
grasp(object) |
Pick up object | grasp('water_bottle') |
place(object) |
Put down object | place('water_bottle') |
wait(duration) |
Wait for time | wait(5.0) |
speak(text) |
Speak message | speak('I found your water') |
// Connect and send AAC input
const ws = new WebSocket('ws://localhost:8765');
ws.send(JSON.stringify({
text: "I'm thirsty",
user_id: "vedant",
metadata: {source: "tablet"}
}));curl -X POST http://localhost:8080/aac_input \
-H "Content-Type: application/json" \
-d '{"text": "bring water", "user_id": "vedant"}'# Publish AAC input
ros2 topic pub /aac_input talkrbot_msgs/AACInput "{text: 'cold', user_id: 'vedant'}"
# Monitor task commands
ros2 topic echo /task_command
# Check execution feedback
ros2 topic echo /execution_feedback# Core data flow
ros2 topic echo /aac_input # Raw AAC input
ros2 topic echo /refined_intent # LLM-refined intent
ros2 topic echo /skill_plan # Generated skill plan
ros2 topic echo /execution_feedback # Execution status
# User interaction
ros2 topic echo /speak_command # Robot speech
ros2 topic echo /speech_response # User responses
ros2 topic echo /current_user # Active user profile# Set debug logging for specific nodes
ros2 run talkrbot_llm intent_refiner_node --ros-args --log-level DEBUG- Define skill in
skill_executor_node.py - Add to GROOT planner prompt
- Update user documentation
Edit the prompt templates in:
intent_refiner_node.py- Intent refinementgroot_planner_node.py- Skill planningfailure_handler_node.py- Failure recovery
Add new fields to user_profiles.json:
{
"new_field": "value",
"preferences": {...},
"accessibility_needs": {...}
}- API Key Management: Uses environment variables, never hardcoded
- Input Validation: All AAC input is sanitized and validated
- User Data: Stored locally in SQLite, not transmitted externally
- Error Handling: Graceful degradation when services are unavailable
- Latency: < 2 seconds from AAC input to robot action
- Accuracy: > 90% intent recognition for common phrases
- Reliability: Automatic retry logic for failed operations
- Scalability: Modular design supports multiple users and robots
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow ROS 2 Python style guide
- Add comprehensive logging
- Include error handling
- Write tests for new features
- Update documentation
This project is licensed under the MIT License - see the LICENSE file for details.
- Google Gemini AI for natural language understanding
- ROS 2 Humble for robotics framework
- YOLOv8 for computer vision
- Nav2 for navigation capabilities
- Ultralytics for object detection
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Wiki
Made with β€οΈ for the assistive technology community