Robot MCP Control System

Natural language robot control using FastMCP and Multi-LLM Support (OpenAI, Groq, Gemini, Ollama).

🎯 Overview

Control robotic arms (Niryo Ned2, WidowX) through natural language using the Model Context Protocol (MCP) and multiple LLM providers. Simply tell the robot what to do: "Pick up the pencil and place it next to the red cube".

Video about the repository

Click the image below to watch a YouTube video about the repository created with NotebookLM:

🆕 Multi-LLM Support

Now supports 4 LLM providers with automatic API detection (see LLMClient):

Provider	Models	Best For	Speed
OpenAI	GPT-4o, GPT-4o-mini	Complex reasoning	Fast
Groq	Kimi K2, Llama 3.3, Mixtral	Ultra-fast inference	Very Fast
Google Gemini	Gemini 2.0/2.5	Long context, multimodal	Fast
Ollama	Llama 3.2, Mistral, CodeLlama	Local/offline use	Variable

🎯 How It Works

┌─────────────┐         ┌──────────────┐         ┌─────────────┐
│   Multi-    │  MCP    │              │  Robot  │             │
│   LLM       │◄───────►│  MCP Server  │◄───────►│   Niryo/    │
│  (OpenAI/   │Protocol │   (FastMCP)  │   API   │   WidowX    │
│ Groq/Gemini)│         │              │         │             │
└─────────────┘         └──────────────┘         └─────────────┘
      ▲                                                 │
      │ Natural Language                     Physical   │
      │ Commands                             Actions    │
   ┌──┴──┐                                          ┌───▼───┐
   │User │                                          │Objects│
   └─────┘                                          └───────┘

System Flow

User speaks natural language: "Pick up the pencil"
LLM interprets command and decides which tools to call
MCP Client sends tool calls to MCP server via SSE
MCP Server executes robot commands
Robot performs physical actions
Results flow back to user through the chain

✨ Key Features

🤖 Natural Language Control - No programming required
🔧 Multi-LLM Support - Choose OpenAI, Groq, Gemini, or Ollama
🎯 Auto-Detection - Automatically selects available API
🔄 Hot-Swapping - Switch providers during runtime
🤖 Multi-Robot Support - Niryo Ned2 and WidowX
👁️ Vision-Based Detection - Automatic object detection
🎨 Gradio Web Interface - User-friendly GUI
🎤 Voice Input - Speak commands directly
🔊 Audio Feedback - Robot speaks status updates

🚀 Quick Start

Prerequisites

Python 3.8+
Redis server
Niryo Ned2 or WidowX robot (or simulation)
At least one API key from:
- OpenAI (GPT-4o, GPT-4o-mini)
- Groq (Free tier available)
- Google AI Studio (Gemini)
- Ollama (Local, no API key needed)

Installation

# Clone repository
git clone https://github.com/dgaida/robot_mcp.git
cd robot_mcp

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -e .

# Configure API keys
cp secrets.env.template secrets.env
# Edit secrets.env and add your API key(s)

Configure API Keys

Edit secrets.env:

# Add at least one API key (priority: OpenAI > Groq > Gemini)

# OpenAI (GPT-4o, GPT-4o-mini)
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxx

# Groq (Kimi, Llama, Mixtral) - Free tier available!
GROQ_API_KEY=gsk-xxxxxxxxxxxxxxxx

# Google Gemini (Gemini 2.0, 2.5)
GEMINI_API_KEY=AIzaSy-xxxxxxxxxxxxxxxx

# Ollama - No API key needed (runs locally)
# Just install: curl -fsSL https://ollama.ai/install.sh | sh

Start Redis

docker run -p 6379:6379 redis:alpine

Quick Test

# Terminal 1: Start FastMCP Server
python server/fastmcp_robot_server.py --robot niryo --no-simulation

# Terminal 2: start object detection and segmentation
# cd .. vision_detect_segment
python scripts/detect_objects_publish_annotated_frames.py

# Terminal 3: visualize augmented frames (optional but highly recommended)
# cd .. redis_robot_comm
python scripts/visualize_annotated_frames.py

# Terminal 4: Run universal client (auto-detects available API)
python client/fastmcp_universal_client.py

💻 Usage

Universal Client (Recommended)

The new universal client auto-detects and uses available LLM providers:

# Auto-detect API (uses first available: OpenAI > Groq > Gemini > Ollama)
python client/fastmcp_universal_client.py

# Explicitly use OpenAI
python client/fastmcp_universal_client.py --api openai --model gpt-4o

# Use Groq (fastest inference)
python client/fastmcp_universal_client.py --api groq

# Use Gemini
python client/fastmcp_universal_client.py --api gemini --model gemini-2.0-flash

# Use local Ollama (no internet required)
python client/fastmcp_universal_client.py --api ollama --model llama3.2:1b

# Single command mode
python client/fastmcp_universal_client.py --command "What objects do you see?"

Interactive Features

You: What objects do you see?
🤖: I can see 3 objects: a pencil at [0.15, -0.05],
    a red cube at [0.20, 0.10], and a blue square at [0.18, -0.10]

You: switch
🔄 Current provider: GROQ
Available: openai, groq, gemini, ollama
Switch to: openai
✓ Switched to OPENAI - gpt-4o-mini

You: Move the pencil next to the red cube
🤖: Done! I've placed the pencil to the right of the red cube.

Legacy Groq-Only Client

The original Groq-specific client is still available:

python client/fastmcp_groq_client.py

Programmatic Usage

from client.fastmcp_universal_client import RobotUniversalMCPClient
import asyncio

async def demo():
    # Auto-detect available API
    client = RobotUniversalMCPClient()

    # Or specify provider
    # client = RobotUniversalMCPClient(api_choice="openai", model="gpt-4o")

    await client.connect()

    # Natural language commands work with any provider
    await client.chat("What objects do you see?")
    await client.chat("Pick up the largest object")
    await client.chat("Place it in the center")

    await client.disconnect()

asyncio.run(demo())

🛠️ Available Tools

The FastMCP server exposes these robot control tools (work with all LLM providers):

Robot Control

pick_place_object - Complete pick and place operation
pick_object - Pick up an object
place_object - Place a held object
push_object - Push objects (for items too large to grip)
move2observation_pose - Position for workspace observation

Object Detection

get_detected_objects - List all detected objects
get_detected_object - Find object at coordinates
get_largest_detected_object - Get biggest object
get_smallest_detected_object - Get smallest object
get_detected_objects_sorted - Sort objects by size

Workspace

get_largest_free_space_with_center - Find free space for placement
get_workspace_coordinate_from_point - Get corner/center coordinates
get_object_labels_as_string - List recognizable objects
add_object_name2object_labels - Add new object type

Feedback

speak - Text-to-speech output

📊 LLM Provider Comparison

Performance Characteristics

Provider	Function Calling	Speed	Cost	Offline	Best Use Case
OpenAI	✅ Excellent	Fast	$$	❌	Production, complex tasks
Groq	✅ Excellent	Very Fast	Free tier	❌	Development, prototyping
Gemini	✅ Excellent	Fast	Free tier	❌	Long context, multimodal
Ollama	⚠️ Limited	Variable	Free	✅	Local testing, privacy

Recommended Models

For Complex Tasks:

# OpenAI - Best reasoning
--api openai --model gpt-4o

# Groq - Fastest inference
--api groq --model moonshotai/kimi-k2-instruct-0905

For Development:

# OpenAI - Fast and cheap
--api openai --model gpt-4o-mini

# Groq - Free and fast
--api groq --model llama-3.3-70b-versatile

For Local/Offline:

# Ollama - No internet required
--api ollama --model llama3.2:1b

📚 Example Tasks

Simple Commands

"What objects do you see?"
"Pick up the pencil and place it at [0.2, 0.1]"
"Move the red cube to the left of the blue square"
"Show me the largest object"

Advanced Tasks

"Sort all objects by size from smallest to largest"
"Arrange objects in a triangle pattern"
"Group objects by color: red on left, blue on right"
"Swap positions of the two largest objects"

Complex Workflows

"Execute: 1) Find all objects 2) Move smallest to [0.15, 0.1]
3) Move largest right of smallest 4) Report positions"

"Organize the workspace: cubes on left, cylinders in middle,
everything else on right, aligned in rows"

🎮 Gradio Web Interface

The web GUI supports all LLM providers:

python robot_gui/mcp_app.py --robot niryo

Features:

💬 Chat with robot using any LLM provider
📹 Live camera feed with object annotations
🎤 Voice input (Whisper)
📊 System status monitoring
🔄 Switch LLM providers on-the-fly

⚙️ Configuration

Environment Variables

Create secrets.env:

# Multi-LLM Support - Add any/all of these

# OpenAI (priority if multiple keys present)
OPENAI_API_KEY=sk-xxxxxxxx

# Groq (fast, free tier available)
GROQ_API_KEY=gsk-xxxxxxxx

# Google Gemini
GEMINI_API_KEY=AIzaSy-xxxxxxxx

# Optional: ElevenLabs for better TTS
ELEVENLABS_API_KEY=your_key

API Priority

If multiple API keys are present, the client uses this priority:

OpenAI (if OPENAI_API_KEY set)
Groq (if GROQ_API_KEY set)
Gemini (if GEMINI_API_KEY set)
Ollama (fallback, no key needed)

Override with --api flag:

# Force Gemini even if OpenAI key exists
python client/fastmcp_universal_client.py --api gemini

🔧 Development

Running Tests

pytest tests/

Note for Linux servers/CI without display:

# Install Xvfb (virtual framebuffer)
sudo apt-get install xvfb

# Run tests with virtual display
xvfb-run -a pytest tests/

GitHub Actions automatically uses Xvfb on Linux runners.

Code Quality

black .
ruff check .
mypy robot_mcp/

📖 Documentation

Architecture Guide - System design and data flow
API Reference - Complete tool documentation
Examples - Common use cases
Troubleshooting - Common issues

🤝 Contributing

See CONTRIBUTING.md.

📝 Notes

Multi-LLM Architecture: This repository uses the LLMClient class from the llm_client repository, providing unified access to multiple LLM providers.

Function Calling Support: OpenAI, Groq, and Gemini all support function calling natively. Ollama has limited support and falls back to text-based instruction following.

Dependencies: Robot Environment, Redis Robot Comm, Text2Speech and Speech2Text are automatically installed from GitHub.

📄 License

MIT License - see LICENSE for details.

🙏 Acknowledgments

Model Context Protocol - Communication framework
OpenAI - GPT models
Groq - Fast LLM inference
Google Gemini - Gemini models
Ollama - Local LLM deployment
FastMCP - Modern MCP implementation
Niryo Robotics - Robot hardware

📧 Contact

Daniel Gaida - daniel.gaida@th-koeln.de

Project Link: https://github.com/dgaida/robot_mcp

Made with ❤️ for robotic automation

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
.github/workflows		.github/workflows
client		client
config		config
docs		docs
examples		examples
robot_gui		robot_gui
scripts		scripts
server		server
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
cliff.toml		cliff.toml
gradio_environment.yml		gradio_environment.yml
launch_gui.bat		launch_gui.bat
launch_gui.sh		launch_gui.sh
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_gradio.txt		requirements_gradio.txt
secrets.env.template		secrets.env.template

Folders and files

Latest commit

History

Repository files navigation

Robot MCP Control System

🎯 Overview

Video about the repository

🆕 Multi-LLM Support

🎯 How It Works

System Flow

✨ Key Features

🚀 Quick Start

Prerequisites

Installation

Configure API Keys

Start Redis

Quick Test

💻 Usage

Universal Client (Recommended)

Interactive Features

Legacy Groq-Only Client

Programmatic Usage

🛠️ Available Tools

Robot Control

Object Detection

Workspace

Feedback

📊 LLM Provider Comparison

Performance Characteristics

Recommended Models

📚 Example Tasks

Simple Commands

Advanced Tasks

Complex Workflows

🎮 Gradio Web Interface

⚙️ Configuration

Environment Variables

API Priority

🔧 Development

Running Tests

Code Quality

📖 Documentation

🤝 Contributing

📝 Notes

📄 License

🙏 Acknowledgments

📧 Contact

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages