Natural language robot control using FastMCP and Multi-LLM Support (OpenAI, Groq, Gemini, Ollama).
Control robotic arms (Niryo Ned2, WidowX) through natural language using the Model Context Protocol (MCP) and multiple LLM providers. Simply tell the robot what to do: "Pick up the pencil and place it next to the red cube".
Click the image below to watch a YouTube video about the repository created with NotebookLM:
Now supports 4 LLM providers with automatic API detection (see LLMClient):
| Provider | Models | Best For | Speed |
|---|---|---|---|
| OpenAI | GPT-4o, GPT-4o-mini | Complex reasoning | Fast |
| Groq | Kimi K2, Llama 3.3, Mixtral | Ultra-fast inference | Very Fast |
| Google Gemini | Gemini 2.0/2.5 | Long context, multimodal | Fast |
| Ollama | Llama 3.2, Mistral, CodeLlama | Local/offline use | Variable |
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ Multi- โ MCP โ โ Robot โ โ
โ LLM โโโโโโโโโโบโ MCP Server โโโโโโโโโโบโ Niryo/ โ
โ (OpenAI/ โProtocol โ (FastMCP) โ API โ WidowX โ
โ Groq/Gemini)โ โ โ โ โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โฒ โ
โ Natural Language Physical โ
โ Commands Actions โ
โโโโดโโโ โโโโโผโโโโ
โUser โ โObjectsโ
โโโโโโโ โโโโโโโโโ
- User speaks natural language: "Pick up the pencil"
- LLM interprets command and decides which tools to call
- MCP Client sends tool calls to MCP server via SSE
- MCP Server executes robot commands
- Robot performs physical actions
- Results flow back to user through the chain
- ๐ค Natural Language Control - No programming required
- ๐ง Multi-LLM Support - Choose OpenAI, Groq, Gemini, or Ollama
- ๐ฏ Auto-Detection - Automatically selects available API
- ๐ Hot-Swapping - Switch providers during runtime
- ๐ค Multi-Robot Support - Niryo Ned2 and WidowX
- ๐๏ธ Vision-Based Detection - Automatic object detection
- ๐จ Gradio Web Interface - User-friendly GUI
- ๐ค Voice Input - Speak commands directly
- ๐ Audio Feedback - Robot speaks status updates
- Python 3.8+
- Redis server
- Niryo Ned2 or WidowX robot (or simulation)
- At least one API key from:
- OpenAI (GPT-4o, GPT-4o-mini)
- Groq (Free tier available)
- Google AI Studio (Gemini)
- Ollama (Local, no API key needed)
# Clone repository
git clone https://github.com/dgaida/robot_mcp.git
cd robot_mcp
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -e .
# Configure API keys
cp secrets.env.template secrets.env
# Edit secrets.env and add your API key(s)Edit secrets.env:
# Add at least one API key (priority: OpenAI > Groq > Gemini)
# OpenAI (GPT-4o, GPT-4o-mini)
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxx
# Groq (Kimi, Llama, Mixtral) - Free tier available!
GROQ_API_KEY=gsk-xxxxxxxxxxxxxxxx
# Google Gemini (Gemini 2.0, 2.5)
GEMINI_API_KEY=AIzaSy-xxxxxxxxxxxxxxxx
# Ollama - No API key needed (runs locally)
# Just install: curl -fsSL https://ollama.ai/install.sh | shdocker run -p 6379:6379 redis:alpine# Terminal 1: Start FastMCP Server
python server/fastmcp_robot_server.py --robot niryo --no-simulation
# Terminal 2: start object detection and segmentation
# cd .. vision_detect_segment
python scripts/detect_objects_publish_annotated_frames.py
# Terminal 3: visualize augmented frames (optional but highly recommended)
# cd .. redis_robot_comm
python scripts/visualize_annotated_frames.py
# Terminal 4: Run universal client (auto-detects available API)
python client/fastmcp_universal_client.pyThe new universal client auto-detects and uses available LLM providers:
# Auto-detect API (uses first available: OpenAI > Groq > Gemini > Ollama)
python client/fastmcp_universal_client.py
# Explicitly use OpenAI
python client/fastmcp_universal_client.py --api openai --model gpt-4o
# Use Groq (fastest inference)
python client/fastmcp_universal_client.py --api groq
# Use Gemini
python client/fastmcp_universal_client.py --api gemini --model gemini-2.0-flash
# Use local Ollama (no internet required)
python client/fastmcp_universal_client.py --api ollama --model llama3.2:1b
# Single command mode
python client/fastmcp_universal_client.py --command "What objects do you see?"You: What objects do you see?
๐ค: I can see 3 objects: a pencil at [0.15, -0.05],
a red cube at [0.20, 0.10], and a blue square at [0.18, -0.10]
You: switch
๐ Current provider: GROQ
Available: openai, groq, gemini, ollama
Switch to: openai
โ Switched to OPENAI - gpt-4o-mini
You: Move the pencil next to the red cube
๐ค: Done! I've placed the pencil to the right of the red cube.
The original Groq-specific client is still available:
python client/fastmcp_groq_client.pyfrom client.fastmcp_universal_client import RobotUniversalMCPClient
import asyncio
async def demo():
# Auto-detect available API
client = RobotUniversalMCPClient()
# Or specify provider
# client = RobotUniversalMCPClient(api_choice="openai", model="gpt-4o")
await client.connect()
# Natural language commands work with any provider
await client.chat("What objects do you see?")
await client.chat("Pick up the largest object")
await client.chat("Place it in the center")
await client.disconnect()
asyncio.run(demo())The FastMCP server exposes these robot control tools (work with all LLM providers):
pick_place_object- Complete pick and place operationpick_object- Pick up an objectplace_object- Place a held objectpush_object- Push objects (for items too large to grip)move2observation_pose- Position for workspace observation
get_detected_objects- List all detected objectsget_detected_object- Find object at coordinatesget_largest_detected_object- Get biggest objectget_smallest_detected_object- Get smallest objectget_detected_objects_sorted- Sort objects by size
get_largest_free_space_with_center- Find free space for placementget_workspace_coordinate_from_point- Get corner/center coordinatesget_object_labels_as_string- List recognizable objectsadd_object_name2object_labels- Add new object type
speak- Text-to-speech output
| Provider | Function Calling | Speed | Cost | Offline | Best Use Case |
|---|---|---|---|---|---|
| OpenAI | โ Excellent | Fast | $$ | โ | Production, complex tasks |
| Groq | โ Excellent | Very Fast | Free tier | โ | Development, prototyping |
| Gemini | โ Excellent | Fast | Free tier | โ | Long context, multimodal |
| Ollama | Variable | Free | โ | Local testing, privacy |
For Complex Tasks:
# OpenAI - Best reasoning
--api openai --model gpt-4o
# Groq - Fastest inference
--api groq --model moonshotai/kimi-k2-instruct-0905For Development:
# OpenAI - Fast and cheap
--api openai --model gpt-4o-mini
# Groq - Free and fast
--api groq --model llama-3.3-70b-versatileFor Local/Offline:
# Ollama - No internet required
--api ollama --model llama3.2:1b"What objects do you see?"
"Pick up the pencil and place it at [0.2, 0.1]"
"Move the red cube to the left of the blue square"
"Show me the largest object"
"Sort all objects by size from smallest to largest"
"Arrange objects in a triangle pattern"
"Group objects by color: red on left, blue on right"
"Swap positions of the two largest objects"
"Execute: 1) Find all objects 2) Move smallest to [0.15, 0.1]
3) Move largest right of smallest 4) Report positions"
"Organize the workspace: cubes on left, cylinders in middle,
everything else on right, aligned in rows"
The web GUI supports all LLM providers:
python robot_gui/mcp_app.py --robot niryoFeatures:
- ๐ฌ Chat with robot using any LLM provider
- ๐น Live camera feed with object annotations
- ๐ค Voice input (Whisper)
- ๐ System status monitoring
- ๐ Switch LLM providers on-the-fly
Create secrets.env:
# Multi-LLM Support - Add any/all of these
# OpenAI (priority if multiple keys present)
OPENAI_API_KEY=sk-xxxxxxxx
# Groq (fast, free tier available)
GROQ_API_KEY=gsk-xxxxxxxx
# Google Gemini
GEMINI_API_KEY=AIzaSy-xxxxxxxx
# Optional: ElevenLabs for better TTS
ELEVENLABS_API_KEY=your_keyIf multiple API keys are present, the client uses this priority:
- OpenAI (if
OPENAI_API_KEYset) - Groq (if
GROQ_API_KEYset) - Gemini (if
GEMINI_API_KEYset) - Ollama (fallback, no key needed)
Override with --api flag:
# Force Gemini even if OpenAI key exists
python client/fastmcp_universal_client.py --api geminipytest tests/Note for Linux servers/CI without display:
# Install Xvfb (virtual framebuffer)
sudo apt-get install xvfb
# Run tests with virtual display
xvfb-run -a pytest tests/GitHub Actions automatically uses Xvfb on Linux runners.
black .
ruff check .
mypy robot_mcp/- Architecture Guide - System design and data flow
- API Reference - Complete tool documentation
- Examples - Common use cases
- Troubleshooting - Common issues
See CONTRIBUTING.md.
Multi-LLM Architecture: This repository uses the LLMClient class from the llm_client repository, providing unified access to multiple LLM providers.
Function Calling Support: OpenAI, Groq, and Gemini all support function calling natively. Ollama has limited support and falls back to text-based instruction following.
Dependencies: Robot Environment, Redis Robot Comm, Text2Speech and Speech2Text are automatically installed from GitHub.
MIT License - see LICENSE for details.
- Model Context Protocol - Communication framework
- OpenAI - GPT models
- Groq - Fast LLM inference
- Google Gemini - Gemini models
- Ollama - Local LLM deployment
- FastMCP - Modern MCP implementation
- Niryo Robotics - Robot hardware
Daniel Gaida - daniel.gaida@th-koeln.de
Project Link: https://github.com/dgaida/robot_mcp
Made with โค๏ธ for robotic automation
