Skip to content

dgaida/robot_mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

128 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Robot MCP Control System

Natural language robot control using FastMCP and Multi-LLM Support (OpenAI, Groq, Gemini, Ollama).

Version Python License: MIT codecov Code Quality Tests CodeQL Code style: black Ruff Docs Maintenance Last commit

๐ŸŽฏ Overview

Control robotic arms (Niryo Ned2, WidowX) through natural language using the Model Context Protocol (MCP) and multiple LLM providers. Simply tell the robot what to do: "Pick up the pencil and place it next to the red cube".

Video about the repository

Click the image below to watch a YouTube video about the repository created with NotebookLM:

Robot Demo

๐Ÿ†• Multi-LLM Support

Now supports 4 LLM providers with automatic API detection (see LLMClient):

Provider Models Best For Speed
OpenAI GPT-4o, GPT-4o-mini Complex reasoning Fast
Groq Kimi K2, Llama 3.3, Mixtral Ultra-fast inference Very Fast
Google Gemini Gemini 2.0/2.5 Long context, multimodal Fast
Ollama Llama 3.2, Mistral, CodeLlama Local/offline use Variable

๐ŸŽฏ How It Works

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Multi-    โ”‚  MCP    โ”‚              โ”‚  Robot  โ”‚             โ”‚
โ”‚   LLM       โ”‚โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚  MCP Server  โ”‚โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚   Niryo/    โ”‚
โ”‚  (OpenAI/   โ”‚Protocol โ”‚   (FastMCP)  โ”‚   API   โ”‚   WidowX    โ”‚
โ”‚ Groq/Gemini)โ”‚         โ”‚              โ”‚         โ”‚             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
      โ–ฒ                                                 โ”‚
      โ”‚ Natural Language                     Physical   โ”‚
      โ”‚ Commands                             Actions    โ”‚
   โ”Œโ”€โ”€โ”ดโ”€โ”€โ”                                          โ”Œโ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”
   โ”‚User โ”‚                                          โ”‚Objectsโ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”˜                                          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

System Flow

  1. User speaks natural language: "Pick up the pencil"
  2. LLM interprets command and decides which tools to call
  3. MCP Client sends tool calls to MCP server via SSE
  4. MCP Server executes robot commands
  5. Robot performs physical actions
  6. Results flow back to user through the chain

โœจ Key Features

  • ๐Ÿค– Natural Language Control - No programming required
  • ๐Ÿ”ง Multi-LLM Support - Choose OpenAI, Groq, Gemini, or Ollama
  • ๐ŸŽฏ Auto-Detection - Automatically selects available API
  • ๐Ÿ”„ Hot-Swapping - Switch providers during runtime
  • ๐Ÿค– Multi-Robot Support - Niryo Ned2 and WidowX
  • ๐Ÿ‘๏ธ Vision-Based Detection - Automatic object detection
  • ๐ŸŽจ Gradio Web Interface - User-friendly GUI
  • ๐ŸŽค Voice Input - Speak commands directly
  • ๐Ÿ”Š Audio Feedback - Robot speaks status updates

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.8+
  • Redis server
  • Niryo Ned2 or WidowX robot (or simulation)
  • At least one API key from:

Installation

# Clone repository
git clone https://github.com/dgaida/robot_mcp.git
cd robot_mcp

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -e .

# Configure API keys
cp secrets.env.template secrets.env
# Edit secrets.env and add your API key(s)

Configure API Keys

Edit secrets.env:

# Add at least one API key (priority: OpenAI > Groq > Gemini)

# OpenAI (GPT-4o, GPT-4o-mini)
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxx

# Groq (Kimi, Llama, Mixtral) - Free tier available!
GROQ_API_KEY=gsk-xxxxxxxxxxxxxxxx

# Google Gemini (Gemini 2.0, 2.5)
GEMINI_API_KEY=AIzaSy-xxxxxxxxxxxxxxxx

# Ollama - No API key needed (runs locally)
# Just install: curl -fsSL https://ollama.ai/install.sh | sh

Start Redis

docker run -p 6379:6379 redis:alpine

Quick Test

# Terminal 1: Start FastMCP Server
python server/fastmcp_robot_server.py --robot niryo --no-simulation

# Terminal 2: start object detection and segmentation
# cd .. vision_detect_segment
python scripts/detect_objects_publish_annotated_frames.py

# Terminal 3: visualize augmented frames (optional but highly recommended)
# cd .. redis_robot_comm
python scripts/visualize_annotated_frames.py

# Terminal 4: Run universal client (auto-detects available API)
python client/fastmcp_universal_client.py

๐Ÿ’ป Usage

Universal Client (Recommended)

The new universal client auto-detects and uses available LLM providers:

# Auto-detect API (uses first available: OpenAI > Groq > Gemini > Ollama)
python client/fastmcp_universal_client.py

# Explicitly use OpenAI
python client/fastmcp_universal_client.py --api openai --model gpt-4o

# Use Groq (fastest inference)
python client/fastmcp_universal_client.py --api groq

# Use Gemini
python client/fastmcp_universal_client.py --api gemini --model gemini-2.0-flash

# Use local Ollama (no internet required)
python client/fastmcp_universal_client.py --api ollama --model llama3.2:1b

# Single command mode
python client/fastmcp_universal_client.py --command "What objects do you see?"

Interactive Features

You: What objects do you see?
๐Ÿค–: I can see 3 objects: a pencil at [0.15, -0.05],
    a red cube at [0.20, 0.10], and a blue square at [0.18, -0.10]

You: switch
๐Ÿ”„ Current provider: GROQ
Available: openai, groq, gemini, ollama
Switch to: openai
โœ“ Switched to OPENAI - gpt-4o-mini

You: Move the pencil next to the red cube
๐Ÿค–: Done! I've placed the pencil to the right of the red cube.

Legacy Groq-Only Client

The original Groq-specific client is still available:

python client/fastmcp_groq_client.py

Programmatic Usage

from client.fastmcp_universal_client import RobotUniversalMCPClient
import asyncio

async def demo():
    # Auto-detect available API
    client = RobotUniversalMCPClient()

    # Or specify provider
    # client = RobotUniversalMCPClient(api_choice="openai", model="gpt-4o")

    await client.connect()

    # Natural language commands work with any provider
    await client.chat("What objects do you see?")
    await client.chat("Pick up the largest object")
    await client.chat("Place it in the center")

    await client.disconnect()

asyncio.run(demo())

๐Ÿ› ๏ธ Available Tools

The FastMCP server exposes these robot control tools (work with all LLM providers):

Robot Control

  • pick_place_object - Complete pick and place operation
  • pick_object - Pick up an object
  • place_object - Place a held object
  • push_object - Push objects (for items too large to grip)
  • move2observation_pose - Position for workspace observation

Object Detection

  • get_detected_objects - List all detected objects
  • get_detected_object - Find object at coordinates
  • get_largest_detected_object - Get biggest object
  • get_smallest_detected_object - Get smallest object
  • get_detected_objects_sorted - Sort objects by size

Workspace

  • get_largest_free_space_with_center - Find free space for placement
  • get_workspace_coordinate_from_point - Get corner/center coordinates
  • get_object_labels_as_string - List recognizable objects
  • add_object_name2object_labels - Add new object type

Feedback

  • speak - Text-to-speech output

๐Ÿ“Š LLM Provider Comparison

Performance Characteristics

Provider Function Calling Speed Cost Offline Best Use Case
OpenAI โœ… Excellent Fast $$ โŒ Production, complex tasks
Groq โœ… Excellent Very Fast Free tier โŒ Development, prototyping
Gemini โœ… Excellent Fast Free tier โŒ Long context, multimodal
Ollama โš ๏ธ Limited Variable Free โœ… Local testing, privacy

Recommended Models

For Complex Tasks:

# OpenAI - Best reasoning
--api openai --model gpt-4o

# Groq - Fastest inference
--api groq --model moonshotai/kimi-k2-instruct-0905

For Development:

# OpenAI - Fast and cheap
--api openai --model gpt-4o-mini

# Groq - Free and fast
--api groq --model llama-3.3-70b-versatile

For Local/Offline:

# Ollama - No internet required
--api ollama --model llama3.2:1b

๐Ÿ“š Example Tasks

Simple Commands

"What objects do you see?"
"Pick up the pencil and place it at [0.2, 0.1]"
"Move the red cube to the left of the blue square"
"Show me the largest object"

Advanced Tasks

"Sort all objects by size from smallest to largest"
"Arrange objects in a triangle pattern"
"Group objects by color: red on left, blue on right"
"Swap positions of the two largest objects"

Complex Workflows

"Execute: 1) Find all objects 2) Move smallest to [0.15, 0.1]
3) Move largest right of smallest 4) Report positions"

"Organize the workspace: cubes on left, cylinders in middle,
everything else on right, aligned in rows"

๐ŸŽฎ Gradio Web Interface

The web GUI supports all LLM providers:

python robot_gui/mcp_app.py --robot niryo

Features:

  • ๐Ÿ’ฌ Chat with robot using any LLM provider
  • ๐Ÿ“น Live camera feed with object annotations
  • ๐ŸŽค Voice input (Whisper)
  • ๐Ÿ“Š System status monitoring
  • ๐Ÿ”„ Switch LLM providers on-the-fly

โš™๏ธ Configuration

Environment Variables

Create secrets.env:

# Multi-LLM Support - Add any/all of these

# OpenAI (priority if multiple keys present)
OPENAI_API_KEY=sk-xxxxxxxx

# Groq (fast, free tier available)
GROQ_API_KEY=gsk-xxxxxxxx

# Google Gemini
GEMINI_API_KEY=AIzaSy-xxxxxxxx

# Optional: ElevenLabs for better TTS
ELEVENLABS_API_KEY=your_key

API Priority

If multiple API keys are present, the client uses this priority:

  1. OpenAI (if OPENAI_API_KEY set)
  2. Groq (if GROQ_API_KEY set)
  3. Gemini (if GEMINI_API_KEY set)
  4. Ollama (fallback, no key needed)

Override with --api flag:

# Force Gemini even if OpenAI key exists
python client/fastmcp_universal_client.py --api gemini

๐Ÿ”ง Development

Running Tests

pytest tests/

Note for Linux servers/CI without display:

# Install Xvfb (virtual framebuffer)
sudo apt-get install xvfb

# Run tests with virtual display
xvfb-run -a pytest tests/

GitHub Actions automatically uses Xvfb on Linux runners.

Code Quality

black .
ruff check .
mypy robot_mcp/

๐Ÿ“– Documentation

๐Ÿค Contributing

See CONTRIBUTING.md.

๐Ÿ“ Notes

Multi-LLM Architecture: This repository uses the LLMClient class from the llm_client repository, providing unified access to multiple LLM providers.

Function Calling Support: OpenAI, Groq, and Gemini all support function calling natively. Ollama has limited support and falls back to text-based instruction following.

Dependencies: Robot Environment, Redis Robot Comm, Text2Speech and Speech2Text are automatically installed from GitHub.

๐Ÿ“„ License

MIT License - see LICENSE for details.

๐Ÿ™ Acknowledgments

๐Ÿ“ง Contact

Daniel Gaida - daniel.gaida@th-koeln.de

Project Link: https://github.com/dgaida/robot_mcp


Made with โค๏ธ for robotic automation

About

Natural language robot control using Model Context Protocol (MCP) and Groq's LLM API.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors