Skip to content

Natural language robot control using code-as-policies. LLM generates executable Python code for manipulation tasks in MuJoCo simulation with Streamlit chat interface and real-time feedback.

License

Notifications You must be signed in to change notification settings

kalaiselvan-t/llm-task-planner

Repository files navigation

LLM Task Planner

Python MuJoCo Streamlit License

Natural language control for robotic manipulation through LLM-generated code

Demo

A robotic task planner that translates natural language commands into executable manipulation tasks using large language models and physics simulation.

Overview

The system enables intuitive robot control through conversational commands. Users describe tasks in plain English, which are converted into executable Python code by an LLM, reviewed, and executed in a physics-accurate simulation environment.

Key capabilities:

  • Natural language interface via Streamlit chat
  • Real-time MuJoCo physics simulation with visual feedback
  • Code generation and review workflow for safe execution
  • Robust pick-and-place primitives with error handling
  • Rich perception API optimized for LLM reasoning

Installation

Prerequisites:

  • Python 3.11 or higher
  • uv package manager
  • Ollama with qwen2.5-coder:14b model

Setup:

git clone https://github.com/kalaiselvan-t/llm-task-planner.git
cd llm-task-planner
uv sync
ollama pull qwen2.5-coder:14b

Quick Start

# Start Ollama (if not already running)
ollama serve

# Launch application
uv run streamlit run app.py

The Streamlit interface opens at http://localhost:8501 while a MuJoCo viewer window displays the simulation.

Example commands:

  • "Pick up the red cube"
  • "Place the blue cube next to the green cube"
  • "Stack all cubes by size, largest at the bottom"
  • "Move all red objects to the left side of the table"

Workflow:

  1. Enter natural language prompt
  2. Review generated code (regenerate if needed)
  3. Execute and observe in real-time simulation
  4. Iterate with additional commands

API Reference

Core Manipulation

# Pick object by name
sim.pick("red_cube")

# Place at target position [x, y, z]
sim.place([0.5, 0.0, 0.45])

# Return to home configuration
sim.move_to_home()

# Gripper control
sim.open_gripper()
sim.close_gripper()
sim.set_gripper(128)  # 0=closed, 255=open

Perception

# Query all movable objects
objects = sim.get_graspable_objects()
# Returns: [{"name": str, "pos": [x,y,z], "color": str, "size": float, "shape": str}, ...]

# Find object by properties
obj = sim.find_graspable_object(color="red", shape="cube")

# Get position by name
pos = sim.get_object_position("red_cube")

# Count objects with filters
count = sim.count_graspable_objects(color="red")

Spatial Reasoning

# Calculate offset from object
target = sim.get_position_offset("red_cube", dx=0.1, dy=0.0, dz=0.05)

# Validate reachability
if sim.is_position_reachable([0.5, 0.2, 0.45]):
    sim.place([0.5, 0.2, 0.45])

# Get workspace bounds
bounds = sim.get_workspace_bounds()
# Returns: {"x": [0.3, 0.8], "y": [-0.3, 0.3], "z": [0.0, 0.8]}

Architecture

The system consists of two threads synchronized via event-based communication:

Main Thread (Streamlit UI):

  • Chat interface for user interaction
  • LLM code generation and streaming
  • Code review and execution control

Background Thread (MuJoCo Viewer):

  • Persistent simulation loop
  • Task execution in viewer context
  • Real-time visualization with wall-clock sync

Synchronization: TaskExecutor mediates between threads using threading.Event for simple, blocking task submission without queues.

Project Structure

llm-task-planner/
├── src/llm_task_planner/
│   ├── __main__.py              # UI, viewer thread, task executor
│   ├── core/
│   │   ├── simulator.py         # Main simulator (combines mixins)
│   │   ├── perception.py        # Object queries and spatial reasoning
│   │   ├── motion.py            # Inverse kinematics and gripper control
│   │   ├── tasks.py             # Pick-and-place primitives
│   │   └── llm_interface.py     # Ollama integration
│   └── config/
│       └── simulation_params.yaml
├── franka_emika_panda/
│   └── panda_with_table.xml     # Scene definition with objects
├── scripts/
│   └── spawn_cubes.py           # Cube object generator
└── tests/

Configuration

Workspace bounds (defined in perception.py):

  • X: 0.3 to 0.8m | Y: -0.3 to 0.3m | Z: 0.0 to 0.8m
  • Table surface: 0.165m height

Task execution constants (defined in tasks.py):

  • Approach height: 0.1m
  • Place height: 0.01m
  • Motion timeout: 2000 steps (~4s)

LLM configuration:

llm = LLM(model="qwen2.5-coder:14b", base_url="http://localhost:11434")

Troubleshooting

Ollama connection error:

ollama serve  # Ensure Ollama is running

Robot not responding:

  • Verify object names are exact matches (case-sensitive)
  • Check positions are within workspace bounds
  • Ensure MuJoCo viewer window is responsive

Invalid generated code:

  • Click "Regenerate" for alternative solution
  • Check Ollama model is loaded: ollama list

Development

Direct API usage example:

from llm_task_planner.core.simulator import Simulator

sim = Simulator()
sim.move_to_home()

# Test perception
objects = sim.get_graspable_objects()
print(f"Found {len(objects)} objects")

# Test manipulation
red_cube = sim.find_graspable_object(color="red")
if red_cube:
    sim.pick(red_cube["name"])
    sim.place([0.5, 0.0, 0.20])

License

MIT

About

Natural language robot control using code-as-policies. LLM generates executable Python code for manipulation tasks in MuJoCo simulation with Streamlit chat interface and real-time feedback.

Topics

Resources

License

Stars

Watchers

Forks

Languages