Skip to content

joemunene-by/ghostloop

Repository files navigation

ghostloop

The agent loop, embodied.

A tool-using agent runtime, fail-closed safety pipeline, statistically-rigorous bench harness, and sim-first execution layer for embodied AI. Sister project to GhostLM.

Now live: pip install ghostloop · interactive demo · 11 releases · 333 tests · MIT.

PyPI Downloads HF Space License Python Status Tests CI


Why this exists

Robotics in 2026 has two healthy ecosystems and a missing middle.

  • ROS 2 gives you middleware: a message bus, lifecycle management, drivers, navigation. It does not care about LLMs, agents, or modern eval methodology.
  • VLA models (Open-X-Embodiment, OpenVLA, π0, RT-2) give you policies: vision-and-language conditioned action heads. They mostly live in research codebases that ship the model weights but not the runtime.

Nobody ships the layer in between: a runtime where a model emits high-level intents like move_to(0.4, 0.2, 0.1) or pick("widget-7"), those intents flow through a fail-closed safety pipeline, the survivors execute on a backend (sim or hardware), and every step is captured in a structured trace that can be replayed, audited, scored, mined, counterfactually re-played, or causally analysed.

That layer is ghostloop. The shape is borrowed from GhostAgent in GhostLM: tool registry, policy gates, structured trace, paired-comparison eval. The novel piece is binding it to robot primitives instead of CVE lookups, making the runtime backend-agnostic so the same agent loop drives a mock today, MuJoCo / PyBullet / Gymnasium right now, and ROS 2 / direct hardware later — and adding a layer of post-hoc analysis tooling (counterfactual replay, causal attribution, LLM-as-judge, property mining, adversarial fuzzing) that no other robotics framework ships.

Architecture

                policy        registry           pipeline           backend         post-hoc
                 emits         resolves           gates           executes         analysis
   user goal  ┌────────┐   ┌──────────┐   ┌──────────────┐   ┌──────────┐   ┌─────────────────┐
   ────────► │ Intent │ ► │ Primitive │ ► │PolicyPipeline│ ► │ Backend  │ ► │ counterfactual  │
              └────────┘   └──────────┘   └──────────────┘   └──────────┘   │ causal          │
                                                                  │         │ LLM-judge       │
                                                                  ▼         │ property mining │
                                              ┌──────────────────────┐      │ adversarial     │
                                              │  Trace (JSONL)       │ ───► │ trace query DSL │
                                              └──────────────────────┘      │ energy ledger   │
                                                       │                    └─────────────────┘
                                                       ▼
                                  ┌────────────────────────────────────┐
                                  │ Bench: Wilson CI + McNemar +       │
                                  │ Cohen's h + Sim2Real transfer gap  │
                                  └────────────────────────────────────┘
Type Role
Intent High-level structured command emitted by a policy: name, args, rationale.
Primitive Backend-bound callable. Has a name, description, arg schema (LLM-tool-card friendly).
PolicyPipeline Ordered list of PolicyGates. Fail-closed: any deny short-circuits.
Backend Execution adapter. MockBackend / MuJoCoBackend / PyBulletBackend / GymnasiumBackend / ROS2Backend / RandomizedBackend.
Trace Append-only event log with state_before / state_after / decision / result per step. JSONL writer + replay + query DSL.
LLMPolicy / VLAPolicy Bridge any OpenAI-compatible chat endpoint or VLA action head to the registry.
Mission DAG of Steps with prerequisites + retry semantics. Kahn-validated.
bench Episode harness with Wilson 95% CIs, McNemar exact p, Cohen's h, paired comparison, sim2real transfer gap.
properties Declarative invariants over traces — Always / Eventually / Until STL combinators + auto-mined candidates.
judges LLM-as-judge + heuristic rule-based trace scoring.
training Constrained-MDP rollout collector + Lagrangian multiplier + HER relabeling.

What ships in v0.10.0

70+ modules across ten releases. Highlights:

Core runtime

13 abstractions in core.py (Intent / Primitive / Result / Decision / PolicyGate / PolicyPipeline / Backend / MockBackend / TraceEvent / Trace / Runtime / Registry / DecisionAction). async_runtime.py mirrors them with awaitable gates + a control_loop(rate_hz).

Policy gates (12)

DenyListGate, RateLimitGate, GeofenceGate, ForceCapGate, HumanInTheLoopGate, ObstacleAvoidanceGate, RetryPolicy, CooldownGate, TimeWindowGate, ActionSmoothingGate (velocity / acceleration limits), plus the LLMPolicy and VLAPolicy adapters. All fail-closed.

Backends (6)

  • MockBackend — zero-install in-memory.
  • MuJoCoBackend — Google DeepMind MuJoCo with Menagerie auto-clone (Franka / UR5e / Spot / Stretch / Aloha / Allegro).
  • PyBulletBackend — Bullet physics for users without MuJoCo.
  • GymnasiumBackend — wrap any Farama Gymnasium env (hundreds of robotics + RL envs).
  • ROS2Backend — rclpy adapter for real-hardware deployments via DDS.
  • RandomizedBackend — wrap any backend with reproducible noise / jitter / dropout for sim2real.

Workspace + geometry

WorkspaceModel with axis-aligned boxes + spheres, HalfSpace / ConvexPolytope / signed_distance for SDF queries, workspace_from_urdf(...) to auto-build from a URDF, plus project_to_workspace / project_to_sdf for safe-action repair when a policy violates constraints.

Bench harness

  • Episode / EpisodeRunner / RunReport with Wilson 95% CIs.
  • paired_compare — McNemar exact p + Cohen's h.
  • Sim2RealBench — paired transfer-gap harness with per-primitive action-distribution KL.
  • random_seeds / grid_seeds / cma_es_seeds — adversarial fuzzers for finding failure-prone Episode initial states.
  • RewardShaper — declarative reward DSL (OnPrimitive / OnDecision / OnObservation / StepCost / CustomReward).
  • Episode catalogue: preset_reach_8 / preset_pick_and_place_4 / preset_geofence_smoke.

Properties + verification

  • PropertyEngine with built-in invariants (StaysInsideWorkspace, NeverHoldsTwoObjects, NeverExceedsRate, NoConsecutiveDuplicateIntents).
  • Always / Eventually / Until STL combinators over sliding windows.
  • AndProperty / OrProperty / NotProperty boolean combinators.
  • mine_properties(traces) — auto-discover candidate invariants from a corpus (followup transitions, numeric bounds, workspace AABBs).

Post-hoc analysis (the v0.10 novel pillars)

  • replay_with_policy(trace, new_policy) — counterfactual reasoning. "What would policy B have done on policy A's trace?"
  • attribute_failure(trace, property) — leave-one-out causal attribution; ranks events by necessity.
  • minimal_cause_set — greedy multi-event attribution.
  • LLMJudge — score traces with an LLM against a configurable rubric.
  • HeuristicJudge — rule-based predicate scoring for air-gapped CI.

Missions + skills

  • Mission / Step / MissionRunner — DAG of steps with prerequisites, retry semantics, required-vs-optional.
  • SkillGraph — typed DAG of skills with prereq + refines edges.
  • MorphologyRegistry — register pick per (franka, ur5e, spot) and build per robot.
  • composite_primitive — sequence existing primitives behind one name.

Training (constrained-MDP + HER)

  • SafeRolloutCollector + LagrangianMultiplier + train_safe — train policies under the safety pipeline; safety violations contribute to a Lagrangian penalty.
  • hindsight_relabel(rollout, goal_extractor, reward_fn) — classic HER (Andrychowicz et al. 2017) with FINAL / FUTURE / EPISODE / RANDOM strategies.
  • sparse_indicator_reward(threshold) — canonical HER reward.

Telemetry + persistence

  • OpenTelemetry hooks (step_span, record_decision, record_result).
  • EnergyLedger — per-primitive joule accounting with constant / linear-in-arg / linear-in-duration / linear-in-xyz models.
  • GhostloopStore — SQLite store for episodes / runs / comparisons.
  • Trace.write_jsonl() + load_trace() + iter_events() + summarize_trace().
  • query(trace, expr) — small DSL over traces (comparison ops + and/or/not/in).
  • diff_traces(a, b) — structured diff for ablation studies.

Fleet + dashboard

  • RobotHandle / FleetRegistry / FleetDispatcher (FIRST_IDLE / ROUND_ROBIN / LEAST_BUSY).
  • create_dashboard_app(store, fleet) — read-only FastAPI surface over the SQLite store.
  • StreamManager + attach_streaming(app) — WebSocket trace streaming with bounded ring buffers.

MCP + LLM integration

  • mcp_server.py exposes the runtime as a FastMCP server so Claude Desktop / Cursor / any MCP client can drive a robot through the safety pipeline.
  • LLMPolicy (closed-loop) and LLMPlanner (single-shot full-plan emission).

Setup

There are three ways to run ghostloop, in order of effort. Start with the zero-install path, prove the safety pipeline, then promote to your real arm.

1. Zero-install (3 minutes)

git clone https://github.com/joemunene-by/ghostloop
cd ghostloop

# Run the canonical pick-and-place demo on MockBackend.
PYTHONPATH=. python3 examples/pick_and_place.py

# Run a paired-comparison bench (Wilson CI + McNemar + Cohen's h).
PYTHONPATH=. python3 examples/bench_with_without_geofence.py

# Full test suite — 314 pass, 8 live-gated (skip cleanly without extras).
PYTHONPATH=. python3 -m pytest tests/

No dependencies beyond Python 3.10+. This proves the runtime, the safety pipeline, the bench harness, and the trace recorder — exactly the same code you'll point at a real arm later.

2. Drive any robot from any chat client over MCP (10 minutes)

ghostloop ships a single MCP server (examples/mcp_robot.py) that works with every MCP-aware client — the protocol is universal, so the same server speaks to Claude Desktop, Cursor, Continue, Cline, Zed, Gemini CLI, and any future client. Pick what you control via GHOSTLOOP_PROFILE:

Profile Robot Primitives exposed
franka_arm (default) Franka Panda 7-DOF arm set_joint, set_gripper, sense, take_photo, …
turtlebot TurtleBot mobile base drive, stop, goto, rotate, …
spot Boston Dynamics Spot quadruped walk_to, sit, stand, lie_down, …
tello DJI Tello / quadcopter takeoff, land, fly_to, hover, …
stretch Hello Robot Stretch RE3 (mobile arm) drive, set_joint, set_gripper, …
humanoid_demo Stationary humanoid wave, look_at, point_at, nod
<path/to/your.yaml> Your robot whatever you declare

Each preset bundles morphology-appropriate primitives, conservative workspace + force + velocity caps, HITL on the dangerous primitives, and a robot-specific instructions block the LLM gets as a system prompt. See examples/custom_robot.yaml for the YAML schema and examples/custom_robot_primitives.py for how to plug in your own actions (dispense_pill, alert_nurse, whatever your hardware does) without forking ghostloop.

Two transports, picked via GHOSTLOOP_TRANSPORT:

Transport When to use Clients
stdio (default) desktop, same machine Claude Desktop, Cursor, Continue, Cline, Zed, Gemini CLI
streamable-http remote, mobile, browser, kiosk any client supporting remote MCP servers

Step 1. Verify the example boots (any OS, any profile):

# Default (Franka arm)
python3 examples/mcp_robot.py --selfcheck

# Quadruped
GHOSTLOOP_PROFILE=spot python3 examples/mcp_robot.py --selfcheck

# Drone
GHOSTLOOP_PROFILE=tello python3 examples/mcp_robot.py --selfcheck

# Custom YAML
GHOSTLOOP_PROFILE=examples/custom_robot.yaml python3 examples/mcp_robot.py --selfcheck

Step 2. Install the MCP transport package:

pip install ghostloop[mcp]      # or: pip install mcp

Step 3. Wire it into your client. The same { command, args, env } block works for every desktop MCP client — only the path to the config file differs:

Client macOS Windows Linux
Claude Desktop ~/Library/Application Support/Claude/claude_desktop_config.json %APPDATA%\Claude\claude_desktop_config.json ~/.config/Claude/claude_desktop_config.json
Cursor ~/.cursor/mcp.json (or project-local .cursor/mcp.json) %USERPROFILE%\.cursor\mcp.json ~/.cursor/mcp.json
Continue ~/.continue/config.yaml (under mcpServers:) same same
Cline VS Code settings.jsoncline.mcpServers same same
Zed ~/.config/zed/settings.json (under context_servers) same same
Gemini CLI ~/.gemini/settings.json (under mcpServers) same same

Paste this block into the config file (replace the absolute path; pick a profile that matches your robot):

{
  "mcpServers": {
    "ghostloop": {
      "command": "python3",
      "args": ["/absolute/path/to/ghostloop/examples/mcp_robot.py"],
      "env": {
        "GHOSTLOOP_PROFILE": "franka_arm",
        "GHOSTLOOP_BACKEND": "mock",
        "GHOSTLOOP_TRANSPORT": "stdio",
        "GHOSTLOOP_INSTRUCTIONS": "Optional: extra robot-specific guidance appended to the profile's instructions block."
      }
    }
  }
}

💡 On Windows, swap python3 for python (or the absolute path to your interpreter, e.g. C:\Python313\python.exe). On macOS, use python3 from Homebrew or pyenv. Continue + Zed use YAML / JSONC respectively but the field shape is identical.

Step 4. Restart the client. New conversations get the tools: list_primitives, step, move_to(x, y, z), pick(object_id), place(), scan(radius), state, recent_trace(n). Try: "Use the ghostloop tools to move to (0.4, 0.0, 0.5), then scan with radius 0.3, then move to (0.6, 0.2, 0.5)." Watch the geofence reject targets outside [-0.6, 0.6].

Upgrade from mock to real physics (MuJoCo) — one env var:

"env": {
  "GHOSTLOOP_BACKEND": "mujoco",
  "GHOSTLOOP_MUJOCO_MODEL": "/absolute/path/to/franka_panda.xml"
}

(pip install ghostloop[mujoco] first.)

Upgrade to a real arm via ROS 2:

"env": {
  "GHOSTLOOP_BACKEND": "ros2",
  "GHOSTLOOP_ROS2_NODE": "ghostloop_arm",
  "GHOSTLOOP_CMD_VEL": "/franka/cmd",
  "GHOSTLOOP_JOINT_STATES": "/franka/joint_states",
  "GHOSTLOOP_FORCE_TORQUE": "/franka/wrench"
}

Prerequisites: ROS 2 (apt install ros-humble-desktop on Ubuntu, the Robotology Mac install on macOS, WSL2 + Ubuntu on Windows), your arm's ROS 2 driver running, and source /opt/ros/humble/setup.bash (or setup.zsh / setup.ps1) in the same shell that launches the client so the subprocess inherits $AMENT_PREFIX_PATH.

Before pointing this at a real robot: edit your profile (or copy a preset to YAML and tweak it) — set workspace_bounds / max_force_n / max_velocity / max_acceleration to your hardware's safe envelope, list dangerous primitives under hitl_primitives so the operator approves each call interactively, and write robot-specific guidance into the instructions: block (e.g. "never reach behind the base", "battery below 20% triggers automatic land"). Read the trace logs for the first dozen episodes; relax HITL only after you trust the model's behaviour.

Define your own robot

Two ways to add a robot ghostloop doesn't already know about:

A. YAML profile (no Python required) — copy examples/custom_robot.yaml, edit it, and point GHOSTLOOP_PROFILE at the path. The schema covers categories of standard primitives, your own custom primitives, composite macros, instructions for the LLM, workspace + force + velocity caps, denied / HITL operations, and the backend kind. The shipped sample defines a hospital medication-delivery robot — mobile base + arm, with custom dispense_pill and alert_nurse primitives and a deliver_room macro composed from existing primitives:

name: medbot_floor3
morphology: mobile_arm
categories: [mobile_base, dexterous, sensing, generic]
instructions: |
  You are MedBot, hospital floor-3 medication delivery. NEVER drive faster
  than 0.4 m/s. ALWAYS stop before extending the arm. ...
workspace_bounds: [[-15, -15, 0], [15, 15, 1.6]]
max_velocity: 0.4
hitl_primitives: [set_gripper, dispense_pill]
custom_primitives:
  - module: examples.custom_robot_primitives
    factory: dispense_pill
  - module: examples.custom_robot_primitives
    factory: alert_nurse
composites:
  - name: deliver_room
    steps: [goto, take_photo, dispense_pill, alert_nurse]
backend:
  kind: ros2
  kwargs: { node_name: medbot, cmd_vel_topic: /medbot/cmd_vel }

B. Code — build a RobotProfile programmatically. Useful when your robot needs runtime state (a calibration matrix, a credential, dynamically-resolved topic names) that doesn't fit YAML:

from ghostloop.profiles import RobotProfile, build_runtime_from_profile
from ghostloop.primitives import drive, set_gripper
from my_robot.primitives import dispense_pill, alert_nurse

profile = RobotProfile(
    name="medbot",
    morphology="mobile_arm",
    primitives=[drive(), set_gripper(), dispense_pill(), alert_nurse()],
    instructions="You are MedBot...",
    workspace_bounds=((-15, -15, 0), (15, 15, 1.6)),
    max_velocity=0.4,
    hitl_primitives=["dispense_pill"],
    backend_kind="ros2",
    backend_kwargs={"node_name": "medbot", "cmd_vel_topic": "/medbot/cmd_vel"},
)
runtime = build_runtime_from_profile(profile)

Custom Primitive factories follow a stable contract: a function returning Primitive(name, call, description, arg_schema). The call body talks to your hardware however you need it to — ROS 2 publisher, vendor SDK, raw serial, REST endpoint. See examples/custom_robot_primitives.py for two worked examples (dispense_pill, alert_nurse).

3. Mobile + remote MCP (HTTP transport)

For mobile chat apps (and any client that doesn't run on the same machine as the robot), run ghostloop's MCP server as a long-running HTTP service on the robot host:

# macOS / Linux
GHOSTLOOP_BACKEND=mock GHOSTLOOP_TRANSPORT=streamable-http \
GHOSTLOOP_HOST=0.0.0.0 GHOSTLOOP_PORT=8765 \
  python3 examples/claude_desktop_mcp_arm.py

# Windows PowerShell
$env:GHOSTLOOP_TRANSPORT='streamable-http'; $env:GHOSTLOOP_HOST='0.0.0.0'; $env:GHOSTLOOP_PORT='8765'
python examples\claude_desktop_mcp_arm.py

Then configure remote-MCP-capable clients with the URL form (no command/args):

{
  "mcpServers": {
    "ghostloop": { "url": "http://your-robot-host.local:8765/mcp" }
  }
}

Mobile MCP clients (Claude iOS once it ships remote MCP, plus the growing crop of third-party MCP-aware iOS / Android chat apps) connect via the same URL — no app-side install. For a custom mobile app, use any MCP TypeScript / Swift / Kotlin SDK from modelcontextprotocol.io. The HTTP wire format is the same.

⚠ Bind to 0.0.0.0 only on a private network or behind authentication. The default 127.0.0.1 is loopback-only (safer). For internet-exposed setups, put a reverse proxy with TLS + auth in front, or use the production dashboard's StaticTokenAuth pattern.

4. Without MCP — direct OpenAI-compatible function calling

Already have a model running and don't want to bother with MCP? examples/direct_llm_arm.py skips the protocol entirely and uses ghostloop's LLMPolicy to drive any OpenAI-compatible chat endpoint via native function calling. Tested against:

  • OpenAI GPT-4o / GPT-4o-mini
  • Anthropic Claude (via the OpenAI-compatible proxy endpoint)
  • Google Gemini (via OpenAI-compatible adapter)
  • Groq (Llama 3.x, DeepSeek, Mixtral)
  • Ollama (local Qwen, Llama, Mistral, GhostLM)
  • vLLM + llama.cpp server + GhostLM's multi-vendor server
OPENAI_BASE_URL=https://api.openai.com/v1 OPENAI_API_KEY=sk-... \
OPENAI_MODEL=gpt-4o-mini \
  python3 examples/direct_llm_arm.py

# Or local Ollama:
OPENAI_BASE_URL=http://localhost:11434/v1 OPENAI_API_KEY=ollama \
OPENAI_MODEL=qwen2.5:14b \
  python3 examples/direct_llm_arm.py

Same Backend choice (Mock / MuJoCo / ROS 2), same safety pipeline, same trace recorder. Only the LLM-to-tool plumbing differs: in-process via LLMPolicy instead of MCP wire protocol.

5. Run programmatically

For everything else — bench harnesses, training loops, post-hoc analysis — use ghostloop as a library. Examples below.

Library API examples

Run an LLM-driven episode

from ghostloop import Intent, MockBackend, PolicyPipeline, PrimitiveRegistry, Runtime
from ghostloop.policies import GeofenceGate, LLMPolicyConfig, llm_policy_loop
from ghostloop.primitives import move_to, pick, place, scan

registry = PrimitiveRegistry([move_to(), scan(), pick(), place()])
runtime = Runtime(
    backend=MockBackend(),
    registry=registry,
    policy_pipeline=PolicyPipeline(gates=[
        GeofenceGate(min_corner=(-1, -1, 0), max_corner=(1, 1, 1)),
    ]),
)

summary = llm_policy_loop(
    registry=registry,
    runtime=runtime,
    goal="Pick widget-7 from (0.4, 0.2, 0.1) and place it at (-0.4, 0.2, 0.1).",
    config=LLMPolicyConfig(base_url="http://localhost:11434/v1", model="qwen2.5:14b"),
    max_steps=16,
)
runtime.trace.write_jsonl("episode.jsonl")

Drive a real physics simulation

from ghostloop import PolicyPipeline, PrimitiveRegistry, Runtime, Intent
from ghostloop.backends import MuJoCoBackend
from ghostloop.backends.mujoco import move_to, scan

backend = MuJoCoBackend(model_path="franka_panda.xml", end_effector="hand")
registry = PrimitiveRegistry([move_to(), scan()])
runtime = Runtime(backend=backend, registry=registry, policy_pipeline=PolicyPipeline())

runtime.step(Intent("move_to", {"x": 0.4, "y": 0.0, "z": 0.5, "duration": 1.0}))
runtime.step(Intent("scan", {"radius": 0.5}))

Models from the MuJoCo Menagerie drop in directly: Franka Panda, UR5e, Stretch RE3, Allegro hand, Spot, Aloha bimanual.

Counterfactual replay — "what would the new policy have done?"

from ghostloop.counterfactual import replay_with_policy
from ghostloop.traces import load_trace

original = load_trace("episode.jsonl")

def new_policy(state_before):
    # any callable mapping state -> Intent | None
    return Intent("scan", {"radius": 0.3})

cf = replay_with_policy(original, new_policy, new_policy_name="more-cautious")
print(cf.divergence_rate, cf.first_divergence_step)
print(cf.render_md())

Causal failure attribution

from ghostloop.causal import attribute_failure, minimal_cause_set
from ghostloop.properties import StaysInsideWorkspace

prop = StaysInsideWorkspace(min_corner=(-1, -1, 0), max_corner=(1, 1, 1))
analysis = attribute_failure(failing_trace, prop)
print(analysis.render_md())          # ranked top-K root causes

cause_set = minimal_cause_set(failing_trace, prop, max_set_size=3)

LLM-as-judge

from ghostloop.judges import LLMJudge, LLMJudgeConfig

class GhostLMClient:
    def chat(self, messages, **kwargs):
        # adapt your chat endpoint here
        ...

judge = LLMJudge(client=GhostLMClient(), config=LLMJudgeConfig(model="ghostlm-v0.9-chat"))
judgement = judge.score(trace)
print(judgement.label, judgement.score, judgement.rubric_scores)

Adversarial fuzzing

from ghostloop.bench import cma_es_seeds

def perturb(base_episode, sample):
    # return a copy of base_episode with backend initial state shifted by `sample`
    ...

results = cma_es_seeds(
    base_episode, perturb,
    parameter_ranges={"x0": (-1.0, 1.0), "y0": (-1.0, 1.0)},
    n_iterations=8, population_size=8, seed=42,
)
worst = results[:5]    # promote into your regression bench

Property mining

from ghostloop.properties import mine_properties

corpus = [load_trace(p) for p in successful_traces_paths]
candidates = mine_properties(corpus, min_support=0.9)
for mp in candidates:
    print(mp.pattern, mp.description, mp.support)
    promoted = mp.promote()        # a real Property ready for the engine

Sim-to-Real bench

from ghostloop.bench import Sim2RealBench

bench = Sim2RealBench(
    sim_episodes=eps_sim,
    real_episodes=eps_real,
    sim_label="mujoco", real_label="randomized_mujoco",
)
report = bench.run()
print(report.render_md())          # transfer gap + McNemar + KL action-distribution

Energy ledger

from ghostloop.telemetry import EnergyLedger

ledger = EnergyLedger()
print(ledger.total(trace), "J")
print(ledger.by_primitive(trace))

Skill graph + cross-embodiment

from ghostloop.skills import SkillGraph, skill_from_primitive
from ghostloop.primitives import MorphologyRegistry, move_to, scan

graph = SkillGraph()
graph.add(skill_from_primitive(move_to()))
graph.add(skill_from_primitive(scan(), prerequisites=["move_to"]))
graph.validate()
order = graph.topological_order()        # ['move_to', 'scan']

reg = MorphologyRegistry()
reg.register("franka", "pick", franka_pick_factory)
reg.register("ur5e",   "pick", ur5e_pick_factory)
prims = reg.build("franka", ["pick"])    # robot-specific primitives

Roadmap

Version Focus
v0.1.0 Core abstractions, MockBackend, three policy gates, runnable demo, 23 tests
v0.2.0 MuJoCoBackend, LLMPolicy adapter, bench harness with Wilson CIs + McNemar + Cohen's h, 64 tests
v0.3.0 PyBulletBackend, async runtime, declarative properties engine, MCP server, scripted policies, 89 tests
v0.4.0 ForceCap + HumanInTheLoop gates, episode catalogue, MuJoCo Menagerie auto-clone, replay/diff CLI, 110 tests
v0.5.0 VLAPolicy adapter, sensor primitives + cameras, OpenTelemetry hooks, SQLite persistence, planner DSL, 142 tests
v0.6.0 Fleet abstraction, FastAPI dashboard, LLMPlanner, RetryGate, observation buffer, property combinators, 182 tests
v0.7.0 GymnasiumBackend, CooldownGate + TimeWindowGate, convex polytope SDF, composite primitives, Mission DAG runner, WebSocket trace streaming, 211 tests
v0.8.0 STL temporal properties, URDF workspace builder, RandomizedBackend, trace query DSL, safe-RL harness with Lagrangian, 239 tests
v0.9.0 ROS2Backend, ActionSmoothingGate, safe-action projection, reward shaper DSL, Sim2RealBench, 263 tests
v0.10.0 Counterfactual trace replay, causal failure attribution, LLM-as-judge for traces, adversarial bench generator, property mining, skill graph, hindsight relabeling, energy ledger, cross-embodiment morphology registry, 296 tests
v1.0.0 (now) RGB-D fusion + deproject_depth + BlobDetector + CameraProcessorPipeline, VLABenchmarkSuite + published-baseline catalogue (OpenVLA / π0 / RT-2 / Octo / Diffusion Policy / ACT), production fleet dashboard (StaticTokenAuth / RateLimiter / AlarmRegistry / Prometheus /metrics / livez+readyz), 314 tests

Repository layout

ghostloop/
  __init__.py                public API surface, version
  core.py                    Intent / Primitive / Runtime / Trace / Decision / Backend / MockBackend
  async_runtime.py           AsyncRuntime + control_loop(rate_hz)
  observations.py            ObservationBuffer (deque-based short-term memory)
  store.py                   GhostloopStore — SQLite episodes / runs / comparisons
  mcp_server.py              FastMCP server exposing Runtime as MCP tools
  counterfactual.py          replay_with_policy + CounterfactualTrace        (v0.10)
  causal.py                  attribute_failure + minimal_cause_set            (v0.10)

  policies/
    deny_list.py             DenyListGate
    rate_limit.py            RateLimitGate
    geofence.py              GeofenceGate
    force_cap.py             ForceCapGate
    human_in_the_loop.py     HumanInTheLoopGate + cli_approver
    workspace.py             WorkspaceModel + ObstacleAvoidanceGate
    sdf.py                   HalfSpace / ConvexPolytope / signed_distance     (v0.7)
    urdf.py                  workspace_from_urdf                              (v0.8)
    cooldown.py              CooldownGate                                     (v0.7)
    time_window.py           TimeWindowGate + Window                          (v0.7)
    smoothing.py             ActionSmoothingGate + smooth_target              (v0.9)
    safe_projection.py       project_to_workspace + project_to_sdf            (v0.9)
    retry.py                 RetryPolicy + transient-error helpers
    llm.py                   LLMPolicy + LLMPolicyConfig + llm_policy_loop
    vla.py                   VLAPolicy + DeltaXYZDecoder

  primitives/
    motion.py                move_to / scan
    manipulation.py          pick / place
    trajectory.py            follow_trajectory + linear_interpolate
    composite.py             composite_primitive factory                     (v0.7)
    morphology.py            MorphologyRegistry — cross-embodiment           (v0.10)
    library.py               cross-morphology primitive catalogue —          (v1.0)
                             mobile_base / quadruped / humanoid / aerial /
                             dexterous / sensing / generic

  profiles/                                                                  (v1.0)
    core.py                  RobotProfile + YAML loader + runtime builder
    presets.py               franka_arm / turtlebot / spot / tello /
                             stretch / humanoid_demo

  backends/
    mujoco.py                MuJoCoBackend                                   (v0.2)
    pybullet.py              PyBulletBackend                                 (v0.3)
    gymnasium.py             GymnasiumBackend (Farama Gym ecosystem)         (v0.7)
    ros2.py                  ROS2Backend (rclpy adapter)                     (v0.9)
    randomized.py            RandomizedBackend (sim2real wrapper)            (v0.8)
    menagerie.py             MuJoCo Menagerie auto-clone                     (v0.4)

  bench/
    episode.py               Episode + EpisodeRunner + EpisodeResult         (v0.2)
    report.py                RunReport + wilson_ci + summarize               (v0.2)
    compare.py               paired_compare + mcnemar_p + cohens_h            (v0.2)
    catalogue.py             preset_reach_8 + preset_pick_and_place_4 + …    (v0.4)
    reward_shaper.py         RewardShaper + OnPrimitive / OnDecision / …     (v0.9)
    sim2real.py              Sim2RealBench + Sim2RealReport                   (v0.9)
    adversarial.py           random_seeds / grid_seeds / cma_es_seeds        (v0.10)

  properties/
    core.py                  Property + PropertyEngine + Severity            (v0.5)
    builtins.py              StaysInsideWorkspace / NeverHoldsTwoObjects/…   (v0.5)
    combinators.py           AndProperty / OrProperty / NotProperty          (v0.6)
    temporal.py              Always / Eventually / Until (STL)               (v0.8)
    mining.py                mine_properties + MinedProperty                 (v0.10)

  judges/
    llm_judge.py             LLMJudge + LLMJudgeConfig + parse_judgement     (v0.10)
    heuristic.py             HeuristicJudge + rule predicates                 (v0.10)

  skills/
    graph.py                 SkillGraph + Skill + topological order           (v0.10)

  missions/
    core.py                  Mission + Step + MissionRunner + MissionResult   (v0.7)

  fleet/
    core.py                  RobotHandle + FleetRegistry + FleetDispatcher    (v0.6)

  dashboard/
    api.py                   FastAPI factory + healthz + store endpoints      (v0.6)
    streaming.py             StreamManager + WebSocket /ws/v1/stream          (v0.7)

  planning/
    core.py                  TaskPlanner + TaskStep                          (v0.5)
    builtin.py               sequential_planner / fixed_plan                  (v0.5)
    llm_planner.py           LLMPlanner (single-shot full-plan emission)      (v0.6)

  sensors/
    camera.py                Camera Protocol + MockCamera + capture_camera   (v0.5)

  telemetry/
    otel.py                  step_span + record_decision + record_result    (v0.5)
    energy.py                EnergyLedger + PrimitiveEnergyModel             (v0.10)

  training/
    core.py                  SafeRolloutCollector + LagrangianMultiplier     (v0.8)
    hindsight.py             HER relabeling + sparse_indicator_reward        (v0.10)

  traces/
    replay.py                load_trace + iter_events + summarize_trace      (v0.4)
    diff.py                  diff_traces + StepDiff + TraceDiff              (v0.6)
    query.py                 query DSL with comparison ops + and/or/not/in   (v0.8)

examples/
  pick_and_place.py                    scripted end-to-end demo
  bench_with_without_geofence.py       paired-comparison demo
  mcp_robot.py                         general MCP server — picks profile   (v1.0)
                                       via GHOSTLOOP_PROFILE; works with
                                       arms, mobile bases, quadrupeds,
                                       drones, humanoids, custom robots
  claude_desktop_mcp_arm.py            arm-specific MCP example (legacy)    (v1.0)
  claude_desktop_config.json           cross-client + cross-OS config       (v1.0)
                                       reference (Claude Desktop / Cursor /
                                       Continue / Cline / Zed / Gemini CLI)
  custom_robot.yaml                    sample profile YAML —                (v1.0)
                                       hospital medication-delivery robot
                                       with custom primitives + composites
  custom_robot_primitives.py           sample custom Primitive factories    (v1.0)
                                       (dispense_pill, alert_nurse)
  direct_llm_arm.py                    direct OpenAI-compatible function    (v1.0)
                                       calling — works with OpenAI /
                                       Anthropic / Gemini / Groq / Ollama
                                       / vLLM / GhostLM

tests/                                  333 tests (8 live-gated)
  test_core.py                          23
  test_llm_policy.py                    14
  test_bench.py                         22
  test_mujoco_backend.py                10
  test_v03_additions.py                 25
  test_v04_additions.py                 21
  test_v05_additions.py                 32
  test_v06_additions.py                 37
  test_v07_additions.py                 29
  test_v08_additions.py                 28
  test_v09_additions.py                 25
  test_v10_additions.py                 33
  test_v10_v1_additions.py              18
  test_profiles.py                      19

assets/                                  brand mark + wordmark variants
docs/                                    architecture / migration / brand notes

Why this is novel

There are robot frameworks. There are agent frameworks. There is no robot framework that treats robots as a model with a tool registry, a fail-closed safety pipeline, a structured trace log, statistical bench rigor, AND a layer of post-hoc analysis (counterfactual replay, causal attribution, LLM-as-judge, property mining, adversarial fuzzing) — the same shape that's now standard for LLM-driven cybersec agents (secure-mcp, ghostguard, GhostAgent).

The thesis: as VLA models become the policy substrate, the runtime around them needs the same rigor we already apply to LLM tool use, plus the analytical tooling — counterfactuals, causal attribution, judge models — that LLM safety has been building for years. ghostloop is that runtime.

License

MIT. See LICENSE.


The ghostloop family

  • ghostloop-ui: Next.js 15 dashboard talking to create_production_app. Fleet view, alarm tray, episode list, Prometheus metrics, profile-aware gamepad mapper (drone / mobile base / quadruped / arm / humanoid), /connect onboarding for non-coders. Public deploy at ghostloop-ui.vercel.app, backend on Render. No Python on the operator's machine.
  • ghostloop-desktop v0.2: Tauri 2 native app wrapping ghostloop-ui. Voice control via the embedded WebView's Web Speech API (Windows + Linux today, native whisper.cpp in v0.3 for macOS), gamepad rumble on safety events, native OS notifications for alarms, sidecar Python runtime, native gamepad via gilrs (wired and Bluetooth), system tray, e-stop hotkey. Single-file bundles for macOS / Windows / Linux on every tagged release via GitHub Actions.
  • Live demo on HuggingFace: Gradio control panel, no install. Per-primitive dispatch buttons, virtual joystick, pause / resume / e-stop, live trace pane.

Built by Joe Munene at Complex Developers. Sibling to GhostLM, secure-mcp, ghostguard, CyberBench.

About

The agent loop, embodied. Tool-using runtime + fail-closed safety pipeline + sim-first execution + statistically-rigorous bench harness + post-hoc analysis (counterfactual replay, causal attribution, LLM judge, property mining) for embodied AI. Sister project to GhostLM.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors