OpenCanvas is a FastAPI-based backend and demo client for interactive 3D graph visualization, AI-assisted graph generation, Mermaid import, and gesture-driven camera control via MediaPipe.
The repository combines four layers in one project:
- a backend API for graph state, validation, and generation
- WebSocket channels for realtime graph and control events
- a browser demo renderer based on
three.jsand3d-force-graph - a MediaPipe gesture sender that turns hand movement into normalized control data
The service lets you:
- generate a graph from a natural-language prompt through an LLM provider
- build a graph from a Mermaid subset without calling the LLM
- import and export graph snapshots as JSON
- stream normalized control messages over WebSocket
- drive a 3D graph camera with hand gestures
- separate low-latency control traffic from graph and status traffic
This makes the project useful both as a standalone demo and as a backend for external renderers such as browser clients, desktop tools, game engines, or TouchDesigner.
FastAPIHTTP API for graph lifecycle and service introspection- realtime WebSocket transport with a unified socket and dedicated sockets
- versioned JSON message contract in
schemas/messages.schema.json - LLM generation with support for
POLZA_*orOPENAI_*environment variables - deterministic fallback and mock mode for local development and tests
- Mermaid-to-graph conversion for fast manual prototyping
- control stream smoothing, dead-zone filtering, and rate limiting
- MediaPipe-based gesture capture with support for the current Tasks API flow
- integration tests for HTTP, WebSocket, import/apply, and generation flows
app/
main.py FastAPI app, endpoints, websocket hubs, runtime limits
llm_service.py LLM integration and graph generation logic
mediapipe_gesture_sender.py MediaPipe hand tracking -> control payloads
control_processing.py Control stream filtering and throttling
mermaid_parser.py Mermaid subset parsing
models.py Pydantic models and schema version
static/
index.html Browser demo UI and 3D renderer
docs/
API_REFERENCE.md
MEDIAPIPE_SETUP.md
CLIENT_ADAPTERS.md
TOUCHDESIGNER_INTEGRATION.md
schemas/
messages.schema.json Versioned websocket message contract
scripts/
run_server.ps1 Helper for choosing a free port and starting uvicorn
download_hand_landmarker_model.py
tests/
integration/ End-to-end API workflow tests
test_mediapipe_gesture_sender.py
models/
hand_landmarker.task MediaPipe task model storage location
The backend stores the current graph in memory, validates payloads with Pydantic, exposes system metadata, and broadcasts graph/status/error events to connected clients.
Key files:
Graphs can arrive from three sources:
POST /api/graph/generatefor LLM-driven graph generationPOST /api/graph/mermaidfor Mermaid subset parsingPOST /api/graph/applyfor externally supplied JSON payloads
The LLM service supports both direct provider calls and mock mode for local work:
POLZA_API_KEYorOPENAI_API_KEY- optional
POLZA_BASE_URL/OPENAI_BASE_URL - optional
POLZA_MODEL/OPENAI_MODEL LLM_MODE=mockfor deterministic local testing
The project exposes three WebSocket entry points:
WS /wsfor combined trafficWS /ws/controlfor high-rate control onlyWS /ws/eventsfor graph, status, and error events
This split is especially useful when a renderer must keep control latency low.
app/mediapipe_gesture_sender.py captures a webcam feed, detects a hand, derives orientation/openness/pinch states, and sends normalized values:
rotateXrotateYzoomDeltaconfidenceresetCamera
The sender can use either:
mp.solutionswhen exposed by the installed MediaPipe build- the newer Tasks API with
models/hand_landmarker.task
The service uses a versioned schema with SCHEMA_VERSION = 1.0.0.
Outgoing messages include:
graphstatuserror
Incoming realtime control messages include:
control
Example control message:
{
"schemaVersion": "1.0.0",
"type": "control",
"requestId": "ctrl-123",
"payload": {
"rotateX": 0.12,
"rotateY": -0.08,
"zoomDelta": 0.03,
"confidence": 0.94,
"resetCamera": false
}
}The canonical contract lives in schemas/messages.schema.json.
- Windows PowerShell is the primary workflow in this repo
- Python
3.12 - webcam access for gesture control
- optional LLM API key for prompt-based graph generation
py -3.12 -m venv .venv312
.\.venv312\Scripts\Activate.ps1pip install -r requirements.txt
pip install -r requirements-dev.txtMinimal local .env example:
LLM_MODE=mockExample with an external model provider:
OPENAI_API_KEY=your_key_here
OPENAI_MODEL=gpt-4o-miniOr:
POLZA_API_KEY=your_key_here
POLZA_MODEL=openai/gpt-4oUsing the helper script:
.\scripts\run_server.ps1Or directly:
.\.venv312\Scripts\python.exe -m uvicorn app.main:app --host 127.0.0.1 --port 8000 --reload --env-file .envAfter startup, open:
http://127.0.0.1:8000/
The browser demo lets you:
- generate a graph from a text prompt
- parse Mermaid code into a graph
- load the current graph from the server
- export and import graph JSON
- watch control and status diagnostics live
If your MediaPipe build needs the Tasks API model bundle, download it once:
python scripts\download_hand_landmarker_model.pyThat stores the model at:
models/hand_landmarker.task
Then run the gesture sender:
python app\mediapipe_gesture_sender.py --uri ws://127.0.0.1:8000/ws --camera 0 --hz 45 --previewOptional explicit model path:
python app\mediapipe_gesture_sender.py --model C:\path\to\hand_landmarker.task --previewThe demo is designed around these gestures:
- turn the hand to rotate the graph
- open or close the hand to zoom
- pinch to lock movement
- hold an
OKgesture to reset the camera
GET /healthGET /api/system/limitsGET /api/system/capabilitiesGET /api/system/schema/messages
GET /api/graph/currentPOST /api/graph/applyPOST /api/graph/generatePOST /api/graph/mermaid
WS /wsWS /ws/controlWS /ws/events
Detailed endpoint examples are documented in docs/API_REFERENCE.md.
The backend exposes several useful tuning variables:
| Variable | Default | Purpose |
|---|---|---|
GENERATE_RATE_LIMIT_PER_MIN |
20 |
per-minute limit for LLM generation |
MERMAID_RATE_LIMIT_PER_MIN |
30 |
per-minute limit for Mermaid parsing |
APPLY_RATE_LIMIT_PER_MIN |
20 |
per-minute limit for external graph apply |
GENERATE_MAX_CONCURRENCY |
2 |
concurrent generation jobs |
GENERATE_TIMEOUT_SECONDS |
45 |
timeout for LLM generation |
GENERATE_QUEUE_WAIT_SECONDS |
2.5 |
wait time to enter the generation queue |
CONTROL_ALPHA |
0.28 |
smoothing factor for control processing |
CONTROL_DEAD_ZONE |
0.015 |
ignore tiny control fluctuations |
CONTROL_MAX_HZ |
30.0 |
maximum outgoing control event rate |
MEDIAPIPE_HAND_MODEL |
unset | explicit path to hand_landmarker.task |
LLM_MODE |
auto |
generation mode, including mock for tests |
.\.venv312\Scripts\python.exe -m pytestThe test suite covers:
- API health and runtime metadata
- graph retrieval and apply flow
- graph generation and validation errors
- Mermaid generation flow
- gesture mapping behavior and reset logic
For stable local development without external API calls:
LLM_MODE=mockThis keeps generation deterministic and avoids needing provider credentials while working on the transport, schema, UI, or gesture pipeline.
Any client that supports HTTP, WebSocket, and JSON can integrate with this backend. A practical pattern is:
- Call
/health,/api/system/capabilities, and/api/system/limits. - Hydrate the latest graph via
GET /api/graph/current. - Open WebSocket subscriptions.
- Trigger graph generation or apply external graph snapshots.
For the best latency split:
- use
WS /ws/controlfor high-rate control data - use
WS /ws/eventsforgraph,status, anderror - keep the final camera math and render loop inside TouchDesigner
See:
The demo page in static/index.html is intentionally lightweight. It is useful as:
- a smoke test for backend connectivity
- a visual debugger for graph payloads
- a manual QA surface for gestures, import/export, and Mermaid parsing
- a reference client for external integrations
The page uses CDN-delivered:
three.js3d-force-graph
- The graph is stored in process memory, so restarting the server resets the current in-memory state.
- LLM generation can be rate-limited, queued, or timed out depending on configuration.
- MediaPipe support can differ by build, which is why the project includes a Tasks API fallback path and model download script.
- The current frontend is a demo surface, not a fully packaged production SPA.
- docs/API_REFERENCE.md
- docs/MEDIAPIPE_SETUP.md
- docs/CLIENT_ADAPTERS.md
- docs/TOUCHDESIGNER_INTEGRATION.md
- schemas/CHANGELOG.md
This repository works well as:
- a backend transport layer for interactive graph canvases
- a prototype platform for AI-generated graph structures
- a gesture-control bridge for realtime 3D visualization tools
- a reference implementation for splitting generation, state, and control into separate channels
No explicit license file is currently present in the repository. Add one before public redistribution if you want the usage terms to be unambiguous.