OpenCanvas

OpenCanvas is a FastAPI-based backend and demo client for interactive 3D graph visualization, AI-assisted graph generation, Mermaid import, and gesture-driven camera control via MediaPipe.

The repository combines four layers in one project:

a backend API for graph state, validation, and generation
WebSocket channels for realtime graph and control events
a browser demo renderer based on three.js and 3d-force-graph
a MediaPipe gesture sender that turns hand movement into normalized control data

What The Project Does

The service lets you:

generate a graph from a natural-language prompt through an LLM provider
build a graph from a Mermaid subset without calling the LLM
import and export graph snapshots as JSON
stream normalized control messages over WebSocket
drive a 3D graph camera with hand gestures
separate low-latency control traffic from graph and status traffic

This makes the project useful both as a standalone demo and as a backend for external renderers such as browser clients, desktop tools, game engines, or TouchDesigner.

Main Capabilities

FastAPI HTTP API for graph lifecycle and service introspection
realtime WebSocket transport with a unified socket and dedicated sockets
versioned JSON message contract in schemas/messages.schema.json
LLM generation with support for POLZA_* or OPENAI_* environment variables
deterministic fallback and mock mode for local development and tests
Mermaid-to-graph conversion for fast manual prototyping
control stream smoothing, dead-zone filtering, and rate limiting
MediaPipe-based gesture capture with support for the current Tasks API flow
integration tests for HTTP, WebSocket, import/apply, and generation flows

Repository Structure

app/
  main.py                     FastAPI app, endpoints, websocket hubs, runtime limits
  llm_service.py              LLM integration and graph generation logic
  mediapipe_gesture_sender.py MediaPipe hand tracking -> control payloads
  control_processing.py       Control stream filtering and throttling
  mermaid_parser.py           Mermaid subset parsing
  models.py                   Pydantic models and schema version
static/
  index.html                  Browser demo UI and 3D renderer
docs/
  API_REFERENCE.md
  MEDIAPIPE_SETUP.md
  CLIENT_ADAPTERS.md
  TOUCHDESIGNER_INTEGRATION.md
schemas/
  messages.schema.json        Versioned websocket message contract
scripts/
  run_server.ps1              Helper for choosing a free port and starting uvicorn
  download_hand_landmarker_model.py
tests/
  integration/                End-to-end API workflow tests
  test_mediapipe_gesture_sender.py
models/
  hand_landmarker.task        MediaPipe task model storage location

Architecture Overview

1. Backend service

The backend stores the current graph in memory, validates payloads with Pydantic, exposes system metadata, and broadcasts graph/status/error events to connected clients.

Key files:

2. Graph generation layer

Graphs can arrive from three sources:

POST /api/graph/generate for LLM-driven graph generation
POST /api/graph/mermaid for Mermaid subset parsing
POST /api/graph/apply for externally supplied JSON payloads

The LLM service supports both direct provider calls and mock mode for local work:

POLZA_API_KEY or OPENAI_API_KEY
optional POLZA_BASE_URL / OPENAI_BASE_URL
optional POLZA_MODEL / OPENAI_MODEL
LLM_MODE=mock for deterministic local testing

3. Realtime transport

The project exposes three WebSocket entry points:

WS /ws for combined traffic
WS /ws/control for high-rate control only
WS /ws/events for graph, status, and error events

This split is especially useful when a renderer must keep control latency low.

4. Gesture input

app/mediapipe_gesture_sender.py captures a webcam feed, detects a hand, derives orientation/openness/pinch states, and sends normalized values:

rotateX
rotateY
zoomDelta
confidence
resetCamera

The sender can use either:

mp.solutions when exposed by the installed MediaPipe build
the newer Tasks API with models/hand_landmarker.task

Message Model

The service uses a versioned schema with SCHEMA_VERSION = 1.0.0.

Outgoing messages include:

graph
status
error

Incoming realtime control messages include:

control

Example control message:

{
  "schemaVersion": "1.0.0",
  "type": "control",
  "requestId": "ctrl-123",
  "payload": {
    "rotateX": 0.12,
    "rotateY": -0.08,
    "zoomDelta": 0.03,
    "confidence": 0.94,
    "resetCamera": false
  }
}

The canonical contract lives in schemas/messages.schema.json.

Quick Start

Requirements

Windows PowerShell is the primary workflow in this repo
Python 3.12
webcam access for gesture control
optional LLM API key for prompt-based graph generation

1. Create and activate a virtual environment

py -3.12 -m venv .venv312
.\.venv312\Scripts\Activate.ps1

2. Install dependencies

pip install -r requirements.txt
pip install -r requirements-dev.txt

3. Configure environment variables

Minimal local .env example:

LLM_MODE=mock

Example with an external model provider:

OPENAI_API_KEY=your_key_here
OPENAI_MODEL=gpt-4o-mini

Or:

POLZA_API_KEY=your_key_here
POLZA_MODEL=openai/gpt-4o

4. Start the server

Using the helper script:

.\scripts\run_server.ps1

Or directly:

.\.venv312\Scripts\python.exe -m uvicorn app.main:app --host 127.0.0.1 --port 8000 --reload --env-file .env

5. Open the demo UI

After startup, open:

http://127.0.0.1:8000/

The browser demo lets you:

generate a graph from a text prompt
parse Mermaid code into a graph
load the current graph from the server
export and import graph JSON
watch control and status diagnostics live

Gesture Control Setup

If your MediaPipe build needs the Tasks API model bundle, download it once:

python scripts\download_hand_landmarker_model.py

That stores the model at:

models/hand_landmarker.task

Then run the gesture sender:

python app\mediapipe_gesture_sender.py --uri ws://127.0.0.1:8000/ws --camera 0 --hz 45 --preview

Optional explicit model path:

python app\mediapipe_gesture_sender.py --model C:\path\to\hand_landmarker.task --preview

Gesture semantics

The demo is designed around these gestures:

turn the hand to rotate the graph
open or close the hand to zoom
pinch to lock movement
hold an OK gesture to reset the camera

API Summary

System endpoints

GET /health
GET /api/system/limits
GET /api/system/capabilities
GET /api/system/schema/messages

Graph endpoints

GET /api/graph/current
POST /api/graph/apply
POST /api/graph/generate
POST /api/graph/mermaid

WebSocket endpoints

WS /ws
WS /ws/control
WS /ws/events

Detailed endpoint examples are documented in docs/API_REFERENCE.md.

Runtime Configuration

The backend exposes several useful tuning variables:

Variable	Default	Purpose
`GENERATE_RATE_LIMIT_PER_MIN`	`20`	per-minute limit for LLM generation
`MERMAID_RATE_LIMIT_PER_MIN`	`30`	per-minute limit for Mermaid parsing
`APPLY_RATE_LIMIT_PER_MIN`	`20`	per-minute limit for external graph apply
`GENERATE_MAX_CONCURRENCY`	`2`	concurrent generation jobs
`GENERATE_TIMEOUT_SECONDS`	`45`	timeout for LLM generation
`GENERATE_QUEUE_WAIT_SECONDS`	`2.5`	wait time to enter the generation queue
`CONTROL_ALPHA`	`0.28`	smoothing factor for control processing
`CONTROL_DEAD_ZONE`	`0.015`	ignore tiny control fluctuations
`CONTROL_MAX_HZ`	`30.0`	maximum outgoing control event rate
`MEDIAPIPE_HAND_MODEL`	unset	explicit path to `hand_landmarker.task`
`LLM_MODE`	`auto`	generation mode, including `mock` for tests

Development Workflow

Run tests

.\.venv312\Scripts\python.exe -m pytest

The test suite covers:

API health and runtime metadata
graph retrieval and apply flow
graph generation and validation errors
Mermaid generation flow
gesture mapping behavior and reset logic

Recommended local mode

For stable local development without external API calls:

LLM_MODE=mock

This keeps generation deterministic and avoids needing provider credentials while working on the transport, schema, UI, or gesture pipeline.

Integration Notes

Browser / desktop / game engine

Any client that supports HTTP, WebSocket, and JSON can integrate with this backend. A practical pattern is:

Call /health, /api/system/capabilities, and /api/system/limits.
Hydrate the latest graph via GET /api/graph/current.
Open WebSocket subscriptions.
Trigger graph generation or apply external graph snapshots.

TouchDesigner

For the best latency split:

use WS /ws/control for high-rate control data
use WS /ws/events for graph, status, and error
keep the final camera math and render loop inside TouchDesigner

See:

Demo Frontend

The demo page in static/index.html is intentionally lightweight. It is useful as:

a smoke test for backend connectivity
a visual debugger for graph payloads
a manual QA surface for gestures, import/export, and Mermaid parsing
a reference client for external integrations

The page uses CDN-delivered:

three.js
3d-force-graph

Known Operational Notes

The graph is stored in process memory, so restarting the server resets the current in-memory state.
LLM generation can be rate-limited, queued, or timed out depending on configuration.
MediaPipe support can differ by build, which is why the project includes a Tasks API fallback path and model download script.
The current frontend is a demo surface, not a fully packaged production SPA.

Documentation Map

Suggested Positioning

This repository works well as:

a backend transport layer for interactive graph canvases
a prototype platform for AI-generated graph structures
a gesture-control bridge for realtime 3D visualization tools
a reference implementation for splitting generation, state, and control into separate channels

License

No explicit license file is currently present in the repository. Add one before public redistribution if you want the usage terms to be unambiguous.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
app		app
docs		docs
models		models
schemas		schemas
scripts		scripts
static		static
tests		tests
.gitignore		.gitignore
PLan.md		PLan.md
README.md		README.md
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

OpenCanvas

What The Project Does

Main Capabilities

Repository Structure

Architecture Overview

1. Backend service

2. Graph generation layer

3. Realtime transport

4. Gesture input

Message Model

Quick Start

Requirements

1. Create and activate a virtual environment

2. Install dependencies

3. Configure environment variables

4. Start the server

5. Open the demo UI

Gesture Control Setup

Gesture semantics

API Summary

System endpoints

Graph endpoints

WebSocket endpoints

Runtime Configuration

Development Workflow

Run tests

Recommended local mode

Integration Notes

Browser / desktop / game engine

TouchDesigner

Demo Frontend

Known Operational Notes

Documentation Map

Suggested Positioning

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages