Lumilake is a data analytics engine for agentic workflows. It accepts workflow specs (native graph JSON, YAML, or n8n JSON), optimizes the runtime graph with HALO, and dispatches tasks through FlowMesh.
- Workflow parsing for native graph specs, YAML workflows, and n8n exports.
- HALO scheduling for multi-step AI and data workflows.
- A FastAPI server for job submission, status, cancellation, results, workers, and traces.
- A CLI and Python SDK for local deployment and server API access.
- Data access through direct PostgreSQL and S3-compatible storage; agent-style retrievals additionally route through lumid.data when
LUMID_DATA_URLis set. - Shared hook integration through
lumid-hooks, plus Lumilake-owned optimizer plugins.
From PyPI:
pip install "lumilake[cli]"From a source checkout:
uv sync --all-packages --all-extras --all-groupsThe PyPI lumilake distribution is a code-free metapackage; install one of the extras below to get a working set. The server runtime is published as a Docker image only and is intentionally not on PyPI.
| Extra | Includes |
|---|---|
sdk |
Python SDK HTTP clients (lumilake-sdk → module lumilake) |
cli |
lumilake command line interface plus deploy lifecycle (lumilake-cli + lumilake-deploy) |
deploy |
Local Docker / FlowMesh deployment helpers (lumilake-deploy) |
hook |
Resource-kind helpers for shared hook integrations (lumilake-hook) |
all |
Everything above. |
The server runs as the published Docker image. lumilake deploy reads its env files from --project-dir (or the current working directory). Either point at a deployment directory with -C / --project-dir, or cd to it first.
mkdir -p ~/lumilake-deploy
lumilake deploy -C ~/lumilake-deploy init --flowmesh # ~/lumilake-deploy/.env + .env.flowmesh
$EDITOR ~/lumilake-deploy/.env # fill in DATABASE_URL / S3 / model keys
lumilake deploy -C ~/lumilake-deploy pull # fetch ghcr.io/mlsys-io/lumilake_server:<tag>
lumilake deploy -C ~/lumilake-deploy up # bring the stack up via docker composeLUMILAKE_DEPLOY_DIR=~/lumilake-deploy is an equivalent override. The deployment directory only needs to hold your .env files (and any local state docker compose creates) — the compose file and server image are resolved from the installed lumilake-deploy package and GHCR. The server listens on http://127.0.0.1:9000 by default — open /docs for the API browser.
Note: a real workflow run also requires running PostgreSQL and S3-compatible storage; agent-style retrievals (DataRetrievalOp with type: agent) additionally require LUMID_DATA_URL. See docs/ENV.md for the env contract. If you don't have your own data plane, the repo ships a bundled Postgres + MinIO at scripts/dev/compose.data-plane.yml — see docs/E2E_DEMO.md for the full three-step demo flow (data plane → load demo data → run a workflow).
The repo ships a hello-world.yaml template — FormatOp →
LambdaOp → LLMChatOp — that is the smallest copy-paste starting
point for a Lumilake YAML workflow. Submit it once the stack is up and
S3_URL points at reachable S3-compatible storage. From a source
checkout, start the bundled data plane first:
docker compose -f scripts/dev/compose.data-plane.yml up -dPyPI installs do not include scripts/dev/; use your own reachable S3
endpoint, or download the repo's data-plane compose file alongside the
workflow template before running the local-only example.
# From a source checkout:
uv run lumilake job submit examples/templates/yaml/hello-world.yaml \
--format yaml --input 'Name=world' --output-prefix demo/hello-world
# From a PyPI install (download the template alongside lumilake):
curl -O https://raw.githubusercontent.com/mlsys-io/lumilake_OSS/main/examples/templates/yaml/hello-world.yaml
lumilake job submit hello-world.yaml \
--format yaml --input 'Name=world' --output-prefix demo/hello-world
lumilake job watch <job_id>
lumilake job result <job_id>The template uses Qwen/Qwen3-8B, which is the bundled text-demo model.
You do not need to pre-populate or inspect cached_models before the
first run; it can be empty until after a worker serves a job. Only edit
config.model if your FlowMesh stack is configured for a different
model or the job fails with a missing-model / worker-placement error.
Submit and inspect a workflow. From a source checkout the example
workflow file is at examples/templates/yaml/trading-agent.yaml;
PyPI installs do not ship the templates, so pass an absolute path to a
workflow file you have locally:
# From a source checkout:
uv run lumilake job submit examples/templates/yaml/trading-agent.yaml \
--format yaml --input 'Stock=NVDA,AAPL,MSFT' --output-prefix demo/trading-agent
# From a PyPI install (lumilake on PATH; supply your own workflow file):
lumilake job submit /path/to/your/workflow.yaml \
--format yaml --input 'Stock=NVDA,AAPL,MSFT' --output-prefix demo/trading-agent
lumilake job list
lumilake job watch <job_id>lumilake deploy up writes ~/.lumilake/config.toml so subsequent
calls find the local server automatically. For remote / hosted servers,
set LUMILAKE_BASE_URL instead.
See docs/E2E_DEMO.md for a full reproduction using the bundled demo
workflows and dataset.
- SQL retrievals connect directly to
DATABASE_URL. - S3 retrievals connect directly to
S3_URL. - Agent retrievals (
DataRetrievalOpwithtype: agent) requireLUMID_DATA_URLand route through lumid.data's/agent/v1endpoint.
Job records and runtime artifacts are written under S3_ARCHIVE_PREFIX using the same S3_URL connection.
Examples below assume you're set up with LUMILAKE_DEPLOY_DIR=~/lumilake-deploy (or pass -C ~/lumilake-deploy explicitly). Workspace-checkout users can prefix the commands with uv run; PyPI-install users invoke lumilake directly.
Generate .env from the bundled template:
lumilake deploy initGenerate both Lumilake and bundled FlowMesh env files:
lumilake deploy init --flowmeshIf another FlowMesh stack is already running on the same host, check
ports before deploy up. Common co-tenant FlowMesh defaults are HTTP
8000, gRPC 50051, Redis control 6379, and Redis telemetry 6380.
The bundled stack reads SERVER_HTTP_PORT, SERVER_GRPC_PORT,
REDIS_CONTROL_PORT, and REDIS_TELEMETRY_PORT from .env.flowmesh;
change them to free ports and keep LUMILAKE_RUNTIME_ORCHESTRATOR_URL
in .env aligned with SERVER_HTTP_PORT.
Common deployment commands:
lumilake deploy doctor
lumilake deploy pull # or `build` to compile from source
lumilake deploy up
lumilake deploy status
lumilake deploy logs server --tail 200
lumilake deploy restart server
lumilake deploy down
lumilake deploy cleanUse deploy down to stop services while keeping data volumes (non-destructive). Use deploy clean or deploy reset only when you want to remove local stack state; both delete every Lumilake-managed volume, and reset prompts for confirmation by default (--yes skips the prompt).
from lumilake import LumilakeClient
with LumilakeClient(base_url="http://127.0.0.1:9000") as client:
print(client.health())
print(client.jobs.list())Install the SDK extra for HTTP clients:
pip install "lumilake[sdk]"Install deploy support as well if you want client.deploy.* methods:
pip install "lumilake[sdk,deploy]"See docs/SDK.md for the SDK resource map.
docs/ENV.md- environment variables and data-plane modes.docs/CLI.md- command groups and common CLI usage.docs/WORKFLOWS.md- workflow input formats and YAML structure.docs/OPS.md- built-in operation classes.docs/SDK.md- sync and async Python client usage.docs/API.md- server route overview and response shape.docs/ARCHITECTURE.md- module layout and runtime flow.docs/PLUGINS.md- shared hooks and Lumilake plugin model.docs/CODE_STYLE.md- coding rules for contributors and agents.
Lumilake wires shared hook protocols from lumid-hooks for identity, permissions, resource registration, submission guards, and usage sinks. Optimizer registration remains Lumilake-specific.
A minimal in-memory plugin is available under examples/plugins/simple_plugin/.
.
├── src/lumilake_server/ # server runtime — image-only, not on PyPI
├── packages/sdk/ # `lumilake-sdk` → module `lumilake` (Client, envs, log)
├── packages/cli/ # `lumilake-cli` → `lumilake_cli` (Typer entry point)
├── packages/deploy/ # `lumilake-deploy` — packaged compose + .env.example assets
├── packages/hook/ # `lumilake-hook` → `lumilake_hook` (resource-kind helpers)
├── examples/ # workflow templates and sample plugins
├── tests/ # pytest suite
├── scripts/ # CI and developer helpers
├── Dockerfile # builds ghcr.io/mlsys-io/lumilake_server
├── .env.example -> packages/deploy/.../assets/.env.example # symlink for editors
├── uv.lock
└── pyproject.toml # metapackage (`lumilake`) with [sdk]/[cli]/[deploy]/[hook]/[all] extras
uv sync --group lint --group test --extra cli
uv run pre-commit install --install-hooks -t pre-commit -t prepare-commit-msg -t commit-msg
uv run pre-commit run --all-files
uv run pytest tests/After changing dependencies, run:
uv lockSee CONTRIBUTING.md for PR title format, CI workflows, DCO sign-off, dependency guidance, and local testing notes.
Apache-2.0. See LICENSE.