Skip to content

mlsys-io/lumilake_OSS

Repository files navigation

Lumilake

License Python 3.12+ Lint Tests

Lumilake is a data analytics engine for agentic workflows. It accepts workflow specs (native graph JSON, YAML, or n8n JSON), optimizes the runtime graph with HALO, and dispatches tasks through FlowMesh.

What Lumilake Provides

  • Workflow parsing for native graph specs, YAML workflows, and n8n exports.
  • HALO scheduling for multi-step AI and data workflows.
  • A FastAPI server for job submission, status, cancellation, results, workers, and traces.
  • A CLI and Python SDK for local deployment and server API access.
  • Data access through direct PostgreSQL and S3-compatible storage; agent-style retrievals additionally route through lumid.data when LUMID_DATA_URL is set.
  • Shared hook integration through lumid-hooks, plus Lumilake-owned optimizer plugins.

Install

From PyPI:

pip install "lumilake[cli]"

From a source checkout:

uv sync --all-packages --all-extras --all-groups

The PyPI lumilake distribution is a code-free metapackage; install one of the extras below to get a working set. The server runtime is published as a Docker image only and is intentionally not on PyPI.

Extra Includes
sdk Python SDK HTTP clients (lumilake-sdk → module lumilake)
cli lumilake command line interface plus deploy lifecycle (lumilake-cli + lumilake-deploy)
deploy Local Docker / FlowMesh deployment helpers (lumilake-deploy)
hook Resource-kind helpers for shared hook integrations (lumilake-hook)
all Everything above.

Quick Start

The server runs as the published Docker image. lumilake deploy reads its env files from --project-dir (or the current working directory). Either point at a deployment directory with -C / --project-dir, or cd to it first.

mkdir -p ~/lumilake-deploy
lumilake deploy -C ~/lumilake-deploy init --flowmesh   # ~/lumilake-deploy/.env + .env.flowmesh
$EDITOR ~/lumilake-deploy/.env                          # fill in DATABASE_URL / S3 / model keys
lumilake deploy -C ~/lumilake-deploy pull               # fetch ghcr.io/mlsys-io/lumilake_server:<tag>
lumilake deploy -C ~/lumilake-deploy up                 # bring the stack up via docker compose

LUMILAKE_DEPLOY_DIR=~/lumilake-deploy is an equivalent override. The deployment directory only needs to hold your .env files (and any local state docker compose creates) — the compose file and server image are resolved from the installed lumilake-deploy package and GHCR. The server listens on http://127.0.0.1:9000 by default — open /docs for the API browser.

Note: a real workflow run also requires running PostgreSQL and S3-compatible storage; agent-style retrievals (DataRetrievalOp with type: agent) additionally require LUMID_DATA_URL. See docs/ENV.md for the env contract. If you don't have your own data plane, the repo ships a bundled Postgres + MinIO at scripts/dev/compose.data-plane.yml — see docs/E2E_DEMO.md for the full three-step demo flow (data plane → load demo data → run a workflow).

Hello world

The repo ships a hello-world.yaml template — FormatOpLambdaOpLLMChatOp — that is the smallest copy-paste starting point for a Lumilake YAML workflow. Submit it once the stack is up and S3_URL points at reachable S3-compatible storage. From a source checkout, start the bundled data plane first:

docker compose -f scripts/dev/compose.data-plane.yml up -d

PyPI installs do not include scripts/dev/; use your own reachable S3 endpoint, or download the repo's data-plane compose file alongside the workflow template before running the local-only example.

# From a source checkout:
uv run lumilake job submit examples/templates/yaml/hello-world.yaml \
    --format yaml --input 'Name=world' --output-prefix demo/hello-world

# From a PyPI install (download the template alongside lumilake):
curl -O https://raw.githubusercontent.com/mlsys-io/lumilake_OSS/main/examples/templates/yaml/hello-world.yaml
lumilake job submit hello-world.yaml \
    --format yaml --input 'Name=world' --output-prefix demo/hello-world
lumilake job watch <job_id>
lumilake job result <job_id>

The template uses Qwen/Qwen3-8B, which is the bundled text-demo model. You do not need to pre-populate or inspect cached_models before the first run; it can be empty until after a worker serves a job. Only edit config.model if your FlowMesh stack is configured for a different model or the job fails with a missing-model / worker-placement error.

Real workflows

Submit and inspect a workflow. From a source checkout the example workflow file is at examples/templates/yaml/trading-agent.yaml; PyPI installs do not ship the templates, so pass an absolute path to a workflow file you have locally:

# From a source checkout:
uv run lumilake job submit examples/templates/yaml/trading-agent.yaml \
    --format yaml --input 'Stock=NVDA,AAPL,MSFT' --output-prefix demo/trading-agent

# From a PyPI install (lumilake on PATH; supply your own workflow file):
lumilake job submit /path/to/your/workflow.yaml \
    --format yaml --input 'Stock=NVDA,AAPL,MSFT' --output-prefix demo/trading-agent

lumilake job list
lumilake job watch <job_id>

lumilake deploy up writes ~/.lumilake/config.toml so subsequent calls find the local server automatically. For remote / hosted servers, set LUMILAKE_BASE_URL instead.

See docs/E2E_DEMO.md for a full reproduction using the bundled demo workflows and dataset.

Data Access

  • SQL retrievals connect directly to DATABASE_URL.
  • S3 retrievals connect directly to S3_URL.
  • Agent retrievals (DataRetrievalOp with type: agent) require LUMID_DATA_URL and route through lumid.data's /agent/v1 endpoint.

Job records and runtime artifacts are written under S3_ARCHIVE_PREFIX using the same S3_URL connection.

Deployment

Examples below assume you're set up with LUMILAKE_DEPLOY_DIR=~/lumilake-deploy (or pass -C ~/lumilake-deploy explicitly). Workspace-checkout users can prefix the commands with uv run; PyPI-install users invoke lumilake directly.

Generate .env from the bundled template:

lumilake deploy init

Generate both Lumilake and bundled FlowMesh env files:

lumilake deploy init --flowmesh

If another FlowMesh stack is already running on the same host, check ports before deploy up. Common co-tenant FlowMesh defaults are HTTP 8000, gRPC 50051, Redis control 6379, and Redis telemetry 6380. The bundled stack reads SERVER_HTTP_PORT, SERVER_GRPC_PORT, REDIS_CONTROL_PORT, and REDIS_TELEMETRY_PORT from .env.flowmesh; change them to free ports and keep LUMILAKE_RUNTIME_ORCHESTRATOR_URL in .env aligned with SERVER_HTTP_PORT.

Common deployment commands:

lumilake deploy doctor
lumilake deploy pull         # or `build` to compile from source
lumilake deploy up
lumilake deploy status
lumilake deploy logs server --tail 200
lumilake deploy restart server
lumilake deploy down
lumilake deploy clean

Use deploy down to stop services while keeping data volumes (non-destructive). Use deploy clean or deploy reset only when you want to remove local stack state; both delete every Lumilake-managed volume, and reset prompts for confirmation by default (--yes skips the prompt).

Python SDK

from lumilake import LumilakeClient

with LumilakeClient(base_url="http://127.0.0.1:9000") as client:
    print(client.health())
    print(client.jobs.list())

Install the SDK extra for HTTP clients:

pip install "lumilake[sdk]"

Install deploy support as well if you want client.deploy.* methods:

pip install "lumilake[sdk,deploy]"

See docs/SDK.md for the SDK resource map.

Documentation

  • docs/ENV.md - environment variables and data-plane modes.
  • docs/CLI.md - command groups and common CLI usage.
  • docs/WORKFLOWS.md - workflow input formats and YAML structure.
  • docs/OPS.md - built-in operation classes.
  • docs/SDK.md - sync and async Python client usage.
  • docs/API.md - server route overview and response shape.
  • docs/ARCHITECTURE.md - module layout and runtime flow.
  • docs/PLUGINS.md - shared hooks and Lumilake plugin model.
  • docs/CODE_STYLE.md - coding rules for contributors and agents.

Plugins

Lumilake wires shared hook protocols from lumid-hooks for identity, permissions, resource registration, submission guards, and usage sinks. Optimizer registration remains Lumilake-specific.

A minimal in-memory plugin is available under examples/plugins/simple_plugin/.

Repository Layout

.
├── src/lumilake_server/       # server runtime — image-only, not on PyPI
├── packages/sdk/              # `lumilake-sdk` → module `lumilake` (Client, envs, log)
├── packages/cli/              # `lumilake-cli` → `lumilake_cli` (Typer entry point)
├── packages/deploy/           # `lumilake-deploy` — packaged compose + .env.example assets
├── packages/hook/             # `lumilake-hook` → `lumilake_hook` (resource-kind helpers)
├── examples/                  # workflow templates and sample plugins
├── tests/                     # pytest suite
├── scripts/                   # CI and developer helpers
├── Dockerfile                 # builds ghcr.io/mlsys-io/lumilake_server
├── .env.example -> packages/deploy/.../assets/.env.example   # symlink for editors
├── uv.lock
└── pyproject.toml             # metapackage (`lumilake`) with [sdk]/[cli]/[deploy]/[hook]/[all] extras

Development

uv sync --group lint --group test --extra cli
uv run pre-commit install --install-hooks -t pre-commit -t prepare-commit-msg -t commit-msg
uv run pre-commit run --all-files
uv run pytest tests/

After changing dependencies, run:

uv lock

See CONTRIBUTING.md for PR title format, CI workflows, DCO sign-off, dependency guidance, and local testing notes.

License

Apache-2.0. See LICENSE.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages