Skip to content

zndx/atelier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

218 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Atelier

Agentic classification workbench for Cloudera AI. Combines the Claude Agent SDK for adaptive keystone-agent orchestration with an Embeddings for interactive visualization of classification results produced by the signals pipeline.

Development Environment

Atelier is devenv-first. devenv provides a reproducible Nix-based development shell with all system dependencies (Python 3.12, Node.js 22, PostgreSQL 16, Qdrant, dbmate, protobuf, grpcurl, mdbook, etc.) and a process manager that starts the full stack in one command.

devenv shell              # Enter the dev environment (loads .env automatically)
devenv up                 # Start everything: PostgreSQL, Qdrant, gRPC, gateway, Vite

That's it. Visit http://localhost:3000.

What devenv up starts

Service Port Description
PostgreSQL 16 5533 State database (with pgvector)
Qdrant 6333 / 6334 Vector store (HTTP / gRPC)
gRPC server 50051 Core service
FastAPI gateway 8090 REST-to-gRPC bridge
Vite dev server 3000 React UI with hot reload

First-time setup

On first clone, install dependencies and initialize:

devenv shell
just install              # uv sync + pnpm install
just proto                # Generate gRPC stubs from atelier.proto
just resolve-config       # Materialize HOCON → build/config/atelier.env
devenv up                 # Start all services (postgres initializes on first run)
# In another terminal:
just migrate              # Apply database migrations via dbmate

devenv utilities

devenv provides more than just the process manager:

Command What it does
devenv shell Enter the dev environment with all tools on PATH
devenv up Start all services and processes
devenv test Run the devenv test suite
devenv info Show environment info and available services

The .env file is loaded automatically via dotenv.enable = true. Copy .env.example to .env for local overrides.

just recipes

just provides task shortcuts that complement devenv. These are convenience wrappers, not replacements for devenv itself.

Recipe Description
just up Alias for devenv up
just install uv sync && cd ui && pnpm install
just proto Generate proto stubs
just migrate Run dbmate migrations
just resolve-config Materialize HOCON config
just build-ui Build React → ui/dist/
just start Production-like startup (no devenv required)
just docs-serve mdbook with live reload

Git submodules

The external/ directory contains forked submodules:

  • embedding-atlas — Fork of Apple's embedding-atlas with important modifications for Atelier's Embeddings page. Pre-built dist/ is committed to the fork so CAI deployment doesn't need the full build toolchain (Emscripten, Rust, uv). Required for both dev and deployment.
  • hermes-agent — Reference fork, dev-only.
git submodule update --init --recursive   # Required: embedding-atlas fork (pre-built dist/)

Deploying to Cloudera AI

There are two ways to deploy Atelier on Cloudera AI (CML): as an AMP (automated) or as a manual Application. devenv is not used in CML — the deployment scripts handle all infrastructure (PGlite for embedded PostgreSQL, Qdrant binary download). The embedding-atlas submodule is cloned during install (pre-built dist/ committed to the fork); hermes-agent is not needed.

Option 1: AMP Deployment (Recommended)

AMPs (Applied ML Prototypes) use .project-metadata.yaml to automate the full setup. Atelier follows the modern create_job/run_job pattern (same as RAG Studio) — install jobs persist in the CML project and can be re-run from the Jobs tab without redeploying the AMP.

  1. In the CML UI, go to AMPs or create a new Project from Git URL
  2. Enter the repository URL: https://github.com/zndx/atelier
  3. CML parses .project-metadata.yaml and runs the tasks in order:
Step Type What it does
Install Dependencies create_job + run_job pip install -e . into system Python, PGlite npm deps, npm run build for React UI, downloads Qdrant binary
Atelier start_application Launches PGlite, Qdrant, gRPC server, and HTTP gateway on CDSW_APP_PORT
  1. Access the application at https://atelier.<CDSW_DOMAIN>

Re-running install: If dependencies need refreshing, go to Jobs > Install Dependencies > Run — no AMP redeploy required.

Option 2: Manual Application Deployment

  1. Create a new CML Project from Git URL: https://github.com/zndx/atelier
  2. Open a Session (Python 3.10 kernel) and run:
    !pip3 install -e .
    !cd ui && npm install && npm run build
    !bash scripts/install_qdrant.sh
  3. Go to Applications > New Application and configure:
    • Name: Atelier
    • Subdomain: atelier
    • Script: scripts/startup_app.py
    • Kernel: Python 3
    • CPU: 2 cores, Memory: 4 GB
  4. Start the application. It will bind to CDSW_APP_PORT automatically.

Entry Point

The entry point for both methods is scripts/startup_app.py:

  • Starts PGlite (background, if no ATELIER_DB_URL set), Qdrant (background), gRPC with auto-migrations (background), FastAPI gateway (foreground)
  • Binds to 127.0.0.1:$CDSW_APP_PORT (CML's reverse proxy handles external routing)
  • Wraps execution in a restart loop for resilience

Environment Variables

Variable Description Default
AWS_ACCESS_KEY_ID Bedrock access key (none)
AWS_SECRET_ACCESS_KEY Bedrock secret (none)
SOPS_AGE_KEY age private key that decrypts .env.cai.enc at startup (none)
ANTHROPIC_API_KEY Anthropic direct API key (overwatch, local dev) (none)
ATELIER_DB_URL PostgreSQL connection URI (overrides PGlite) auto (PGlite)
QDRANT_HOST Qdrant hostname localhost
QDRANT_PORT Qdrant HTTP port 6333

CAI operators typically only need the first three — the encrypted .env.cai.enc file bundles every other deployment default. See Operations → Encrypted Deployment Defaults in the mdbook for the full pattern (just docs-serve to browse locally).

Local vs CML Infrastructure

Component Local (devenv) CML
PostgreSQL services.postgres (PG 16 + pgvector, port 5533) PGlite (Node.js process, WASM PG + pgvector)
Qdrant pkgs.qdrant (devenv process) Binary download from GitHub releases
Migrations just migrate (dbmate CLI) Auto-applied on startup via SQLAlchemy
Node.js pnpm (via devenv) npm (CML base image)
Git submodules Available for development embedding-atlas cloned (pre-built dist/); hermes-agent not needed

Architecture

  • gRPC Core Service — Proto-first API (port 50051)
  • FastAPI HTTP Gateway — Serves React build + bridges REST to gRPC
  • React Frontend — Ant Design UI with XYFlow canvas for agent workflows
  • PostgreSQL — State persistence (devenv or PGlite on CAI)
  • Qdrant — Vector store for embedding search
  • Claude Agent SDK — Keystone agents for classification orchestration
  • Embeddings — Interactive parquet visualization (powered by embedding-atlas)
  • HOCON Configuration — Single source of truth with env var substitution

Documentation

just docs-serve       # mdbook at localhost:3000