Agentic classification workbench for Cloudera AI. Combines the Claude Agent SDK for adaptive keystone-agent orchestration with an Embeddings for interactive visualization of classification results produced by the signals pipeline.
Atelier is devenv-first. devenv provides a reproducible Nix-based development shell with all system dependencies (Python 3.12, Node.js 22, PostgreSQL 16, Qdrant, dbmate, protobuf, grpcurl, mdbook, etc.) and a process manager that starts the full stack in one command.
devenv shell # Enter the dev environment (loads .env automatically)
devenv up # Start everything: PostgreSQL, Qdrant, gRPC, gateway, ViteThat's it. Visit http://localhost:3000.
| Service | Port | Description |
|---|---|---|
| PostgreSQL 16 | 5533 | State database (with pgvector) |
| Qdrant | 6333 / 6334 | Vector store (HTTP / gRPC) |
| gRPC server | 50051 | Core service |
| FastAPI gateway | 8090 | REST-to-gRPC bridge |
| Vite dev server | 3000 | React UI with hot reload |
On first clone, install dependencies and initialize:
devenv shell
just install # uv sync + pnpm install
just proto # Generate gRPC stubs from atelier.proto
just resolve-config # Materialize HOCON → build/config/atelier.env
devenv up # Start all services (postgres initializes on first run)
# In another terminal:
just migrate # Apply database migrations via dbmatedevenv provides more than just the process manager:
| Command | What it does |
|---|---|
devenv shell |
Enter the dev environment with all tools on PATH |
devenv up |
Start all services and processes |
devenv test |
Run the devenv test suite |
devenv info |
Show environment info and available services |
The .env file is loaded automatically via dotenv.enable = true. Copy .env.example to .env for local overrides.
just provides task shortcuts that complement devenv. These are convenience wrappers, not replacements for devenv itself.
| Recipe | Description |
|---|---|
just up |
Alias for devenv up |
just install |
uv sync && cd ui && pnpm install |
just proto |
Generate proto stubs |
just migrate |
Run dbmate migrations |
just resolve-config |
Materialize HOCON config |
just build-ui |
Build React → ui/dist/ |
just start |
Production-like startup (no devenv required) |
just docs-serve |
mdbook with live reload |
The external/ directory contains forked submodules:
- embedding-atlas — Fork of Apple's embedding-atlas with important modifications for Atelier's Embeddings page. Pre-built dist/ is committed to the fork so CAI deployment doesn't need the full build toolchain (Emscripten, Rust, uv). Required for both dev and deployment.
- hermes-agent — Reference fork, dev-only.
git submodule update --init --recursive # Required: embedding-atlas fork (pre-built dist/)There are two ways to deploy Atelier on Cloudera AI (CML): as an AMP (automated) or as a manual Application. devenv is not used in CML — the deployment scripts handle all infrastructure (PGlite for embedded PostgreSQL, Qdrant binary download). The embedding-atlas submodule is cloned during install (pre-built dist/ committed to the fork); hermes-agent is not needed.
AMPs (Applied ML Prototypes) use .project-metadata.yaml to automate the full setup. Atelier follows the modern create_job/run_job pattern (same as RAG Studio) — install jobs persist in the CML project and can be re-run from the Jobs tab without redeploying the AMP.
- In the CML UI, go to AMPs or create a new Project from Git URL
- Enter the repository URL:
https://github.com/zndx/atelier - CML parses
.project-metadata.yamland runs the tasks in order:
| Step | Type | What it does |
|---|---|---|
| Install Dependencies | create_job + run_job |
pip install -e . into system Python, PGlite npm deps, npm run build for React UI, downloads Qdrant binary |
| Atelier | start_application |
Launches PGlite, Qdrant, gRPC server, and HTTP gateway on CDSW_APP_PORT |
- Access the application at
https://atelier.<CDSW_DOMAIN>
Re-running install: If dependencies need refreshing, go to Jobs > Install Dependencies > Run — no AMP redeploy required.
- Create a new CML Project from Git URL:
https://github.com/zndx/atelier - Open a Session (Python 3.10 kernel) and run:
!pip3 install -e . !cd ui && npm install && npm run build !bash scripts/install_qdrant.sh
- Go to Applications > New Application and configure:
- Name: Atelier
- Subdomain: atelier
- Script:
scripts/startup_app.py - Kernel: Python 3
- CPU: 2 cores, Memory: 4 GB
- Start the application. It will bind to
CDSW_APP_PORTautomatically.
The entry point for both methods is scripts/startup_app.py:
- Starts PGlite (background, if no
ATELIER_DB_URLset), Qdrant (background), gRPC with auto-migrations (background), FastAPI gateway (foreground) - Binds to
127.0.0.1:$CDSW_APP_PORT(CML's reverse proxy handles external routing) - Wraps execution in a restart loop for resilience
| Variable | Description | Default |
|---|---|---|
AWS_ACCESS_KEY_ID |
Bedrock access key | (none) |
AWS_SECRET_ACCESS_KEY |
Bedrock secret | (none) |
SOPS_AGE_KEY |
age private key that decrypts .env.cai.enc at startup |
(none) |
ANTHROPIC_API_KEY |
Anthropic direct API key (overwatch, local dev) | (none) |
ATELIER_DB_URL |
PostgreSQL connection URI (overrides PGlite) | auto (PGlite) |
QDRANT_HOST |
Qdrant hostname | localhost |
QDRANT_PORT |
Qdrant HTTP port | 6333 |
CAI operators typically only need the first three — the encrypted
.env.cai.enc file bundles every other deployment default. See
Operations → Encrypted Deployment Defaults
in the mdbook for the full pattern (just docs-serve to browse
locally).
| Component | Local (devenv) | CML |
|---|---|---|
| PostgreSQL | services.postgres (PG 16 + pgvector, port 5533) |
PGlite (Node.js process, WASM PG + pgvector) |
| Qdrant | pkgs.qdrant (devenv process) |
Binary download from GitHub releases |
| Migrations | just migrate (dbmate CLI) |
Auto-applied on startup via SQLAlchemy |
| Node.js | pnpm (via devenv) | npm (CML base image) |
| Git submodules | Available for development | embedding-atlas cloned (pre-built dist/); hermes-agent not needed |
- gRPC Core Service — Proto-first API (port 50051)
- FastAPI HTTP Gateway — Serves React build + bridges REST to gRPC
- React Frontend — Ant Design UI with XYFlow canvas for agent workflows
- PostgreSQL — State persistence (devenv or PGlite on CAI)
- Qdrant — Vector store for embedding search
- Claude Agent SDK — Keystone agents for classification orchestration
- Embeddings — Interactive parquet visualization (powered by embedding-atlas)
- HOCON Configuration — Single source of truth with env var substitution
just docs-serve # mdbook at localhost:3000