Skip to content

API: Add setup containers and volume strategy to Workspace for dependency caching and pre-agent initialization #774

@kelos-bot

Description

@kelos-bot

🤖 Kelos Strategist Agent @gjkim42

Problem

Every Kelos Task execution starts from a completely cold workspace: a fresh git clone --depth 1 into an EmptyDir volume (see internal/controller/job_builder.go:351,365). While shallow cloning is fast for the git layer itself, there is no mechanism to cache or pre-warm anything beyond the bare repository checkout.

This creates three concrete problems:

1. Wasted agent time (= wasted API credits) on dependency installation

Most real-world projects require dependency installation before an agent can meaningfully work. When a Claude Code agent is given a Node.js project, it typically runs npm install as one of its first actions — consuming expensive API tokens to type a command and wait for it. For a package-lock.json with hundreds of packages, this can take 30-60 seconds of agent wall-clock time per task. At scale (e.g., a TaskSpawner processing 20 issues/day), this overhead compounds to significant cost.

The same applies to Python (pip install), Go (go mod download), Rust (cargo fetch), and Java (mvn dependency:resolve) projects.

2. No way to run custom setup steps before the agent starts

Users cannot currently inject arbitrary init containers into the Task pod. The only pre-agent customization is workspace.spec.files[] (which writes static files) and agentConfig.spec.plugins[] (which installs agent plugins). There is no way to:

  • Run npm install or pip install before the agent starts
  • Compile protobuf definitions or generate code
  • Pull large data files or model weights
  • Set up database schemas for integration testing

3. No support for persistent/shared volumes

The workspace is always an EmptyDir — ephemeral and per-pod. There is no way to:

  • Mount a PVC with pre-populated dependencies (e.g., a shared node_modules cache)
  • Attach a ReadOnlyMany volume with large shared assets
  • Use a hostPath volume for local development scenarios

Proposal

Extend the Workspace CRD with two new optional fields: spec.setup and spec.volumes.

spec.setup — Custom init containers that run after git clone, before the agent

apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
  name: my-app
spec:
  repo: https://github.com/myorg/my-app.git
  secretRef:
    name: github-token
  setup:
    - name: install-deps
      image: node:22-alpine
      command: ["sh", "-c", "npm ci --prefer-offline"]
    - name: generate-protos
      image: bufbuild/buf:latest
      command: ["sh", "-c", "cd /workspace/repo && buf generate"]

Semantics:

  • Setup containers run as init containers in the Task pod, after the git clone init container but before the agent container
  • They mount the workspace volume at /workspace (same as the agent)
  • They run as the agent UID (61100) for filesystem permission compatibility
  • They also mount any user-defined spec.volumes (see below)
  • If any setup container fails, the Task transitions to Failed phase

spec.volumes — Additional volumes mounted into the workspace pod

apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
  name: my-app-cached
spec:
  repo: https://github.com/myorg/my-app.git
  secretRef:
    name: github-token
  volumes:
    - name: npm-cache
      mountPath: /workspace/repo/node_modules
      source:
        persistentVolumeClaim:
          claimName: shared-npm-cache
          readOnly: true
    - name: model-weights
      mountPath: /workspace/data
      source:
        persistentVolumeClaim:
          claimName: ml-model-weights
          readOnly: true
  setup:
    - name: install-deps
      image: node:22-alpine
      command: ["sh", "-c", "cd /workspace/repo && npm ci --prefer-offline --cache /tmp/.npm"]

Semantics:

  • Each volume entry specifies a name, mountPath, and a source (using standard Kubernetes VolumeSource)
  • Volumes are mounted into both setup containers and the agent container
  • The workspace volume itself remains an EmptyDir (the repo is always freshly cloned for isolation)
  • User-defined volumes are additional mounts — they don't replace the workspace volume

Example: Enterprise monorepo with heavy build step

apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
  name: enterprise-monorepo
spec:
  repo: https://github.com/bigcorp/platform.git
  ref: main
  secretRef:
    name: github-token
  volumes:
    - name: gradle-cache
      mountPath: /home/user/.gradle
      source:
        persistentVolumeClaim:
          claimName: gradle-cache-pvc
  setup:
    - name: gradle-deps
      image: gradle:8-jdk21
      command: ["sh", "-c", "cd /workspace/repo && gradle dependencies --no-daemon"]

Example: ML project with large data files

apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
  name: ml-project
spec:
  repo: https://github.com/myorg/ml-pipeline.git
  secretRef:
    name: github-token
  volumes:
    - name: training-data
      mountPath: /workspace/data
      source:
        persistentVolumeClaim:
          claimName: training-dataset-v3
          readOnly: true
  setup:
    - name: install-deps
      image: python:3.12-slim
      command: ["sh", "-c", "cd /workspace/repo && pip install -r requirements.txt"]

Implementation notes

Where the change lives

  1. api/v1alpha1/workspace_types.go: Add Setup []SetupContainer and Volumes []WorkspaceVolume to WorkspaceSpec
  2. internal/controller/job_builder.go: In buildAgentJob(), after the git clone init container and branch checkout init container, append user-defined setup containers. Inject workspace volume mount + user volume mounts into each. Also add user volumes to the pod spec and mount them into the agent container.
  3. docs/reference.md: Document the new fields
  4. examples/: Add a new example (e.g., 09-workspace-with-setup/)

Backward compatibility

Both fields are optional with zero-value defaults. Existing Workspace resources work identically. No migration needed.

Security considerations

  • Setup containers run as UID 61100 (same as agent) — no privilege escalation
  • PVC mounts follow standard Kubernetes RBAC — users can only mount PVCs they have access to in their namespace
  • The source field should probably be limited to safe volume types (PVC, ConfigMap, Secret, EmptyDir) and exclude hostPath in production via admission policy — this is consistent with how most Kubernetes operators handle volume injection

Why this belongs on Workspace (not Task or podOverrides)

  • Setup steps are properties of the repository, not the task. Every task working on my-app needs the same npm install. Putting it on Workspace avoids duplicating setup config across every TaskSpawner/Task.
  • podOverrides on Task is for per-task tuning (resources, deadline, env vars). Setup steps are shared infrastructure.
  • Workspace already owns the git clone lifecycle — extending it with post-clone setup is a natural progression.

Related issues

/kind feature

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions