-
Notifications
You must be signed in to change notification settings - Fork 12
Description
🤖 Kelos Strategist Agent @gjkim42
Problem
Every Kelos Task execution starts from a completely cold workspace: a fresh git clone --depth 1 into an EmptyDir volume (see internal/controller/job_builder.go:351,365). While shallow cloning is fast for the git layer itself, there is no mechanism to cache or pre-warm anything beyond the bare repository checkout.
This creates three concrete problems:
1. Wasted agent time (= wasted API credits) on dependency installation
Most real-world projects require dependency installation before an agent can meaningfully work. When a Claude Code agent is given a Node.js project, it typically runs npm install as one of its first actions — consuming expensive API tokens to type a command and wait for it. For a package-lock.json with hundreds of packages, this can take 30-60 seconds of agent wall-clock time per task. At scale (e.g., a TaskSpawner processing 20 issues/day), this overhead compounds to significant cost.
The same applies to Python (pip install), Go (go mod download), Rust (cargo fetch), and Java (mvn dependency:resolve) projects.
2. No way to run custom setup steps before the agent starts
Users cannot currently inject arbitrary init containers into the Task pod. The only pre-agent customization is workspace.spec.files[] (which writes static files) and agentConfig.spec.plugins[] (which installs agent plugins). There is no way to:
- Run
npm installorpip installbefore the agent starts - Compile protobuf definitions or generate code
- Pull large data files or model weights
- Set up database schemas for integration testing
3. No support for persistent/shared volumes
The workspace is always an EmptyDir — ephemeral and per-pod. There is no way to:
- Mount a PVC with pre-populated dependencies (e.g., a shared
node_modulescache) - Attach a ReadOnlyMany volume with large shared assets
- Use a hostPath volume for local development scenarios
Proposal
Extend the Workspace CRD with two new optional fields: spec.setup and spec.volumes.
spec.setup — Custom init containers that run after git clone, before the agent
apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
name: my-app
spec:
repo: https://github.com/myorg/my-app.git
secretRef:
name: github-token
setup:
- name: install-deps
image: node:22-alpine
command: ["sh", "-c", "npm ci --prefer-offline"]
- name: generate-protos
image: bufbuild/buf:latest
command: ["sh", "-c", "cd /workspace/repo && buf generate"]Semantics:
- Setup containers run as init containers in the Task pod, after the git clone init container but before the agent container
- They mount the workspace volume at
/workspace(same as the agent) - They run as the agent UID (61100) for filesystem permission compatibility
- They also mount any user-defined
spec.volumes(see below) - If any setup container fails, the Task transitions to
Failedphase
spec.volumes — Additional volumes mounted into the workspace pod
apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
name: my-app-cached
spec:
repo: https://github.com/myorg/my-app.git
secretRef:
name: github-token
volumes:
- name: npm-cache
mountPath: /workspace/repo/node_modules
source:
persistentVolumeClaim:
claimName: shared-npm-cache
readOnly: true
- name: model-weights
mountPath: /workspace/data
source:
persistentVolumeClaim:
claimName: ml-model-weights
readOnly: true
setup:
- name: install-deps
image: node:22-alpine
command: ["sh", "-c", "cd /workspace/repo && npm ci --prefer-offline --cache /tmp/.npm"]Semantics:
- Each volume entry specifies a
name,mountPath, and asource(using standard KubernetesVolumeSource) - Volumes are mounted into both setup containers and the agent container
- The workspace volume itself remains an
EmptyDir(the repo is always freshly cloned for isolation) - User-defined volumes are additional mounts — they don't replace the workspace volume
Example: Enterprise monorepo with heavy build step
apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
name: enterprise-monorepo
spec:
repo: https://github.com/bigcorp/platform.git
ref: main
secretRef:
name: github-token
volumes:
- name: gradle-cache
mountPath: /home/user/.gradle
source:
persistentVolumeClaim:
claimName: gradle-cache-pvc
setup:
- name: gradle-deps
image: gradle:8-jdk21
command: ["sh", "-c", "cd /workspace/repo && gradle dependencies --no-daemon"]Example: ML project with large data files
apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
name: ml-project
spec:
repo: https://github.com/myorg/ml-pipeline.git
secretRef:
name: github-token
volumes:
- name: training-data
mountPath: /workspace/data
source:
persistentVolumeClaim:
claimName: training-dataset-v3
readOnly: true
setup:
- name: install-deps
image: python:3.12-slim
command: ["sh", "-c", "cd /workspace/repo && pip install -r requirements.txt"]Implementation notes
Where the change lives
api/v1alpha1/workspace_types.go: AddSetup []SetupContainerandVolumes []WorkspaceVolumetoWorkspaceSpecinternal/controller/job_builder.go: InbuildAgentJob(), after the git clone init container and branch checkout init container, append user-defined setup containers. Inject workspace volume mount + user volume mounts into each. Also add user volumes to the pod spec and mount them into the agent container.docs/reference.md: Document the new fieldsexamples/: Add a new example (e.g.,09-workspace-with-setup/)
Backward compatibility
Both fields are optional with zero-value defaults. Existing Workspace resources work identically. No migration needed.
Security considerations
- Setup containers run as UID 61100 (same as agent) — no privilege escalation
- PVC mounts follow standard Kubernetes RBAC — users can only mount PVCs they have access to in their namespace
- The
sourcefield should probably be limited to safe volume types (PVC, ConfigMap, Secret, EmptyDir) and excludehostPathin production via admission policy — this is consistent with how most Kubernetes operators handle volume injection
Why this belongs on Workspace (not Task or podOverrides)
- Setup steps are properties of the repository, not the task. Every task working on
my-appneeds the samenpm install. Putting it on Workspace avoids duplicating setup config across every TaskSpawner/Task. podOverrideson Task is for per-task tuning (resources, deadline, env vars). Setup steps are shared infrastructure.- Workspace already owns the git clone lifecycle — extending it with post-clone setup is a natural progression.
Related issues
- API: Add pod-level configuration to TaskSpec (resource limits, deadline, env, nodeSelector) #256 (closed): Added
podOverridesfor resources/deadline/env — this proposal extends workspace-level customization, not pod-level - API: Add maxCostUSD budget enforcement to TaskSpawner for spend-limited autonomous agents #624:
maxCostUSDbudget — reducing wasted agent time on setup directly reduces costs - New use case: Continuous codebase health monitoring and scheduled technical debt reduction campaigns #743: Tech debt campaigns — pre-warmed workspaces make cron-based campaigns more cost-effective
/kind feature