API: Add setup containers and volume strategy to Workspace for dependency caching and pre-agent initialization

🤖 **Kelos Strategist Agent** @gjkim42

## Problem

Every Kelos Task execution starts from a completely cold workspace: a fresh `git clone --depth 1` into an `EmptyDir` volume (see `internal/controller/job_builder.go:351,365`). While shallow cloning is fast for the git layer itself, **there is no mechanism to cache or pre-warm anything beyond the bare repository checkout**.

This creates three concrete problems:

### 1. Wasted agent time (= wasted API credits) on dependency installation

Most real-world projects require dependency installation before an agent can meaningfully work. When a Claude Code agent is given a Node.js project, it typically runs `npm install` as one of its first actions — consuming expensive API tokens to type a command and wait for it. For a `package-lock.json` with hundreds of packages, this can take 30-60 seconds of agent wall-clock time per task. At scale (e.g., a TaskSpawner processing 20 issues/day), this overhead compounds to significant cost.

The same applies to Python (`pip install`), Go (`go mod download`), Rust (`cargo fetch`), and Java (`mvn dependency:resolve`) projects.

### 2. No way to run custom setup steps before the agent starts

Users cannot currently inject arbitrary init containers into the Task pod. The only pre-agent customization is `workspace.spec.files[]` (which writes static files) and `agentConfig.spec.plugins[]` (which installs agent plugins). There is no way to:
- Run `npm install` or `pip install` before the agent starts
- Compile protobuf definitions or generate code
- Pull large data files or model weights
- Set up database schemas for integration testing

### 3. No support for persistent/shared volumes

The workspace is always an `EmptyDir` — ephemeral and per-pod. There is no way to:
- Mount a PVC with pre-populated dependencies (e.g., a shared `node_modules` cache)
- Attach a ReadOnlyMany volume with large shared assets
- Use a hostPath volume for local development scenarios

## Proposal

Extend the **Workspace CRD** with two new optional fields: `spec.setup` and `spec.volumes`.

### `spec.setup` — Custom init containers that run after git clone, before the agent

```yaml
apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
  name: my-app
spec:
  repo: https://github.com/myorg/my-app.git
  secretRef:
    name: github-token
  setup:
    - name: install-deps
      image: node:22-alpine
      command: ["sh", "-c", "npm ci --prefer-offline"]
    - name: generate-protos
      image: bufbuild/buf:latest
      command: ["sh", "-c", "cd /workspace/repo && buf generate"]
```

**Semantics:**
- Setup containers run as init containers in the Task pod, **after** the git clone init container but **before** the agent container
- They mount the workspace volume at `/workspace` (same as the agent)
- They run as the agent UID (61100) for filesystem permission compatibility
- They also mount any user-defined `spec.volumes` (see below)
- If any setup container fails, the Task transitions to `Failed` phase

### `spec.volumes` — Additional volumes mounted into the workspace pod

```yaml
apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
  name: my-app-cached
spec:
  repo: https://github.com/myorg/my-app.git
  secretRef:
    name: github-token
  volumes:
    - name: npm-cache
      mountPath: /workspace/repo/node_modules
      source:
        persistentVolumeClaim:
          claimName: shared-npm-cache
          readOnly: true
    - name: model-weights
      mountPath: /workspace/data
      source:
        persistentVolumeClaim:
          claimName: ml-model-weights
          readOnly: true
  setup:
    - name: install-deps
      image: node:22-alpine
      command: ["sh", "-c", "cd /workspace/repo && npm ci --prefer-offline --cache /tmp/.npm"]
```

**Semantics:**
- Each volume entry specifies a `name`, `mountPath`, and a `source` (using standard Kubernetes `VolumeSource`)
- Volumes are mounted into both setup containers and the agent container
- The workspace volume itself remains an `EmptyDir` (the repo is always freshly cloned for isolation)
- User-defined volumes are *additional* mounts — they don't replace the workspace volume

### Example: Enterprise monorepo with heavy build step

```yaml
apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
  name: enterprise-monorepo
spec:
  repo: https://github.com/bigcorp/platform.git
  ref: main
  secretRef:
    name: github-token
  volumes:
    - name: gradle-cache
      mountPath: /home/user/.gradle
      source:
        persistentVolumeClaim:
          claimName: gradle-cache-pvc
  setup:
    - name: gradle-deps
      image: gradle:8-jdk21
      command: ["sh", "-c", "cd /workspace/repo && gradle dependencies --no-daemon"]
```

### Example: ML project with large data files

```yaml
apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
  name: ml-project
spec:
  repo: https://github.com/myorg/ml-pipeline.git
  secretRef:
    name: github-token
  volumes:
    - name: training-data
      mountPath: /workspace/data
      source:
        persistentVolumeClaim:
          claimName: training-dataset-v3
          readOnly: true
  setup:
    - name: install-deps
      image: python:3.12-slim
      command: ["sh", "-c", "cd /workspace/repo && pip install -r requirements.txt"]
```

## Implementation notes

### Where the change lives

1. **`api/v1alpha1/workspace_types.go`**: Add `Setup []SetupContainer` and `Volumes []WorkspaceVolume` to `WorkspaceSpec`
2. **`internal/controller/job_builder.go`**: In `buildAgentJob()`, after the git clone init container and branch checkout init container, append user-defined setup containers. Inject workspace volume mount + user volume mounts into each. Also add user volumes to the pod spec and mount them into the agent container.
3. **`docs/reference.md`**: Document the new fields
4. **`examples/`**: Add a new example (e.g., `09-workspace-with-setup/`)

### Backward compatibility

Both fields are optional with zero-value defaults. Existing Workspace resources work identically. No migration needed.

### Security considerations

- Setup containers run as UID 61100 (same as agent) — no privilege escalation
- PVC mounts follow standard Kubernetes RBAC — users can only mount PVCs they have access to in their namespace
- The `source` field should probably be limited to safe volume types (PVC, ConfigMap, Secret, EmptyDir) and exclude `hostPath` in production via admission policy — this is consistent with how most Kubernetes operators handle volume injection

## Why this belongs on Workspace (not Task or podOverrides)

- **Setup steps are properties of the repository**, not the task. Every task working on `my-app` needs the same `npm install`. Putting it on Workspace avoids duplicating setup config across every TaskSpawner/Task.
- **`podOverrides` on Task is for per-task tuning** (resources, deadline, env vars). Setup steps are shared infrastructure.
- **Workspace already owns the git clone lifecycle** — extending it with post-clone setup is a natural progression.

## Related issues

- #256 (closed): Added `podOverrides` for resources/deadline/env — this proposal extends workspace-level customization, not pod-level
- #624: `maxCostUSD` budget — reducing wasted agent time on setup directly reduces costs
- #743: Tech debt campaigns — pre-warmed workspaces make cron-based campaigns more cost-effective

/kind feature

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: Add setup containers and volume strategy to Workspace for dependency caching and pre-agent initialization #774

Problem

1. Wasted agent time (= wasted API credits) on dependency installation

2. No way to run custom setup steps before the agent starts

3. No support for persistent/shared volumes

Proposal

`spec.setup` — Custom init containers that run after git clone, before the agent

`spec.volumes` — Additional volumes mounted into the workspace pod

Example: Enterprise monorepo with heavy build step

Example: ML project with large data files

Implementation notes

Where the change lives

Backward compatibility

Security considerations

Why this belongs on Workspace (not Task or podOverrides)

Related issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

API: Add setup containers and volume strategy to Workspace for dependency caching and pre-agent initialization #774

Description

Problem

1. Wasted agent time (= wasted API credits) on dependency installation

2. No way to run custom setup steps before the agent starts

3. No support for persistent/shared volumes

Proposal

spec.setup — Custom init containers that run after git clone, before the agent

spec.volumes — Additional volumes mounted into the workspace pod

Example: Enterprise monorepo with heavy build step

Example: ML project with large data files

Implementation notes

Where the change lives

Backward compatibility

Security considerations

Why this belongs on Workspace (not Task or podOverrides)

Related issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`spec.setup` — Custom init containers that run after git clone, before the agent

`spec.volumes` — Additional volumes mounted into the workspace pod