Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 9 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ cocoon-operator/
│ ┌────────────────────────┐ ┌─────────────────────────────┐ │
│ │ cocoonset.Reconciler │ │ hibernation.Reconciler │ │
│ │ - finalizer + GC │ │ - HibernateState patches │ │
│ │ - main → subs → tbs │ │ - epoch.HasManifest probe │ │
│ │ - main → subs → tbs │ │ - registry manifest probe │ │
│ │ - patch /status │ │ - Conditions │ │
│ └────────┬───────────────┘ └────────────┬────────────────┘ │
│ │ │ │
Expand All @@ -44,12 +44,12 @@ cocoon-operator/
### CocoonSet reconcile loop

1. Fetch the CocoonSet (return early on NotFound).
2. If `DeletionTimestamp` is set, walk owned pods, delete them, `epoch.DeleteManifest` for both `:latest` and `:hibernate` tags on every owned VM (unconditional — DeleteManifest is 404-tolerant, and hibernate pushes ignore snapshotPolicy so any main-only gate would orphan `:hibernate` tags pushed by sub-agents), then drop the finalizer. VM names to GC are stashed onto an annotation before pod deletion so the cleanup survives a CocoonSet deleted before `Status.Agents` was ever patched.
2. If `DeletionTimestamp` is set, walk owned pods, delete them, `Registry.DeleteManifest` for both `:latest` and `:hibernate` tags on every owned VM (unconditional — DeleteManifest is 404-tolerant, and hibernate pushes ignore snapshotPolicy so any main-only gate would orphan `:hibernate` tags pushed by sub-agents), then drop the finalizer. VM names to GC are stashed onto an annotation before pod deletion so the cleanup survives a CocoonSet deleted before `Status.Agents` was ever patched.
3. Ensure the `cocoonset.cocoonstack.io/finalizer` is in place.
4. List owned pods by `cocoonset.cocoonstack.io/name=<cs.Name>`, drop any with stale labels that aren't actually controller-owned, and classify the rest by role label.
5. **Lifecycle-bridge stamp**: patch `cs.Generation` onto each owned pod's `cocoonset.cocoonstack.io/generation` annotation so vk-cocoon can echo it back as `lifecycle-observed-generation`, giving clients a counter-based completion signal immune to wallclock skew.
6. **Failed-state short-circuit**: if the main pod is terminal (`Pod.Phase=Failed`, or it carries `vm.cocoonstack.io/lifecycle-state=Failed` from vk-cocoon while still Running), patch `Phase=Failed` and emit `MainAgentFailed` / `PodLifecycleFailed`. The Failed phase is recoverable: when the main pod becomes `Ready` again the operator emits `RecoveredFromFailure` and resumes normal reconciliation.
7. **Suspend short-circuit**: if `spec.suspend == true`, write `meta.HibernateState(true)` onto every owned pod and poll epoch for `:hibernate` manifests on every managed VM. Stay in `Phase=Suspending` (requeueing every 5 s) until every required snapshot lands, then transition to `Phase=Suspended`.
7. **Suspend short-circuit**: if `spec.suspend == true`, write `meta.HibernateState(true)` onto every owned pod and poll the registry for `:hibernate` manifests on every managed VM. Stay in `Phase=Suspending` (requeueing every 5 s) until every required snapshot lands, then transition to `Phase=Suspended`.
8. **Un-suspend**: if `spec.suspend == false` and any owned pod still carries the hibernate annotation from a prior suspend, clear it via `PatchHibernateState(false)` so vk-cocoon wakes the VMs. Pods that are the active target of a `desire=Hibernate` CocoonHibernation CR are skipped to avoid racing the hibernation reconciler. `PatchHibernateState(false)` is a no-op on pods whose annotation is already absent, so this is cheap in the common "never suspended" case.
9. Ensure the **main agent** (slot 0). If the existing pod has drifted from spec, delete it for recreate. If it is not yet `Ready`, requeue in 5 s and report `Phase=Pending`.
10. Ensure sub-agents `[1..Replicas]` (creates are fanned out via an errgroup capped at 8 concurrent pod creates so a large scale-up does not burst the apiserver); delete extras above the requested count.
Expand All @@ -62,12 +62,12 @@ Pods are constructed via `meta.FromAgentSpec` / `meta.FromToolboxSpec` factory h

| Spec.Desire | What the reconciler does | Terminal phase |
|---|---|---|
| `Hibernate` | `meta.HibernateState(true).Apply` on the target pod, then poll `epoch.HasManifest(vmName, meta.HibernateSnapshotTag)` until the snapshot lands or `hibernateTimeout` (3 minutes) trips. A probe error (transport / 5xx / auth) surfaces as a returned error so controller-runtime logs + retries with backoff. | `Hibernated` |
| `Wake` | Check if the container is already `Running` (skip annotation patch if so), otherwise clear `meta.HibernateState` **once** (skip if already cleared to avoid triggering informer events on every requeue cycle), then wait for the container to be `Running` and drop the hibernation snapshot tag from epoch. A wake that does not complete within `wakeTimeout` (5 minutes) is escalated to `Phase=Failed` with a dated message in the `Ready` condition. | `Active` |
| `Hibernate` | `meta.HibernateState(true).Apply` on the target pod, then poll `Registry.HasManifest(vmName, meta.HibernateSnapshotTag)` until the snapshot lands or `hibernateTimeout` (3 minutes) trips. A probe error (transport / 5xx / auth) surfaces as a returned error so controller-runtime logs + retries with backoff. | `Hibernated` |
| `Wake` | Check if the container is already `Running` (skip annotation patch if so), otherwise clear `meta.HibernateState` **once** (skip if already cleared to avoid triggering informer events on every requeue cycle), then wait for the container to be `Running` and drop the hibernation snapshot tag from the registry. A wake that does not complete within `wakeTimeout` (5 minutes) is escalated to `Phase=Failed` with a dated message in the `Ready` condition. | `Active` |

On CR deletion the reconciler runs a finalizer (`cocoonhibernation.cocoonset.cocoonstack.io/finalizer`) that clears the `:hibernate` tag from epoch (if `Status.VMName` is set) before removing itself, so deleting a CocoonHibernation never leaves an orphaned snapshot on the registry.
On CR deletion the reconciler runs a finalizer (`cocoonhibernation.cocoonset.cocoonstack.io/finalizer`) that clears the `:hibernate` tag from the registry (if `Status.VMName` is set) before removing itself, so deleting a CocoonHibernation never leaves an orphaned snapshot on the registry.

There is no `cocoon-vm-snapshots` ConfigMap bridge — epoch is the single source of truth for hibernation state. Failure paths set `Phase=Failed` with a one-shot message in the `Ready` condition instead of looping forever on a bad reference. Both `Hibernate` and `Wake` Failed phases are recoverable: on re-entry from a non-deadline phase the reconciler refreshes the Ready condition's `LastTransitionTime` so the budget resets cleanly (without the override, `apimeta.SetStatusCondition` would preserve the stale timestamp across the `False → False` transition and the recovered phase would trip the deadline on the next reconcile). Each retry emits a `RetryRequested` Normal Event so the recovery is visible in `kubectl describe`.
There is no `cocoon-vm-snapshots` ConfigMap bridge — the registry is the single source of truth for hibernation state. Failure paths set `Phase=Failed` with a one-shot message in the `Ready` condition instead of looping forever on a bad reference. Both `Hibernate` and `Wake` Failed phases are recoverable: on re-entry from a non-deadline phase the reconciler refreshes the Ready condition's `LastTransitionTime` so the budget resets cleanly (without the override, `apimeta.SetStatusCondition` would preserve the stale timestamp across the `False → False` transition and the recovered phase would trip the deadline on the next reconcile). Each retry emits a `RetryRequested` Normal Event so the recovery is visible in `kubectl describe`.

### Observability

Expand Down Expand Up @@ -101,9 +101,7 @@ cocoon_operator_lifecycle_state_failed_observed_total{phase}
|---|---|---|
| `KUBECONFIG` | unset | Path to kubeconfig when running outside the cluster |
| `OPERATOR_LOG_LEVEL` | `info` | `projecteru2/core/log` level |
| `EPOCH_URL` | `http://epoch.cocoon-system.svc:8080` | Base URL of the epoch registry |
| `EPOCH_TOKEN` | unset | Bearer token (read-only is enough) |
| `EPOCH_CA_CERT` | unset | Read internally by `github.com/cocoonstack/epoch/registryclient` — see that package for the exact env names. |
| `OCI_REGISTRY` | **required** | OCI registry base for snapshot manifests (e.g. an Artifact Registry repo). Auth resolves GCP ADC then docker config. |
| `METRICS_ADDR` | `:8080` | Prometheus listener |
| `PROBE_ADDR` | `:8081` | healthz / readyz listener |
| `LEADER_ELECT` | `true` | Enable leader election so only one replica reconciles |
Expand Down Expand Up @@ -153,9 +151,8 @@ The Makefile detects Go workspace mode (`go env GOWORK`) and skips `go mod tidy`

| Project | Role |
|---|---|
| [cocoon-common](https://github.com/cocoonstack/cocoon-common) | CRD types, annotation contract, shared helpers |
| [cocoon-common](https://github.com/cocoonstack/cocoon-common) | CRD types, annotation contract, shared helpers, and the OCI registry client |
| [cocoon-webhook](https://github.com/cocoonstack/cocoon-webhook) | Admission webhook for sticky scheduling and CocoonSet validation |
| [epoch](https://github.com/cocoonstack/epoch) | Snapshot registry; the operator queries it via the `snapshot.Registry` interface |
| [vk-cocoon](https://github.com/cocoonstack/vk-cocoon) | Virtual kubelet provider managing VM lifecycle |

## License
Expand Down
10 changes: 5 additions & 5 deletions cocoonset/delete.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ func (r *Reconciler) reconcileDelete(ctx context.Context, cs *cocoonv1.CocoonSet
}

// Requeue if any pods still exist — vk-cocoon's DeletePod may still be running
// snapshot save/push. We only GC epoch tags once every pod is fully gone.
// snapshot save/push. We only GC registry tags once every pod is fully gone.
remainingOwned, listErr := r.listOwnedPods(ctx, cs)
if listErr != nil {
return ctrl.Result{}, fmt.Errorf("re-list pods after delete: %w", listErr)
Expand All @@ -59,20 +59,20 @@ func (r *Reconciler) reconcileDelete(ctx context.Context, cs *cocoonv1.CocoonSet

// :hibernate is always orphaned at teardown — drop unconditionally. :latest
// is kept when shouldKeepLatestTag says vk-cocoon pushed it for retag.
if r.Epoch != nil {
if r.Registry != nil {
for _, name := range vmNamesForGC(cs) {
if err := r.Epoch.DeleteManifest(ctx, name, meta.HibernateSnapshotTag); err != nil {
if err := r.Registry.DeleteManifest(ctx, name, meta.HibernateSnapshotTag); err != nil {
logger.Warnf(ctx, "delete snapshot %s:%s: %v", name, meta.HibernateSnapshotTag, err)
}
if shouldKeepLatestTag(cs, name) {
continue
}
if err := r.Epoch.DeleteManifest(ctx, name, meta.DefaultSnapshotTag); err != nil {
if err := r.Registry.DeleteManifest(ctx, name, meta.DefaultSnapshotTag); err != nil {
logger.Warnf(ctx, "delete snapshot %s:%s: %v", name, meta.DefaultSnapshotTag, err)
}
}
} else {
logger.Warnf(ctx, "skipping epoch tag GC for cocoonset %s/%s: registry not configured", cs.Namespace, cs.Name)
logger.Warnf(ctx, "skipping registry tag GC for cocoonset %s/%s: registry not configured", cs.Namespace, cs.Name)
}

if controllerutil.ContainsFinalizer(cs, finalizerName) {
Expand Down
2 changes: 1 addition & 1 deletion cocoonset/reconciler.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ const (
type Reconciler struct {
client.Client
Scheme *runtime.Scheme
Epoch snapshot.Registry
Registry snapshot.Registry
Recorder record.EventRecorder
}

Expand Down
14 changes: 7 additions & 7 deletions cocoonset/reconciler_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@ func TestAllOwnedPodsHibernatedWaitsForEachManagedPod(t *testing.T) {
reg := &fakeRegistry{present: map[string]bool{
"vk-ns-demo-0:" + meta.HibernateSnapshotTag: true,
}}
r := &Reconciler{Scheme: scheme, Epoch: reg}
r := &Reconciler{Scheme: scheme, Registry: reg}

done, err := r.allOwnedPodsHibernated(t.Context(), classified)
if err != nil {
Expand Down Expand Up @@ -255,7 +255,7 @@ func TestAllOwnedPodsHibernatedSkipsUnmanagedToolbox(t *testing.T) {
reg := &fakeRegistry{present: map[string]bool{
"vk-ns-demo-0:" + meta.HibernateSnapshotTag: true,
}}
r := &Reconciler{Scheme: scheme, Epoch: reg}
r := &Reconciler{Scheme: scheme, Registry: reg}

done, err := r.allOwnedPodsHibernated(t.Context(), classified)
if err != nil {
Expand All @@ -276,7 +276,7 @@ func TestAllOwnedPodsHibernatedPropagatesProbeError(t *testing.T) {
toolbox: map[string]*corev1.Pod{},
allByName: map[string]*corev1.Pod{main.Name: main},
}
r := &Reconciler{Scheme: scheme, Epoch: &fakeRegistry{probeErr: errors.New("transport boom")}}
r := &Reconciler{Scheme: scheme, Registry: &fakeRegistry{probeErr: errors.New("transport boom")}}
if _, err := r.allOwnedPodsHibernated(t.Context(), classified); err == nil {
t.Fatal("expected probe error to surface")
}
Expand Down Expand Up @@ -363,7 +363,7 @@ func TestReconcileMainLifecycleFailedTransitionsToFailed(t *testing.T) {
WithObjects(cs, mainPod).
WithStatusSubresource(&cocoonv1.CocoonSet{}).
Build()
r := &Reconciler{Client: cli, Scheme: scheme, Epoch: &fakeRegistry{}}
r := &Reconciler{Client: cli, Scheme: scheme, Registry: &fakeRegistry{}}

if _, err := r.Reconcile(t.Context(), ctrl.Request{NamespacedName: types.NamespacedName{Namespace: cs.Namespace, Name: cs.Name}}); err != nil {
t.Fatalf("Reconcile: %v", err)
Expand Down Expand Up @@ -557,7 +557,7 @@ func TestReconcileDeleteSnapshotPolicyGC(t *testing.T) {

cli := ctrlfake.NewClientBuilder().WithScheme(scheme).WithObjects(cs).Build()
reg := &fakeRegistry{}
r := &Reconciler{Client: cli, Scheme: scheme, Epoch: reg}
r := &Reconciler{Client: cli, Scheme: scheme, Registry: reg}

if _, err := r.reconcileDelete(t.Context(), cs); err != nil {
t.Fatalf("reconcileDelete: %v", err)
Expand All @@ -583,7 +583,7 @@ func TestReconcileDeleteStashesPodVMNamesEvenWhenStatusIsEmpty(t *testing.T) {
WithObjects(cs, mustBuildAgentPod(t, cs, 0, "", "", scheme)).
Build()
reg := &fakeRegistry{}
r := &Reconciler{Client: cli, Scheme: scheme, Epoch: reg}
r := &Reconciler{Client: cli, Scheme: scheme, Registry: reg}

if _, err := r.reconcileDelete(t.Context(), cs); err != nil {
t.Fatalf("reconcileDelete: %v", err)
Expand Down Expand Up @@ -611,7 +611,7 @@ func TestReconcileDeleteCleansTagsAfterPodsGone(t *testing.T) {

cli := ctrlfake.NewClientBuilder().WithScheme(scheme).WithObjects(cs).Build()
reg := &fakeRegistry{}
r := &Reconciler{Client: cli, Scheme: scheme, Epoch: reg}
r := &Reconciler{Client: cli, Scheme: scheme, Registry: reg}

if _, err := r.reconcileDelete(t.Context(), cs); err != nil {
t.Fatalf("reconcileDelete: %v", err)
Expand Down
10 changes: 5 additions & 5 deletions cocoonset/suspend.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ import (
)

// reconcileSuspend ensures the main agent exists, applies the hibernate
// annotation to every owned pod, then polls epoch to observe when all
// annotation to every owned pod, then polls the registry to observe when all
// managed VMs have been pushed to snapshot. Stays in Suspending with a
// periodic requeue until every required snapshot lands.
func (r *Reconciler) reconcileSuspend(ctx context.Context, cs *cocoonv1.CocoonSet, classified classifiedPods) (ctrl.Result, error) {
Expand Down Expand Up @@ -51,13 +51,13 @@ func (r *Reconciler) reconcileSuspend(ctx context.Context, cs *cocoonv1.CocoonSe
}

// allOwnedPodsHibernated reports whether every managed owned pod has a
// hibernate snapshot published to epoch. Unmanaged pods (e.g. static
// hibernate snapshot published to the registry. Unmanaged pods (e.g. static
// toolboxes) are skipped since they have no VM lifecycle to observe.
// Returns (false, nil) whenever the expected state is not yet observed so
// the caller requeues rather than treats it as an error.
func (r *Reconciler) allOwnedPodsHibernated(ctx context.Context, classified classifiedPods) (bool, error) {
if r.Epoch == nil {
// No registry configured; epoch-less deployments have no snapshot to
if r.Registry == nil {
// No registry configured; such deployments have no snapshot to
// observe, so treat the annotation write as authoritative.
return true, nil
}
Expand All @@ -73,7 +73,7 @@ func (r *Reconciler) allOwnedPodsHibernated(ctx context.Context, classified clas
if spec.VMName == "" {
return false, nil
}
present, err := r.Epoch.HasManifest(ctx, spec.VMName, meta.HibernateSnapshotTag)
present, err := r.Registry.HasManifest(ctx, spec.VMName, meta.HibernateSnapshotTag)
if err != nil {
return false, fmt.Errorf("probe hibernate snapshot %s: %w", spec.VMName, err)
}
Expand Down
23 changes: 15 additions & 8 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,26 +3,29 @@ module github.com/cocoonstack/cocoon-operator
go 1.25.6

require (
github.com/cocoonstack/cocoon-common v0.2.2
github.com/cocoonstack/epoch v0.2.4
github.com/cocoonstack/cocoon-common v0.2.3-0.20260701064759-3dcdfdd23a16
github.com/go-logr/logr v1.4.3
github.com/google/go-containerregistry v0.21.7
github.com/projecteru2/core v0.0.0-20241016125006-ff909eefe04c
github.com/prometheus/client_golang v1.23.2
golang.org/x/sync v0.19.0
golang.org/x/sync v0.21.0
k8s.io/api v0.35.3
k8s.io/apimachinery v0.35.3
k8s.io/client-go v0.35.3
sigs.k8s.io/controller-runtime v0.23.3
)

require (
cloud.google.com/go/compute/metadata v0.9.0 // indirect
github.com/alphadose/haxmap v1.2.0 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/cockroachdb/errors v1.9.1 // indirect
github.com/cockroachdb/logtags v0.0.0-20230118201751-21c54148d20b // indirect
github.com/cockroachdb/redact v1.1.3 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/docker/cli v29.5.3+incompatible // indirect
github.com/docker/docker-credential-helpers v0.9.3 // indirect
github.com/emicklei/go-restful/v3 v3.12.2 // indirect
github.com/evanphx/json-patch/v5 v5.9.11 // indirect
github.com/fsnotify/fsnotify v1.9.0 // indirect
Expand All @@ -38,6 +41,7 @@ require (
github.com/google/uuid v1.6.0 // indirect
github.com/josharian/intern v1.0.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/klauspost/compress v1.18.6 // indirect
github.com/kr/pretty v0.3.1 // indirect
github.com/kr/text v0.2.0 // indirect
github.com/mailru/easyjson v0.7.7 // indirect
Expand All @@ -47,23 +51,26 @@ require (
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.3-0.20250322232337-35a7c28c31ee // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/opencontainers/go-digest v1.0.0 // indirect
github.com/opencontainers/image-spec v1.1.1 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/prometheus/client_model v0.6.2 // indirect
github.com/prometheus/common v0.66.1 // indirect
github.com/prometheus/procfs v0.16.1 // indirect
github.com/rogpeppe/go-internal v1.14.1 // indirect
github.com/rs/zerolog v1.29.1 // indirect
github.com/sirupsen/logrus v1.9.4 // indirect
github.com/spf13/pflag v1.0.10 // indirect
github.com/x448/float16 v0.8.4 // indirect
go.yaml.in/yaml/v2 v2.4.3 // indirect
go.yaml.in/yaml/v3 v3.0.4 // indirect
golang.org/x/exp v0.0.0-20240719175910-8a7402abbf56 // indirect
golang.org/x/net v0.48.0 // indirect
golang.org/x/oauth2 v0.30.0 // indirect
golang.org/x/sys v0.39.0 // indirect
golang.org/x/term v0.38.0 // indirect
golang.org/x/text v0.32.0 // indirect
golang.org/x/net v0.56.0 // indirect
golang.org/x/oauth2 v0.36.0 // indirect
golang.org/x/sys v0.46.0 // indirect
golang.org/x/term v0.44.0 // indirect
golang.org/x/text v0.38.0 // indirect
golang.org/x/time v0.9.0 // indirect
gomodules.xyz/jsonpatch/v2 v2.4.0 // indirect
google.golang.org/grpc v1.72.2 // indirect
Expand Down
Loading