Skip to content

fix(kyverno): set explicit replicas=1 for all controllers (EAI-6864)#743

Open
silokimmo wants to merge 1 commit into
mainfrom
EAI-6864-fix-kyverno-admission-controller-replicas
Open

fix(kyverno): set explicit replicas=1 for all controllers (EAI-6864)#743
silokimmo wants to merge 1 commit into
mainfrom
EAI-6864-fix-kyverno-admission-controller-replicas

Conversation

@silokimmo

@silokimmo silokimmo commented Jun 10, 2026

Copy link
Copy Markdown

Summary

  • sources/kyverno/3.5.1/values.yaml had replicas: ~ (YAML null) for all four Kyverno controllers
  • The chart's null-replica guard in _deployment.tpl uses kindIs "invalid" which null passes — rendering empty string → Kubernetes coerces to 0 replicas
  • This silently breaks the admission controller webhook on every fresh cluster-forge install, meaning generate policies (e.g. dynamic-pvc-creation for workspace PVCs) never fire
  • Discovered when workspace pods were stuck in Pending on app-dev: kyverno-admission-controller had 0 desired replicas since cluster installation (2026-02-06)
  • ArgoCD considered state Synced because live 0 replicas matched what the chart rendered — selfHeal never corrected it

Fix: Set replicas: 1 explicitly for all four controllers (admissionController, backgroundController, cleanupController, reportsController). Clusters needing HA can override via their cluster-values.

Upstream bug: kyverno/kyverno#8941, #6182

Jira: EAI-6864

Immediate workaround already applied: cluster-values override on app-dev sets admissionController.replicas: 1 — that override can be removed once this PR is deployed to app-dev.

Test plan

  • Verify kyverno-admission-controller deploys with 1 replica on a fresh cluster install
  • Verify dynamic-pvc-creation ClusterPolicy fires on workspace deployment (PVC created, pod reaches Running)
  • Verify existing clusters with cluster-values override continue to work after this fix is synced

🤖 Generated with Claude Code

The upstream Helm chart uses replicas: ~ (YAML null) as the default for
all four Kyverno controllers. The chart's null-replica guard in
_deployment.tpl only checks kindIs "invalid", which null passes — causing
Kubernetes to coerce the empty rendered value to 0 replicas.

This silently breaks the admission controller webhook, meaning generate
policies (e.g. dynamic-pvc-creation) never fire, leaving workspace pods
stuck in Pending due to missing PVCs.

Set replicas: 1 explicitly for admissionController, backgroundController,
cleanupController, and reportsController so all cluster-forge installs get
a working Kyverno from day one. Clusters needing HA can override via their
cluster-values.

Upstream bug: kyverno/kyverno#8941, #6182

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
@silokimmo silokimmo requested a review from a team as a code owner June 10, 2026 12:51
@silokimmo

Copy link
Copy Markdown
Author

E2E Test Results ✅

Tested on ephemeral OCI VM (Ubuntu 24.04) using cluster-bloom v2.1.0.

Test setup:

  • Single-node RKE2 cluster (v1.34.1+rke2r1) deployed via cluster-bloom v2.1.0 (without cluster-forge)
  • cluster-forge deployed from this branch via ArgoCD

Kyverno deployment results — all 4 controllers 1/1 Running:

Controller Desired Ready
admission-controller 1 1
background-controller 1 1
cleanup-controller 1 1
reports-controller 1 1

Fix verified. Setting replicas: 1 explicitly (instead of ~) works correctly on a single-node cluster. Safe to merge.

@silokimmo

Copy link
Copy Markdown
Author

dynamic-pvc-creation ClusterPolicy Test Results ✅

Tested admission controller policy on the same ephemeral VM (cluster-bloom v2.1.0 + this branch via ArgoCD).

Policy: dynamic-pvc-creation ClusterPolicy fires at admission time when a Pod/Deployment has the required annotations and auto-creates a PVC named pvc-{user-pvc-uid}.

Test Expectation Result
Pod without annotations Admitted, no PVC ✅ Pod Running, no PVC
Pod with all required annotations Admitted, PVC auto-created ✅ Pod Running, pvc-test-abc123 created
Pod with partial annotations (missing size) Admitted, no PVC ✅ Pod Running, no extra PVC

Required annotations confirmed:

  • pvc.silogen.ai/user-pvc-auto-create: "true"
  • pvc.silogen.ai/user-pvc-size
  • pvc.silogen.ai/user-pvc-storage-class-name
  • pvc.silogen.ai/user-pvc-uid (used as PVC name suffix)

Policy report showed PASS: 1, FAIL: 0 for the annotated pod. PVC was Pending as expected (local-path doesn't support ReadWriteMany — on a cluster with an RWX-capable storage class the pod would bind and mount it normally).

Admission controller is working correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant