fix(kyverno-policies): generate-pdb creates duplicates that block node drain in client-app namespaces

## Context

The Kyverno ClusterPolicy `generate-pdb` (defined in `components/kyverno-policies/templates/generate-pdb.yaml`) automatically generates a `{{ .metadata.name }}-pdb` PodDisruptionBudget for every Deployment with `replicas > 1` that is not in the excluded-namespace-list (`_helpers.tpl`).

The excluded-namespace-list covers **platform** namespaces (argocd, cert-manager, cnpg-system, etc.) — all places where the rendered Helm chart already provides its own PDB. It does **not** cover client application namespaces.

## Bug observed

During teardown validation on `transfero-workload-azure-eastus2-hml`, `terraform destroy` failed to drain an AKS node because the Dapr `sidecar-injector` pod had **two overlapping PDBs**:

| PDB | Source | Selector |
|---|---|---|
| `dapr-sidecar-injector-disruption-budget` | Dapr Helm chart (`dapr_sidecar_injector/templates/dapr_sidecar_injector_poddisruptionbudget.yaml`) | `dapr.io/control-plane=dapr-sidecar-injector` |
| `dapr-sidecar-injector-pdb` | **Kyverno `generate-pdb` policy (this chart)** | same matchLabels propagated from Deployment.spec.selector |

Kubernetes eviction API returns:

```
This pod has more than one PodDisruptionBudget, which the eviction subresource does not support
```

Every replica eviction fails, node drain blocks indefinitely, AKS agent pool delete fails, Terraform times out. Manual remediation: `kubectl delete pdb --all -A` across the cluster.

The same duplication affected `kong-kong-pdb` (from Kong chart) vs `kong-kong` (from Kyverno).

## Root causes

1. **Excluded namespaces only cover platform-owned namespaces.** When a client installs a Helm chart in a namespace outside the platform set (e.g., `dapr-system`, `kong`), and that chart ships its own PDB, the Kyverno generator duplicates it.
2. **The generator has no pre-existence check.** It does not skip Deployments that already have a PDB selecting them.
3. **`synchronize: true` + `generateExisting: true`** means deleting the Kyverno-generated PDB does not help — it gets re-created on the next reconcile.

## Options

### A — Extend the excluded-namespace-list

Add client-app namespaces where upstream charts already provide PDBs (`dapr-system`, `kong`, ...).

Pros: trivial change, unblocks today's cases.
Cons: doesn't scale — every new client app with PDBs needs a chart bump. Pollutes the "platform" excluded list with client-specific names.

### B — Add an opt-out label on the Deployment or Namespace

Policy skips if `estabilis.io/pdb: unmanaged` is present on the Deployment (or its Namespace). Client opts out when their chart ships PDBs.

Pros: scales — client self-serves. Clear ownership model.
Cons: requires documentation and client awareness.

### C — Pre-existence check in the policy

Use a Kyverno JMESPath or CEL expression to query existing PodDisruptionBudgets in the namespace and skip if one selects the current Deployment.

Pros: fully automatic — zero configuration burden on clients.
Cons: expensive (policy checks all PDBs in the namespace per Deployment event), Kyverno's `query` capabilities are limited, complexity in testing.

### D — Rename the generated PDB to a namespaced distinctive name

Instead of `{{name}}-pdb`, use `estabilis-{{name}}-pdb`. This does **not** solve the duplicate-PDB problem (both still select the same pod) but makes the Kyverno-owned PDB clearly distinguishable.

Pros: clarity of ownership.
Cons: doesn't fix the core bug.

## Recommendation

**Combine B (opt-out label) + A (immediate exclude for known cases).**

- Apply A now: add `dapr-system` and `kong` to the excluded-namespace-list as a quick fix
- Implement B in a follow-up PR: add the opt-out label check in the match expression, document in the chart README

Longer-term, consider **C** when Kyverno's query API matures or when a generate-if-absent mutation pattern is available.

## Acceptance criteria

- [ ] `dapr-system` and `kong` added to the excluded-namespace-list in `_helpers.tpl`
- [ ] Opt-out label `estabilis.io/pdb: unmanaged` honored by the `generate-pdb` policy via `exclude.any.resources.selector.matchLabels`
- [ ] README updated with the opt-out mechanism and an example
- [ ] Chart bumped (workload-bootstrap + values.yaml.repoVersion, per cross-repo rules)
- [ ] Tested on a fresh workload cluster: Dapr charts install, no duplicate PDBs, `kubectl drain` works

## Related

- Teardown validation session (2026-04-14) — blocker for `terraform destroy` on workload cluster
- [kubernetes/kubernetes#72320](https://github.com/kubernetes/kubernetes/issues/72320) — canonical issue documenting eviction API limitation with multiple PDBs

## Evidence

Deploy age at time of failure: 95m (initial workload-bootstrap) + 39m (re-sync after bridge refactor).

Kyverno-generated PDBs at time of drain failure:
- `dapr-operator-pdb`
- `dapr-sentry-pdb`
- `dapr-sidecar-injector-pdb`
- `kong-kong-pdb`

Dapr chart-generated PDBs:
- `dapr-operator-disruption-budget`
- `dapr-sentry-budget`
- `dapr-sidecar-injector-disruption-budget`
- (also `dapr-placement-server-disruption-budget`, `dapr-scheduler-server-disruption-budget` — these had no Kyverno duplicate because their Deployments are StatefulSets, not Deployments)


PDB	Source	Selector
`dapr-sidecar-injector-disruption-budget`	Dapr Helm chart (`dapr_sidecar_injector/templates/dapr_sidecar_injector_poddisruptionbudget.yaml`)	`dapr.io/control-plane=dapr-sidecar-injector`
`dapr-sidecar-injector-pdb`	Kyverno `generate-pdb` policy (this chart)	same matchLabels propagated from Deployment.spec.selector

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(kyverno-policies): generate-pdb creates duplicates that block node drain in client-app namespaces #2

Context

Bug observed

Root causes

Options

A — Extend the excluded-namespace-list

B — Add an opt-out label on the Deployment or Namespace

C — Pre-existence check in the policy

D — Rename the generated PDB to a namespaced distinctive name

Recommendation

Acceptance criteria

Related

Evidence

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fix(kyverno-policies): generate-pdb creates duplicates that block node drain in client-app namespaces #2

Description

Context

Bug observed

Root causes

Options

A — Extend the excluded-namespace-list

B — Add an opt-out label on the Deployment or Namespace

C — Pre-existence check in the policy

D — Rename the generated PDB to a namespaced distinctive name

Recommendation

Acceptance criteria

Related

Evidence

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions