feat(istio): expose istiod_replicas to guarantee HA for node drains by agustincelentano · Pull Request #292 · nullplatform/tofu-modules

agustincelentano · 2026-04-17T21:42:42Z

Summary

Single-replica istiod + chart-default PDB (minAvailable=1) yields disruptionsAllowed=0, which blocks every EKS node rolling update with PodEvictionFailure: Reached max retries. In clusters that use this module, tofu apply fails as soon as any change triggers a node group replacement (AMI bumps, instance_type changes, etc.).

Change

Add a new istiod_replicas variable (default 1, validated >= 1).
Wire it into both pilot.replicaCount and pilot.autoscaleMin on the helm_release "istiod".

Why default = 1

Backwards compatibility. Existing consumers see no behavior change after upgrading the module. Callers that need HA (recommended for clusters doing node rolling updates) opt in explicitly:

module "istio" {
  source = "...//infrastructure/commons/istio?ref=v1.52.0"

  istiod_replicas = 2
}

Why both `replicaCount` AND `autoscaleMin`

The upstream istiod chart enables the HPA by default (pilot.autoscaleEnabled=true, pilot.autoscaleMin=1). Setting only pilot.replicaCount is insufficient — the HPA would scale the deployment back to 1 replica shortly after install, leaving us with the same problem. Overriding autoscaleMin locks in the floor.

Test plan

Apply on a dev cluster with istiod_replicas = 2 and verify kubectl -n istio-system get deploy istiod shows READY 2/2.
Verify kubectl -n istio-system get hpa istiod shows MINPODS=2.
Verify kubectl -n istio-system get pdb istiod shows ALLOWED DISRUPTIONS=1.
Trigger a node rolling update and confirm the drain succeeds.
Apply on another cluster without setting the variable and verify the deployment stays at 1 replica (backwards compat).

Single-replica istiod + PDB minAvailable=1 (chart default) yields disruptionsAllowed=0, which blocks every EKS node rolling update with 'PodEvictionFailure: Reached max retries'. Expose a new istiod_replicas variable (default 2) and wire it into both pilot.replicaCount and pilot.autoscaleMin on the helm_release. Setting only replicaCount is insufficient because the chart enables the HPA by default with autoscaleMin=1, and the HPA would scale back to 1 replica shortly after install.

The hashicorp/helm v3 provider replaced the 'set {}' block with a 'set' attribute taking a list of objects.

…atibility Flip the default from 2 to 1 so existing consumers of this module see no behavior change after upgrading. Callers that need HA (recommended for clusters doing node rolling updates) opt in explicitly with istiod_replicas = 2.

agustincelentano added 3 commits April 17, 2026 18:42

fix(istio): use list syntax for helm set (provider v3)

c4158a4

The hashicorp/helm v3 provider replaced the 'set {}' block with a 'set' attribute taking a list of objects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(istio): expose istiod_replicas to guarantee HA for node drains#292

feat(istio): expose istiod_replicas to guarantee HA for node drains#292
agustincelentano wants to merge 3 commits intomainfrom
feat/istiod-replicas

agustincelentano commented Apr 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

agustincelentano commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change

Why default = 1

Why both replicaCount AND autoscaleMin

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

agustincelentano commented Apr 17, 2026 •

edited

Loading

Why both `replicaCount` AND `autoscaleMin`