feat(istio): expose istiod_replicas to guarantee HA for node drains#292
Open
agustincelentano wants to merge 3 commits intomainfrom
Open
feat(istio): expose istiod_replicas to guarantee HA for node drains#292agustincelentano wants to merge 3 commits intomainfrom
agustincelentano wants to merge 3 commits intomainfrom
Conversation
Single-replica istiod + PDB minAvailable=1 (chart default) yields disruptionsAllowed=0, which blocks every EKS node rolling update with 'PodEvictionFailure: Reached max retries'. Expose a new istiod_replicas variable (default 2) and wire it into both pilot.replicaCount and pilot.autoscaleMin on the helm_release. Setting only replicaCount is insufficient because the chart enables the HPA by default with autoscaleMin=1, and the HPA would scale back to 1 replica shortly after install.
The hashicorp/helm v3 provider replaced the 'set {}' block with a 'set'
attribute taking a list of objects.
…atibility Flip the default from 2 to 1 so existing consumers of this module see no behavior change after upgrading. Callers that need HA (recommended for clusters doing node rolling updates) opt in explicitly with istiod_replicas = 2.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Single-replica
istiod+ chart-default PDB (minAvailable=1) yieldsdisruptionsAllowed=0, which blocks every EKS node rolling update withPodEvictionFailure: Reached max retries. In clusters that use this module,tofu applyfails as soon as any change triggers a node group replacement (AMI bumps, instance_type changes, etc.).Change
istiod_replicasvariable (default1, validated>= 1).pilot.replicaCountandpilot.autoscaleMinon thehelm_release "istiod".Why default = 1
Backwards compatibility. Existing consumers see no behavior change after upgrading the module. Callers that need HA (recommended for clusters doing node rolling updates) opt in explicitly:
Why both
replicaCountANDautoscaleMinThe upstream istiod chart enables the HPA by default (
pilot.autoscaleEnabled=true,pilot.autoscaleMin=1). Setting onlypilot.replicaCountis insufficient — the HPA would scale the deployment back to 1 replica shortly after install, leaving us with the same problem. OverridingautoscaleMinlocks in the floor.Test plan
istiod_replicas = 2and verifykubectl -n istio-system get deploy istiodshowsREADY 2/2.kubectl -n istio-system get hpa istiodshowsMINPODS=2.kubectl -n istio-system get pdb istiodshowsALLOWED DISRUPTIONS=1.