Skip to content

Fix blue-green switch-traffic failure when additional_ports added post-deploy#172

Merged
pilundain merged 4 commits intobetafrom
fix/blue-green-additional-ports-missing-service
Apr 15, 2026
Merged

Fix blue-green switch-traffic failure when additional_ports added post-deploy#172
pilundain merged 4 commits intobetafrom
fix/blue-green-additional-ports-missing-service

Conversation

@pilundain
Copy link
Copy Markdown
Contributor

Summary

  • Fixes blue-green deployment switch-traffic failure when additional_ports (e.g., gRPC on port 9014) are added to a scope after the initial deployment was created
  • Detects whether the blue deployment's additional_port K8s Services exist during build_context and passes a map (blue_additional_port_services) to the template context
  • When a blue service is missing, blue-green-ingress.yaml.tpl renders a single-target ALB forward action (100% to green deployment) instead of referencing the non-existent blue service

Root cause

blue-green-ingress.yaml.tpl unconditionally references d-{scope_id}-{blue_deployment_id}-{type}-{port} for the old deployment. During active blue-green workflows (blue_green.yaml, switch_traffic.yaml), service templates are only rendered for the green (new) deployment — the blue deployment's additional_port K8s Services are never re-created if they didn't exist at the time of the original deploy. This causes FailedBuildModel in the ALB Ingress Controller, blocking ingress reconciliation and timing out after 120s.

Files changed

  • k8s/deployment/build_context — Checks if blue deployment's additional_port K8s Services exist via kubectl get service and injects blue_additional_port_services map into CONTEXT
  • k8s/deployment/templates/blue-green-ingress.yaml.tpl — Conditionally renders single-target (100% green) or dual-target forward actions based on the map
  • k8s/deployment/tests/build_context.bats — 6 new unit tests for the service detection logic

Backward compatibility

If blue_additional_port_services is absent from the context (old scope agents), the template defaults to the existing dual-target behavior — no breaking change.

Test plan

  • All 41 build_context.bats tests pass (including 6 new)
  • All 174 deployment tests pass
  • All 17 networking tests pass
  • Manual gomplate rendering verification with test contexts (dual-target, single-target, backward compat)
  • Verify on affected scope (52328993 or 56392226) that switch-traffic succeeds after deploy

Pablo Ilundain added 3 commits April 13, 2026 12:45
…er initial deploy

When additional_ports (e.g., gRPC) are added to a scope after the initial
deployment, blue-green switch-traffic fails because the ingress template
references K8s Services for the blue deployment's additional ports that
were never created. This causes FailedBuildModel in the ALB Ingress
Controller and a 120s timeout.

The fix detects whether the blue deployment's additional_port K8s Services
exist during build_context and passes that info to the template. When a
blue service is missing, the ingress renders a single-target forward
action (100% to the green deployment) instead of a dual-target action
that would reference the non-existent service.
…s without blue service

When additional_ports (e.g., gRPC) are added to a scope after the blue
deployment was created, the ALB has listeners with single-target weights
(100% green) instead of the standard blue-green split. The verify script
was checking the first matching listener regardless of port, and if it
hit the gRPC listener first, the weight comparison (100 vs 10/90) would
fail and the script would never check the primary HTTP listener.

The fix reads blue_additional_port_services from the deployment context
and skips weight verification on listeners for ports where the blue
deployment has no K8s service, falling through to verify the primary
HTTP listener instead.

Also fixes a jq bug: `false // true` returns `true` in jq because the
alternative operator treats `false` as falsy. Changed to explicit
`if has($k) then .[$k] else true end`.
Weights are stored with newlines for comparison but this caused the
mismatch log to split across multiple lines, making deployment logs
hard to read. Now formats as "expected=20/80 actual=10/90".
fedemaleh
fedemaleh previously approved these changes Apr 15, 2026
@pilundain pilundain merged commit f22f93f into beta Apr 15, 2026
3 checks passed
@pilundain pilundain deleted the fix/blue-green-additional-ports-missing-service branch April 15, 2026 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants