Skip to content

fix: recreate deployments with stale legacy-managed env vars#159

Open
bigbluechief wants to merge 1 commit intomainfrom
stale_field_manager
Open

fix: recreate deployments with stale legacy-managed env vars#159
bigbluechief wants to merge 1 commit intomainfrom
stale_field_manager

Conversation

@bigbluechief
Copy link
Copy Markdown

We have observed pods keeping old env vars after spec changes, which can cause invalid runtime config (for example stale servlet/webflux base paths).

Root cause is split SSA ownership in managedFields: old deployments still have env entries owned by the legacy field manager flaisapplicationreconciler.
When desired state removes those vars, normal apply from the current reconciler may not clear them.

This change does two things:

  • set fieldManager = "applicationreconciler" in ApplicationReconciler to keep future SSA ownership stable
  • in DeploymentDR, detect stale env vars (actual - desired, per container) and recreate deployment when both conditions are true:
    • legacy apply manager is present
    • stale env vars exist

Added unit tests for:

  • recreate with legacy manager + stale env vars
  • no recreate without legacy manager
  • no recreate when env is already in sync
  • container-aware stale env identity

Potential downsides:

  • affected deployments may roll once due to delete/recreate when healed
  • manually added env vars not present in desired state can be removed during healing (expected for desired-state reconciliation)
  • detection currently targets known legacy manager names; more can be added if historical manager variants are discovered

We have observed pods keeping old env vars after spec changes, which can cause invalid runtime config (for example stale servlet/webflux base paths).

Root cause is split SSA ownership in managedFields: old deployments still have env entries owned by the legacy field manager `flaisapplicationreconciler`.
When desired state removes those vars, normal apply from the current reconciler may not clear them.

This change does two things:
- set `fieldManager = "applicationreconciler"` in ApplicationReconciler to keep future SSA ownership stable
- in DeploymentDR, detect stale env vars (`actual - desired`, per container) and recreate deployment when both conditions are true:
  - legacy apply manager is present
  - stale env vars exist

Added unit tests for:
- recreate with legacy manager + stale env vars
- no recreate without legacy manager
- no recreate when env is already in sync
- container-aware stale env identity

Potential downsides:
- affected deployments may roll once due to delete/recreate when healed
- manually added env vars not present in desired state can be removed during healing (expected for desired-state reconciliation)
- detection currently targets known legacy manager names; more can be added if historical manager variants are discovered
@bigbluechief bigbluechief self-assigned this Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant