Skip to content

fix(operator): FerrFlow* and FerrVault* reconcilers can fight over the same target Secret #115

Description

@BryanFRD

Problem

Both FerrFlowSecretReconciler (internal/controller/ferrflowsecret_controller.go) and FerrVaultSecretReconciler (internal/controller/ferrvaultsecret_controller.go) are registered simultaneously in cmd/main.go (lines 89-121). Each:

  • calls Owns(&corev1.Secret{}) in SetupWithManager,
  • materialises a target Secret via ensureTargetSecret, which calls controllerutil.SetControllerReference(cr, secret, ...),
  • derives the target name from spec.target.name (defaulting to the CR name).

A Kubernetes object can have at most one controller owner reference. If a FerrFlowSecret and a FerrVaultSecret in the same namespace resolve to the same target Secret name (trivially: same metadata.name, or both pointing spec.target.name at one Secret during a FerrFlow→FerrVault migration), the two reconcilers will:

  1. fight over the controller owner-ref (SetControllerReference errors with AlreadyOwnedError for the loser, surfacing as SecretWriteFailed), and
  2. overwrite each other's StringData and content-hash annotation on every reconcile, producing flapping rollout-restarts and churn.

The migration path the repo is explicitly building toward (dual CRDs, see #93) makes the collision likely, not theoretical: users will create a FerrVaultSecret that targets the Secret an existing FerrFlowSecret already owns.

Why it matters

Silent secret data corruption / flapping in a secrets-sync operator is high blast radius — workloads can get restarted in a loop or read stale/mixed values. There is no guard today.

Proposed approach

  • Detect foreign ownership in ensureTargetSecret: before claiming, check for an existing controller owner-ref of a different Kind/CRD and refuse with a clear Ready=False reason (e.g. TargetOwnedByOtherCR) instead of silently clobbering.
  • Optionally key the content-hash/managed-by annotations per-CRD and treat a mismatch as a conflict.
  • Document the FerrFlow→FerrVault migration as "delete the FerrFlowSecret first, then create the FerrVaultSecret" until the rename (rename!: FerrFlow* → FerrVault* across CRDs, API group, image, repo (BREAKING) #82) lands, and add an e2e case in .github/workflows/e2e.yml covering two CRs targeting one Secret.

Acceptance criteria

  • Two CRs (one of each kind) targeting the same Secret name no longer corrupt each other; the second to claim it goes Ready=False with an explicit reason.
  • Unit test in internal/controller reproduces the collision and asserts the guard.
  • Migration guidance documented.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High prioritybugSomething isn't working

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions