Skip to content

K8SPG-1057: Allow using etcd as patroni DCS#1647

Open
yoav-katz wants to merge 20 commits into
percona:mainfrom
yoav-katz:etcd-dcs
Open

K8SPG-1057: Allow using etcd as patroni DCS#1647
yoav-katz wants to merge 20 commits into
percona:mainfrom
yoav-katz:etcd-dcs

Conversation

@yoav-katz

@yoav-katz yoav-katz commented Jun 18, 2026

Copy link
Copy Markdown

CHANGE DESCRIPTION

Problem:
Patroni supports multiple DCS backends, but the operator hardcodes Kubernetes Endpoints as the only option. This blocks clusters on managed Kubernetes platforms where workloads cannot reach the control plane API.

Cause:
The kubernetes: stanza was hardcoded in the generated Patroni config with no mechanism to select a different backend.
Several other pieces of the operator also assumed k8s DCS: RBAC rules unconditionally granted Endpoints permissions, the primary service routed through Patroni-managed Endpoints objects, and pod role labels/annotations were expected to be set by Patroni itself (which only happens with k8s DCS).

Solution:
Add a spec.patroni.dcs field (type: kubernetes default, type: etcd alternative). The field is immutable after cluster creation, enforced by a CEL validation rule on the CRD.

When type: etcd, the operator:

  • Emits an etcd3: stanza in the generated Patroni config instead of kubernetes:, with optional TLS (cacert/cert/key) and auth credentials (PATRONI_ETCD3_USERNAME/PATRONI_ETCD3_PASSWORD) sourced from referenced Secrets.
  • Injects on_start and on_role_change Patroni callbacks pointing to a new patroni-role-change.sh script. Since Patroni does not set pod role labels or the status annotation when using etcd DCS, this script patches the pod via the k8s API on every role transition, restoring the label (role=primary|replica) and annotation ({"role":"primary"}) that the rest of the operator depends on for Service routing and primary detection.
  • Creates the primary Service with a label selector (role=primary) instead of the previous headless-Endpoints-to-Patroni-leader-ClusterIP indirection, which only works with k8s DCS.
  • Skips creating the Patroni leader lease Service and distributed configuration Service, which are k8s DCS artifacts.
  • Omits the Endpoints RBAC permissions from the postgres pod ServiceAccount, since they are not needed.
  • Validates that referenced TLS and auth Secrets exist and contain the required keys, surfacing issues as Warning events on the cluster.

The Kubernetes DCS path is unchanged.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Helm Chart Merge Request
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported PG version?
  • Does the change support oldest and newest supported Kubernetes version?

@it-percona-cla

it-percona-cla commented Jun 18, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@yoav-katz yoav-katz marked this pull request as draft June 18, 2026 22:26
@yoav-katz

Copy link
Copy Markdown
Author

I will wait for the jira ticket to open the PR in the helm charts repo

@yoav-katz

Copy link
Copy Markdown
Author

and another note - this is my first OSS contribute so be gentel 😄
if there is stuff that you think should be changed becuase of style/dependency consideration I will be happy to fix!

@egegunes egegunes changed the title feat(etcd) K8SPG-1057: Allow using etcd as patroni DCS Jun 19, 2026

@egegunes egegunes left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yoav-katz the implementation looks good to me in general. but we definitely need an e2e test that deploys etcd and configures PerconaPGCluster to use it.

@DanBrima

Copy link
Copy Markdown

would love to see this merged!

@yoav-katz

yoav-katz commented Jun 20, 2026

Copy link
Copy Markdown
Author

QUESTIONS FOR REVIEWERS:

  1. e2e coverage approach: The new suite covers the etcd DCS happy path but does not run the existing full test suite (switchover, pgbackrest backup/restore, scale-up, upgrade, etc.) against etcd DCS. Should we: (a) add the most critical existing tests (switchover, backup) to this suite, or (b) parameterize the main suite to run with both DCS backends?
  2. Read-from-replica test: There is no step testing that replica pods are correctly labeled and that reads can be served through the replica service. Should that be added before merge?
  3. DCS immutability UX: The CEL rule prevents changing dcs.type after cluster creation. If a user needs to migrate between DCS backends, the only path is delete and recreate. Is this the right trade-off, or should we document a migration procedure?
  4. Routing: The current implementation routes primary/replica traffic via k8s Services with label selectors (role=primary, role=replica). The longer-term goal is to replace this with HAProxy, which would discover and health-check postgres pods directly via Patroni's REST API - removing the dependency on pod labels for routing entirely. should HAProxy integration be implemented inside the operator, or should the operator when using etcd as a dcs simply expose a headless Service covering all postgres pods and leave HAProxy configuration to the user?

@yoav-katz yoav-katz marked this pull request as ready for review June 20, 2026 16:10
@yoav-katz

Copy link
Copy Markdown
Author

Operator-managed etcd (future consideration)

The current design requires users to supply an external etcd cluster via spec.patroni.dcs.etcd.endpoints. This is a reasonable first step, but it places a significant operational burden on users who don't already have etcd infrastructure. An alternative would be a managed sub-field on the etcd spec, e.g.:

spec:
  patroni:
    dcs:
      type: etcd
      etcd:
        managed:           # operator deploys etcd itself
          replicas: 3      # 1 for dev, 3 for production HA
          storage: 1Gi
          storageClass: standard
        # endpoints: omitted when managed: is set

The operator would create and reconcile an etcd StatefulSet (with PVCs) co-located with the PostgreSQL cluster. This raises a few design questions:
(a) Should this be scoped to this PR or tracked as a follow-up?
(b) If implemented, should it be a thin wrapper (the operator just creates a StatefulSet from a known etcd image) or should it delegate to an existing etcd operator (e.g., via a EtcdCluster CR)?

@JNKPercona

Copy link
Copy Markdown
Collaborator
Test Name Result Time
backup-enable-disable passed 00:14:30
builtin-extensions passed 00:05:54
cert-manager-tls passed 00:09:53
custom-envs passed 00:19:44
custom-tls passed 00:06:44
database-init-sql passed 00:02:32
demand-backup passed 00:22:48
demand-backup-offline-snapshot passed 00:13:11
dynamic-configuration passed 00:03:11
finalizers passed 00:03:09
init-deploy passed 00:02:52
huge-pages passed 00:02:54
major-upgrade-14-to-15 passed 00:09:59
major-upgrade-15-to-16 passed 00:09:42
major-upgrade-16-to-17 passed 00:10:58
major-upgrade-17-to-18 passed 00:09:00
ldap passed 00:03:39
ldap-tls passed 00:05:34
monitoring passed 00:07:58
monitoring-pmm3 passed 00:08:45
one-pod passed 00:06:26
operator-self-healing passed 00:10:20
pitr passed 00:11:45
scaling passed 00:05:05
scheduled-backup passed 00:26:52
self-healing passed 00:10:19
sidecars passed 00:02:41
standby-pgbackrest passed 00:18:59
standby-streaming passed 00:13:05
start-from-backup passed 00:10:48
tablespaces passed 00:07:16
telemetry-transfer passed 00:04:33
upgrade-consistency passed 00:06:43
upgrade-minor passed 00:07:27
users passed 00:04:50
etcd-dcs passed 00:03:47
Summary Value
Tests Run 36/36
Job Duration 01:46:08
Total Test Time 05:24:09

commit: 692e3e3
image: perconalab/percona-postgresql-operator:PR-1647-692e3e35e

@yoav-katz yoav-katz requested a review from egegunes June 21, 2026 22:06
@egegunes

Copy link
Copy Markdown
Contributor
  1. e2e coverage approach: The new suite covers the etcd DCS happy path but does not run the existing full test suite (switchover, pgbackrest backup/restore, scale-up, upgrade, etc.) against etcd DCS. Should we: (a) add the most critical existing tests (switchover, backup) to this suite, or (b) parameterize the main suite to run with both DCS backends?

Let's start with (a).

  1. Read-from-replica test: There is no step testing that replica pods are correctly labeled and that reads can be served through the replica service. Should that be added before merge?

I don't think it's crucial but would be a good addition.

  1. DCS immutability UX: The CEL rule prevents changing dcs.type after cluster creation. If a user needs to migrate between DCS backends, the only path is delete and recreate. Is this the right trade-off, or should we document a migration procedure?

For the start, I think it's better to not allow live migration. We can revisit this after receiving feedback.

  1. Routing: The current implementation routes primary/replica traffic via k8s Services with label selectors (role=primary, role=replica). The longer-term goal is to replace this with HAProxy, which would discover and health-check postgres pods directly via Patroni's REST API - removing the dependency on pod labels for routing entirely. should HAProxy integration be implemented inside the operator, or should the operator when using etcd as a dcs simply expose a headless Service covering all postgres pods and leave HAProxy configuration to the user?

Why the longer-term goal is to replace routing by labels with HAProxy? Also, operator already creates a headless service covering all postgres pods.

  1. Operator-managed etcd: The operator would create and reconcile an etcd StatefulSet (with PVCs) co-located with the PostgreSQL cluster. This raises a few design questions:
    (a) Should this be scoped to this PR or tracked as a follow-up?
    (b) If implemented, should it be a thin wrapper (the operator just creates a StatefulSet from a known etcd image) or should it delegate to an existing etcd operator (e.g., via a EtcdCluster CR)?

I don't think we should ever have etcd managed by the operator. It should be the user who configure and manage etcd infrastructure.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this step completely

ctx context.Context, cluster *v1beta1.PostgresCluster,
) error {
// With etcd DCS, Patroni stores distributed configuration in etcd, not k8s Endpoints.
if dcs := cluster.Spec.Patroni.GetDCS(); dcs != nil && dcs.Type == v1beta1.PatroniDCSTypeEtcd {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can extract this into a method of PostgresCluster and use everywhere

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants