Skip to content

PerconaPGRestore: restore job pod does not inherit tolerations on tainted nodes #1633

Description

@Avapaa

Just for transparency's sake: The following was (mostly) written by an LLM after a multi-hour debug session to get restores to work in my AKS environment. I left the suggested fix(es) in but you can ignore them if they don't align with your vision for this project. Please forgive me. Thanks! :)

Summary

When triggering an in-place restore via PerconaPGRestore on a cluster where all nodes
carry NoSchedule taints, the restore job pod is created without any tolerations and
cannot be scheduled. The restore hangs indefinitely in RestoreStarting/RestoreRunning
state with FailedScheduling events on the pod.

Affected versions

Confirmed on v2.8.2, still present on main (v3.0.0).

Root cause

The Start function in percona/controller/pgrestore/utils/pgbackrest.go (v3) /
percona/controller/pgrestore/controller.go (v2) patches the PerconaPGCluster with
enabled, repoName, options (and env/envFrom on v3), but never with tolerations.

The restore job is created by generateRestoreJobIntent in
internal/controller/postgrescluster/pgbackrest.go, which sets tolerations exclusively
from spec.backups.pgbackrest.restore.tolerations (PostgresClusterDataSource.Tolerations).
This is a separate field from spec.backups.pgbackrest.jobs.tolerations, which only covers
scheduled and manual backup jobs.

Because Start never writes to restore.tolerations, the field stays nil, and the restore
job pod is created with no tolerations, making it unschedulable on any tainted node.

Steps to reproduce

  1. Create a PerconaPGCluster on a Kubernetes cluster where all nodes have a NoSchedule
    taint (common on managed Kubernetes: AKS, EKS, GKE node pools with taints).
  2. Ensure spec.backups.pgbackrest.jobs.tolerations is set correctly (backup jobs schedule
    fine, confirming the toleration values are correct).
  3. Create a PerconaPGRestore CR targeting the cluster.
  4. Observe: the restore pod is created without any tolerations and fails to schedule.
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Warning  FailedScheduling  default-scheduler  0/N nodes are available:
         N node(s) had untolerated taint(s).

Again: spec.backups.pgbackrest.jobs.tolerations is set correctly and backup jobs schedule
without issue, the problem is specific to the restore code path.

Expected behaviour

The restore job pod should be schedulable under the same conditions as backup job pods.
Start should propagate tolerations into spec.backups.pgbackrest.restore.tolerations
so they reach the restore job via generateRestoreJobIntent.

Suggested fix

In Start (v3: percona/controller/pgrestore/utils/pgbackrest.go), inherit tolerations
from jobs.tolerations when none are explicitly set on the restore stanza, consistent
with how jobs.tolerations is the single place users configure pgBackRest pod scheduling:

r.pgCluster.Spec.Backups.PGBackRest.Restore.Enabled = new(true)
r.pgCluster.Spec.Backups.PGBackRest.Restore.RepoName = ptr.Deref(r.pgRestore.Spec.RepoName, "")
r.pgCluster.Spec.Backups.PGBackRest.Restore.Options = r.pgRestore.Spec.Options
r.pgCluster.Spec.Backups.PGBackRest.Restore.Env = r.pgRestore.Spec.ContainerOptions.Env
r.pgCluster.Spec.Backups.PGBackRest.Restore.EnvFrom = r.pgRestore.Spec.ContainerOptions.EnvFrom
// Add:
if r.pgCluster.Spec.Backups.PGBackRest.Restore.Tolerations == nil &&
    r.pgCluster.Spec.Backups.PGBackRest.Jobs != nil {
    r.pgCluster.Spec.Backups.PGBackRest.Restore.Tolerations =
        r.pgCluster.Spec.Backups.PGBackRest.Jobs.Tolerations
}

Alternatively, and more in the spirit of the ContainerOptions pattern already present
on PerconaPGRestore in v3, tolerations (and affinity, priorityClassName) could be
added to ContainerOptions and forwarded the same way Env/EnvFrom are. This would be
cleaner API design, keeping scheduling constraints on the PerconaPGRestore CR itself
rather than requiring users to pre-configure the PerconaPGCluster.

Workaround

Pre-seed spec.backups.pgbackrest.restore on the PerconaPGCluster with the required
tolerations before triggering the restore. Since Start uses a merge patch, pre-existing
fields not written by Start are preserved. The enabled field conflict with GitOps
tooling (ArgoCD selfHeal) requires an ignoreDifferences exemption on
.spec.backups.pgbackrest.restore.enabled.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions