Just for transparency's sake: The following was (mostly) written by an LLM after a multi-hour debug session to get restores to work in my AKS environment. I left the suggested fix(es) in but you can ignore them if they don't align with your vision for this project. Please forgive me. Thanks! :)
Summary
When triggering an in-place restore via PerconaPGRestore on a cluster where all nodes
carry NoSchedule taints, the restore job pod is created without any tolerations and
cannot be scheduled. The restore hangs indefinitely in RestoreStarting/RestoreRunning
state with FailedScheduling events on the pod.
Affected versions
Confirmed on v2.8.2, still present on main (v3.0.0).
Root cause
The Start function in percona/controller/pgrestore/utils/pgbackrest.go (v3) /
percona/controller/pgrestore/controller.go (v2) patches the PerconaPGCluster with
enabled, repoName, options (and env/envFrom on v3), but never with tolerations.
The restore job is created by generateRestoreJobIntent in
internal/controller/postgrescluster/pgbackrest.go, which sets tolerations exclusively
from spec.backups.pgbackrest.restore.tolerations (PostgresClusterDataSource.Tolerations).
This is a separate field from spec.backups.pgbackrest.jobs.tolerations, which only covers
scheduled and manual backup jobs.
Because Start never writes to restore.tolerations, the field stays nil, and the restore
job pod is created with no tolerations, making it unschedulable on any tainted node.
Steps to reproduce
- Create a
PerconaPGCluster on a Kubernetes cluster where all nodes have a NoSchedule
taint (common on managed Kubernetes: AKS, EKS, GKE node pools with taints).
- Ensure
spec.backups.pgbackrest.jobs.tolerations is set correctly (backup jobs schedule
fine, confirming the toleration values are correct).
- Create a
PerconaPGRestore CR targeting the cluster.
- Observe: the restore pod is created without any tolerations and fails to schedule.
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Warning FailedScheduling default-scheduler 0/N nodes are available:
N node(s) had untolerated taint(s).
Again: spec.backups.pgbackrest.jobs.tolerations is set correctly and backup jobs schedule
without issue, the problem is specific to the restore code path.
Expected behaviour
The restore job pod should be schedulable under the same conditions as backup job pods.
Start should propagate tolerations into spec.backups.pgbackrest.restore.tolerations
so they reach the restore job via generateRestoreJobIntent.
Suggested fix
In Start (v3: percona/controller/pgrestore/utils/pgbackrest.go), inherit tolerations
from jobs.tolerations when none are explicitly set on the restore stanza, consistent
with how jobs.tolerations is the single place users configure pgBackRest pod scheduling:
r.pgCluster.Spec.Backups.PGBackRest.Restore.Enabled = new(true)
r.pgCluster.Spec.Backups.PGBackRest.Restore.RepoName = ptr.Deref(r.pgRestore.Spec.RepoName, "")
r.pgCluster.Spec.Backups.PGBackRest.Restore.Options = r.pgRestore.Spec.Options
r.pgCluster.Spec.Backups.PGBackRest.Restore.Env = r.pgRestore.Spec.ContainerOptions.Env
r.pgCluster.Spec.Backups.PGBackRest.Restore.EnvFrom = r.pgRestore.Spec.ContainerOptions.EnvFrom
// Add:
if r.pgCluster.Spec.Backups.PGBackRest.Restore.Tolerations == nil &&
r.pgCluster.Spec.Backups.PGBackRest.Jobs != nil {
r.pgCluster.Spec.Backups.PGBackRest.Restore.Tolerations =
r.pgCluster.Spec.Backups.PGBackRest.Jobs.Tolerations
}
Alternatively, and more in the spirit of the ContainerOptions pattern already present
on PerconaPGRestore in v3, tolerations (and affinity, priorityClassName) could be
added to ContainerOptions and forwarded the same way Env/EnvFrom are. This would be
cleaner API design, keeping scheduling constraints on the PerconaPGRestore CR itself
rather than requiring users to pre-configure the PerconaPGCluster.
Workaround
Pre-seed spec.backups.pgbackrest.restore on the PerconaPGCluster with the required
tolerations before triggering the restore. Since Start uses a merge patch, pre-existing
fields not written by Start are preserved. The enabled field conflict with GitOps
tooling (ArgoCD selfHeal) requires an ignoreDifferences exemption on
.spec.backups.pgbackrest.restore.enabled.
Just for transparency's sake: The following was (mostly) written by an LLM after a multi-hour debug session to get restores to work in my AKS environment. I left the suggested fix(es) in but you can ignore them if they don't align with your vision for this project. Please forgive me. Thanks! :)
Summary
When triggering an in-place restore via
PerconaPGRestoreon a cluster where all nodescarry
NoScheduletaints, the restore job pod is created without any tolerations andcannot be scheduled. The restore hangs indefinitely in
RestoreStarting/RestoreRunningstate with
FailedSchedulingevents on the pod.Affected versions
Confirmed on v2.8.2, still present on
main(v3.0.0).Root cause
The
Startfunction inpercona/controller/pgrestore/utils/pgbackrest.go(v3) /percona/controller/pgrestore/controller.go(v2) patches thePerconaPGClusterwithenabled,repoName,options(andenv/envFromon v3), but never with tolerations.The restore job is created by
generateRestoreJobIntentininternal/controller/postgrescluster/pgbackrest.go, which sets tolerations exclusivelyfrom
spec.backups.pgbackrest.restore.tolerations(PostgresClusterDataSource.Tolerations).This is a separate field from
spec.backups.pgbackrest.jobs.tolerations, which only coversscheduled and manual backup jobs.
Because
Startnever writes torestore.tolerations, the field stays nil, and the restorejob pod is created with no tolerations, making it unschedulable on any tainted node.
Steps to reproduce
PerconaPGClusteron a Kubernetes cluster where all nodes have aNoScheduletaint (common on managed Kubernetes: AKS, EKS, GKE node pools with taints).
spec.backups.pgbackrest.jobs.tolerationsis set correctly (backup jobs schedulefine, confirming the toleration values are correct).
PerconaPGRestoreCR targeting the cluster.Again:
spec.backups.pgbackrest.jobs.tolerationsis set correctly and backup jobs schedulewithout issue, the problem is specific to the restore code path.
Expected behaviour
The restore job pod should be schedulable under the same conditions as backup job pods.
Startshould propagate tolerations intospec.backups.pgbackrest.restore.tolerationsso they reach the restore job via
generateRestoreJobIntent.Suggested fix
In
Start(v3:percona/controller/pgrestore/utils/pgbackrest.go), inherit tolerationsfrom
jobs.tolerationswhen none are explicitly set on the restore stanza, consistentwith how
jobs.tolerationsis the single place users configure pgBackRest pod scheduling:Alternatively, and more in the spirit of the
ContainerOptionspattern already presenton
PerconaPGRestorein v3, tolerations (andaffinity,priorityClassName) could beadded to
ContainerOptionsand forwarded the same wayEnv/EnvFromare. This would becleaner API design, keeping scheduling constraints on the
PerconaPGRestoreCR itselfrather than requiring users to pre-configure the
PerconaPGCluster.Workaround
Pre-seed
spec.backups.pgbackrest.restoreon thePerconaPGClusterwith the requiredtolerations before triggering the restore. Since
Startuses a merge patch, pre-existingfields not written by
Startare preserved. Theenabledfield conflict with GitOpstooling (ArgoCD
selfHeal) requires anignoreDifferencesexemption on.spec.backups.pgbackrest.restore.enabled.