PerconaPGRestore: restore job pod does not inherit tolerations on tainted nodes

Just for transparency's sake: The following was (mostly) written by an LLM after a multi-hour debug session to get restores to work in my AKS environment. I left the suggested fix(es) in but you can ignore them if they don't align with your vision for this project. Please forgive me. Thanks! :)

## Summary
When triggering an in-place restore via `PerconaPGRestore` on a cluster where all nodes
carry `NoSchedule` taints, the restore job pod is created without any tolerations and
cannot be scheduled. The restore hangs indefinitely in `RestoreStarting`/`RestoreRunning`
state with `FailedScheduling` events on the pod.

## Affected versions

Confirmed on v2.8.2, still present on `main` (v3.0.0).

## Root cause

The `Start` function in `percona/controller/pgrestore/utils/pgbackrest.go` (v3) /
`percona/controller/pgrestore/controller.go` (v2) patches the `PerconaPGCluster` with
`enabled`, `repoName`, `options` (and `env`/`envFrom` on v3), but never with tolerations.

The restore job is created by `generateRestoreJobIntent` in
`internal/controller/postgrescluster/pgbackrest.go`, which sets tolerations exclusively
from `spec.backups.pgbackrest.restore.tolerations` (`PostgresClusterDataSource.Tolerations`).
This is a separate field from `spec.backups.pgbackrest.jobs.tolerations`, which only covers
scheduled and manual backup jobs.

Because `Start` never writes to `restore.tolerations`, the field stays nil, and the restore
job pod is created with no tolerations, making it unschedulable on any tainted node.

## Steps to reproduce

1. Create a `PerconaPGCluster` on a Kubernetes cluster where all nodes have a `NoSchedule`
   taint (common on managed Kubernetes: AKS, EKS, GKE node pools with taints).
2. Ensure `spec.backups.pgbackrest.jobs.tolerations` is set correctly (backup jobs schedule
   fine, confirming the toleration values are correct).
3. Create a `PerconaPGRestore` CR targeting the cluster.
4. Observe: the restore pod is created without any tolerations and fails to schedule.

```
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
```

```
Warning  FailedScheduling  default-scheduler  0/N nodes are available:
         N node(s) had untolerated taint(s).
```

Again: `spec.backups.pgbackrest.jobs.tolerations` is set correctly and backup jobs schedule
without issue, the problem is specific to the restore code path.

## Expected behaviour

The restore job pod should be schedulable under the same conditions as backup job pods.
`Start` should propagate tolerations into `spec.backups.pgbackrest.restore.tolerations`
so they reach the restore job via `generateRestoreJobIntent`.

## Suggested fix

In `Start` (v3: `percona/controller/pgrestore/utils/pgbackrest.go`), inherit tolerations
from `jobs.tolerations` when none are explicitly set on the restore stanza, consistent
with how `jobs.tolerations` is the single place users configure pgBackRest pod scheduling:

```go
r.pgCluster.Spec.Backups.PGBackRest.Restore.Enabled = new(true)
r.pgCluster.Spec.Backups.PGBackRest.Restore.RepoName = ptr.Deref(r.pgRestore.Spec.RepoName, "")
r.pgCluster.Spec.Backups.PGBackRest.Restore.Options = r.pgRestore.Spec.Options
r.pgCluster.Spec.Backups.PGBackRest.Restore.Env = r.pgRestore.Spec.ContainerOptions.Env
r.pgCluster.Spec.Backups.PGBackRest.Restore.EnvFrom = r.pgRestore.Spec.ContainerOptions.EnvFrom
// Add:
if r.pgCluster.Spec.Backups.PGBackRest.Restore.Tolerations == nil &&
    r.pgCluster.Spec.Backups.PGBackRest.Jobs != nil {
    r.pgCluster.Spec.Backups.PGBackRest.Restore.Tolerations =
        r.pgCluster.Spec.Backups.PGBackRest.Jobs.Tolerations
}
```

Alternatively, and more in the spirit of the `ContainerOptions` pattern already present
on `PerconaPGRestore` in v3, tolerations (and `affinity`, `priorityClassName`) could be
added to `ContainerOptions` and forwarded the same way `Env`/`EnvFrom` are. This would be
cleaner API design, keeping scheduling constraints on the `PerconaPGRestore` CR itself
rather than requiring users to pre-configure the `PerconaPGCluster`.

## Workaround

Pre-seed `spec.backups.pgbackrest.restore` on the `PerconaPGCluster` with the required
tolerations before triggering the restore. Since `Start` uses a merge patch, pre-existing
fields not written by `Start` are preserved. The `enabled` field conflict with GitOps
tooling (ArgoCD `selfHeal`) requires an `ignoreDifferences` exemption on
`.spec.backups.pgbackrest.restore.enabled`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PerconaPGRestore: restore job pod does not inherit tolerations on tainted nodes #1633

Summary

Affected versions

Root cause

Steps to reproduce

Expected behaviour

Suggested fix

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

PerconaPGRestore: restore job pod does not inherit tolerations on tainted nodes #1633

Description

Summary

Affected versions

Root cause

Steps to reproduce

Expected behaviour

Suggested fix

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions