Skip to content

feat(restore): managed restore replicas — control plane for PGRO (PR1)#293

Merged
passcod merged 7 commits into
mainfrom
pgro-restore-verification
Jun 30, 2026
Merged

feat(restore): managed restore replicas — control plane for PGRO (PR1)#293
passcod merged 7 commits into
mainfrom
pgro-restore-verification

Conversation

@passcod

@passcod passcod commented Jun 30, 2026

Copy link
Copy Markdown
Member

🤖 PR1 of the PGRO restore-verification integration (TAM-6877): the control + access half of canopy-as-restore-control-plane.

Canopy becomes the source of truth for which replicas an external restore consumer (PGRO) should maintain. An operator declares replicas; the consumer registers the intents it can satisfy, fetches its worklist, and obtains read-only credentials per group — it no longer holds long-lived AWS keys or decides what to restore.

Spec: .workhorse/specs/public-server/restore-replicas.md (id RST). Cross-repo handoff + the control-model inversion PGRO signed off on: docs/plans/pgro-restore-replicas-canopy-response.md.

What's here

  • New backup-restore device role — read-only by construction: it can't reach the ServerDevice-gated /backup-credentials, and /restore-credentials only ever issues the read-only session policy. Role is plain TEXT (no migration); operator promotes a device to it like a releaser.
  • restore_replicas + restore_consumer_capabilities tables, RestoreIntent open enum (verify/analytics/disaster-recovery/custom). A declaration is both the work item and the authorization — no separate grant object.
  • Public endpoints (consumer-facing): POST /restore-capabilities (register supported intents), GET /restore-worklist (enabled declarations expanded per live server, capability-filtered, carrying the latest snapshot id canopy knows + repo coords, server-specific-over-group-wide deduped), POST /restore-credentials (read-only STS + repo password, authorized by a covering declaration).
  • Private admin API + operator UI (/restore-replicas): declare/list/toggle/delete replicas, a consumers panel showing each consumer's registered capabilities, and gap surfacing when a declaration's intent isn't currently supported.

Design notes carried from review

  • Per-server, not per-group: credentials are group-scoped (one kopia repo per group bucket), but targeting and the worklist are per-server. Canopy supplies the snapshot id (latest successful backup_runs per (server, type)), so the consumer never lists the repo.
  • Capability registration over reactive failure: canopy never dispatches an intent the consumer hasn't advertised, so an unsupported intent is a surfaced configuration gap, never a paging restore-health incident.

Tests

  • database::restore — model CRUD, duplicate-scope 409, authorization, capability replace semantics.
  • public-server::restore — capability-filtered worklist, per-server expansion, dedup, 403/502 credential paths, non-consumer-role rejection.
  • private-web/e2e/restore-replicas.spec.ts — operator UI: empty state, gap flagging, consumers panel, delete, enable toggle, declare dialog.

Follow-up (PR2)

Restore-health ingest: backup_restore_checks + POST /restore-verification, group-level per-server alert routing + recovery, the overdue-freshness sweep, and restore-health in the operator UI. Wire shapes are frozen at this point for the bestool Appendix A handoff.

passcod and others added 6 commits June 30, 2026 13:48
…opy response

- Copy pgro's restore-verification handoff into docs/plans/.
- Add .workhorse/specs/public-server/restore-replicas.md (RST): canopy as
  restore control plane — operator-declared replicas, worklist-driven
  executor, per-server targeting + restore-health.
- Add docs/plans/pgro-restore-replicas-canopy-response.md: conformance
  verdict + the control-model inversion, for pgro sign-off.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tration

Consumer registers supported intents on start/change; canopy persists
them, constrains the declaration UX, dispatches only matching worklist
entries, and surfaces capability-shrink gaps to operators — instead of
the consumer reactively reporting unsupported intents (which conflated a
capability mismatch with an unrestorable-backup page).

Resolves pgro's post-sign-off open question. Adds POST /restore-capabilities
to the PR1 surface; /restore-verification outcome stays success/failure.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ma & models

- Add the backup-restore device role (commons-types enum + auth extractor
  macro + openapi security scheme + drift test + UI role picker/colour).
- New RestoreIntent open enum (verify/analytics/disaster-recovery/custom),
  mirroring BackupType.
- Migration restore_replicas: restore_replicas (declared replicas) +
  restore_consumer_capabilities tables.
- database::restore models + CRUD (RestoreReplica, RestoreConsumerCapability,
  capability register-as-insert-then-prune, creds authz check) and
  Server::list_live_in_group for worklist expansion.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…oints

Public-server (backup-restore role): POST /restore-capabilities (register
supported intents), GET /restore-worklist (enabled declarations expanded
per live server, capability-filtered, with the latest snapshot + repo
coords), POST /restore-credentials (read-only STS + repo password, authz
via declaration).

Private-server admin API (crate::fns::restore_replicas): list/for_group/
consumers/create/update/delete, with per-declaration gap computation.

Adds Device::list_by_role; regenerates public + private openapi and
api-types.ts (DeviceRole gains backup-restore).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New /restore-replicas page: declarations table (scope, intent with gap
chip, enable toggle, delete), consumers panel showing each backup-restore
device's registered capabilities, and a declare dialog with
consumer/group/server/type/intent pickers (intent options annotate
unsupported choices). Nav entry + route. Adds backup-restore to the device
trust picker (earlier commit).

e2e: seedRestoreReplica + seedRestoreConsumerCapability helpers, restore
tables added to resetSeededTables, and restore-replicas.spec.ts covering
empty state, gap flagging, consumers panel, delete, enable toggle, and the
declare dialog.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
database::restore: CRUD roundtrip, duplicate-scope 409 (server vs group
scope separate), update/delete, authorizes (enabled/group/type/disabled),
capability register replace semantics.

public-server::restore: capability-filtered worklist, per-server expansion
of a group-wide declaration, server-specific-over-group-wide dedup,
empty-without-capabilities, restore-credentials 403 (no declaration) / 502
(authorized but STS unconfigured), and non-consumer-role rejection.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@passcod passcod added this pull request to the merge queue Jun 30, 2026
Merged via the queue into main with commit 42ad7cc Jun 30, 2026
7 checks passed
@passcod passcod deleted the pgro-restore-verification branch June 30, 2026 05:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant