diff --git a/docs/deployment/deployment-pipeline.md b/docs/deployment/deployment-pipeline.md index 501e113..de7d7dc 100644 --- a/docs/deployment/deployment-pipeline.md +++ b/docs/deployment/deployment-pipeline.md @@ -1,7 +1,7 @@ # Deployment Pipeline > **Scope**: End-to-end flow from a GitHub tag or main-branch push through Azure DevOps pipelines to an Arc Run Command executing on hospital gateway VMs. -> **Related docs**: [Windows Service Deploy](./windows-service-deploy.md) | [Onboard Hospital VM](./runbooks/onboard-hospital-vm.md) | [Rollback Runbook](./runbooks/rollback.md) +> **Related docs**: [Windows Service Deploy](./windows-service-deploy.md) | [Onboard Hospital VM](./runbooks/onboard-hospital-vm.md) | [Promote VM Pre-Prod → Prod](./runbooks/promote-vm-preprod-to-prod.md) | [Rollback Runbook](./runbooks/rollback.md) --- diff --git a/docs/deployment/runbooks/promote-vm-preprod-to-prod.md b/docs/deployment/runbooks/promote-vm-preprod-to-prod.md new file mode 100644 index 0000000..c99d125 --- /dev/null +++ b/docs/deployment/runbooks/promote-vm-preprod-to-prod.md @@ -0,0 +1,136 @@ +# Promote a Gateway VM from Pre-Prod to Prod + +> **When to use**: A hospital VM currently running the **pre-prod** gateway needs to become the **prod** gateway (same infrastructure, different environment). +> **Related**: [Onboard Hospital VM](./onboard-hospital-vm.md) | [Cleanup](./cleanup.md) | [Deployment Pipeline](../deployment-pipeline.md) + +--- + +## Why this is not just "re-point the gateway" + +Pre-prod and prod are separate Azure subscriptions, with separate Arc resource groups, relay namespaces (`relay-manbrs-`), web-API service principals (`spn-manbrs-web-api-`) and ADO pipelines. An Arc-connected machine belongs to **exactly one** subscription, and the gateway services authenticate using that machine's managed identity. A pre-prod machine identity holds the `Gateway.Access` role on the **pre-prod** API and cannot talk to prod. + +So you cannot promote by editing `.env`. The machine must be **disconnected from the pre-prod subscription and re-onboarded into the prod subscription** — effectively the [onboarding runbook](./onboard-hospital-vm.md) run for prod, preceded by a clean decommission of the pre-prod gateway. + +> One box hosts one gateway. After this procedure there is **no pre-prod gateway at this site**. If pre-prod is still needed at the site, use a different VM. + +--- + +## Before the day (do these in advance) + +- [ ] **`GATEWAY_RINGS` is set for prod.** `infrastructure/environments/prod/variables.sh` must set `GATEWAY_RINGS` to include the ring the site is tagged with (for initial production installs this will typically be `ring1`). If it is unset it defaults to `ring0` and the prod deploy pipeline **silently skips** the machine. +- [ ] **Prod infrastructure exists.** Confirm the prod relay namespace, `spn-manbrs-web-api-prod`, Log Analytics workspace and the prod ADO pipelines are provisioned and have had a successful infra deploy. +- [ ] **Prod Manage (Rubie) is seeded.** The prod Rubie instance has the site's clinic/setting, a `Relay` record pointing at `relay-manbrs-prod` and the prod gateway, the `gateway_images` feature flag enabled, and a test appointment for the end-to-end check. +- [ ] **DHCP reservation confirmed** for the VM, so the IP the modality targets cannot change between decommission and go-live. +- [ ] **Modality engineer briefed** that the gateway **AE titles change** (see [Step 5](#step-5--reconfigure-the-modality-on-site)) and the modality must be reconfigured and re-tested on the day. +- [ ] **Fresh prod Arc onboarding SPN secret** generated with 1-day expiry (`arc-onboarding-spn-client-secret`), shared with hospital IT before the onboarding call. + +--- + +## Step 1 — Decommission the pre-prod gateway (on the VM) + +Run from an **elevated PowerShell session** on the VM. First confirm you are on the right machine: + +```powershell +hostname +Get-Service Gateway-* | Format-Table Name, Status +``` + +Then run the cleanup script, which stops and removes the services and **removes the installation directory including all pre-prod data** (`worklist.db`, `pacs.db`, `data\storage`): + +```powershell +.\scripts\powershell\cleanup.ps1 +``` + +**Verify**: no `Gateway-*` services remain and `C:\Program Files\NHS\ManageBreastScreeningGateway` is gone (see [Cleanup runbook — Verify](./cleanup.md)). + +## Step 2 — Disconnect from the pre-prod Arc subscription + +Still on the VM, elevated: + +```powershell +azcmagent disconnect +``` + +This removes the machine's registration from the pre-prod resource group. The Arc **agent stays installed** — only the registration is removed, so re-onboarding in Step 3 skips the agent install. + +**Verify**: `azcmagent show` reports the agent as **Disconnected**. + +## Step 3 — Re-onboard into prod + +This is [Onboarding runbook Step 2](./onboard-hospital-vm.md#step-2--run-arc-onboarding-script-on-the-gateway-vm) with **prod parameters**. The site parameters (`SiteName`, `ODSCode`, `Instance`) are unchanged, so the Arc resource name is identical — but it is now created in the prod resource group. + +```powershell +.\arc-setup.ps1 ` + -SubscriptionId "" ` + -TenantId "" ` + -ResourceGroup "rg-mbsgw-prod-uks-arc-enabled-servers" ` + -Location "uksouth" ` + -ServicePrincipalId "" ` + -ServicePrincipalSecret "" ` + -SiteName "Hull-University-Teaching-Hospitals-NHS-Trust" ` + -ODSCode "RWA" ` + -Instance "01" ` + -NHSRegion "neyh" ` + -SiteType "static" ` + -DeploymentRing "ring1" +``` + +**Verify**: in the Azure portal, `rg-mbsgw-prod-uks-arc-enabled-servers` → Azure Arc machines → `gw-<...>-rwa-01` is **Connected**. + +## Step 4 — Provision prod and deploy the prod gateway + +From here, follow the onboarding runbook against **prod**: + +1. **Grant API access** — `make prod assign-arc-app-roles` ([Step 3](./onboard-hospital-vm.md#step-3--grant-api-access)). This assigns `Gateway.Access` on `spn-manbrs-web-api-prod` to the machine's new prod managed identity. The old pre-prod assignment is irrelevant (different identity) and is cleaned up in Step 6. +2. **Provision the Hybrid Connection** — run **Deploy Arc Infrastructure - prod** ([Step 4](./onboard-hospital-vm.md#step-4--trigger-terraform-to-provision-the-hybrid-connection)). Creates `hc-gw-<...>-rwa-01` in `relay-manbrs-prod`. +3. **Deploy the application** — run **Deploy Gateway - prod** with a **released** `releaseTag` (not a pre-prod build) ([Step 5](./onboard-hospital-vm.md#step-5--deploy-the-gateway-application)). This writes a fresh prod `.env` (prod relay namespace, `CLOUD_API_HOSTNAME=manage-breast-screening.nhs.uk`, prod AE titles), fully replacing the pre-prod `.env`. + +**Verify** (smoke test): all four services Running, and an initial heartbeat in the prod Log Analytics workspace within 5 minutes. + +```powershell +Get-Service Gateway-PACS, Gateway-MWL, Gateway-Upload, Gateway-Relay | Select-Object Name, Status +``` + +## Step 5 — Reconfigure the modality + +The gateway AE titles are environment-specific: the `.env` builder sets `MWL_AET=RUBIE_MWL_` / `PACS_AET=RUBIE_PACS_`, where the environment is uppercased and truncated to 3 characters if longer than 4. + +| Environment | MWL AE title | PACS AE title | +|-------------|--------------|---------------| +| pre-prod (`PREPROD` → `PRE`) | `RUBIE_MWL_PRE` | `RUBIE_PACS_PRE` | +| prod (`PROD`) | `RUBIE_MWL_PROD` | `RUBIE_PACS_PROD` | + +The modality is currently configured to send to the **pre-prod** AE titles. The modality's MWL and PACS destinations must be updated to the **prod** AE titles. The VM IP and ports (`104` / `11112`) will typically be unchanged. + +**Verify**: a C-ECHO from the modality to each prod AE title succeeds. (A C-ECHO to the old `..._PRE` titles will now fail — expected.) + +## Step 6 — Clean up the pre-prod side (Azure) + +Disconnecting the machine in Step 2 leaves orphaned pre-prod resources: a disconnected Arc machine, a dangling Hybrid Connection (`hc-gw-<...>-rwa-01` in `relay-manbrs-preprod`), and a stale `Gateway.Access` role assignment. + +- [ ] Run **Deploy Arc Infrastructure - preprod** (Terraform) so it destroys the now-orphaned Hybrid Connection. +- [ ] Remove the disconnected Arc machine resource from `rg-mbsgw-preprod-uks-arc-enabled-servers` if Terraform/Arc has not already. +- [ ] Confirm the stale pre-prod app-role assignment is removed (re-running `make preprod assign-arc-app-roles` reconciles, or remove it via the portal). + +--- + +## End-to-end check + +With the modality reconfigured and prod Rubie seeded: + +1. Start the seeded test appointment in **prod** Rubie → a worklist item reaches the prod gateway. +2. Query the worklist from the modality → the item appears. +3. Acquire and send images → they arrive in the prod gateway PACS and are forwarded to prod Rubie. +4. Confirm the images appear against the appointment in prod Rubie. + +--- + +## Gotchas + +| Symptom | Cause | Fix | +|---------|-------|-----| +| **Deploy Gateway - prod** reports "No machines found for ring1 — skipping" | `GATEWAY_RINGS` not set for prod, defaulting to `ring0` | Set `GATEWAY_RINGS="ring1"` in `prod/variables.sh` and redeploy | +| Services start but fail to authenticate against the cloud API | `Gateway.Access` not assigned to the **prod** managed identity | Run `make prod assign-arc-app-roles` | +| Modality C-ECHO fails after promotion | Modality still targeting `..._PRE` AE titles | Reconfigure modality to `..._PROD` (Step 5) | +| Hybrid Connection not created by Terraform | Arc machine not yet **Connected** in the prod RG | Confirm Step 3 verify, then re-run the infra pipeline | +| Pre-prod relay still shows connection attempts | Orphaned pre-prod HC / registration | Complete Step 6 |