Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/deployment/deployment-pipeline.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Deployment Pipeline

> **Scope**: End-to-end flow from a GitHub tag or main-branch push through Azure DevOps pipelines to an Arc Run Command executing on hospital gateway VMs.
> **Related docs**: [Windows Service Deploy](./windows-service-deploy.md) | [Onboard Hospital VM](./runbooks/onboard-hospital-vm.md) | [Rollback Runbook](./runbooks/rollback.md)
> **Related docs**: [Windows Service Deploy](./windows-service-deploy.md) | [Onboard Hospital VM](./runbooks/onboard-hospital-vm.md) | [Promote VM Pre-Prod → Prod](./runbooks/promote-vm-preprod-to-prod.md) | [Rollback Runbook](./runbooks/rollback.md)

---

Expand Down
136 changes: 136 additions & 0 deletions docs/deployment/runbooks/promote-vm-preprod-to-prod.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Promote a Gateway VM from Pre-Prod to Prod

> **When to use**: A hospital VM currently running the **pre-prod** gateway needs to become the **prod** gateway (same infrastructure, different environment).
> **Related**: [Onboard Hospital VM](./onboard-hospital-vm.md) | [Cleanup](./cleanup.md) | [Deployment Pipeline](../deployment-pipeline.md)

---

## Why this is not just "re-point the gateway"

Pre-prod and prod are separate Azure subscriptions, with separate Arc resource groups, relay namespaces (`relay-manbrs-<env>`), web-API service principals (`spn-manbrs-web-api-<env>`) and ADO pipelines. An Arc-connected machine belongs to **exactly one** subscription, and the gateway services authenticate using that machine's managed identity. A pre-prod machine identity holds the `Gateway.Access` role on the **pre-prod** API and cannot talk to prod.

So you cannot promote by editing `.env`. The machine must be **disconnected from the pre-prod subscription and re-onboarded into the prod subscription** — effectively the [onboarding runbook](./onboard-hospital-vm.md) run for prod, preceded by a clean decommission of the pre-prod gateway.

> One box hosts one gateway. After this procedure there is **no pre-prod gateway at this site**. If pre-prod is still needed at the site, use a different VM.

---

## Before the day (do these in advance)

- [ ] **`GATEWAY_RINGS` is set for prod.** `infrastructure/environments/prod/variables.sh` must set `GATEWAY_RINGS` to include the ring the site is tagged with (for initial production installs this will typically be `ring1`). If it is unset it defaults to `ring0` and the prod deploy pipeline **silently skips** the machine.
- [ ] **Prod infrastructure exists.** Confirm the prod relay namespace, `spn-manbrs-web-api-prod`, Log Analytics workspace and the prod ADO pipelines are provisioned and have had a successful infra deploy.
- [ ] **Prod Manage (Rubie) is seeded.** The prod Rubie instance has the site's clinic/setting, a `Relay` record pointing at `relay-manbrs-prod` and the prod gateway, the `gateway_images` feature flag enabled, and a test appointment for the end-to-end check.
- [ ] **DHCP reservation confirmed** for the VM, so the IP the modality targets cannot change between decommission and go-live.
- [ ] **Modality engineer briefed** that the gateway **AE titles change** (see [Step 5](#step-5--reconfigure-the-modality-on-site)) and the modality must be reconfigured and re-tested on the day.
- [ ] **Fresh prod Arc onboarding SPN secret** generated with 1-day expiry (`arc-onboarding-spn-client-secret`), shared with hospital IT before the onboarding call.

---

## Step 1 — Decommission the pre-prod gateway (on the VM)

Run from an **elevated PowerShell session** on the VM. First confirm you are on the right machine:

```powershell
hostname
Get-Service Gateway-* | Format-Table Name, Status
```

Then run the cleanup script, which stops and removes the services and **removes the installation directory including all pre-prod data** (`worklist.db`, `pacs.db`, `data\storage`):

```powershell
.\scripts\powershell\cleanup.ps1
```

**Verify**: no `Gateway-*` services remain and `C:\Program Files\NHS\ManageBreastScreeningGateway` is gone (see [Cleanup runbook — Verify](./cleanup.md)).

## Step 2 — Disconnect from the pre-prod Arc subscription

Still on the VM, elevated:

```powershell
azcmagent disconnect
```

This removes the machine's registration from the pre-prod resource group. The Arc **agent stays installed** — only the registration is removed, so re-onboarding in Step 3 skips the agent install.

**Verify**: `azcmagent show` reports the agent as **Disconnected**.

## Step 3 — Re-onboard into prod

This is [Onboarding runbook Step 2](./onboard-hospital-vm.md#step-2--run-arc-onboarding-script-on-the-gateway-vm) with **prod parameters**. The site parameters (`SiteName`, `ODSCode`, `Instance`) are unchanged, so the Arc resource name is identical — but it is now created in the prod resource group.

```powershell
.\arc-setup.ps1 `
-SubscriptionId "<prod-spoke-subscription-id>" `
-TenantId "<tenant-id>" `
-ResourceGroup "rg-mbsgw-prod-uks-arc-enabled-servers" `
-Location "uksouth" `
-ServicePrincipalId "<arc-onboarding-spn-client-id>" `
-ServicePrincipalSecret "<prod-arc-onboarding-spn-client-secret>" `
-SiteName "Hull-University-Teaching-Hospitals-NHS-Trust" `
-ODSCode "RWA" `
-Instance "01" `
-NHSRegion "neyh" `
-SiteType "static" `
-DeploymentRing "ring1"
```

**Verify**: in the Azure portal, `rg-mbsgw-prod-uks-arc-enabled-servers` → Azure Arc machines → `gw-<...>-rwa-01` is **Connected**.

## Step 4 — Provision prod and deploy the prod gateway

From here, follow the onboarding runbook against **prod**:

1. **Grant API access** — `make prod assign-arc-app-roles` ([Step 3](./onboard-hospital-vm.md#step-3--grant-api-access)). This assigns `Gateway.Access` on `spn-manbrs-web-api-prod` to the machine's new prod managed identity. The old pre-prod assignment is irrelevant (different identity) and is cleaned up in Step 6.
2. **Provision the Hybrid Connection** — run **Deploy Arc Infrastructure - prod** ([Step 4](./onboard-hospital-vm.md#step-4--trigger-terraform-to-provision-the-hybrid-connection)). Creates `hc-gw-<...>-rwa-01` in `relay-manbrs-prod`.
3. **Deploy the application** — run **Deploy Gateway - prod** with a **released** `releaseTag` (not a pre-prod build) ([Step 5](./onboard-hospital-vm.md#step-5--deploy-the-gateway-application)). This writes a fresh prod `.env` (prod relay namespace, `CLOUD_API_HOSTNAME=manage-breast-screening.nhs.uk`, prod AE titles), fully replacing the pre-prod `.env`.

**Verify** (smoke test): all four services Running, and an initial heartbeat in the prod Log Analytics workspace within 5 minutes.

```powershell
Get-Service Gateway-PACS, Gateway-MWL, Gateway-Upload, Gateway-Relay | Select-Object Name, Status
```

## Step 5 — Reconfigure the modality

The gateway AE titles are environment-specific: the `.env` builder sets `MWL_AET=RUBIE_MWL_<ENV>` / `PACS_AET=RUBIE_PACS_<ENV>`, where the environment is uppercased and truncated to 3 characters if longer than 4.

| Environment | MWL AE title | PACS AE title |
|-------------|--------------|---------------|
| pre-prod (`PREPROD` → `PRE`) | `RUBIE_MWL_PRE` | `RUBIE_PACS_PRE` |
| prod (`PROD`) | `RUBIE_MWL_PROD` | `RUBIE_PACS_PROD` |

The modality is currently configured to send to the **pre-prod** AE titles. The modality's MWL and PACS destinations must be updated to the **prod** AE titles. The VM IP and ports (`104` / `11112`) will typically be unchanged.

**Verify**: a C-ECHO from the modality to each prod AE title succeeds. (A C-ECHO to the old `..._PRE` titles will now fail — expected.)

## Step 6 — Clean up the pre-prod side (Azure)

Disconnecting the machine in Step 2 leaves orphaned pre-prod resources: a disconnected Arc machine, a dangling Hybrid Connection (`hc-gw-<...>-rwa-01` in `relay-manbrs-preprod`), and a stale `Gateway.Access` role assignment.

- [ ] Run **Deploy Arc Infrastructure - preprod** (Terraform) so it destroys the now-orphaned Hybrid Connection.
- [ ] Remove the disconnected Arc machine resource from `rg-mbsgw-preprod-uks-arc-enabled-servers` if Terraform/Arc has not already.
- [ ] Confirm the stale pre-prod app-role assignment is removed (re-running `make preprod assign-arc-app-roles` reconciles, or remove it via the portal).

---

## End-to-end check

With the modality reconfigured and prod Rubie seeded:

1. Start the seeded test appointment in **prod** Rubie → a worklist item reaches the prod gateway.
2. Query the worklist from the modality → the item appears.
3. Acquire and send images → they arrive in the prod gateway PACS and are forwarded to prod Rubie.
4. Confirm the images appear against the appointment in prod Rubie.

---

## Gotchas

| Symptom | Cause | Fix |
|---------|-------|-----|
| **Deploy Gateway - prod** reports "No machines found for ring1 — skipping" | `GATEWAY_RINGS` not set for prod, defaulting to `ring0` | Set `GATEWAY_RINGS="ring1"` in `prod/variables.sh` and redeploy |
| Services start but fail to authenticate against the cloud API | `Gateway.Access` not assigned to the **prod** managed identity | Run `make prod assign-arc-app-roles` |
| Modality C-ECHO fails after promotion | Modality still targeting `..._PRE` AE titles | Reconfigure modality to `..._PROD` (Step 5) |
| Hybrid Connection not created by Terraform | Arc machine not yet **Connected** in the prod RG | Confirm Step 3 verify, then re-run the infra pipeline |
| Pre-prod relay still shows connection attempts | Orphaned pre-prod HC / registration | Complete Step 6 |
Loading