Summary
- Alert: Proactive Reliability (App Service) High Response Time Alert
- Resource: sreproactive-vscode-39596 (/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-sre-proactive-demo/providers/Microsoft.Web/sites/sreproactive-vscode-39596)
- Action: Automatic deployment slot swap (staging → production) executed at 2026-05-10T05:31:43.153Z
- Status: Response time remains elevated post-swap; investigation required
Evidence
- Baseline (from baseline.txt):
- BaselineResponseTime: 37.080758823529415 ms
- BaselineTimestamp: 2026-05-09T09:30:44.0722477Z
- Pre-swap (App Insights, 5m window):
- Window: 2026-05-10T05:23:33Z .. 2026-05-10T05:28:33Z
- Query:
requests
| where timestamp between (datetime(2026-05-10T05:23:33Z) .. datetime(2026-05-10T05:28:33Z))
| where cloud_RoleName <> '' and not(cloud_RoleName contains "staging")
| summarize CurrentResponseTime = avg(duration) by cloud_RoleName
| extend CurrentTimestamp = datetime(2026-05-10T05:28:33Z)
- Result: sreproactive-vscode-39596 CurrentResponseTime = 230.12468823529412 ms; CurrentTimestamp = 2026-05-10T05:28:33Z
- Deviation vs baseline: +520.60%
- Post-swap (App Insights, ~2.7m window):
- Window: 2026-05-10T05:32:00Z .. 2026-05-10T05:34:40Z
- Query:
requests
| where timestamp between (datetime(2026-05-10T05:32:00Z) .. datetime(2026-05-10T05:34:40Z))
| where cloud_RoleName == "sreproactive-vscode-39596"
| summarize AvgResponseTime = avg(duration)
| extend CurrentTimestamp = datetime(2026-05-10T05:34:40Z)
- Result: AvgResponseTime = 215.0364 ms; Deviation vs baseline: +479.91%
Health Endpoints
Notes
- Swap was triggered automatically due to >=20% degradation.
- Despite swap, response time remains far above baseline, indicating a deeper regression.
- Please investigate recent commits/config changes affecting the production slot and staging contents.
Next Steps (suggested)
- Review recent deployments/changes around 2026-05-10T05:20Z–05:35Z.
- Compare application configuration between slots (app settings/connection strings).
- Analyze dependencies (DB calls, external services) for latency.
- Consider temporarily increasing instance count if load-related.
This issue was created by sre-agent-proactive-demo--73aee8f4
Tracked by the SRE agent here
Summary
Evidence
requests
| where timestamp between (datetime(2026-05-10T05:23:33Z) .. datetime(2026-05-10T05:28:33Z))
| where cloud_RoleName <> '' and not(cloud_RoleName contains "staging")
| summarize CurrentResponseTime = avg(duration) by cloud_RoleName
| extend CurrentTimestamp = datetime(2026-05-10T05:28:33Z)
requests
| where timestamp between (datetime(2026-05-10T05:32:00Z) .. datetime(2026-05-10T05:34:40Z))
| where cloud_RoleName == "sreproactive-vscode-39596"
| summarize AvgResponseTime = avg(duration)
| extend CurrentTimestamp = datetime(2026-05-10T05:34:40Z)
Health Endpoints
Notes
Next Steps (suggested)
This issue was created by sre-agent-proactive-demo--73aee8f4
Tracked by the SRE agent here