High response time vs baseline on sreproactive-vscode-39596 – mitigated via slot swap (2026-05-10)

Summary
- Service: Azure App Service sreproactive-vscode-39596 (rg: rg-sre-proactive-demo)
- Alert context: Plan CPU Sev2 fired 2026-05-10T05:19:37Z. Correlated response time regression vs baseline.
- Action taken: Verified staging & production health; performed slot swap (staging -> production) at 2026-05-10T05:24:25.917Z UTC.
- Outcome: Post-swap average response time improved to 1.78 ms (well below baseline 47.43 ms).

Evidence
- Baseline (from baseline.txt):
  - BaselineResponseTime = 47.431782352941184 ms
  - BaselineTimestamp = 2026-01-22T16:00:55.2520697Z
- Current (pre-swap) App Insights query window:
  - Window: 2026-05-10T05:15:25Z .. 2026-05-10T05:20:25Z
  - KQL:
    requests
    | where timestamp between (datetime(2026-05-10T05:15:25Z) .. datetime(2026-05-10T05:20:25Z))
    | where cloud_RoleName <> '' and not (cloud_RoleName contains "staging")
    | summarize CurrentResponseTime = avg(duration) by cloud_RoleName
    | extend CurrentTimestamp = datetime(2026-05-10T05:20:25Z)
  - Result: cloud_RoleName=sreproactive-vscode-39596; CurrentResponseTime=88.03081666666667 ms; CurrentTimestamp=2026-05-10T05:20:25Z
  - Deviation vs baseline: +85.59%
- Post-swap verification query window:
  - Window: 2026-05-10T05:23:11Z .. 2026-05-10T05:25:11Z
  - KQL:
    requests
    | where timestamp between (datetime(2026-05-10T05:23:11Z) .. datetime(2026-05-10T05:25:11Z))
    | where cloud_RoleName <> '' and not (cloud_RoleName contains "staging")
    | summarize PostSwapResponseTime = avg(duration) by cloud_RoleName
    | extend CurrentTimestamp = datetime(2026-05-10T05:25:11Z)
  - Result: cloud_RoleName=sreproactive-vscode-39596; PostSwapResponseTime=1.7782 ms; CurrentTimestamp=2026-05-10T05:25:11Z

Health endpoints prior to swap
- Production: https://sreproactive-vscode-39596.azurewebsites.net/health (200 Healthy)
- Staging: https://sreproactive-vscode-39596-staging.azurewebsites.net/health (200 Healthy)

Artifacts
- Chart (PNG): [response_time_evidence_20260510T052534Z.png](/api/files/response_time_evidence_20260510T052534Z.png)
- Data (CSV): [response_time_evidence_20260510T052534Z.csv](/api/files/response_time_evidence_20260510T052534Z.csv)

Next steps / Suspicions
- Investigate recent code/config changes in the previously active production slot that could explain the 85%+ latency increase.
- Review dependency latency (DB/external calls) around 05:15–05:20Z.
- Consider adding automated canary/slot soak validation to catch regressions pre-promotion.

Closing note
- Mitigation successful via slot swap; response time is better than baseline. Keeping this issue open for root-cause analysis.
---
*This issue was created by sre-agent-proactive-demo--73aee8f4*
Tracked by the SRE agent [here](https://portal.azure.com/?feature.customPortal=false&feature.canmodifystamps=true&feature.fastmanifest=false&nocdn=force&websitesextension_loglevel=verbose&Microsoft_Azure_PaasServerless=beta&microsoft_azure_paasserverless_assettypeoptions=%7B%22SreAgentCustomMenu%22%3A%7B%22options%22%3A%22%22%7D%7D#view/Microsoft_Azure_PaasServerless/AgentFrameBlade.ReactView/id/%2Fsubscriptions%2F06dbbc7b-2363-4dd4-9803-95d07f1a8d3e%2FresourceGroups%2Frg-sre-proactive-demo%2Fproviders%2FMicrosoft.App%2Fagents%2Fsre-agent-proactive-demo/sreLink/%2Fviews%2Factivities%2Fthreads%2Fd06cef31-3066-4dd6-a00e-d0e60e5adc2c)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High response time vs baseline on sreproactive-vscode-39596 – mitigated via slot swap (2026-05-10) #111

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

High response time vs baseline on sreproactive-vscode-39596 – mitigated via slot swap (2026-05-10) #111

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions