Skip to content

Add post-swap verification and automatic rollback to DeploymentHealthCheck subagent#106

Draft
Copilot wants to merge 2 commits into
mainfrom
copilot/fix-high-response-time-issue-another-one
Draft

Add post-swap verification and automatic rollback to DeploymentHealthCheck subagent#106
Copilot wants to merge 2 commits into
mainfrom
copilot/fix-high-response-time-issue-another-one

Conversation

Copy link
Copy Markdown

Copilot AI commented May 5, 2026

The DeploymentHealthCheck subagent swapped staging→production when response time exceeded baseline by 20%, but had no post-swap verification or rollback logic — leaving a degraded deployment in production when the swap made things worse.

Key changes to DeploymentHealthCheck.yaml

  • Post-swap verification — After swap, waits 120s (App Service warm-up + App Insights ingestion latency), then re-queries App Insights with an explicit timestamp window anchored to SwapExecutedAt
  • Automatic rollback — If post-swap avg response time is >10% worse than pre-swap, triggers an immediate rollback swap without approval; threshold is intentionally tighter than the 20% swap trigger to avoid false positive rollbacks
  • Consistent KQL filters — All queries now use cloud_RoleName == 'sreproactive-vscode-39596' (explicit) instead of a negative not contains "staging" filter
  • Evidence collectionExecutePythonCode generates a timestamped CSV + bar chart PNG (Baseline / PreSwap / PostSwap) saved to /mnt/data/ and linked in the issue
  • Enriched GitHub issue template — Matches the incident report format: baseline, pre/post swap windows with exact timestamps, deviation %, slot health check results, rollback status, evidence links, and the actual KQL queries used
  • New tools addedGetCurrentUtcTime, WaitInMilliSeconds, ExecutePythonCode
# Auto-rollback logic (step 12)
- If PostSwapResponseTime > CurrentResponseTime * 1.10:
    execute: az webapp deployment slot swap ... (rollback)
    set: RollbackExecuted = true, PostSwapAssessment = "worse than pre-swap; rollback executed"
- Else:
    set: RollbackExecuted = false, PostSwapAssessment = "improved; swap successful"

…hCheck subagent

Agent-Logs-Url: https://github.com/gderossilive/AzSreAgentLab/sessions/5d7d26b3-9b41-4078-9145-2312823208c0

Co-authored-by: gderossilive <20165027+gderossilive@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix high response time on sreproactive-vscode-39596 after swap Add post-swap verification and automatic rollback to DeploymentHealthCheck subagent May 5, 2026
Copilot AI requested a review from gderossilive May 5, 2026 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sev2: High response time on sreproactive-vscode-39596 – auto slot swap executed

2 participants