Incident Report: HTTP 5xx spike with cart memory-retention behavior
- Incident ID:
4bcfd5d3-3796-4fc9-8c2c-bc4415cef000
- Service: Azure Container Apps —
ca-grubify-api (rg: rg-grubify-app)
- Subscription:
06dbbc7b-2363-4dd4-9803-95d07f1a8d3e
- FQDN:
ca-grubify-api.politecliff-89094031.swedencentral.azurecontainerapps.io
- Active revision:
ca-grubify-api--0000002 (100% traffic)
Summary
A Sev2 alert fired for elevated HTTP 5xx on the Grubify backend Container App. Incident-window telemetry confirmed a burst of 5xx responses and logs showed repeated per-request cache growth in the cart path (10MB increments). Service is currently reachable with successful synthetic checks, but recurrence risk remains until code-level containment is implemented.
Impact
- Elevated backend HTTP 5xx during the incident window.
- Cart operations experienced increased failure risk while the burst occurred.
Timeline (UTC)
- ~10:20: 5xx requests reached 37/min.
- ~10:21: 5xx requests reached 61/min.
- 10:22:28: Azure Monitor Sev2 alert fired.
- ~10:28: Container startup observed in logs; recovery underway.
- ~10:35: Synthetic checks on key endpoints returned HTTP 200.
Evidence
Console logs (active revision)
2026-05-07T10:28:32Z cache: Added request data. Total entries: 1
2026-05-07T10:28:32Z size: 10MB
2026-05-07T10:29:36Z cache: Added request data. Total entries: 2
2026-05-07T10:29:36Z size: 20MB
2026-05-07T10:29:36Z cache: Added request data. Total entries: 3
2026-05-07T10:29:36Z size: 30MB
2026-05-07T10:32:45Z cache: Added request data. Total entries: 4
2026-05-07T10:32:45Z size: 40MB
Traffic and Response Time
- Evidence chart artifact:
/api/files/tmp/ThreadFiles/a3bd7360-77aa-4402-a14c-eba85094b03e/grubify-5xx-incident-2026-05-07-evidence.png
- Synthetic checks (~10:35 UTC):
GET /weatherforecast → 200
GET /api/restaurants → 200
GET /api/fooditems → 200
POST /api/cart/demo-user/items → 200
Metrics snapshot (Azure Monitor)
- Requests (5xx, 1m bins):
10:20=37, 10:21=61, 10:24=1
- ResponseTime avg: peak
79 ms at 10:29
- RestartCount:
0 across sampled window
- MemoryPercentage: one control-plane call returned scope error; memory corroborated with
WorkingSetBytes
- WorkingSetBytes: peak
128,634,880 bytes (~128.6 MB) at 10:20
- UsageNanoCores: peak
245,688,944 (~245.7 millicores) at 10:29
Root Cause
Likely application-level memory-retention behavior in the cart path (POST /api/cart/{userId}/items) causing transient resource pressure and backend 5xx under burst traffic, evidenced by cumulative 10MB growth log entries.
Remediation
- Code: Remove unbounded per-request retained allocations in cart handling.
- Defensive: Add endpoint rate/resource guards for cart POST traffic.
- Platform: Keep protective backend scaling/memory settings while code fix is rolled out.
- Observability: Add targeted alerting on cache-growth signatures and exception spikes.
Action Items
| # |
Action |
Priority |
| 1 |
Patch cart-path memory retention bug |
High |
| 2 |
Add repeated cart-POST memory regression test |
High |
| 3 |
Review memory autoscale guardrails |
Medium |
| 4 |
Add alert for cart cache-growth signature |
Medium |
References
- Container App:
/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-app/providers/Microsoft.App/containerapps/ca-grubify-api
- Log Analytics Workspace ID:
bd41ac04-55df-4ef8-b157-4aebd5cd76d5
- App Insights:
/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-sre/providers/Microsoft.Insights/components/appi-sre-grubify
This issue was created by sre-agent-cff6qws2yy4ku--163d1e9d
Tracked by the SRE agent here
Incident Report: HTTP 5xx spike with cart memory-retention behavior
4bcfd5d3-3796-4fc9-8c2c-bc4415cef000ca-grubify-api(rg:rg-grubify-app)06dbbc7b-2363-4dd4-9803-95d07f1a8d3eca-grubify-api.politecliff-89094031.swedencentral.azurecontainerapps.ioca-grubify-api--0000002(100% traffic)Summary
A Sev2 alert fired for elevated HTTP 5xx on the Grubify backend Container App. Incident-window telemetry confirmed a burst of 5xx responses and logs showed repeated per-request cache growth in the cart path (10MB increments). Service is currently reachable with successful synthetic checks, but recurrence risk remains until code-level containment is implemented.
Impact
Timeline (UTC)
Evidence
Console logs (active revision)
Traffic and Response Time
/api/files/tmp/ThreadFiles/a3bd7360-77aa-4402-a14c-eba85094b03e/grubify-5xx-incident-2026-05-07-evidence.pngGET /weatherforecast→200GET /api/restaurants→200GET /api/fooditems→200POST /api/cart/demo-user/items→200Metrics snapshot (Azure Monitor)
10:20=37,10:21=61,10:24=179 msat10:290across sampled windowWorkingSetBytes128,634,880bytes (~128.6 MB) at10:20245,688,944(~245.7 millicores) at10:29Root Cause
Likely application-level memory-retention behavior in the cart path (
POST /api/cart/{userId}/items) causing transient resource pressure and backend 5xx under burst traffic, evidenced by cumulative 10MB growth log entries.Remediation
Action Items
References
/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-app/providers/Microsoft.App/containerapps/ca-grubify-apibd41ac04-55df-4ef8-b157-4aebd5cd76d5/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-sre/providers/Microsoft.Insights/components/appi-sre-grubifyThis issue was created by sre-agent-cff6qws2yy4ku--163d1e9d
Tracked by the SRE agent here