Incident Report: HTTP 5xx due to OutOfMemory in Cart API
- Incident ID:
c319f08a-bd84-4c4b-99c0-b63cff7bf000
- Service: Azure Container Apps —
ca-grubify-api (rg: rg-grubify-app)
- Subscription:
06dbbc7b-2363-4dd4-9803-95d07f1a8d3e
- FQDN:
ca-grubify-api.politecliff-89094031.swedencentral.azurecontainerapps.io
- Active revision:
ca-grubify-api--0000001 (100% traffic)
Summary
A Sev2 Azure Monitor metric alert fired for Grubify API 5xx responses. During the alert window, platform metrics showed a sharp burst of 5xx requests, and revision logs showed repeated unhandled System.OutOfMemoryException in CartController.AddItemToCart (/app/Controllers/CartController.cs:line 30). Immediate stabilization actions were executed (revision restart, then scale-up), and endpoint verification showed recovery.
Impact
- User-facing API failures (HTTP 5xx) during the spike window.
- Failed cart item operations on
POST /api/cart/{userId}/items for affected requests.
- Short-lived service instability while containers recycled after memory pressure.
Timeline (UTC)
- ~08:53: 5xx spike starts; alert condition breached (
Requests{statusCodeCategory=5xx} > threshold).
- 08:55:51: Azure Monitor alert fired (
alert-http-5xx-grubify, Sev2).
- ~08:56-08:58: Repeated
OutOfMemoryException seen in active revision logs; readiness failures observed around recycle events.
- ~09:01: Revision restart executed for active revision.
- ~09:04-09:06: Defensive scale change applied; new revision
ca-grubify-api--0000001 became active with increased resources/replicas.
- ~09:06-09:07: Verification checks show cart endpoint returning HTTP 200 (20/20 successful test calls).
Evidence
Console logs (active revision)
fail: Microsoft.AspNetCore.Server.Kestrel[13]
Connection id "...", Request id "...": An unhandled exception was thrown by the application.
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at GrubifyApi.Controllers.CartController.AddItemToCart(String userId, AddCartItemRequest request) in /app/Controllers/CartController.cs:line 30
Additional observed lines:
readiness probe failed: connection refused
Analytics cache: Added request data. Total entries: 1/2/3/4
Cache size: 10MB/20MB/30MB/40MB
Traffic and Response Time
Evidence chart generated from investigation window (Requests, 5xx Requests, and ResponseTime with alert marker):
- Local artifact generated by investigation:
grubify-incident-2026-05-03-evidence.png
Metrics snapshot (Azure Monitor)
- Requests (5xx, 1m bins):
- 08:53 = 31
- 08:54 = 80 (peak)
- 08:55 = 16
- 08:58 = 2
- 09:00+ = 0
- ResponseTime avg (ms): 08:36 = 47 ms, 08:59 = 46 ms, most spike-window points low/sparse due failures.
- CPU utilization: low-to-moderate; peak observed ~2.75% during traffic burst.
- Memory utilization: low percentage at platform level (~4.5% peak) but app still throws OOM in request path (managed/runtime allocation issue).
- Alert context metric value at fire:
31 for 5xx criterion.
Root Cause
Application-level memory exhaustion in the cart API request path. Repeated unhandled System.OutOfMemoryException occurred in CartController.AddItemToCart (/app/Controllers/CartController.cs:line 30), causing request failures (5xx) and downstream readiness instability during recycle periods.
Remediation
- Code: Refactor cart analytics/data retention in
AddItemToCart to prevent unbounded memory growth (bounded cache/eviction or external store).
- Defensive: Add payload size limits and endpoint throttling for
POST /api/cart/{userId}/items.
- Platform: Immediate mitigation executed:
- Restarted active revision.
- Scaled app to higher baseline capacity (new revision
ca-grubify-api--0000001, resources increased to 1 vCPU / 2Gi, replicas increased).
- Observability: Add alerting for
OutOfMemoryException signature in container logs and endpoint-specific 5xx for cart route.
Action Items
| # |
Action |
Priority |
| 1 |
Patch CartController.AddItemToCart to remove unbounded memory retention and add bounded strategy |
High |
| 2 |
Add regression/load test for sustained cart POST traffic and memory growth behavior |
High |
| 3 |
Keep temporary higher baseline scaling until code fix is deployed and validated |
Medium |
| 4 |
Add log-based OOM alert + cart endpoint 5xx alert dimensioning |
Medium |
| 5 |
Validate rollback criteria and document safe fallback scaling profile |
Medium |
References
- Container App:
/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-app/providers/Microsoft.App/containerApps/ca-grubify-api
- Alert Rule:
/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-sre/providers/Microsoft.Insights/metricAlerts/alert-http-5xx-grubify
- Alert Resource:
/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourcegroups/rg-grubify-app/providers/microsoft.app/containerapps/ca-grubify-api/providers/Microsoft.AlertsManagement/alerts/c319f08a-bd84-4c4b-99c0-b63cff7bf000
- Log Analytics Workspace ID (GUID):
bd41ac04-55df-4ef8-b157-4aebd5cd76d5
- Log Analytics Workspace ARM ID:
/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-app/providers/Microsoft.OperationalInsights/workspaces/cae-grubify-logs
- App Insights:
/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-lab/providers/microsoft.insights/components/appi-cff6qws2yy4ku
This issue was created by sre-agent-cff6qws2yy4ku--163d1e9d
Tracked by the SRE agent here
Incident Report: HTTP 5xx due to OutOfMemory in Cart API
c319f08a-bd84-4c4b-99c0-b63cff7bf000ca-grubify-api(rg:rg-grubify-app)06dbbc7b-2363-4dd4-9803-95d07f1a8d3eca-grubify-api.politecliff-89094031.swedencentral.azurecontainerapps.ioca-grubify-api--0000001(100% traffic)Summary
A Sev2 Azure Monitor metric alert fired for Grubify API 5xx responses. During the alert window, platform metrics showed a sharp burst of 5xx requests, and revision logs showed repeated unhandled
System.OutOfMemoryExceptioninCartController.AddItemToCart(/app/Controllers/CartController.cs:line 30). Immediate stabilization actions were executed (revision restart, then scale-up), and endpoint verification showed recovery.Impact
POST /api/cart/{userId}/itemsfor affected requests.Timeline (UTC)
Requests{statusCodeCategory=5xx}> threshold).alert-http-5xx-grubify, Sev2).OutOfMemoryExceptionseen in active revision logs; readiness failures observed around recycle events.ca-grubify-api--0000001became active with increased resources/replicas.Evidence
Console logs (active revision)
Additional observed lines:
Traffic and Response Time
Evidence chart generated from investigation window (Requests, 5xx Requests, and ResponseTime with alert marker):
grubify-incident-2026-05-03-evidence.pngMetrics snapshot (Azure Monitor)
31for 5xx criterion.Root Cause
Application-level memory exhaustion in the cart API request path. Repeated unhandled
System.OutOfMemoryExceptionoccurred inCartController.AddItemToCart(/app/Controllers/CartController.cs:line 30), causing request failures (5xx) and downstream readiness instability during recycle periods.Remediation
AddItemToCartto prevent unbounded memory growth (bounded cache/eviction or external store).POST /api/cart/{userId}/items.ca-grubify-api--0000001, resources increased to1 vCPU / 2Gi, replicas increased).OutOfMemoryExceptionsignature in container logs and endpoint-specific 5xx for cart route.Action Items
CartController.AddItemToCartto remove unbounded memory retention and add bounded strategyReferences
/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-app/providers/Microsoft.App/containerApps/ca-grubify-api/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-sre/providers/Microsoft.Insights/metricAlerts/alert-http-5xx-grubify/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourcegroups/rg-grubify-app/providers/microsoft.app/containerapps/ca-grubify-api/providers/Microsoft.AlertsManagement/alerts/c319f08a-bd84-4c4b-99c0-b63cff7bf000bd41ac04-55df-4ef8-b157-4aebd5cd76d5/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-app/providers/Microsoft.OperationalInsights/workspaces/cae-grubify-logs/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-lab/providers/microsoft.insights/components/appi-cff6qws2yy4kuThis issue was created by sre-agent-cff6qws2yy4ku--163d1e9d
Tracked by the SRE agent here