Incident Report: HTTP 5xx due to OutOfMemoryException in Cart API
- Incident ID:
1802ec48-6466-4df5-bee5-b0345a7df000
- Service: Azure Container Apps —
ca-grubify-api (rg: rg-grubify-app)
- Subscription:
06dbbc7b-2363-4dd4-9803-95d07f1a8d3e
- FQDN:
ca-grubify-api.politecliff-89094031.swedencentral.azurecontainerapps.io
- Active revision:
ca-grubify-api--0000001 (100% traffic)
Summary
Azure Monitor fired alert-http-5xx-grubify at 2026-05-03T10:28:53Z for sustained 5xx responses on ca-grubify-api. Runtime logs show repeated unhandled System.OutOfMemoryException in CartController.AddItemToCart during the same window. Immediate mitigation was applied by restarting the active revision; live endpoint probes then returned HTTP 200.
Impact
- User-facing API failures (HTTP 5xx) during the incident burst around the alert window.
- Cart-related operations were at elevated risk of failure while OOM exceptions were being thrown.
Timeline (UTC)
- ~10:22–10:27: Traffic ramp observed (43, 82, 81, 65, 80, 62 req/min).
- ~10:26–10:31: Repeated
System.OutOfMemoryException and unhandled request failures in app logs.
- 10:28:53: Sev2 Azure Monitor alert fired (
alert-http-5xx-grubify).
- ~10:31–10:32: Active revision restart executed (
ca-grubify-api--0000001).
- ~10:32–10:33: Endpoint validation succeeded (
/weatherforecast, /api/restaurants, /api/fooditems, /api/cart/demo-user/items all HTTP 200).
Evidence
Console logs (active revision)
fail: Microsoft.AspNetCore.Server.Kestrel[13]
Connection id "0HNL8UCA95GM9", Request id "0HNL8UCA95GM9:00000002": An unhandled exception was thrown by the application.
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at GrubifyApi.Controllers.CartController.AddItemToCart(String userId, AddCartItemRequest request) in /app/Controllers/CartController.cs:line 30
Additional repeated signals in the same window:
readiness probe failed: connection refused
Stopping container grubify-api
Application is shutting down...
Traffic and Response Time
An investigation chart was generated for the incident window showing request burst followed by drop at alert time and post-mitigation stability.
Key plotted points (UTC, req/min):
- 10:22=43, 10:23=82, 10:24=81, 10:25=65, 10:26=80, 10:27=62, 10:28=0, 10:29=0, 10:30=0
Metrics snapshot (Azure Monitor)
- Requests (1m bins): 43, 82, 81, 65, 80, 62 (10:22–10:27), then 0 at 10:28+
- ResponseTime avg: elevated during incident window (alert fired on 5xx rule context)
- RestartCount: not reliably returned via CLI in this run; runtime logs confirm restart activity around mitigation
- MemoryPercentage: ~2% average in sampled post-window points
- UsageNanoCores: low/near-idle in sampled post-window points
Endpoint validation after mitigation:
GET /weatherforecast → 200
GET /api/restaurants → 200
GET /api/fooditems → 200
POST /api/cart/demo-user/items → 200
Root Cause
Application-level memory exhaustion in the cart code path: repeated System.OutOfMemoryException in GrubifyApi.Controllers.CartController.AddItemToCart (/app/Controllers/CartController.cs:line 30) caused unhandled exceptions and 5xx errors under active traffic.
Remediation
- Code: Remove unbounded per-request memory retention in
CartController.AddItemToCart; implement bounded cache or persistent storage.
- Defensive: Add request throttling/rate limits specifically on cart write endpoint and enforce payload constraints.
- Platform: Keep higher baseline capacity for resilience (current app configured at 2Gi / min replicas 2, max 4) and tune autoscale rules for burst traffic.
- Observability: Add explicit OOM/log-based alerting and endpoint-specific SLO monitors for cart operations.
Action Items
| # |
Action |
Priority |
| 1 |
Patch CartController to eliminate unbounded memory growth and deploy fixed image |
High |
| 2 |
Add regression/load test that repeatedly posts to cart endpoint and asserts stable memory |
High |
| 3 |
Add autoscale rule and guardrails for cart traffic bursts |
Medium |
| 4 |
Add log-based alert for OutOfMemoryException with incident correlation to 5xx alert |
Medium |
| 5 |
Confirm alert auto-resolves and document closeout steps in runbook |
Low |
References
- Container App:
/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-app/providers/Microsoft.App/containerApps/ca-grubify-api
- Log Analytics Workspace ID:
bd41ac04-55df-4ef8-b157-4aebd5cd76d5
- Log Analytics Workspace ARM ID:
/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-lab/providers/Microsoft.OperationalInsights/workspaces/cae-grubify-logs
- App Insights:
/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-lab/providers/Microsoft.Insights/components/appi-cff6qws2yy4ku
- Alert ARM ID:
/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourcegroups/rg-grubify-app/providers/microsoft.app/containerapps/ca-grubify-api/providers/Microsoft.AlertsManagement/alerts/1802ec48-6466-4df5-bee5-b0345a7df000
This issue was created by sre-agent-cff6qws2yy4ku--163d1e9d
Tracked by the SRE agent here
Incident Report: HTTP 5xx due to OutOfMemoryException in Cart API
1802ec48-6466-4df5-bee5-b0345a7df000ca-grubify-api(rg:rg-grubify-app)06dbbc7b-2363-4dd4-9803-95d07f1a8d3eca-grubify-api.politecliff-89094031.swedencentral.azurecontainerapps.ioca-grubify-api--0000001(100% traffic)Summary
Azure Monitor fired
alert-http-5xx-grubifyat2026-05-03T10:28:53Zfor sustained 5xx responses onca-grubify-api. Runtime logs show repeated unhandledSystem.OutOfMemoryExceptioninCartController.AddItemToCartduring the same window. Immediate mitigation was applied by restarting the active revision; live endpoint probes then returned HTTP 200.Impact
Timeline (UTC)
System.OutOfMemoryExceptionand unhandled request failures in app logs.alert-http-5xx-grubify).ca-grubify-api--0000001)./weatherforecast,/api/restaurants,/api/fooditems,/api/cart/demo-user/itemsall HTTP 200).Evidence
Console logs (active revision)
Additional repeated signals in the same window:
readiness probe failed: connection refusedStopping container grubify-apiApplication is shutting down...Traffic and Response Time
An investigation chart was generated for the incident window showing request burst followed by drop at alert time and post-mitigation stability.
Key plotted points (UTC, req/min):
Metrics snapshot (Azure Monitor)
Endpoint validation after mitigation:
GET /weatherforecast→ 200GET /api/restaurants→ 200GET /api/fooditems→ 200POST /api/cart/demo-user/items→ 200Root Cause
Application-level memory exhaustion in the cart code path: repeated
System.OutOfMemoryExceptioninGrubifyApi.Controllers.CartController.AddItemToCart(/app/Controllers/CartController.cs:line 30) caused unhandled exceptions and 5xx errors under active traffic.Remediation
CartController.AddItemToCart; implement bounded cache or persistent storage.Action Items
CartControllerto eliminate unbounded memory growth and deploy fixed imageOutOfMemoryExceptionwith incident correlation to 5xx alertReferences
/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-app/providers/Microsoft.App/containerApps/ca-grubify-apibd41ac04-55df-4ef8-b157-4aebd5cd76d5/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-lab/providers/Microsoft.OperationalInsights/workspaces/cae-grubify-logs/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-lab/providers/Microsoft.Insights/components/appi-cff6qws2yy4ku/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourcegroups/rg-grubify-app/providers/microsoft.app/containerapps/ca-grubify-api/providers/Microsoft.AlertsManagement/alerts/1802ec48-6466-4df5-bee5-b0345a7df000This issue was created by sre-agent-cff6qws2yy4ku--163d1e9d
Tracked by the SRE agent here