Incident Report: HTTP 5xx due to OutOfMemory in Cart API
- Incident ID:
7b758a0e-de46-47af-8009-60a3134cf000
- Service: Azure Container Apps —
ca-grubify-api (rg: rg-grubify-app)
- Subscription:
06dbbc7b-2363-4dd4-9803-95d07f1a8d3e
- FQDN:
ca-grubify-api.politecliff-89094031.swedencentral.azurecontainerapps.io
- Active revision:
ca-grubify-api--0000001 (100% traffic)
Summary
A Sev2 metric alert fired for API 5xx on ca-grubify-api. Runtime logs showed repeated unhandled System.OutOfMemoryException in CartController.AddItemToCart plus cache-growth messages. Immediate mitigation (active revision restart and scale/resource increase) stabilized the app and post-mitigation 5xx dropped to 0.
Impact
- User-facing cart/API failures (HTTP 5xx) during the burst window.
- Failed cart add-item operations while the process was under memory pressure.
Timeline (UTC)
- ~08:53: 5xx burst starts (
31, then 80 in 1-minute bins).
- 08:56:03: Alert fired (
alert-http-5xx-grubify-api, Sev2).
- ~09:01: Active revision restart executed.
- ~09:04: Defensive capacity update applied (
minReplicas=2, maxReplicas=4, cpu=1, memory=2Gi).
- 09:00–09:09: 5xx metric remained 0 in all sampled minutes.
Evidence
Console logs (active revision)
fail: Microsoft.AspNetCore.Server.Kestrel[13]
Connection id "...", Request id "...": An unhandled exception was thrown by the application.
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at GrubifyApi.Controllers.CartController.AddItemToCart(String userId, AddCartItemRequest request) in /app/Controllers/CartController.cs:line 30
Analytics cache: Added request data. Total entries: 1/2/3/4
Cache size: 10MB/20MB/30MB/40MB
Traffic and Response Time
Evidence chart generated during investigation (requests 5xx and response time over incident window): grubify-api-5xx-incident-evidence-2026-05-03.png.
Metrics snapshot (Azure Monitor)
- Requests (5m/1m bins): spike at 08:53=
31, 08:54=80, 08:55=16, 08:58=2; then 09:00–09:09 all 0.
- ResponseTime avg: peak observed
46 ms at 08:59.
- RestartCount:
0 across post-mitigation validation window.
- MemoryPercentage: around
4–5% during spike; lower/stable after capacity change.
- UsageNanoCores: peak observed
238,068,716 (~0.238 core).
Root Cause
Application memory exhaustion in cart request handling. Repeated unhandled OutOfMemoryException in CartController.AddItemToCart (/app/Controllers/CartController.cs:line 30) caused request failures and the observed 5xx burst.
Remediation
- Code: Remove unbounded in-memory retention in cart analytics/request path; enforce bounded structures.
- Defensive: Add cart endpoint rate limiting and request/payload bounds.
- Platform: Restarted active revision; increased to
cpu=1, memory=2Gi, minReplicas=2, maxReplicas=4.
- Observability: Add explicit log-based OOM alert and cache-growth alerting.
Action Items
| # |
Action |
Priority |
| 1 |
Patch CartController.AddItemToCart to remove unbounded memory growth |
High |
| 2 |
Add sustained cart-write load test for OOM regression prevention |
Medium |
| 3 |
Keep elevated resource settings until code fix is deployed/validated |
Medium |
| 4 |
Add OOM + cart endpoint SLO alerts |
Low |
References
- Container App:
/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-app/providers/Microsoft.App/containerApps/ca-grubify-api
- Log Analytics Workspace ID:
bd41ac04-55df-4ef8-b157-4aebd5cd76d5
- App Insights:
/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-sre/providers/Microsoft.Insights/components/appi-sre-grubify
This issue was created by sre-agent-cff6qws2yy4ku--163d1e9d
Tracked by the SRE agent here
Incident Report: HTTP 5xx due to OutOfMemory in Cart API
7b758a0e-de46-47af-8009-60a3134cf000ca-grubify-api(rg:rg-grubify-app)06dbbc7b-2363-4dd4-9803-95d07f1a8d3eca-grubify-api.politecliff-89094031.swedencentral.azurecontainerapps.ioca-grubify-api--0000001(100% traffic)Summary
A Sev2 metric alert fired for API 5xx on
ca-grubify-api. Runtime logs showed repeated unhandledSystem.OutOfMemoryExceptioninCartController.AddItemToCartplus cache-growth messages. Immediate mitigation (active revision restart and scale/resource increase) stabilized the app and post-mitigation 5xx dropped to 0.Impact
Timeline (UTC)
31, then80in 1-minute bins).alert-http-5xx-grubify-api, Sev2).minReplicas=2,maxReplicas=4,cpu=1,memory=2Gi).Evidence
Console logs (active revision)
Traffic and Response Time
Evidence chart generated during investigation (requests 5xx and response time over incident window):
grubify-api-5xx-incident-evidence-2026-05-03.png.Metrics snapshot (Azure Monitor)
31, 08:54=80, 08:55=16, 08:58=2; then 09:00–09:09 all0.46 msat 08:59.0across post-mitigation validation window.4–5%during spike; lower/stable after capacity change.238,068,716(~0.238 core).Root Cause
Application memory exhaustion in cart request handling. Repeated unhandled
OutOfMemoryExceptioninCartController.AddItemToCart(/app/Controllers/CartController.cs:line 30) caused request failures and the observed 5xx burst.Remediation
cpu=1,memory=2Gi,minReplicas=2,maxReplicas=4.Action Items
CartController.AddItemToCartto remove unbounded memory growthReferences
/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-app/providers/Microsoft.App/containerApps/ca-grubify-apibd41ac04-55df-4ef8-b157-4aebd5cd76d5/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-sre/providers/Microsoft.Insights/components/appi-sre-grubifyThis issue was created by sre-agent-cff6qws2yy4ku--163d1e9d
Tracked by the SRE agent here