Skip to content

Incident: HTTP 5xx due to OutOfMemory in Cart API (Grubify Container App) #100

@gderossilive

Description

@gderossilive

Incident Report: HTTP 5xx due to OutOfMemory in Cart API

  • Incident ID: 7be188e9-9fe9-458e-8ff0-53cfbc11f000
  • Service: Azure Container Apps — ca-grubify-cff6qws2yy4ku (rg: rg-grubify-lab)
  • Subscription: 06dbbc7b-2363-4dd4-9803-95d07f1a8d3e
  • FQDN: ca-grubify-cff6qws2yy4ku.politecliff-89094031.swedencentral.azurecontainerapps.io
  • Active revision: ca-grubify-cff6qws2yy4ku--0000007 (100% traffic)

Summary

A Sev2 Azure Monitor alert fired for HTTP 5xx on the Grubify backend Container App. During the alert window, logs show repeated System.OutOfMemoryException in CartController.AddItemToCart, and Azure Monitor reported a 5xx spike above threshold. Service is currently healthy, but the failure mode is consistent with the known cart memory leak path.

Impact

  • API errors (HTTP 5xx) on cart operations during the spike window.
  • Failed cart updates for affected requests.

Timeline (UTC)

  • ~08:31: 5xx spike observed (Requests{statusCodeCategory=5xx} = 70, then 55).
  • ~08:31: Repeated System.OutOfMemoryException logged in CartController.AddItemToCart (/app/Controllers/CartController.cs:line 30).
  • 08:33:28: Azure Monitor alert alert-http-5xx-grubify-lab fired (Sev2).
  • ~08:39: MemoryPercentage peaked at 99%.

Evidence

Console logs (active revision)

2026-05-03T08:31:34.7334545Z ca-grubify-cff6qws2yy4ku--0000007 grubify-api
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at GrubifyApi.Controllers.CartController.AddItemToCart(String userId, AddCartItemRequest request) in /app/Controllers/CartController.cs:line 30

2026-05-03T08:37:46.8014159Z ca-grubify-cff6qws2yy4ku--0000007 grubify-api
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at GrubifyApi.Controllers.CartController.AddItemToCart(String userId, AddCartItemRequest request) in /app/Controllers/CartController.cs:line 30

Traffic and Response Time

Chart artifact generated: grubify-incident-7be188e9-requests-response.png

Metrics snapshot (Azure Monitor)

  • Requests (5m bins): Alert context metric value = 70 for statusCodeCategory=5xx (threshold >5)
  • ResponseTime avg: peak 35 ms
  • RestartCount: 0
  • MemoryPercentage: peak 99% at 08:39 (memory pressure)
  • UsageNanoCores: peak 482,045,382

Root Cause

Application-level memory exhaustion in CartController.AddItemToCart caused repeated OutOfMemoryException and elevated HTTP 5xx. Stack traces point to /app/Controllers/CartController.cs:line 30, matching the intentional unbounded memory-retention fault in this lab.

Remediation

  • Code: Remove unbounded per-request memory retention and implement bounded/evicted analytics storage.
  • Defensive: Add request throttling and payload limits on cart endpoints.
  • Platform: Keep minReplicas >= 1; evaluate memory-based scaling and higher memory SKU.
  • Observability: Add OOM-specific log alert and memory pressure early-warning alert.

Action Items

# Action Priority
1 Patch CartController.AddItemToCart to remove unbounded memory retention High
2 Add load/regression test for cart memory behavior Medium
3 Add memory-based autoscale and review memory limits Medium
4 Add OOM log alerts and dashboard correlation with 5xx Low

References

  • Container App: /subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-lab/providers/Microsoft.App/containerApps/ca-grubify-cff6qws2yy4ku
  • Log Analytics Workspace ID: 5c4d8643-6749-4f54-afef-91d0cef8f0e5
  • App Insights: /subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-lab/providers/microsoft.insights/components/appi-cff6qws2yy4ku

This issue was created by sre-agent-cff6qws2yy4ku--163d1e9d
Tracked by the SRE agent here

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions