Skip to content

Incident: HTTP 5xx due to OutOfMemory in Cart API (ca-grubify-api) #102

@gderossilive

Description

@gderossilive

Incident Report: HTTP 5xx due to OutOfMemory in Cart API

  • Incident ID: 7b758a0e-de46-47af-8009-60a3134cf000
  • Service: Azure Container Apps — ca-grubify-api (rg: rg-grubify-app)
  • Subscription: 06dbbc7b-2363-4dd4-9803-95d07f1a8d3e
  • FQDN: ca-grubify-api.politecliff-89094031.swedencentral.azurecontainerapps.io
  • Active revision: ca-grubify-api--0000001 (100% traffic)

Summary

A Sev2 metric alert fired for API 5xx on ca-grubify-api. Runtime logs showed repeated unhandled System.OutOfMemoryException in CartController.AddItemToCart plus cache-growth messages. Immediate mitigation (active revision restart and scale/resource increase) stabilized the app and post-mitigation 5xx dropped to 0.

Impact

  • User-facing cart/API failures (HTTP 5xx) during the burst window.
  • Failed cart add-item operations while the process was under memory pressure.

Timeline (UTC)

  • ~08:53: 5xx burst starts (31, then 80 in 1-minute bins).
  • 08:56:03: Alert fired (alert-http-5xx-grubify-api, Sev2).
  • ~09:01: Active revision restart executed.
  • ~09:04: Defensive capacity update applied (minReplicas=2, maxReplicas=4, cpu=1, memory=2Gi).
  • 09:00–09:09: 5xx metric remained 0 in all sampled minutes.

Evidence

Console logs (active revision)

fail: Microsoft.AspNetCore.Server.Kestrel[13]
Connection id "...", Request id "...": An unhandled exception was thrown by the application.
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at GrubifyApi.Controllers.CartController.AddItemToCart(String userId, AddCartItemRequest request) in /app/Controllers/CartController.cs:line 30
Analytics cache: Added request data. Total entries: 1/2/3/4
Cache size: 10MB/20MB/30MB/40MB

Traffic and Response Time

Evidence chart generated during investigation (requests 5xx and response time over incident window): grubify-api-5xx-incident-evidence-2026-05-03.png.

Metrics snapshot (Azure Monitor)

  • Requests (5m/1m bins): spike at 08:53=31, 08:54=80, 08:55=16, 08:58=2; then 09:00–09:09 all 0.
  • ResponseTime avg: peak observed 46 ms at 08:59.
  • RestartCount: 0 across post-mitigation validation window.
  • MemoryPercentage: around 4–5% during spike; lower/stable after capacity change.
  • UsageNanoCores: peak observed 238,068,716 (~0.238 core).

Root Cause

Application memory exhaustion in cart request handling. Repeated unhandled OutOfMemoryException in CartController.AddItemToCart (/app/Controllers/CartController.cs:line 30) caused request failures and the observed 5xx burst.

Remediation

  • Code: Remove unbounded in-memory retention in cart analytics/request path; enforce bounded structures.
  • Defensive: Add cart endpoint rate limiting and request/payload bounds.
  • Platform: Restarted active revision; increased to cpu=1, memory=2Gi, minReplicas=2, maxReplicas=4.
  • Observability: Add explicit log-based OOM alert and cache-growth alerting.

Action Items

# Action Priority
1 Patch CartController.AddItemToCart to remove unbounded memory growth High
2 Add sustained cart-write load test for OOM regression prevention Medium
3 Keep elevated resource settings until code fix is deployed/validated Medium
4 Add OOM + cart endpoint SLO alerts Low

References

  • Container App: /subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-app/providers/Microsoft.App/containerApps/ca-grubify-api
  • Log Analytics Workspace ID: bd41ac04-55df-4ef8-b157-4aebd5cd76d5
  • App Insights: /subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-grubify-sre/providers/Microsoft.Insights/components/appi-sre-grubify

This issue was created by sre-agent-cff6qws2yy4ku--163d1e9d
Tracked by the SRE agent here

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions