Detect Lambda OOM crashes and return codes.ResourceExhausted#859
Detect Lambda OOM crashes and return codes.ResourceExhausted#859c1-dev-bot[bot] wants to merge 2 commits into
Conversation
The Lambda error handling previously only detected timeouts, leaving OOM crashes as uninformative "function returned error: Unhandled" messages. This adds OOM detection by checking for Runtime.ExitError with signal:killed and comparing Memory Size vs Max Memory Used from the Lambda REPORT line. OOM errors now return codes.ResourceExhausted for proper retry behavior and clearer diagnostics. Also stops filtering Runtime.ExitError from log output — it carries crash context that was previously discarded. Refactors inline error classification into classifyLambdaError() per the existing code comment requesting this when a third case was added.
|
Checking lint failure - reviewing code for potential issues. |
| wantIsGRPC: true, | ||
| }, | ||
| { | ||
| name: "timeout via context deadline exceeded in logs", | ||
| functionError: "Unhandled", | ||
| statusCode: 200, | ||
| payload: []byte(`{}`), | ||
| rawLog: `{"level":"error","error":"context deadline exceeded","msg":"sync failed"}`, | ||
| wantCode: codes.DeadlineExceeded, | ||
| wantSubstring: "function timed out", | ||
| wantIsGRPC: true, | ||
| }, |
There was a problem hiding this comment.
🟠 Bug: This test case will fail at runtime due to two compounding issues:
- The
rawLogstarts with{, soextractMeaningfulLogLinesfilters it out as a JSON log line (client.go:189), makingfilteredLogsempty. - Even if it weren't filtered, the code at client.go:222 checks for escaped quotes (
`\"error\":\"context deadline exceeded\"`— literal\+"in a raw string), but this rawLog contains unescaped JSON quotes.
The function falls through to the generic error path returning "lambda_transport: function returned error: Unhandled; status code: 200", which fails both the substring ("function timed out") and gRPC code (DeadlineExceeded) assertions. The rawLog needs to match the actual production Lambda log format that triggers the escaped-quote check.
| return status.Errorf(codes.DeadlineExceeded, "lambda_transport: function timed out: %s; logSummary: %s", functionError, filteredLogs) | ||
| } | ||
| if strings.Contains(filteredLogs, `\"error\":\"context deadline exceeded\"`) { | ||
| return status.Errorf(codes.DeadlineExceeded, "lambda_transport: function timed out: %s; logSummary: %s", functionError, filteredLogs) |
There was a problem hiding this comment.
🟡 Suggestion: This pre-existing check searches filteredLogs for escaped-quote patterns (`\"error\":\"context deadline exceeded\"`), but extractMeaningfulLogLines already strips all JSON lines starting with { (line 189). If in production this pattern only appears in JSON-formatted log lines, this check would be dead code. Consider checking rawLog instead of filteredLogs, and matching the unescaped form "error":"context deadline exceeded".
General PR Review: Detect Lambda OOM crashes and return codes.ResourceExhaustedBlocking Issues: 0 | Suggestions: 0 | Threads Resolved: 0 Review SummaryThe new commits fix the Security IssuesNone found. Correctness IssuesNone found. SuggestionsNone. |
Check rawLog instead of filteredLogs for context deadline exceeded, since JSON log lines (where this error appears) get filtered out by extractMeaningfulLogLines. Also simplifies the match to plain string instead of escaped-quote pattern. Break long test strings to stay under 200-char line limit.
Summary
Runtime.ExitErrorwithsignal: killedand comparingMemory SizevsMax Memory Usedfrom the Lambda REPORT linecodes.ResourceExhaustedfor OOM crashes (enabling proper retry behavior and clearer diagnostics)Runtime.ExitErrorfromextractMeaningfulLogLines()— this line carries crash context that was previously discarded, causing the uninformative "function returned error: Unhandled; status code: 200" messageclassifyLambdaError()per the existing code comment at line 69 requesting this when a third case was addedContext
This addresses a cross-connector Lambda crash pattern where
FunctionError=Unhandlederrors provide no diagnostic information becauseextractMeaningfulLogLines()filters out the key crash indicators (Runtime.ExitError, REPORT line with memory stats). The issue has been observed in multiple connectors (baton-workday, baton-google-workspace, baton-okta).Test plan
TestIsLambdaOOMwith 8 test cases covering: empty log, normal execution, OOM via signal killed, OOM via memory match, timeout (not OOM), signal killed without ExitError, ExitError without signal killed, memory fields on separate linesTestClassifyLambdaErrorwith 6 test cases covering: timeout via payload, timeout via context deadline, OOM via signal killed, OOM via memory limit, generic error with logs, generic error without logsTestExtractMeaningfulLogLineswith new case verifyingRuntime.ExitErroris preserved in outputFixes: CXH-1553
Automated PR Notice
This PR was automatically created by c1-dev-bot as a potential implementation.
This code requires: