Add retry transport for transient Google API errors#84
Open
c1-dev-bot[bot] wants to merge 2 commits into
Open
Conversation
…reset) The Google Admin SDK's generated code uses gensupport.SendRequest (without retry), so transient network errors like EOF and connection reset propagate directly as sync failures. This is the root cause of intermittent sync failures reported against the GCP/Google Workspace connector. This change: - Adds a retryTransport that wraps the HTTP transport used by Google API clients with exponential backoff retry logic for transient errors (EOF, connection reset, broken pipe, etc.) - Updates wrapGoogleApiErrorWithContext to wrap transient network errors as gRPC Unavailable status, so the baton framework treats them as retryable even if the retry transport exhausts its attempts - Includes comprehensive tests for transient error detection and retry behavior
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
retryTransportHTTP round-tripper that wraps the Google API client's transport with exponential backoff retry logic for transient network errors (EOF, connection reset, broken pipe, etc.)wrapGoogleApiErrorWithContextto wrap transient network errors as gRPCUnavailablestatus, so the baton framework treats them as retryable even after the retry transport exhausts its attemptsRoot Cause
The Google Admin SDK's generated code uses
gensupport.SendRequest(without retry), notgensupport.SendRequestWithRetry. This means transient network errors likeio.EOFand connection resets propagate directly as unrecoverable sync failures. The existingwrapGoogleApiErrorWithContexthelper only handled*googleapi.Error(HTTP errors with status codes) and returned raw network errors unchanged, so the baton framework couldn't identify them as retryable.Changes
retry_transport.go— NewretryTransportwrapping the outermost HTTP transport (around oauth2) with up to 3 retries and exponential backoff with jitter. Handlesio.EOF,io.ErrUnexpectedEOF,ECONNRESET,ECONNREFUSED,EPIPE,net.ErrClosed, and other temporary network errors.connector.go— Wraps the Google API HTTP client's transport withnewRetryTransport()so all Google Admin SDK calls benefit from retry logic.error_helpers.go—wrapGoogleApiErrorWithContextnow detects transient network errors and wraps them ascodes.UnavailablegRPC status, providing a safety net if retries are exhausted.retry_transport_test.go— Tests for transient error detection and retry behavior including EOF, connection reset, retry exhaustion, and non-transient error passthrough.Test plan
go test ./...)isTransientErrorcover EOF, connection reset, broken pipe, wrapped errorsretryTransportverify retry on EOF, no retry on non-transient, retry exhaustion, connection reset recovery, success on first attemptgo vet ./...passes cleanlyAutomated PR Notice
This PR was automatically created by c1-dev-bot as a potential implementation.
This code requires: