feat(worker): add ExternalWorker retry with exponential backoff#84
feat(worker): add ExternalWorker retry with exponential backoff#84YuminosukeSato merged 4 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds retry functionality with exponential backoff to the ExternalWorker type to handle scenarios where external Python workers may not be immediately available at startup (e.g., sidecar containers starting after the main process).
Changes:
- Introduces
ExternalWorkerOptionsstruct with configurable retry parameters (MaxRetries,RetryInterval) andConnectTimeout - Adds
NewExternalWorkerWithOptionsconstructor while maintaining backward compatibility inNewExternalWorker(no retry by default) - Implements exponential backoff retry logic in
Start()method with context cancellation support
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pkg/pyproc/worker_external.go | Adds retry configuration struct, new constructor with options, and implements exponential backoff in Start() |
| pkg/pyproc/worker_external_test.go | Adds comprehensive test coverage for default values, custom options, backward compatibility, and retry scenarios |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pkg/pyproc/worker_external.go
Outdated
| const defaultConnectTimeout = 5 * time.Second | ||
| const ( | ||
| defaultConnectTimeout = 5 * time.Second | ||
| defaultMaxRetries = 10 |
There was a problem hiding this comment.
The default of 10 retries with exponential backoff starting at 500ms could result in up to ~8 minutes of retry attempts (500ms * 2^9). Consider reducing the default MaxRetries to 5-7 for more reasonable startup times, or documenting this behavior explicitly.
|
|
||
| func TestNewExternalWorker_BackwardCompat(t *testing.T) { | ||
| w := NewExternalWorker("/tmp/compat.sock", 3*time.Second) | ||
| if w.maxRetries != 1 { |
There was a problem hiding this comment.
The backward compatibility test validates maxRetries and connectTimeout but doesn't verify that retryInterval is also set correctly. Consider adding a check for w.retryInterval != defaultRetryInterval to ensure the old constructor doesn't inadvertently enable retry behavior beyond a single attempt.
| f, err := os.CreateTemp("/tmp", "pyproc-retry-*.sock") | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| sockPath := f.Name() | ||
| _ = f.Close() | ||
| _ = os.Remove(sockPath) |
There was a problem hiding this comment.
Using /tmp directly may cause issues on systems where /tmp is not writable or has different permissions. Consider using t.TempDir() instead, which creates a temporary directory that's automatically cleaned up and works consistently across platforms.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8052a9713a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return NewExternalWorkerWithOptions(ExternalWorkerOptions{ | ||
| SocketPath: socketPath, | ||
| ConnectTimeout: connectTimeout, | ||
| MaxRetries: 1, // no retry for backward compat |
There was a problem hiding this comment.
Route external pool workers through retry-capable constructor
This hardcoded MaxRetries: 1 keeps the legacy single-attempt behavior for every NewExternalWorker call, and newExternalPool still constructs workers via NewExternalWorker (pkg/pyproc/pool.go), so the sidecar startup path in ExternalMode does not actually get the new retry/backoff behavior and will still fail immediately when the Python socket appears shortly after pool startup.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Intentional design choice. This PR adds the retry mechanism to ExternalWorker as a building block. The backward-compatible NewExternalWorker (MaxRetries=1) preserves existing behavior for all current callers including pool.go.
Integrating retry into the Pool's ExternalMode path requires adding retry configuration to PoolOptions and updating newExternalPool to use NewExternalWorkerWithOptions. That is planned as a follow-up PR to keep changes atomic and reviewable (one responsibility per PR).
Users who need retry today can construct workers directly with NewExternalWorkerWithOptions.
Stabilize sidecar mode where the Python worker may start after the Go process by adding configurable retry logic to ExternalWorker.Start(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
golangci-lint v1.64.8 flags unchecked error return from defer ln.Close() inside a goroutine. Add nolint:errcheck directive. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update newExternalPool to use NewExternalWorkerWithOptions so that ExternalMode pools benefit from the retry/backoff mechanism. Add ExternalMaxRetries and ExternalRetryInterval fields to PoolOptions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add doc comments to Start() recommending context.WithTimeout in production, and annotate defaultMaxRetries with total wait time. Fix data race in TestExternalWorker_Start_RetrySuccess by using a channel to safely pass the listener between goroutines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5e1dfa0 to
3f02f76
Compare
Summary
ExternalWorkerOptionsstruct withMaxRetriesandRetryIntervalfieldsNewExternalWorkerWithOptionsconstructor for retry-enabled external workersStart()with context cancellation supportNewExternalWorkerbackward compatible (no retry by default)Test plan
go test -v -race ./pkg/pyproc/ -run TestExternalWorkerpassesgolangci-lint run ./pkg/pyproc/clean🤖 Generated with Claude Code