Skip to content

feat(worker): add ExternalWorker retry with exponential backoff#84

Merged
YuminosukeSato merged 4 commits intomainfrom
feat/v0.6.2-sidecar-retry
Feb 6, 2026
Merged

feat(worker): add ExternalWorker retry with exponential backoff#84
YuminosukeSato merged 4 commits intomainfrom
feat/v0.6.2-sidecar-retry

Conversation

@YuminosukeSato
Copy link
Owner

Summary

  • Add ExternalWorkerOptions struct with MaxRetries and RetryInterval fields
  • Add NewExternalWorkerWithOptions constructor for retry-enabled external workers
  • Implement exponential backoff retry in Start() with context cancellation support
  • Keep NewExternalWorker backward compatible (no retry by default)
  • Stabilizes sidecar mode where the Python worker may start after the Go process

Test plan

  • go test -v -race ./pkg/pyproc/ -run TestExternalWorker passes
  • Existing tests still pass with old constructor
  • New retry tests: success after delay, all retries fail, context cancellation
  • golangci-lint run ./pkg/pyproc/ clean

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings February 6, 2026 13:44
@github-actions github-actions bot added enhancement New feature or request lang/go Go code changes labels Feb 6, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds retry functionality with exponential backoff to the ExternalWorker type to handle scenarios where external Python workers may not be immediately available at startup (e.g., sidecar containers starting after the main process).

Changes:

  • Introduces ExternalWorkerOptions struct with configurable retry parameters (MaxRetries, RetryInterval) and ConnectTimeout
  • Adds NewExternalWorkerWithOptions constructor while maintaining backward compatibility in NewExternalWorker (no retry by default)
  • Implements exponential backoff retry logic in Start() method with context cancellation support

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
pkg/pyproc/worker_external.go Adds retry configuration struct, new constructor with options, and implements exponential backoff in Start()
pkg/pyproc/worker_external_test.go Adds comprehensive test coverage for default values, custom options, backward compatibility, and retry scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

const defaultConnectTimeout = 5 * time.Second
const (
defaultConnectTimeout = 5 * time.Second
defaultMaxRetries = 10
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default of 10 retries with exponential backoff starting at 500ms could result in up to ~8 minutes of retry attempts (500ms * 2^9). Consider reducing the default MaxRetries to 5-7 for more reasonable startup times, or documenting this behavior explicitly.

Copilot uses AI. Check for mistakes.

func TestNewExternalWorker_BackwardCompat(t *testing.T) {
w := NewExternalWorker("/tmp/compat.sock", 3*time.Second)
if w.maxRetries != 1 {
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backward compatibility test validates maxRetries and connectTimeout but doesn't verify that retryInterval is also set correctly. Consider adding a check for w.retryInterval != defaultRetryInterval to ensure the old constructor doesn't inadvertently enable retry behavior beyond a single attempt.

Copilot uses AI. Check for mistakes.
Comment on lines +196 to +202
f, err := os.CreateTemp("/tmp", "pyproc-retry-*.sock")
if err != nil {
t.Fatal(err)
}
sockPath := f.Name()
_ = f.Close()
_ = os.Remove(sockPath)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using /tmp directly may cause issues on systems where /tmp is not writable or has different permissions. Consider using t.TempDir() instead, which creates a temporary directory that's automatically cleaned up and works consistently across platforms.

Copilot uses AI. Check for mistakes.
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8052a9713a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return NewExternalWorkerWithOptions(ExternalWorkerOptions{
SocketPath: socketPath,
ConnectTimeout: connectTimeout,
MaxRetries: 1, // no retry for backward compat

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Route external pool workers through retry-capable constructor

This hardcoded MaxRetries: 1 keeps the legacy single-attempt behavior for every NewExternalWorker call, and newExternalPool still constructs workers via NewExternalWorker (pkg/pyproc/pool.go), so the sidecar startup path in ExternalMode does not actually get the new retry/backoff behavior and will still fail immediately when the Python socket appears shortly after pool startup.

Useful? React with 👍 / 👎.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional design choice. This PR adds the retry mechanism to ExternalWorker as a building block. The backward-compatible NewExternalWorker (MaxRetries=1) preserves existing behavior for all current callers including pool.go.

Integrating retry into the Pool's ExternalMode path requires adding retry configuration to PoolOptions and updating newExternalPool to use NewExternalWorkerWithOptions. That is planned as a follow-up PR to keep changes atomic and reviewable (one responsibility per PR).

Users who need retry today can construct workers directly with NewExternalWorkerWithOptions.

@github-actions github-actions bot added the area/pool Worker pool management label Feb 6, 2026
YuminosukeSato and others added 4 commits February 6, 2026 23:42
Stabilize sidecar mode where the Python worker may start after the Go
process by adding configurable retry logic to ExternalWorker.Start().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
golangci-lint v1.64.8 flags unchecked error return from defer ln.Close()
inside a goroutine. Add nolint:errcheck directive.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update newExternalPool to use NewExternalWorkerWithOptions so that
ExternalMode pools benefit from the retry/backoff mechanism. Add
ExternalMaxRetries and ExternalRetryInterval fields to PoolOptions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add doc comments to Start() recommending context.WithTimeout in
production, and annotate defaultMaxRetries with total wait time.
Fix data race in TestExternalWorker_Start_RetrySuccess by using a
channel to safely pass the listener between goroutines.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@YuminosukeSato YuminosukeSato force-pushed the feat/v0.6.2-sidecar-retry branch from 5e1dfa0 to 3f02f76 Compare February 6, 2026 14:43
@YuminosukeSato YuminosukeSato merged commit a7f45bb into main Feb 6, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/pool Worker pool management enhancement New feature or request lang/go Go code changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants