Skip to content

fix: BGP/Anycast stability improvements#130

Merged
poyrazK merged 3 commits intomainfrom
fix/bgp-anycast-stability
May 2, 2026
Merged

fix: BGP/Anycast stability improvements#130
poyrazK merged 3 commits intomainfrom
fix/bgp-anycast-stability

Conversation

@poyrazK
Copy link
Copy Markdown
Owner

@poyrazK poyrazK commented May 2, 2026

Summary

Implements 6 BGP/Anycast stability improvements for production DNS anycast deployments:

Changes

internal/adapters/routing/gobgp.go

  • Serve() goroutine now tracked in WaitGroup for graceful Stop()
  • SetConfig() protected by mutex for concurrent access
  • Added monitorPeer() goroutine for reconnection mechanism
  • Stop() waits for all goroutines via wg.Wait() before closing BGP server
  • Added localASN, peerASN, peerIP fields for reconnection use

internal/core/services/anycast_manager.go

  • Added sync.Mutex to protect check-then-act on VIP binding
  • Added debounce timer to delay health state transitions (prevents VIP flapping)
  • Renamed announce/withdraw to announceLocked/withdrawLocked (caller must hold mutex)
  • TriggerCheck() now locks mutex and uses debounce logic
  • Start() withdraws on shutdown

cmd/clouddns/main.go

  • NewAnycastManager call updated to pass debounce duration (5s default)

Test files updated

  • anycast_manager_test.go: updated constructor calls to 7 args, announceannounceLocked, withdrawwithdrawLocked
  • anycast_test.go: updated constructor call to 7 args

Test plan

  • go build ./...
  • go test ./... (all pass)
  • go test -race -run "Anycast" ./internal/core/services/... ./internal/dns/server/... (race detector clean)
  • Verified: concurrent BindVIP calls do not duplicate, rapid health transitions are debounced, shutdown withdraws routes

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 2, 2026

Warning

Rate limit exceeded

@poyrazK has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 12 minutes and 58 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1bb7f827-04f9-4b16-9dc3-369bcbf58772

📥 Commits

Reviewing files that changed from the base of the PR and between 63322f8 and 2083c83.

📒 Files selected for processing (6)
  • cmd/clouddns/main.go
  • internal/adapters/routing/gobgp.go
  • internal/adapters/routing/gobgp_test.go
  • internal/core/services/anycast_manager.go
  • internal/core/services/anycast_manager_test.go
  • internal/dns/server/anycast_test.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/bgp-anycast-stability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 12 minutes and 58 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

poyrazK added a commit that referenced this pull request May 2, 2026
- A: Remove dead pendingState field from AnycastManager
- B: Fix timer callback mutex re-entry deadlock via channel-based debounce
       Timer now signals a debounceCh read by Start() loop instead of
       calling announceLocked directly (which would re-enter the non-recursive mutex)
- C: Add trailing newline to gobgp.go
- D: Add ORIGIN and NEXT_HOP path attributes to Withdraw (RFC 4271)
- E: Track announcedVIPs map; Stop() now withdraws all active routes
- F: Implement monitorPeer reconnection — periodic peer health check
       via re-add attempt; on failure, restarts BGP server and re-adds peer

Fixes: #112 (Stop withdraw), #113 (Withdraw attributes), #111 (peer reconnection)
- GoBGPAdapter: track Serve() goroutine in WaitGroup for graceful shutdown
- GoBGPAdapter: add mutex for concurrent access to config fields
- GoBGPAdapter: add monitorPeer goroutine for reconnection mechanism
- GoBGPAdapter: Stop() now waits for all goroutines and withdraws active routes
- GoBGPAdapter: Withdraw now sends ORIGIN + NEXT_HOP path attributes (RFC 4271)
- AnycastManager: add mutex to protect check-then-act on VIP binding
- AnycastManager: add channel-based debounce to prevent VIP flapping on health transitions
- AnycastManager: rename announce/withdraw to announceLocked/withdrawLocked (internal)
- Add ANYCAST_DEBOUNCE env var support (defaults to 5s)
- Update NewAnycastManager signature: add debounceDuration parameter
- Fix goroutine leak in monitorPeer reconnection error path
- Add trailing newlines to gobgp.go and anycast_manager_test.go

Fixes: #93, #110, #116, #112, #113, #111
@poyrazK poyrazK force-pushed the fix/bgp-anycast-stability branch from aa91b2b to 70dd570 Compare May 2, 2026 18:51
poyrazK added 2 commits May 2, 2026 22:02
- Fix stale "exponential backoff" comment in monitorPeer (no backoff implemented)
- Add TestGoBGPAdapter_StopWithdrawsActiveRoutes to verify Stop() clears
  the announcedVIPs map
- gobgp.go: use parent context (ctx) for WithTimeout instead of context.Background() (contextcheck)
- gobgp.go: check error return value of DeletePath in Stop() (errcheck)
- anycast_manager.go: remove empty block from timer stop (revive)
Copy link
Copy Markdown
Owner Author

@poyrazK poyrazK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay to merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant