Skip to content

feat: resilient Watch RPC with reconnect and resourceVersion tracking#7188

Merged
AdilFayyaz merged 2 commits intov2from
adil/apps-watch-reconnect
Apr 21, 2026
Merged

feat: resilient Watch RPC with reconnect and resourceVersion tracking#7188
AdilFayyaz merged 2 commits intov2from
adil/apps-watch-reconnect

Conversation

@AdilFayyaz
Copy link
Copy Markdown

@AdilFayyaz AdilFayyaz commented Apr 9, 2026

Tracking issue

Depends on: #7176, #7175, #7166

Why are the changes needed?

K8s watches time out every ~5 minutes by default. The previous implementation closed the client's stream silently on disconnect with no reconnect, making Watch unreliable for long-lived connections.

What changes were proposed in this pull request?

  • Replaced the single-shot watch goroutine in AppK8sClient.Watch() with a reconnect loop (watchLoop + drainWatcher) that transparently reopens the K8s watch on unexpected closes or Error events
  • Added resourceVersion tracking — extracted from every Added/Modified/Deleted/Bookmark event and passed to the next watch call, ensuring no events are missed or replayed across reconnects
  • Added exponential backoff (1s → 2s → 4s → 30s max) between reconnect attempts; backoff resets on any successful event or Bookmark
  • K8s Error events are now logged with code/reason/message instead of being silently dropped

How was this patch tested?

  • go test ./app/internal/k8s/... -run TestWatch — 6 new tests covering:
    channel close reconnect, Error event reconnect, Bookmark RV propagation to next watch call, exponential backoff timing, ctx cancel stops the goroutine, initial watch error surfaces synchronously
  • go test ./app/... — full suite passes with no regressions

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Comment thread app/internal/k8s/app_client.go Outdated
Comment thread app/internal/k8s/app_client.go Outdated
Comment thread app/internal/k8s/app_client.go Outdated
@AdilFayyaz AdilFayyaz force-pushed the adil/apps-app-service branch from 5ab4209 to 6e12f57 Compare April 20, 2026 19:49
Base automatically changed from adil/apps-app-service to v2 April 21, 2026 17:18
Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
@AdilFayyaz AdilFayyaz force-pushed the adil/apps-watch-reconnect branch from 9d4ac19 to c534e79 Compare April 21, 2026 20:24
@AdilFayyaz AdilFayyaz requested a review from pingsutw April 21, 2026 20:57
@AdilFayyaz AdilFayyaz merged commit ab07d54 into v2 Apr 21, 2026
20 checks passed
@AdilFayyaz AdilFayyaz deleted the adil/apps-watch-reconnect branch April 21, 2026 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

added Merged changes that add new functionality flyte2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants