Skip to content

🚧 DO NOT MERGE: HTTPS/HTTP2 on watchdog to eliminate connection pool exhaustion#3204

Open
clubanderson wants to merge 10 commits intomainfrom
feat/http2-watchdog
Open

🚧 DO NOT MERGE: HTTPS/HTTP2 on watchdog to eliminate connection pool exhaustion#3204
clubanderson wants to merge 10 commits intomainfrom
feat/http2-watchdog

Conversation

@clubanderson
Copy link
Copy Markdown
Collaborator

Summary

  • HTTP/2 multiplexing: The watchdog reverse proxy now serves HTTPS with HTTP/2. All 17+ SSE streams multiplex over a single TCP connection, eliminating the browser's 6-connection HTTP/1.1 limit that caused "Connection lost" and slow dashboard loading.
  • Self-signed TLS cert: Auto-generates an ECDSA P-256 cert for localhost/127.0.0.1/::1, stored in ./data/tls/, valid 1 year, reused if not expired.
  • SSE flush fix: Sets proxy.FlushInterval = -1 on the reverse proxy so SSE events are forwarded immediately (not buffered).
  • OAuth scheme-aware: backendURL() and startup script updated to use https:// when TLS is active.

Architecture

Browser (HTTPS/H2) --TLS--> Watchdog :8080 --HTTP/1.1--> Backend :8081
                                                          (Fiber/fasthttp)

Changes

File Change
cmd/console/watchdog.go TLS cert generation, ListenAndServeTLS, FlushInterval = -1
cmd/console/main.go --tls flag
startup-oauth.sh Pass --tls, TLS_ENABLED=true, https:// URLs
pkg/api/server.go TLSEnabled config, scheme-aware backendURL()
pkg/api/handlers/auth.go Comment clarifying OAuth callback scheme
web/vite.config.ts Dev proxy targets → https://localhost:8080 with secure: false

Test plan

  • startup-oauth.sh starts watchdog with HTTPS
  • Browser shows h2 protocol in DevTools Network tab
  • All SSE streams load concurrently without blocking
  • No more "Connection lost" snackbar from connection pool exhaustion
  • OAuth login flow works with https:// callback
  • start-dev.sh still works (no TLS, HTTP/1.1)
  • npm run build && npm run lint passes

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 22, 2026 20:40
@kubestellar-prow kubestellar-prow bot added the dco-signoff: no Indicates the PR's author has not signed the DCO. label Mar 22, 2026
@kubestellar-prow
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign clubanderson for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@netlify
Copy link
Copy Markdown

netlify bot commented Mar 22, 2026

Deploy Preview for kubestellarconsole failed. Why did it fail? →

Name Link
🔨 Latest commit 22025fc
🔍 Latest deploy log https://app.netlify.com/projects/kubestellarconsole/deploys/69c2ca2ec48091000864fd55

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hey @clubanderson — thanks for opening this PR!

🤖 This project is developed exclusively using AI coding assistants.

Please do not attempt to code anything for this project manually.
All contributions should be authored using an AI coding tool such as:

This ensures consistency in code style, architecture patterns, test coverage,
and commit quality across the entire codebase.


This is an automated message.

@kubestellar-prow kubestellar-prow bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 22, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds optional TLS (HTTPS + HTTP/2) support to the watchdog reverse proxy to multiplex many SSE streams over a single connection, reducing browser connection pool contention and improving dashboard load reliability.

Changes:

  • Watchdog can now serve HTTPS/HTTP2 with auto-generated self-signed localhost certs and immediate SSE flushing.
  • Backend config gains TLS awareness for scheme-correct OAuth callback URL generation.
  • Dev tooling/scripts updated to route through the TLS-enabled watchdog (including Vite dev proxy and OAuth startup script).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
cmd/console/watchdog.go Adds TLS cert generation/reuse, ListenAndServeTLS, and ReverseProxy.FlushInterval = -1 for SSE.
cmd/console/main.go Adds --tls flag to enable watchdog TLS mode.
pkg/api/server.go Adds TLSEnabled config and scheme-aware backendURL() used for OAuth redirect URL construction.
pkg/api/handlers/auth.go Clarifies comment about callback scheme when watchdog TLS is enabled.
startup-oauth.sh Starts watchdog with --tls, exports TLS_ENABLED=true, and updates printed/setup URLs to https://.
web/vite.config.ts Switches dev proxy targets to https://localhost:8080 / wss://localhost:8080 and disables cert verification (secure:false).

Comment on lines 205 to 209
'/api': {
target: 'http://localhost:8080',
target: 'https://localhost:8080',
changeOrigin: true,
secure: false, // accept self-signed cert from watchdog
},
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vite dev-server proxy is now hard-coded to https://localhost:8080 (and wss:// for /ws). This will break the existing start-dev.sh flow where the backend listens on plain HTTP :8080 (no watchdog/TLS), because the dev server will attempt a TLS handshake to an HTTP server. Make the proxy scheme conditional (e.g., based on process.env.TLS_ENABLED or a dedicated VITE_* env var) so HTTP is used by default and HTTPS is only used when the watchdog is started with --tls (and only set secure:false in that case).

Copilot uses AI. Check for mistakes.
Comment on lines +294 to +298
// Check if existing cert/key are present and not expired
if certValid(certFile) {
log.Printf("[Watchdog] Using existing TLS cert: %s", certFile)
return certFile, keyFile, nil
}
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensureSelfSignedCert() treats an existing cert as usable based solely on certValid(certFile), but it never checks that the corresponding key file exists and matches the cert. If localhost.key is missing/corrupted (or doesn't match the cert), ListenAndServeTLS will fail at runtime. Consider validating the key pair (e.g., tls.LoadX509KeyPair) and regenerating if either file is missing/invalid/mismatched, rather than checking only the cert expiration.

Copilot uses AI. Check for mistakes.
@clubanderson
Copy link
Copy Markdown
Collaborator Author

🔄 Auto-Applying Copilot Code Review

Copilot code review found 0 code suggestion(s) and 2 general comment(s).

Also address these general comments:

  • web/vite.config.ts (line 209): Vite dev-server proxy is now hard-coded to https://localhost:8080 (and wss:// for /ws). This will break the existing
  • cmd/console/watchdog.go (line 298): ensureSelfSignedCert() treats an existing cert as usable based solely on certValid(certFile), but it never checks th

Push all fixes in a single commit. Run cd web && npm run build && npm run lint before committing.


Auto-generated by copilot-review-apply workflow.

@github-actions github-actions bot mentioned this pull request Mar 22, 2026
@clubanderson clubanderson changed the title ✨ feat: HTTPS/HTTP2 on watchdog to eliminate connection pool exhaustion 🚧 DO NOT MERGE: HTTPS/HTTP2 on watchdog to eliminate connection pool exhaustion Mar 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

❌ PR Title Verification Failed

Your PR title does not follow the required format.

Current title: 🚧 DO NOT MERGE: HTTPS/HTTP2 on watchdog to eliminate connection pool exhaustion

Required Format

PR titles must start with one of these emoji prefixes:

Emoji Meaning
⚠️ Breaking change
Non-breaking feature
🐛 Patch fix / Bug fix
📖 Documentation
🚀 Release
🌱 Infra/Tests/Other

How to Fix

Edit your PR title to start with the appropriate emoji. For example:

  • ✨ Add new feature for user authentication
  • 🐛 Fix crash when loading empty config
  • 📖 Update installation guide

You can edit the title by clicking the Edit button next to your PR title.


This comment was automatically posted by the PR Title Verifier workflow.

@clubanderson clubanderson added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 22, 2026
@github-actions github-actions bot mentioned this pull request Mar 23, 2026
@kubestellar-prow kubestellar-prow bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 23, 2026
andy anderson and others added 10 commits March 24, 2026 18:19
Solve browser HTTP/1.1 connection pool exhaustion (6 connections per
origin, 17+ SSE streams) by enabling HTTP/2 multiplexing on the
watchdog reverse proxy.

Architecture: Browser --HTTPS/H2--> Watchdog :8080 --HTTP/1.1--> Backend :8081

- Add --tls flag to watchdog that auto-generates a self-signed ECDSA
  P-256 cert for localhost/127.0.0.1/::1 (stored in ./data/tls/,
  already gitignored, reused if not expired)
- Use Go's built-in HTTP/2 via ListenAndServeTLS on the watchdog
- Set proxy.FlushInterval = -1 for immediate SSE event forwarding
- Add TLSEnabled to server Config, read from TLS_ENABLED env var
- Make backendURL() scheme-aware (https:// when TLS active)
- Update startup-oauth.sh: pass --tls, export TLS_ENABLED=true,
  update FRONTEND_URL and displayed URLs to https://
- Update Vite dev proxy targets to https://localhost:8080 with
  secure: false for self-signed cert acceptance
- Print user-friendly note about browser cert warning on first visit

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add -k flag to curl calls (skip cert verification for self-signed cert)
and update oauth smoke test URLs from http:// to https://.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two fixes from code review:

1. vite.config.ts: Proxy targets are now conditional on TLS_ENABLED env
   var. Without this, start-dev.sh (no TLS, no watchdog) would fail
   because Vite would attempt HTTPS against a plain HTTP server.

2. watchdog.go: certValid() now uses tls.LoadX509KeyPair() to validate
   that both cert and key files exist and match before reusing them.
   Previously only the cert was checked — a missing/corrupted key file
   would cause a confusing runtime error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
startup-oauth.sh now prints a clear message when TLS_ENABLED=true,
telling developers to add https://localhost:8080/auth/github/callback
to their GitHub OAuth App's callback URLs. Without this, existing
developers pulling the update would get OAuth failures because their
app only has the http:// callback URL configured.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…meout, wire isRefreshing

Three fixes for dashboard card loading issues:

1. Early card chunk prefetch: Fire dynamic imports at module scope in main.tsx
   before React initializes, eliminating the skeleton flash caused by the
   3-hop waterfall (auth → cardRegistry → card chunk).

2. Device tracker timeout: Increase per-cluster timeout from 5s to 15s for
   Hardware Health card — CoreWeave clusters need >5s to respond.

3. DeploymentStatus: Wire isRefreshing to useCardLoadingState so CardWrapper
   shows refresh spinner during background data updates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…t 8081

1. CardDataContext: Show skeleton instead of "No deployments found" when
   refreshing with stale empty cache. The isRefreshingWithNoData flag
   keeps skeleton visible until data arrives or refresh completes.

2. Vite proxy: Target BACKEND_LISTEN_PORT (default 8081) directly instead
   of the TLS watchdog on 8080, fixing 400 errors from HTTP→HTTPS mismatch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When IndexedDB has cached data that's equivalent to initialData (empty
arrays, zeroed objects), treat it as "no cache" so isLoading=true and
cards show skeleton instead of "No deployments found" / "No events".
Real cached data (non-empty) still shows instantly with refresh spinner.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace sequential kubectlProxy.getEvents() (WebSocket-based kubectl)
with parallel HTTP requests to /events agent endpoint. This matches
the deployments pattern — parallel fetch, progressive onProgress per
cluster, 5s timeout per cluster.

Fixes Event Stream and Events Timeline cards failing/stuck on skeleton
when WebSocket proxy is unavailable or slow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: andy anderson <andy@andys-MacBook-Pro.local>
- Remove duplicate watchdogStageFile constant declaration
- Remove unused KagentDiscoveredTool import
- Fix type narrowing in prefetchCardChunksEarly filter

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kubestellar-prow
Copy link
Copy Markdown
Contributor

Thanks for your pull request. Before we can look at it, you'll need to add a 'DCO signoff' to your commits.

📝 Please follow instructions in the contributing guide to update your commits with the DCO

Full details of the Developer Certificate of Origin can be found at developercertificate.org.

The list of commits missing DCO signoff:

  • 047dc3b feat: add HTTPS/HTTP2 support to watchdog reverse proxy
  • 53a6c8d fix: update smoke test for HTTPS watchdog
  • eae24f5 fix: make Vite proxy TLS-conditional and validate cert+key pair
  • 88a7222 fix: warn developers to update OAuth callback URL when TLS is enabled
  • e589d6e merge: resolve conflicts with main (watchdog stage file + version flag)
  • 504a499 fix: improve card loading UX — prefetch chunks, fix device tracker timeout, wire isRefreshing
  • e952b66 fix: skeleton on refresh with empty cache + proxy Vite to backend port 8081
  • 2c96db4 fix: treat empty cached data as no cache — show skeleton not empty state
  • 22025fc fix: remove duplicate constant and fix TS errors after rebase
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@kubestellar-prow kubestellar-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 24, 2026
@kubestellar-prow
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kubestellar-prow kubestellar-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: no Indicates the PR's author has not signed the DCO. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants