Skip to content

Remove kube-rbac-proxy and replace with native metrics auth#73

Open
shreyabiradar07 wants to merge 11 commits intokruize:mvp_demofrom
shreyabiradar07:remove-kube-rbac-proxy
Open

Remove kube-rbac-proxy and replace with native metrics auth#73
shreyabiradar07 wants to merge 11 commits intokruize:mvp_demofrom
shreyabiradar07:remove-kube-rbac-proxy

Conversation

@shreyabiradar07
Copy link
Copy Markdown
Contributor

@shreyabiradar07 shreyabiradar07 commented Mar 9, 2026

This PR using native metrics authentication integrates TokenReview and SubjectAccessReview directly into the operator's Go binary via controller-runtime. This secures the /metrics endpoint with HTTPS and RBAC internally, removing the deprecated external sidecar container like kube-rbac-proxy.

Docker image: quay.io/shbirada/kruize_operator:remove-kube-rbac-proxy

Summary by Sourcery

Replace the external kube-rbac-proxy sidecar with native controller-runtime metrics authentication and secure serving for the /metrics endpoint.

Bug Fixes:

  • Ensure the metrics endpoint is protected by Kubernetes TokenReview and SubjectAccessReview RBAC flows instead of relying on a deprecated sidecar proxy.

Enhancements:

  • Introduce shared utilities for configuring TLS and metrics server options, including HTTP/2 control and secure defaults for metrics binding.
  • Update manager configuration and RBAC roles to support secure HTTPS metrics on port 8443 with native authz/authn filters.
  • Add controller tests to verify the secure metrics endpoint is reachable and rejects unauthorized or invalidly authenticated requests.

CI:

  • Simplify CI configuration by removing kube-rbac-proxy-specific patches and resource adjustments from the PR check workflow.

Signed-off-by: Shreya Biradar <shbirada@ibm.com>
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai bot commented Mar 9, 2026

Reviewer's Guide

Replaces the external kube-rbac-proxy sidecar with controller-runtime’s native secure metrics server configuration, centralizing TLS/HTTP2 handling, locking metrics to HTTPS on 8443, granting TokenReview/SAR RBAC, updating tests and manifests, and simplifying CI and kustomize defaults accordingly.

Sequence diagram for authenticated metrics scrape via controller_runtime filters

sequenceDiagram
    actor Admin
    participant Prometheus
    participant KubeAPIServer
    participant Manager as KruizeOperatorManager
    participant Authn as TokenReviewAPI
    participant Authz as SubjectAccessReviewAPI

    Admin->>Prometheus: Configure scrape target <svc>:8443/metrics

    Prometheus->>KubeAPIServer: HTTPS GET /api/v1/namespaces/ns/services/svc:8443/proxy/metrics (Bearer token)
    KubeAPIServer->>Manager: Forward HTTPS /metrics request

    Manager->>Authn: Create TokenReview
    Authn-->>Manager: TokenReview status (authenticated/unauthenticated)

    alt token_invalid_or_unauthenticated
        Manager-->>Prometheus: 401 Unauthorized
    else token_authenticated
        Manager->>Authz: Create SubjectAccessReview
        Authz-->>Manager: SAR status (allowed/denied)
        alt access_denied
            Manager-->>Prometheus: 403 Forbidden
        else access_allowed
            Manager-->>Prometheus: 200 OK + metrics
        end
    end
Loading

Class diagram for metrics TLS and authentication utilities

classDiagram
    class Utils {
        <<package>>
    }

    class MetricsOptionsUtil {
        <<utility>>
        +string DefaultMetricsPort
        +string DefaultMetricsAddr
        +string LocalMetricsAddr
        +GetTLSOpts(enableHTTP2 bool) []func(tls_Config)
        +GetMetricsOptions(addr string, secure bool, enableHTTP2 bool) metricsserver_Options
    }

    class Main {
        <<executable>>
        -string metricsAddr
        -string probeAddr
        -bool enableLeaderElection
        -bool secureMetrics
        -bool enableHTTP2
        +main()
    }

    class MetricsServerOptions {
        <<from_controller_runtime>>
        +string BindAddress
        +bool SecureServing
        +FilterProvider FilterProvider
        +[]func(tls_Config) TLSOpts
    }

    class Filters {
        <<from_controller_runtime>>
        +WithAuthenticationAndAuthorization() FilterProvider
    }

    Utils <|-- MetricsOptionsUtil
    Main ..> MetricsOptionsUtil : uses
    MetricsOptionsUtil ..> MetricsServerOptions : constructs
    MetricsOptionsUtil ..> Filters : uses

    %% Represent tls.Config and metricsserver.Options without methods
    class tls_Config {
        <<struct>>
        +[]string NextProtos
    }

    class metricsserver_Options {
        <<alias_of_MetricsServerOptions>>
    }

    MetricsServerOptions <|-- metricsserver_Options
Loading

File-Level Changes

Change Details Files
Introduce shared utilities for metrics and TLS configuration and use them from main and tests.
  • Add utils package helpers to compute TLS options and metrics server options, including authentication/authorization filters when secure
  • Change main manager setup to use the new metrics options helper and shared TLS options for both webhook and metrics servers
  • Expose default and local metrics addresses and ports for production and test usage
internal/utils/metrics_options.go
cmd/main.go
internal/controller/suite_test.go
Secure the metrics endpoint with HTTPS and native Kubernetes authentication/authorization.
  • Default the metrics-bind-address flag to a secure 8443 address and enable secure metrics by default
  • Configure controller-runtime metrics FilterProvider to use authentication and authorization filters when secure
  • Start a metrics-enabled manager in envtest to exercise the secure metrics endpoint
cmd/main.go
internal/utils/metrics_options.go
internal/controller/suite_test.go
config/manager/manager.yaml
Add RBAC for TokenReview and SubjectAccessReview required by native metrics auth.
  • Grant create on authentication.k8s.io tokenreviews in controller RBAC role
  • Grant create on authorization.k8s.io subjectaccessreviews in controller RBAC role
  • Mirror these RBAC rules in kubebuilder RBAC markers on the reconciler type
config/rbac/role.yaml
internal/controller/kruize_controller.go
Add integration-style tests for metrics auth behavior under envtest and real cluster modes.
  • Add a Metrics Auth Integration Ginkgo Describe block that verifies unauthorized access is rejected, invalid tokens fail, and the metrics server is reachable
  • Use an HTTPS client with InsecureSkipVerify to connect to the self-signed metrics endpoint on the local metrics address
  • Gate some expectations on KRUIZE_TEST_MODE to distinguish envtest from real-cluster behaviors
internal/controller/kruize_controller_test.go
internal/controller/suite_test.go
Remove kube-rbac-proxy usage from manifests and CI and rely on native auth instead.
  • Comment out auth_proxy-related resources from the RBAC kustomization so they are no longer applied by default
  • Delete the manager_auth_proxy_patch and switch the default kustomization comments to point to controller-runtime-based protection
  • Simplify the PR CI workflow by removing kube-rbac-proxy resource copying, patching, and output printing
config/rbac/kustomization.yaml
config/default/kustomization.yaml
config/default/manager_auth_proxy_patch.yaml
.github/workflows/pr-check.yaml
Adjust manager deployment to expose secure metrics and health probe ports explicitly.
  • Change manager container metrics bind address to :8443 to serve HTTPS metrics externally
  • Expose metrics and health-probe ports as named container ports in the manager deployment manifest
config/manager/manager.yaml
Pull in additional indirect dependencies required for the metrics filters and updated controller-runtime stack.
  • Add Kubernetes apiserver/component-base and controller-runtime-related libraries needed for authentication/authorization filters
  • Include httpsnoop, OpenTelemetry, CEL, grpc-gateway, and related transitive dependencies in go.mod and go.sum
go.mod
go.sum

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The RBAC for tokenreviews/subjectaccessreviews is added both via kubebuilder markers and by hand in config/rbac/role.yaml; since role.yaml is generated, consider removing the manual edits and regenerating manifests so RBAC stays source-of-truth in the markers.
  • The go.mod change pulls in a large number of new indirect dependencies (CEL, gRPC, OTel, etc.); it would be good to verify they are actually required for the metrics auth change and run go mod tidy to drop anything unnecessary and keep the dependency surface minimal.
  • In config/rbac/kustomization.yaml and config/default/kustomization.yaml, the comments still talk about enabling/disabling the auth proxy even though kube-rbac-proxy has been removed; updating these comments to describe the new native metrics auth behavior will make the configuration more self-explanatory.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The RBAC for tokenreviews/subjectaccessreviews is added both via kubebuilder markers and by hand in config/rbac/role.yaml; since role.yaml is generated, consider removing the manual edits and regenerating manifests so RBAC stays source-of-truth in the markers.
- The go.mod change pulls in a large number of new indirect dependencies (CEL, gRPC, OTel, etc.); it would be good to verify they are actually required for the metrics auth change and run `go mod tidy` to drop anything unnecessary and keep the dependency surface minimal.
- In config/rbac/kustomization.yaml and config/default/kustomization.yaml, the comments still talk about enabling/disabling the auth proxy even though kube-rbac-proxy has been removed; updating these comments to describe the new native metrics auth behavior will make the configuration more self-explanatory.

## Individual Comments

### Comment 1
<location path="config/default/kustomization.yaml" line_range="31-33" />
<code_context>
 # Protect the /metrics endpoint by putting it behind auth.
 # If you want your controller-manager to expose the /metrics
 # endpoint w/o any authn/z, please comment the following line.
-- path: manager_auth_proxy_patch.yaml

</code_context>
<issue_to_address>
**suggestion:** Update the metrics auth comment now that the auth proxy patch line has been removed

This comment still instructs readers to “comment the following line,” but the `manager_auth_proxy_patch.yaml` entry has been removed rather than commented out. To avoid confusion, please update or remove this block so it matches the current setup, where metrics are protected via controller-runtime filters instead of the auth proxy patch.

```suggestion
# /metrics is protected by controller-runtime authentication/authorization filters.
# If you want your controller-manager to expose the /metrics endpoint without authn/z,
# update the manager configuration (e.g. flags or filters) instead of using an auth proxy patch.
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Signed-off-by: Shreya Biradar <shbirada@ibm.com>
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
@shreyabiradar07 shreyabiradar07 self-assigned this Mar 11, 2026
@shreyabiradar07 shreyabiradar07 moved this to In Progress in Monitoring Mar 11, 2026
@shreyabiradar07 shreyabiradar07 added this to the Kruize Operator 0.0.5 milestone Mar 11, 2026
@shreyabiradar07 shreyabiradar07 added enhancement New feature or request Operator labels Mar 11, 2026
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
@shreyabiradar07
Copy link
Copy Markdown
Contributor Author

@sourcery-ai review

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The envtest suite now hardcodes a separate manager with metrics on 127.0.0.1:8443; consider refactoring to share the same metrics configuration/options construction as cmd/main.go to avoid drift between test and production setups.
  • A large number of new indirect dependencies were added to go.mod; it may be worth checking whether they all stem from an intentional library upgrade (e.g., controller-runtime) and pruning any that are no longer required to keep the module graph minimal.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The envtest suite now hardcodes a separate manager with metrics on 127.0.0.1:8443; consider refactoring to share the same metrics configuration/options construction as cmd/main.go to avoid drift between test and production setups.
- A large number of new indirect dependencies were added to go.mod; it may be worth checking whether they all stem from an intentional library upgrade (e.g., controller-runtime) and pruning any that are no longer required to keep the module graph minimal.

## Individual Comments

### Comment 1
<location path="internal/controller/kruize_controller_test.go" line_range="810" />
<code_context>
+        })
+
+        Context("when accessing the metrics endpoint", func() {
+            It("should reject unauthorized requests with 401 or 403", func() {
+                By("calling the metrics endpoint without an Authorization header")
+                resp, err := client.Get("https://127.0.0.1:8443/metrics")
</code_context>
<issue_to_address>
**suggestion (testing):** Avoid potential flakiness by waiting for the metrics server to be ready before asserting on the response

This test calls `https://127.0.0.1:8443/metrics` and asserts immediately, but the manager/metrics server are started asynchronously in `BeforeSuite`, so there’s a race where the server may not be listening yet, causing intermittent connection errors. To avoid flakiness, wrap the request and status-code assertions in an `Eventually` that retries until a successful connection (or timeout).
</issue_to_address>

### Comment 2
<location path="internal/controller/kruize_controller_test.go" line_range="836" />
<code_context>
+                defer resp.Body.Close()
+
+                By("verifying the metrics server intercepted the request")
+                // Accepting 401/403 (Real Cluster) or 500 (EnvTest limitation)
+                // All three prove the filter is protecting the endpoint.
+                Expect(resp.StatusCode).To(SatisfyAny(
</code_context>
<issue_to_address>
**suggestion (testing):** The acceptance of HTTP 500 as a passing condition may make the test less meaningful

Treating `500 Internal Server Error` as success weakens the assertion, since it might reflect a real bug rather than just an envtest limitation. If envtest always returns 500 for invalid tokens, consider detecting that mode explicitly (e.g., build tag or env var) and branching expectations, or at least asserting something envtest-specific in the response (like a known error message) so genuine internal failures aren’t silently accepted.

Suggested implementation:

```golang
    "net/http"
    "os"

```

```golang
                By("verifying the metrics server intercepted the request")

                if os.Getenv("ENVTEST_MODE") == "true" {
                    // In envtest mode we explicitly expect the known 500 behavior,
                    // rather than treating any 500 as success.
                    Expect(resp.StatusCode).To(
                        Equal(http.StatusInternalServerError),
                        "Expected 500 from envtest when handling unauthenticated request",
                    )
                } else {
                    Expect(resp.StatusCode).To(SatisfyAny(
                        Equal(http.StatusUnauthorized),
                        Equal(http.StatusForbidden),
                    ), "Expected 401 or 403 for unauthenticated request")
                }

```

To fully implement the intent of your review comment, you should make a similar change in the `"should fail when an invalid token is provided"` test: instead of accepting 500 along with 401/403, branch on the same `ENVTEST_MODE` (or a better-suited name if your project already uses one) and:
1. Expect exactly `http.StatusInternalServerError` in envtest mode.
2. Otherwise, expect 401/403 (or the appropriate non-envtest statuses for invalid tokens).
This ensures 500 is only treated as success when you're explicitly running under the envtest behavior, and avoids silently passing on genuine internal server errors.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Signed-off-by: Shreya Biradar <shbirada@ibm.com>
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
@shreyabiradar07
Copy link
Copy Markdown
Contributor Author

@sourcery-ai review

Signed-off-by: Shreya Biradar <shbirada@ibm.com>
Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The metrics port and bind address are now hard-coded as 8443 in several places (flags in main, manager.yaml args/ports, envtest manager, and metrics tests); consider centralizing this value in one place or using shared constants to avoid config drift in future changes.
  • GetMetricsOptions always configures FilterProvider even when secureServing is disabled; if you intend to support an unprotected or plain HTTP metrics endpoint, you may want to make the filters conditional on the secure flag to avoid unexpected authn/z behavior.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The metrics port and bind address are now hard-coded as 8443 in several places (flags in main, manager.yaml args/ports, envtest manager, and metrics tests); consider centralizing this value in one place or using shared constants to avoid config drift in future changes.
- GetMetricsOptions always configures FilterProvider even when secureServing is disabled; if you intend to support an unprotected or plain HTTP metrics endpoint, you may want to make the filters conditional on the secure flag to avoid unexpected authn/z behavior.

## Individual Comments

### Comment 1
<location path="internal/controller/kruize_controller_test.go" line_range="856-864" />
<code_context>
+                }
+            })
+
+            It("should have the metrics server running and reachable", func() {
+                By("checking the connection to the secure port")
+                resp, err := client.Get("https://127.0.0.1:8443/metrics")
+                Expect(err).ToNot(HaveOccurred())
+                defer resp.Body.Close()
+
+                // Reachability test: getting any HTTP response proves the server is listening
+                Expect(resp).ToNot(BeNil())
+                Expect(resp.StatusCode).ToNot(Equal(0))
+            })
+        })
</code_context>
<issue_to_address>
**suggestion (testing):** Use Eventually/Consistently to avoid flakiness when checking metrics server reachability

This test performs a single GET to `https://127.0.0.1:8443/metrics` and assumes the metrics server is already listening. Because the manager starts in `BeforeSuite`, this can intermittently fail with connection errors/EOFs. Consider wrapping the GET in `Eventually` (as in the first test in this context) to retry until you get any non-zero status code, or using `Consistently` after an `Eventually`-based readiness check to keep the reachability intent while avoiding flakiness.

Suggested implementation:

```golang
            It("should have the metrics server running and reachable", func() {
                checkMetricsReachable := func() error {
                    resp, err := client.Get("https://127.0.0.1:8443/metrics")
                    if err != nil {
                        return err
                    }
                    defer resp.Body.Close()

                    // Reachability test: getting any HTTP response proves the server is listening
                    if resp == nil {
                        return fmt.Errorf("expected non-nil response from metrics endpoint")
                    }
                    if resp.StatusCode == 0 {
                        return fmt.Errorf("expected non-zero HTTP status code from metrics endpoint")
                    }

                    return nil
                }

                By("waiting for the metrics server to become reachable on the secure port")
                Eventually(checkMetricsReachable).Should(Succeed())

                By("verifying the metrics server remains reachable on the secure port")
                Consistently(checkMetricsReachable).Should(Succeed())
            })

```

1. Ensure `fmt` is imported at the top of `internal/controller/kruize_controller_test.go`:
   - If not present, add `fmt` to the existing import block, e.g.:
     `import ( ... "fmt" ... )`.
2. If other tests in this file use custom timeouts/intervals for `Eventually`/`Consistently`, consider aligning this test with those by adding explicit duration/interval arguments to both calls (e.g. `Eventually(checkMetricsReachable, 10*time.Second, 250*time.Millisecond)`), and ensure `time` is imported if not already.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Signed-off-by: Shreya Biradar <shbirada@ibm.com>
@shreyabiradar07
Copy link
Copy Markdown
Contributor Author

@sourcery-ai review

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The Metrics Auth Integration tests currently treat HTTP 500 as an acceptable outcome for unauthorized access in the first Eventually check; this can mask real regressions in the metrics auth path and might be worth tightening to only the expected 401/403 (and explicitly-scoped 500 in envtest where needed).
  • The new metrics auth wiring pulled in a large set of indirect dependencies (e.g., OpenTelemetry, grpc-gateway, cel-go); it would be good to double-check which of these are actually required by controller-runtime filters and prune any that are not strictly needed to keep the module lean.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `Metrics Auth Integration` tests currently treat HTTP 500 as an acceptable outcome for unauthorized access in the first `Eventually` check; this can mask real regressions in the metrics auth path and might be worth tightening to only the expected 401/403 (and explicitly-scoped 500 in envtest where needed).
- The new metrics auth wiring pulled in a large set of indirect dependencies (e.g., OpenTelemetry, grpc-gateway, cel-go); it would be good to double-check which of these are actually required by controller-runtime filters and prune any that are not strictly needed to keep the module lean.

## Individual Comments

### Comment 1
<location path="internal/controller/kruize_controller_test.go" line_range="811-820" />
<code_context>
+        })
+
+        Context("when accessing the metrics endpoint", func() {
+            It("should reject unauthorized requests", func() {
+                By("calling the metrics endpoint until it is ready")
+
+                Eventually(func() (int, error) {
+                    metricsURL := fmt.Sprintf("https://%s/metrics", utils.LocalMetricsAddr)
+                    resp, err := client.Get(metricsURL)
+                    if err != nil {
+                        return 0, err // Return error to trigger a retry
+                    }
+                    defer resp.Body.Close()
+                    return resp.StatusCode, nil
+                }, "10s", "1s").Should(SatisfyAny(
+                    Equal(http.StatusUnauthorized),
+                    Equal(http.StatusForbidden),
</code_context>
<issue_to_address>
**suggestion (testing):** Avoid accepting 500 from the metrics endpoint in real-cluster mode when asserting unauthorized access

`Eventually` currently treats `StatusInternalServerError` as acceptable for unauthorized requests. That’s reasonable in envtest (authn/z webhooks can surface as 500), but in a real cluster a 500 usually indicates a misconfigured TokenReview/SAR or metrics filters. To avoid this test masking real production issues, branch on `KRUIZE_TEST_MODE` (as in the next test) and only permit 500 in envtest mode, while requiring 401/403 in real-cluster mode.

Suggested implementation:

```golang
        Context("when accessing the metrics endpoint", func() {
            It("should reject unauthorized requests", func() {
                By("calling the metrics endpoint until it is ready")

                testMode := os.Getenv("KRUIZE_TEST_MODE")

                if testMode == "envtest" {
                    // In envtest, 500 may surface from authn/z webhooks, so allow it
                    Eventually(func() (int, error) {
                        metricsURL := fmt.Sprintf("https://%s/metrics", utils.LocalMetricsAddr)
                        resp, err := client.Get(metricsURL)
                        if err != nil {
                            return 0, err // Return error to trigger a retry
                        }
                        defer resp.Body.Close()
                        return resp.StatusCode, nil
                    }, "10s", "1s").Should(SatisfyAny(
                        Equal(http.StatusUnauthorized),
                        Equal(http.StatusForbidden),
                        Equal(http.StatusInternalServerError),
                    ), "The metrics server should eventually be reachable and reject the unauthorized request in envtest")
                } else {
                    // In real clusters, 500 usually indicates misconfiguration, so do not accept it
                    Eventually(func() (int, error) {
                        metricsURL := fmt.Sprintf("https://%s/metrics", utils.LocalMetricsAddr)
                        resp, err := client.Get(metricsURL)
                        if err != nil {
                            return 0, err // Return error to trigger a retry
                        }
                        defer resp.Body.Close()
                        return resp.StatusCode, nil
                    }, "10s", "1s").Should(SatisfyAny(
                        Equal(http.StatusUnauthorized),
                        Equal(http.StatusForbidden),
                    ), "The metrics server should eventually be reachable and reject the unauthorized request in real-cluster mode")
                }
            })

            It("should fail when an invalid token is provided", func() {
                By("creating a request with an invalid bearer token")
                metricsURL := fmt.Sprintf("https://%s/metrics", utils.LocalMetricsAddr)
                req, err := http.NewRequest("GET", metricsURL, nil)
                Expect(err).ToNot(HaveOccurred())
                req.Header.Set("Authorization", "Bearer invalid-token")

```

1. Add the `os` package import at the top of `internal/controller/kruize_controller_test.go` if it is not already present:

<<<<<<< SEARCH
import (
    "crypto/tls"
    "fmt"
    "net/http"
=======
import (
    "crypto/tls"
    "fmt"
    "net/http"
    "os"
>>>>>>> REPLACE

2. If the existing import block differs, adjust the `SEARCH` section accordingly so that `os` is included in the same grouped import block.
</issue_to_address>

### Comment 2
<location path="internal/controller/kruize_controller_test.go" line_range="829-838" />
<code_context>
+            It("should fail when an invalid token is provided", func() {
</code_context>
<issue_to_address>
**suggestion (testing):** Add readiness waiting to avoid flakes when testing invalid-token behavior

This test does a single `client.Do(req)` without any readiness wait. If the metrics server isn’t up yet, the call can fail with a transient connection error, making the test flaky and obscuring the invalid-token behavior. Please either reuse the readiness logic from the first test before sending the invalid-token request, or wrap `client.Do(req)` in an `Eventually` that tolerates transient connection errors and then asserts the final status code.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Signed-off-by: Shreya Biradar <shbirada@ibm.com>
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
@kusumachalasani
Copy link
Copy Markdown

The image works as expected.

Copy link
Copy Markdown

@kusumachalasani kusumachalasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@kusumachalasani kusumachalasani moved this from In Progress to Under Review in Monitoring Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Operator

Projects

Status: Under Review

Development

Successfully merging this pull request may close these issues.

2 participants