MG-265: Mandate ServiceAccount for MustGather CR for all users#347
MG-265: Mandate ServiceAccount for MustGather CR for all users#347neha037 wants to merge 7 commits into
Conversation
Remove the default value and optional marker from serviceAccountName, replacing them with Required and MinLength=1 validators. This forces users to explicitly specify a ServiceAccount, eliminating the silent fallback to "default" which often lacks must-gather permissions. Also adds two CEL validation rules: - audit cannot be used with a custom image (imageStreamRef) - audit cannot be used when gatherSpec.command is set without imageStreamRef The controller fallback logic that defaulted to "default" SA is removed since the field is now guaranteed to be populated at admission. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tName Regenerated after API changes to reflect the new required field, minLength validator, and CEL validation rules for audit constraints. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@neha037: This pull request references MG-265 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughMade spec.serviceAccountName required and non-empty; added CRD cross-field validations that (1) forbid gatherSpec.command/args unless imageStreamRef is set and (2) forbid gatherSpec.audit: true when imageStreamRef is set or when gatherSpec.command is non-empty; controller no longer defaults an empty ServiceAccountName; updated unit and e2e tests and generated CRDs. ChangesMustGather: API, CRD, Controller, Tests
🎯 3 (Moderate) | ⏱️ ~25 minutes Sequence DiagramsequenceDiagram
participant User
participant API_Server
participant Admission as "Admission (CRD Validation)"
participant Controller
participant K8s_API as "Kubernetes API"
User->>API_Server: Create MustGather CR
API_Server->>Admission: Validate CRD schema and XValidation rules
Admission-->>API_Server: Accept or Reject (422 Invalid)
API_Server-->>User: Return result
alt Accepted
Controller->>K8s_API: Get ServiceAccount(instance.Spec.ServiceAccountName)
K8s_API-->>Controller: ServiceAccount found / not found
Controller->>Controller: Reconcile using provided ServiceAccountName (no defaulting)
end
🚥 Pre-merge checks | ✅ 9 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (9 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: neha037 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/v1alpha1/mustgather_types.go`:
- Around line 29-30: The comments for the MustGather API are out of sync with
the CEL XValidation rules: update the doc comments on the audit field and
related gatherSpec to state that audit cannot be used with a custom image
(ImageStreamRef) and also cannot be enabled when gatherSpec.command is set
unless ImageStreamRef is present; specifically edit the comment text that
mentions audit applying regardless of ImageStreamRef and that custom images can
respect audit so it matches the XValidation rules referencing audit,
imageStreamRef, and gatherSpec.command.
- Around line 34-36: The ServiceAccountName field's validation is too lax;
update the struct tag for ServiceAccountName to include the same DNS-1123
validation pattern used by the resource Name field to prevent invalid Kubernetes
names (e.g., enforce lowercase alphanumerics, hyphens, start/end with
alphanumeric). Modify the kubebuilder markers on ServiceAccountName (the
ServiceAccountName string `json:"serviceAccountName"`) to add the DNS-1123 regex
pattern (matching the pattern applied to Name) and any length constraints so
admission rejects invalid names instead of failing later in reconciliation.
In `@controllers/mustgather/mustgather_controller.go`:
- Around line 182-193: Before calling r.GetClient().Get for the service account,
validate that instance.Spec.ServiceAccountName is not empty and, if it is empty,
call r.setValidationFailureStatus(ctx, reqLogger, instance,
ValidationServiceAccount, <an appropriate error>) and return (matching the SFTP
secret pattern); only perform the Get when Name != "" and keep the existing
NotFound and transient-error handling for the Get call (references:
instance.Spec.ServiceAccountName, r.GetClient().Get,
r.setValidationFailureStatus, ValidationServiceAccount, reconcile.Result).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 74d13004-dbbb-4ccc-a202-a0865f354109
📒 Files selected for processing (5)
api/v1alpha1/mustgather_types.gobundle/manifests/tech-preview/operator.openshift.io_mustgathers.yamlcontrollers/mustgather/mustgather_controller.gocontrollers/mustgather/mustgather_controller_test.godeploy/crds/operator.openshift.io_mustgathers.yaml
| // +kubebuilder:validation:Required | ||
| // +kubebuilder:validation:MinLength=1 | ||
| ServiceAccountName string `json:"serviceAccountName"` |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify current ServiceAccountName schema constraints
rg -n "ServiceAccountName|serviceAccountName|MinLength|MaxLength|dns1123Subdomain|Pattern" api/v1alpha1/mustgather_types.go -C2
# Verify reconcile path uses the value directly (no pre-validation)
rg -n "Name:\\s+instance.Spec.ServiceAccountName|service account not found|transient error" controllers/mustgather/mustgather_controller.go -C3Repository: openshift/must-gather-operator
Length of output: 2073
🏁 Script executed:
# Check for other ServiceAccountName field definitions to see if any have stricter validation
fd -e go -type f | xargs rg -l "ServiceAccountName" | head -20
# Look for all ServiceAccountName field definitions and their validation
rg -A3 "ServiceAccountName\s*string.*json" --type go
# Check if there are any tests that validate ServiceAccountName input
rg -n "ServiceAccountName.*test|test.*ServiceAccountName" --type go -i
# Verify the CEL validation rule format matches Kubernetes conventions
rg "dns1123Subdomain|dns1123Label" --type goRepository: openshift/must-gather-operator
Length of output: 9708
🏁 Script executed:
# Check tests to see if ServiceAccountName validation is tested
rg -n "ServiceAccountName" test/e2e/must_gather_operator_test.go controllers/mustgather/mustgather_controller_test.go -B2 -A2
# Check the exact CEL syntax used at line 207 to ensure proposed syntax is correct
sed -n '204,210p' api/v1alpha1/mustgather_types.go
# Verify if there are any integration tests that would catch invalid ServiceAccount names
rg -n "setValidationFailureStatus|ValidationServiceAccount" controllers/mustgather/mustgather_controller.go -B3 -A3Repository: openshift/must-gather-operator
Length of output: 27634
Strengthen serviceAccountName validation to enforce DNS-1123 format.
Lines 34–36 enforce only non-empty input. Invalid Kubernetes names (e.g., uppercase, spaces, special characters) are accepted at admission and fail later during reconciliation. Add DNS-1123 validation matching the pattern already used for the Name field in this file (lines 206–207).
Proposed validation
// +kubebuilder:validation:Required
// +kubebuilder:validation:MinLength=1
+ // +kubebuilder:validation:MaxLength=253
+ // +kubebuilder:validation:XValidation:rule="!format.dns1123Subdomain().validate(self).hasValue()",message="serviceAccountName must be a lowercase RFC 1123 subdomain"
ServiceAccountName string `json:"serviceAccountName"`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/v1alpha1/mustgather_types.go` around lines 34 - 36, The
ServiceAccountName field's validation is too lax; update the struct tag for
ServiceAccountName to include the same DNS-1123 validation pattern used by the
resource Name field to prevent invalid Kubernetes names (e.g., enforce lowercase
alphanumerics, hyphens, start/end with alphanumeric). Modify the kubebuilder
markers on ServiceAccountName (the ServiceAccountName string
`json:"serviceAccountName"`) to add the DNS-1123 regex pattern (matching the
pattern applied to Name) and any length constraints so admission rejects invalid
names instead of failing later in reconciliation.
| err = r.GetClient().Get(ctx, types.NamespacedName{ | ||
| Namespace: instance.Namespace, | ||
| Name: saName, | ||
| Name: instance.Spec.ServiceAccountName, | ||
| }, serviceAccount) | ||
| if err != nil { | ||
| if errors.IsNotFound(err) { | ||
| log.Error(err, "service account not found", "name", saName, "namespace", instance.Namespace) | ||
| log.Error(err, "service account not found", "name", instance.Spec.ServiceAccountName, "namespace", instance.Namespace) | ||
| return r.setValidationFailureStatus(ctx, reqLogger, instance, ValidationServiceAccount, err) | ||
| } | ||
|
|
||
| log.Error(err, "failed to get service account (transient error, will retry)", "name", saName, "namespace", instance.Namespace) | ||
| log.Error(err, "failed to get service account (transient error, will retry)", "name", instance.Spec.ServiceAccountName, "namespace", instance.Namespace) | ||
| return reconcile.Result{Requeue: true}, err |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify current SA lookup path and error branching
rg -n "Name:\\s+instance.Spec.ServiceAccountName|service account not found|transient error, will retry|setValidationFailureStatus\\(" controllers/mustgather/mustgather_controller.go -C3
# Verify test coverage around SA failure scenarios
rg -n "service_account|ServiceAccountName|requeue|validation failed" controllers/mustgather/mustgather_controller_test.go -C2Repository: openshift/must-gather-operator
Length of output: 15401
🏁 Script executed:
# Check if there are any existing validations for empty/invalid object names in the codebase
rg -n "TrimSpace|len.*==.*0|empty" controllers/mustgather/mustgather_controller.go -B2 -A2
# Check what validation pattern is used for other fields (like the SFTP secret name)
rg -n "secretName|Secret.*not.*found|validation.*fail" controllers/mustgather/mustgather_controller.go -B2 -A2Repository: openshift/must-gather-operator
Length of output: 3848
Add validation for empty ServiceAccountName before lookup to prevent endless requeue.
The ServiceAccount lookup at line 182 uses instance.Spec.ServiceAccountName directly without prior validation. If the field is empty or invalid, the Kubernetes client returns a non-NotFound error, which is incorrectly handled as a transient failure (line 193: requeue) rather than a validation failure. This differs from the pattern used for SFTP secret lookup above (line 197), which validates Name != "" before attempting the Get.
Add an explicit empty check before the lookup to match the existing validation pattern in this file:
Suggested fix
+ saName := instance.Spec.ServiceAccountName
+ if strings.TrimSpace(saName) == "" {
+ validationErr := fmt.Errorf("spec.serviceAccountName must be non-empty")
+ return r.setValidationFailureStatus(ctx, reqLogger, instance, ValidationServiceAccount, validationErr)
+ }
+
serviceAccount := &corev1.ServiceAccount{}
err = r.GetClient().Get(ctx, types.NamespacedName{
Namespace: instance.Namespace,
- Name: instance.Spec.ServiceAccountName,
+ Name: saName,
}, serviceAccount)
if err != nil {
if errors.IsNotFound(err) {
- log.Error(err, "service account not found", "name", instance.Spec.ServiceAccountName, "namespace", instance.Namespace)
+ log.Error(err, "service account not found", "name", saName, "namespace", instance.Namespace)
return r.setValidationFailureStatus(ctx, reqLogger, instance, ValidationServiceAccount, err)
}
- log.Error(err, "failed to get service account (transient error, will retry)", "name", instance.Spec.ServiceAccountName, "namespace", instance.Namespace)
+ log.Error(err, "failed to get service account (transient error, will retry)", "name", saName, "namespace", instance.Namespace)
return reconcile.Result{Requeue: true}, err
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@controllers/mustgather/mustgather_controller.go` around lines 182 - 193,
Before calling r.GetClient().Get for the service account, validate that
instance.Spec.ServiceAccountName is not empty and, if it is empty, call
r.setValidationFailureStatus(ctx, reqLogger, instance, ValidationServiceAccount,
<an appropriate error>) and return (matching the SFTP secret pattern); only
perform the Get when Name != "" and keep the existing NotFound and
transient-error handling for the Get call (references:
instance.Spec.ServiceAccountName, r.GetClient().Get,
r.setValidationFailureStatus, ValidationServiceAccount, reconcile.Result).
Verify that the API server rejects MustGather CRs submitted without a serviceAccountName field, confirming the CRD admission validation works as expected after making the field required. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/e2e/must_gather_operator_test.go (1)
864-866: ⚡ Quick winTighten validation assertion to the exact missing field
At Line 864, checking only
IsInvalidcan pass on unrelated schema failures. Assert the error cause includesspec.serviceAccountNameso this test only passes for the intended admission rejection.Suggested diff
err := nonAdminClient.Create(testCtx, mg) Expect(err).To(HaveOccurred(), "CR without serviceAccountName should be rejected at admission") Expect(apierrors.IsInvalid(err)).To(BeTrue(), "Error should be a validation error for missing required field") + statusErr, ok := err.(*apierrors.StatusError) + Expect(ok).To(BeTrue(), "Expected StatusError for schema validation failure") + Expect(statusErr.ErrStatus.Details).NotTo(BeNil()) + Expect(statusErr.ErrStatus.Details.Causes).To(ContainElement( + HaveField("Field", "spec.serviceAccountName"), + ))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/must_gather_operator_test.go` around lines 864 - 866, The test currently only checks apierrors.IsInvalid(err); tighten it by casting err to *apierrors.StatusError (e.g., statusErr := err.(*apierrors.StatusError)) and assert that statusErr.ErrStatus.Details.Causes contains a cause whose Field equals "spec.serviceAccountName" so the failure is specifically for the missing serviceAccountName field (use the existing err variable and check ErrStatus.Details.Causes for the field match).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/e2e/must_gather_operator_test.go`:
- Around line 864-866: The test currently only checks apierrors.IsInvalid(err);
tighten it by casting err to *apierrors.StatusError (e.g., statusErr :=
err.(*apierrors.StatusError)) and assert that statusErr.ErrStatus.Details.Causes
contains a cause whose Field equals "spec.serviceAccountName" so the failure is
specifically for the missing serviceAccountName field (use the existing err
variable and check ErrStatus.Details.Causes for the field match).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c911f884-a54e-41cb-b43c-0b798c3822f9
📒 Files selected for processing (1)
test/e2e/must_gather_operator_test.go
There was a problem hiding this comment.
🧹 Nitpick comments (2)
test/e2e/must_gather_operator_test.go (1)
850-869: ⚡ Quick winAdd a case for
serviceAccountName: ""too.This validates the
Requiredpath, but the API change also addsMinLength=1. A second negative case for an explicit empty string would cover the non-empty constraint you just introduced.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/must_gather_operator_test.go` around lines 850 - 869, Add a second negative test in test/e2e/must_gather_operator_test.go that mirrors the existing "should reject MustGather CR without serviceAccountName" scenario but sets Spec.ServiceAccountName = "" explicitly to exercise the MinLength=1 validation; create the MustGather object (same pattern using mustgatherv1alpha1.MustGather and nonAdminClient.Create(testCtx, mg)), assert that err is returned and apierrors.IsInvalid(err) is true, and print the rejection like the existing test (ginkgo.GinkgoWriter.Printf), then nil out mg. This can be a separate ginkgo.It case or appended as an additional check in the same test.api/v1alpha1/mustgather_types.go (1)
29-30: ⚡ Quick winRemove the duplicate
auditCEL rules.These two predicates overlap with the existing validations on Lines 31-32, so one invalid CR can fail twice for the same invariant. Keep a single rule and a single message per condition to avoid noisy admission errors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/v1alpha1/mustgather_types.go` around lines 29 - 30, The two CEL XValidation rules repeating the same audit predicates should be consolidated so a single rule and message exist for each invariant; remove the duplicate rule that checks combinations of imageStreamRef, gatherSpec.audit (audit) and gatherSpec.command so each condition appears only once. Locate the duplicated annotations referencing has(self.imageStreamRef), has(self.gatherSpec), self.gatherSpec.audit, and has(self.gatherSpec.command) and delete the redundant line while keeping one canonical XValidation rule and its message for the audit vs. custom image and audit vs. command constraints.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@api/v1alpha1/mustgather_types.go`:
- Around line 29-30: The two CEL XValidation rules repeating the same audit
predicates should be consolidated so a single rule and message exist for each
invariant; remove the duplicate rule that checks combinations of imageStreamRef,
gatherSpec.audit (audit) and gatherSpec.command so each condition appears only
once. Locate the duplicated annotations referencing has(self.imageStreamRef),
has(self.gatherSpec), self.gatherSpec.audit, and has(self.gatherSpec.command)
and delete the redundant line while keeping one canonical XValidation rule and
its message for the audit vs. custom image and audit vs. command constraints.
In `@test/e2e/must_gather_operator_test.go`:
- Around line 850-869: Add a second negative test in
test/e2e/must_gather_operator_test.go that mirrors the existing "should reject
MustGather CR without serviceAccountName" scenario but sets
Spec.ServiceAccountName = "" explicitly to exercise the MinLength=1 validation;
create the MustGather object (same pattern using mustgatherv1alpha1.MustGather
and nonAdminClient.Create(testCtx, mg)), assert that err is returned and
apierrors.IsInvalid(err) is true, and print the rejection like the existing test
(ginkgo.GinkgoWriter.Printf), then nil out mg. This can be a separate ginkgo.It
case or appended as an additional check in the same test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: cb55399b-d1c3-493d-8db0-3fd5afc6b23d
📒 Files selected for processing (4)
api/v1alpha1/mustgather_types.gobundle/manifests/tech-preview/operator.openshift.io_mustgathers.yamldeploy/crds/operator.openshift.io_mustgathers.yamltest/e2e/must_gather_operator_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- deploy/crds/operator.openshift.io_mustgathers.yaml
- bundle/manifests/tech-preview/operator.openshift.io_mustgathers.yaml
|
/retest all |
|
/test all |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/must_gather_operator_test.go`:
- Around line 850-869: The test currently constructs a typed MustGather
(mustgatherv1alpha1.MustGather with Spec: mustgatherv1alpha1.MustGatherSpec{})
which serializes ServiceAccountName as an empty string, so it doesn't exercise a
truly missing field; replace the "should reject MustGather CR without
serviceAccountName" case to create and submit an unstructured.Unstructured
payload instead (set GVK to the MustGather API group/version/kind, set
metadata.name/namespace to mustGatherName/ns.Name and include a spec map that
omits serviceAccountName entirely) and use nonAdminClient.Create(testCtx,
<unstructured>) to assert admission rejects it; leave the existing empty-string
test unchanged. Ensure you reference the MustGather GVK and the same metadata
used by the other tests so the admission path is identical.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 063bad32-db8b-48b6-9676-ab661f48f674
📒 Files selected for processing (4)
api/v1alpha1/mustgather_types.gobundle/manifests/tech-preview/operator.openshift.io_mustgathers.yamldeploy/crds/operator.openshift.io_mustgathers.yamltest/e2e/must_gather_operator_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- deploy/crds/operator.openshift.io_mustgathers.yaml
- bundle/manifests/tech-preview/operator.openshift.io_mustgathers.yaml
- api/v1alpha1/mustgather_types.go
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #347 +/- ##
==========================================
- Coverage 62.89% 62.71% -0.18%
==========================================
Files 8 8
Lines 830 826 -4
==========================================
- Hits 522 518 -4
Misses 302 302
Partials 6 6
🚀 New features to boost your workflow:
|
|
/test e2e-gcp-operator |
2 similar comments
|
/test e2e-gcp-operator |
|
/test e2e-gcp-operator |
|
@neha037: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
serviceAccountNamea required field on the MustGather CR withminLength: 1, removing the default value of"default"."default"ServiceAccount when the field was empty.auditfield (rejected with custom image, rejected with custom command without imageStreamRef). The original command/args CEL rule is intentionally retained as it serves a different purpose.Why
Previously, the operator silently fell back to the default ServiceAccount, which often lacks the permissions needed for must-gather jobs, causing jobs to hang or fail in confusing ways. Making the field required forces users to be intentional about which SA they use, eliminating ambiguity for both admin and non-admin users.
Changes
api/v1alpha1/mustgather_types.go+kubebuilder:validation:Optionaland+kubebuilder:default:="default"with+kubebuilder:validation:Requiredand+kubebuilder:validation:MinLength=1. Removedomitemptyfrom JSON tag. Added two new CEL rules for audit field validation (retained existing command/args rule).controllers/mustgather/mustgather_controller.goif saName == ""fallback block. Controller now usesinstance.Spec.ServiceAccountNamedirectly since it's guaranteed non-empty at admission.controllers/mustgather/mustgather_controller_test.goServiceAccountName.test/e2e/must_gather_operator_test.goserviceAccountNameis rejected at admission.deploy/crds/andbundle/manifests/Test plan
go vet ./...passesgo build ./...compiles cleanlyserviceAccountNameis rejected at admission🤖 Generated with Claude Code via
/jira:solve [MG-265](https://redhat.atlassian.net/browse/MG-265)Summary by CodeRabbit