OCPCLOUD-1645: Security: Configure RBAC for capi operator#493
OCPCLOUD-1645: Security: Configure RBAC for capi operator#493sunzhaohua2 wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@sunzhaohua2: This pull request references OCPCLOUD-1645 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughReplaced wildcard RBAC in Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
|
/hold for test |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@sunzhaohua2: This pull request references OCPCLOUD-1645 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
manifests/0000_30_cluster-api_03_rbac_roles.yaml (1)
187-191: Remove the legacyextensionsDaemonSet grant if the supported 4.22 floor no longer serves that API group.If this payload only targets modern Kubernetes/OpenShift, this rule is dead privilege and weakens the cleanup. Keeping the
appsDaemonSet read is enough unless there is still an explicit compatibility path usingextensions.Suggested diff
-# Extensions (legacy daemonsets support during drain operations) -- apiGroups: [extensions] - resources: - - daemonsets - verbs: [get, list, watch]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manifests/0000_30_cluster-api_03_rbac_roles.yaml` around lines 187 - 191, The RBAC rule currently grants read access to DaemonSets in the legacy apiGroup "extensions"; remove the dead-privilege entry by deleting the apiGroups: [extensions] stanza for the DaemonSet rule and ensure the equivalent rule for apiGroups: [apps] (resources: ["daemonsets"], verbs: ["get","list","watch"]) remains or is present; if backward compatibility is required instead, gate inclusion behind a clear compatibility flag or comment, but otherwise drop the "extensions" grant entirely.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@manifests/0000_30_cluster-api_03_rbac_roles.yaml`:
- Around line 284-318: Update the existing RoleBinding named
cluster-capi-operator-pull-secret in the openshift-config namespace so its
roleRef.name points to the new Role capi-controllers (instead of the old role
name), and add a new RoleBinding in the openshift-config-managed namespace that
binds the capi-controllers Role to the same subject(s) (serviceAccount and
namespace) used by the operator (mirror the subjects from
cluster-capi-operator-pull-secret) so the operator retains access to the
pull-secret and managed configmaps; ensure the new RoleBinding.roleRef.apiGroup
is rbac.authorization.k8s.io and kind is Role.
- Around line 23-27: Remove the unnecessary "create" verb from the Nodes RBAC
rule: in the rule that lists resources: - nodes and verbs: [get, list, watch,
create, update, patch, delete] remove "create" so the verbs only include the
operations performed by this controller (get, list, watch, update, patch,
delete); update the manifest entry that defines the Node management RBAC rule
(the section referencing "nodes" and its "verbs") to enforce least privilege.
---
Nitpick comments:
In `@manifests/0000_30_cluster-api_03_rbac_roles.yaml`:
- Around line 187-191: The RBAC rule currently grants read access to DaemonSets
in the legacy apiGroup "extensions"; remove the dead-privilege entry by deleting
the apiGroups: [extensions] stanza for the DaemonSet rule and ensure the
equivalent rule for apiGroups: [apps] (resources: ["daemonsets"], verbs:
["get","list","watch"]) remains or is present; if backward compatibility is
required instead, gate inclusion behind a clear compatibility flag or comment,
but otherwise drop the "extensions" grant entirely.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 3369d44c-2e4e-4203-8176-98f516b64c26
📒 Files selected for processing (1)
manifests/0000_30_cluster-api_03_rbac_roles.yaml
|
@sunzhaohua2: This pull request references OCPCLOUD-1645 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@manifests/0000_30_cluster-api_03_rbac_roles.yaml`:
- Around line 40-46: The cluster-wide Role in
manifests/0000_30_cluster-api_03_rbac_roles.yaml currently grants
create/update/patch/delete on core namespaced resources (secrets, configmaps,
services, serviceaccounts); change that cluster-scoped rule to read-only (get,
list, watch) for those resources and remove create/update/patch/delete verbs,
and then add corresponding namespace-scoped Roles (or update existing Roles) to
grant the necessary write verbs for secrets, configmaps, services, and
serviceaccounts within each target namespace so writes remain only in
namespace-scoped Role bindings.
- Around line 210-215: The RBAC entry currently allows create/patch/delete on
roles and rolebindings cluster-wide; change it to a namespaced Role scoped to
the openshift-cluster-api namespace (instead of a ClusterRole) so the verbs on
the resources "roles" and "rolebindings" (and the verbs array [get, list, watch,
create, update, patch, delete]) only apply inside that namespace; alternatively
further tighten by setting resourceNames to the specific Role/RoleBinding names
the controller needs and ensure any RoleBinding/ClusterRoleBinding granting the
SA wider access is removed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 44c3b234-7847-4889-9e50-ef5c8ce7ba6b
📒 Files selected for processing (1)
manifests/0000_30_cluster-api_03_rbac_roles.yaml
|
@sunzhaohua2: This pull request references OCPCLOUD-1645 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@sunzhaohua2: This pull request references OCPCLOUD-1645 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test ? |
|
@sunzhaohua2: The following commands are available to trigger required jobs: The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@manifests/0000_30_cluster-api_03_rbac_roles.yaml`:
- Around line 232-250: Remove the unused parent-resource verbs by editing the
RBAC rule for the resource "clusteroperators": remove the "update" and "patch"
verbs and only keep the verbs actually required (e.g., "get", "list", "watch",
and the one-time "create" if bootstrap is needed); ensure any ongoing status
updates use the separate rule for "clusteroperators/status" with "get",
"update", and "patch". If bootstrap still needs a one-time "create" on
"clusteroperators", split that into its own rule containing only "create" (and
no "update"/"patch") and leave the status operations solely on the
"clusteroperators/status" rule referenced by operator code such as
Status().Patch() in pkg/operatorstatus/operator_status.go.
- Around line 458-471: The current RBAC rule grants write verbs (create, update,
patch, delete) to both "deployments" and "replicasets"; remove those write verbs
for "replicasets" since the operator only reconciles Deployments. Fix by
splitting the rule into two rules (or separate resource entries): one for
"deployments" with verbs [get, list, watch, create, update, patch, delete] and a
second for "replicasets" with read-only verbs [get, list, watch]; update the
rule entries that mention the resources "deployments" and "replicasets"
accordingly so only "deployments" retain create/update/patch/delete.
- Around line 252-279: The RBAC rules for CRDs and webhook configurations in the
Cluster API role (the rules granting verbs update/patch/delete on
apiextensions.k8s.io/customresourcedefinitions and
admissionregistration.k8s.io/*webhookconfigurations) are intentionally broad;
add an explicit documented acknowledgement of this security trade-off near these
rules explaining the assumption that only trusted provider manifests are
deployed and that dynamic provider installation requires global CRD/webhook
mutating permissions, and also include recommended mitigations (e.g., restrict
resourceNames when possible, apply provider onboarding via an admission-gated
namespace or separate mechanism, or use a dedicated service account scoped to
provider installation) so reviewers understand the boundary and future
tightening options.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 368265f3-4724-498a-9b6d-3c66604101bd
📒 Files selected for processing (1)
manifests/0000_30_cluster-api_03_rbac_roles.yaml
| - apiGroups: | ||
| - config.openshift.io | ||
| resources: | ||
| - clusteroperators | ||
| verbs: | ||
| - get | ||
| - list | ||
| - watch | ||
| - create | ||
| - update | ||
| - patch | ||
| - apiGroups: | ||
| - config.openshift.io | ||
| resources: | ||
| - clusteroperators/status | ||
| verbs: | ||
| - get | ||
| - update | ||
| - patch |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== ClusterOperator references =="
rg -n -C4 --type go 'ClusterOperator|clusteroperators'
echo
echo "== Explicit status writes =="
rg -n -C4 --type go 'Status\(\)\.(Update|Patch)\('
echo
echo "== Nearby create/update/patch call sites =="
rg -n -C6 --type go 'ClusterOperator|clusteroperators|\.Create\(|\.Update\(|\.Patch\('Repository: openshift/cluster-capi-operator
Length of output: 50389
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Searching for ClusterOperator writes in main code (excluding vendor) =="
# Find files that reference ClusterOperator outside vendor directories
find . -type f -name "*.go" ! -path "*/vendor/*" ! -path "*/.git/*" | xargs grep -l "ClusterOperator" 2>/dev/null | head -20
echo
echo "== Looking for .Create calls on ClusterOperator clients =="
find . -type f -name "*.go" ! -path "*/vendor/*" ! -path "*/.git/*" -exec grep -l "Create.*ClusterOperator\|clusteroperators.*Create\|\.Create(" {} \; 2>/dev/null | head -10
echo
echo "== Searching for Status().Update/Patch patterns =="
find . -type f -name "*.go" ! -path "*/vendor/*" ! -path "*/.git/*" -exec grep -l "Status()" {} \; 2>/dev/null | head -10Repository: openshift/cluster-capi-operator
Length of output: 2092
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Examining ClusterOperator controller for write operations ==="
wc -l ./pkg/controllers/clusteroperator/clusteroperator_controller.go
echo
echo "=== ClusterOperator controller content ==="
cat -n ./pkg/controllers/clusteroperator/clusteroperator_controller.go
echo
echo "=== operatorstatus package (likely handles status writes) ==="
cat -n ./pkg/operatorstatus/operator_status.goRepository: openshift/cluster-capi-operator
Length of output: 17863
Remove unnecessary update and patch verbs on parent clusteroperators resource.
The code only uses create once to bootstrap the ClusterOperator object, then uses Status().Patch() for all subsequent updates (see pkg/operatorstatus/operator_status.go lines 131 and 211). The parent-resource update and patch verbs are unused and should be removed. If bootstrap still needs create, split that rule out separately.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@manifests/0000_30_cluster-api_03_rbac_roles.yaml` around lines 232 - 250,
Remove the unused parent-resource verbs by editing the RBAC rule for the
resource "clusteroperators": remove the "update" and "patch" verbs and only keep
the verbs actually required (e.g., "get", "list", "watch", and the one-time
"create" if bootstrap is needed); ensure any ongoing status updates use the
separate rule for "clusteroperators/status" with "get", "update", and "patch".
If bootstrap still needs a one-time "create" on "clusteroperators", split that
into its own rule containing only "create" (and no "update"/"patch") and leave
the status operations solely on the "clusteroperators/status" rule referenced by
operator code such as Status().Patch() in pkg/operatorstatus/operator_status.go.
mdbooth
left a comment
There was a problem hiding this comment.
Can you explain how this was generated? It contains many permissions I find surprising for the controllers endpoint.
| # clusterctl | ||
| - apiGroups: | ||
| - clusterctl.cluster.x-k8s.io | ||
| resources: | ||
| - providers | ||
| verbs: | ||
| - get | ||
| - list | ||
| - watch | ||
| - create | ||
| - update | ||
| - patch | ||
| - delete |
There was a problem hiding this comment.
I don't think we need this?
| # Node management (for machine lifecycle) | ||
| - apiGroups: | ||
| - "" | ||
| resources: | ||
| - nodes | ||
| verbs: | ||
| - get | ||
| - list | ||
| - watch | ||
| - update | ||
| - patch | ||
| - delete |
| # Pod management (for draining nodes) | ||
| - apiGroups: | ||
| - "" | ||
| resources: | ||
| - pods | ||
| verbs: | ||
| - get | ||
| - list | ||
| - watch | ||
| - delete | ||
| - apiGroups: | ||
| - "" | ||
| resources: | ||
| - pods/eviction | ||
| verbs: | ||
| - create |
|
@sunzhaohua2: This pull request references OCPCLOUD-1645 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
manifests/0000_30_cluster-api_03_rbac_roles.yaml (2)
395-449: Consider adding status subresources to cluster-reader ClusterRole.The cluster-reader aggregation provides read access to CAPI resources but doesn't include the
/statussubresources. For complete observability, cluster admins may benefit from reading status as well.Suggested addition
# CAPI core resources - apiGroups: - cluster.x-k8s.io resources: - clusters - machines - machinesets + - clusters/status + - machines/status + - machinesets/status verbs: - get - list - watchSimilar additions could be made for infrastructure provider status subresources.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manifests/0000_30_cluster-api_03_rbac_roles.yaml` around lines 395 - 449, The ClusterRole cluster-capi-operator:cluster-reader currently grants get/list/watch on core CAPI resources and provider resources but omits their status subresources; update the rules for the cluster.x-k8s.io and infrastructure.cluster.x-k8s.io sections (the blocks listing resources like clusters, machines, machinesets and awsclusters, awsmachines, etc.) to also include the corresponding "/status" subresources (e.g., clusters/status, machines/status, machinesets/status and awsclusters/status, awsmachines/status, …) so read-only clients can view resource status as well.
193-210: Consider removingupdatefrom parentclusteroperatorsresource.Based on prior analysis, the operator only uses
createfor initial bootstrap andStatus().Patch()for ongoing updates. Theupdateverb on the parent resource (line 202) appears unused. Keeping onlycreateon the parent andupdate/patchon the/statussubresource would further tighten permissions.Suggested diff
- apiGroups: - config.openshift.io resources: - clusteroperators verbs: - get - list - watch - create - - update#!/bin/bash # Verify if update on parent clusteroperators resource is actually used echo "=== Searching for Update calls on ClusterOperator (not status) ===" rg -n "\.Update\(" pkg/ --type go -A2 -B2 | grep -i "clusteroperator" | head -20 echo echo "=== Check for UpdateContext calls ===" rg -n "UpdateContext\(" pkg/ --type go | grep -v "Status" | head -10 echo echo "=== Look at operator_status.go for the actual update patterns ===" rg -n "Update|Patch" pkg/operatorstatus/ --type go -B2 -A2 | head -50🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manifests/0000_30_cluster-api_03_rbac_roles.yaml` around lines 193 - 210, Remove the unnecessary "update" verb from the parent "clusteroperators" resource block (leave only "create", "get", "list", "watch") and keep "get", "update", "patch" on the "clusteroperators/status" subresource; this tightens RBAC because the operator uses Create for bootstrap and Status().Patch()/Status().Update() for status changes instead of updating the parent resource directly—locate the RBAC block referencing "clusteroperators" and the separate block for "clusteroperators/status" in the manifest and drop the "update" entry from the parent resource verbs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@manifests/0000_30_cluster-api_03_rbac_roles.yaml`:
- Around line 272-290: The comment "Metal3 (baremetal infrastructure -
read-only)" is inaccurate because the RBAC rules for resources baremetalhosts,
baremetalhosts/status, and baremetalhosts/finalizers include write verbs
(update, patch); update that comment to reflect read/write permissions (e.g.,
"Metal3 (baremetal infrastructure - read/write)") or otherwise describe that
these rules grant update/patch on baremetalhosts and update on status/finalizers
so the comment matches the actual verbs.
---
Nitpick comments:
In `@manifests/0000_30_cluster-api_03_rbac_roles.yaml`:
- Around line 395-449: The ClusterRole cluster-capi-operator:cluster-reader
currently grants get/list/watch on core CAPI resources and provider resources
but omits their status subresources; update the rules for the cluster.x-k8s.io
and infrastructure.cluster.x-k8s.io sections (the blocks listing resources like
clusters, machines, machinesets and awsclusters, awsmachines, etc.) to also
include the corresponding "/status" subresources (e.g., clusters/status,
machines/status, machinesets/status and awsclusters/status, awsmachines/status,
…) so read-only clients can view resource status as well.
- Around line 193-210: Remove the unnecessary "update" verb from the parent
"clusteroperators" resource block (leave only "create", "get", "list", "watch")
and keep "get", "update", "patch" on the "clusteroperators/status" subresource;
this tightens RBAC because the operator uses Create for bootstrap and
Status().Patch()/Status().Update() for status changes instead of updating the
parent resource directly—locate the RBAC block referencing "clusteroperators"
and the separate block for "clusteroperators/status" in the manifest and drop
the "update" entry from the parent resource verbs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 975a5d71-158c-4a21-8d5e-059ef7cb9930
📒 Files selected for processing (1)
manifests/0000_30_cluster-api_03_rbac_roles.yaml
| # Metal3 (baremetal infrastructure - read-only) | ||
| - apiGroups: | ||
| - metal3.io | ||
| resources: | ||
| - baremetalhosts | ||
| verbs: | ||
| - get | ||
| - list | ||
| - watch | ||
| - update | ||
| - patch | ||
| - apiGroups: | ||
| - metal3.io | ||
| resources: | ||
| - baremetalhosts/status | ||
| - baremetalhosts/finalizers | ||
| verbs: | ||
| - update | ||
|
|
There was a problem hiding this comment.
Comment is misleading - Metal3 permissions include write verbs.
The comment on line 272 says "read-only" but the rule includes update and patch verbs on baremetalhosts, and update on status/finalizers. Consider updating the comment to accurately reflect the permissions.
Suggested diff
-# Metal3 (baremetal infrastructure - read-only)
+# Metal3 (baremetal infrastructure - read and update for host management)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@manifests/0000_30_cluster-api_03_rbac_roles.yaml` around lines 272 - 290, The
comment "Metal3 (baremetal infrastructure - read-only)" is inaccurate because
the RBAC rules for resources baremetalhosts, baremetalhosts/status, and
baremetalhosts/finalizers include write verbs (update, patch); update that
comment to reflect read/write permissions (e.g., "Metal3 (baremetal
infrastructure - read/write)") or otherwise describe that these rules grant
update/patch on baremetalhosts and update on status/finalizers so the comment
matches the actual verbs.
|
@sunzhaohua2: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@mdbooth thank you checking this, I had Claude check Jira and generate the file based on the mapi file I’ve made some adjustments to it and will do some testing. If you notice any issues, please let me know, thank you! |
This PR replaces the insecure wildcard RBAC permissions (
*:*:*) with explicit, scoped permissions following the principle of least privilegeSummary by CodeRabbit
Chores
New Features