Add evaluationInterval to core-finish MCP policies#710
Add evaluationInterval to core-finish MCP policies#710yprokule wants to merge 1 commit intoopenshift-kni:mainfrom
Conversation
📝 WalkthroughWalkthroughAdds a placement selector to the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/assign @imiller0 |
Add evaluationInterval with 'compliant: never' to MCP unpause policies in core-finish.yaml. These policies manage temporal MCP orchestration during ZTP deployment (pause for config staging, unpause for application, reset maxUnavailable). After initial compliance during wave execution, continuous re-evaluation is unnecessary and adds API server load. The one-time evaluation aligns with ZTP handoff semantics - policies apply configuration during deployment, then ownership transfers to cluster administrators. Also unbinds the policy after deployment is completed. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
92fdcc3 to
3c86f81
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: yprokule The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
telco-core/configuration/core-finish.yaml (1)
77-78: Intentional: drift detection disabled post-ZTP formaxUnavailable: 1.Unlike
custom-mcp-unpause-22, this policy has noztp-doneplacement restriction, so it remains bound post-handoff. Withcompliant: never, once it reaches Compliant it won't re-evaluate — meaning if a cluster admin (or another controller) later changesspec.maxUnavailableon any of the three MCPs, the policy will silently continue reporting Compliant against a stale observation.Per the PR rationale this is by design (ownership transfers to the cluster admin after ZTP), so no change requested. Just flagging that the asymmetry vs.
custom-mcp-unpause-22(which is descoped entirely) means this policy is the one keeping a permanent "stale green" signal on the hub — consider whether a non-nevercompliantinterval (e.g., a long one like24h) would give you a safer day-2 posture while still dramatically reducing API server load compared to the default.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@telco-core/configuration/core-finish.yaml` around lines 77 - 78, The policy currently sets evaluationInterval.compliant: never which disables drift detection post-ZTP and can leave a permanent “stale green” signal; change evaluationInterval.compliant from "never" to a finite interval (for example "24h") in the same policy YAML (look for the evaluationInterval/compliant keys and the policy that contrasts with custom-mcp-unpause-22) so the policy will periodically re-evaluate day-2 changes while still greatly reducing API load.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@telco-core/configuration/core-finish.yaml`:
- Around line 20-34: Add a short YAML comment near the
placement.matchExpressions (or adjacent to evaluationInterval.compliant)
documenting the expected transient NonCompliance window that occurs between wave
201 (when core-custom-mcp-set-maxavailable-22 flips maxUnavailable back to 1)
and the moment the ztp-done label is applied; mention that this transient
NonCompliant state is harmless because remediationAction: inform + TALM already
completed wave 200, and that dashboards may briefly show a failure during
handoff so maintainers should not treat it as a regression.
---
Nitpick comments:
In `@telco-core/configuration/core-finish.yaml`:
- Around line 77-78: The policy currently sets evaluationInterval.compliant:
never which disables drift detection post-ZTP and can leave a permanent “stale
green” signal; change evaluationInterval.compliant from "never" to a finite
interval (for example "24h") in the same policy YAML (look for the
evaluationInterval/compliant keys and the policy that contrasts with
custom-mcp-unpause-22) so the policy will periodically re-evaluate day-2 changes
while still greatly reducing API load.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 378ce12b-0b47-449e-a1c0-74c58c021efe
📒 Files selected for processing (1)
telco-core/configuration/core-finish.yaml
| placement: | ||
| labelSelector: | ||
| matchExpressions: | ||
| - key: common | ||
| operator: In | ||
| values: | ||
| - "core" | ||
| - key: version | ||
| operator: In | ||
| values: | ||
| - "4.22" | ||
| - key: ztp-done | ||
| operator: DoesNotExist | ||
| evaluationInterval: | ||
| compliant: never |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
policy-generator-plugin evaluationInterval placement policyDefaults override behavior
💡 Result:
The policy-generator-plugin is a Kustomize exec plugin for Open Cluster Management (OCM)/Red Hat Advanced Cluster Management (RHACM) that generates Policy, Placement, and PlacementBinding manifests from PolicyGenerator CRs containing Kubernetes YAML manifests. evaluationInterval: Configures the minimum elapsed time before reevaluating ConfigurationPolicies. Format: durations like "30m" or "watch" (default, uses API watches) or "never". Specified separately for compliant and noncompliant states. Set in policyDefaults or overridden per-policy in policies[].evaluationInterval. placement: Configures cluster targeting for generated policies. Includes labelSelector (to generate Placement), placementPath (existing file), or placementName (existing Placement). Defaults to matching all clusters. Set in policyDefaults.placement or overridden per-policy. policyDefaults: Top-level section providing defaults applied to all generated policies (except namespace, which cannot be overridden). Includes evaluationInterval, placement, remediationAction, severity, etc. override behavior: Defaults cascade: hard-coded defaults -> policyDefaults -> per-policy settings in policies[] -> per-manifest in policies[].manifests[]. Most policyDefaults fields (except namespace) can be overridden at policy level. Some fields (e.g., evaluationInterval) cannot be overridden when consolidateManifests or orderManifests is true. Placement can be consolidated across policies with matching selectors. Explicit values in policies[] override policyDefaults; namespace in policyDefaults is not overridable by policies[].
Citations:
- 1: https://github.com/open-cluster-management-io/policy-generator-plugin/blob/main/docs/policygenerator-reference.yaml
- 2: https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.11/html/governance/integrate-policy-generator
- 3: https://github.com/open-cluster-management-io/policy-generator-plugin/blob/main/CLAUDE.md
- 4: https://www.aba-easysteps.co.uk
🏁 Script executed:
fd -t f "core-finish.yaml" --exec cat -n {}Repository: openshift-kni/telco-reference
Length of output: 3266
🌐 Web query:
ConfigurationPolicy evaluationInterval "never" noncompliant state transition behavior ACM RHACM
💡 Result:
In Red Hat Advanced Cluster Management (RHACM), the ConfigurationPolicy resource includes an optional evaluationInterval field with compliant and noncompliant subfields. These control how frequently the policy controller re-evaluates the policy's compliance status on managed clusters. Setting evaluationInterval.noncompliant to "never" disables periodic re-evaluation once the policy reaches a NONCOMPLIANT state. The policy will not be re-evaluated until the policy spec is updated or the list of selected namespaces changes. The controller uses Kubernetes API watches by default (equivalent to "watch" when unset or empty), but "never" explicitly prevents polling or re-checks in that state. This behavior applies similarly to the compliant state. Defaults in PolicyGenerator are often noncompliant: 10s and compliant: 10m, but can be set to "never" to stop evaluation after reaching the state. State transitions: The policy can still transition from NONCOMPLIANT to COMPLIANT if cluster changes (detected via watches) make it compliant, or if the policy is modified. However, while in NONCOMPLIANT, no periodic evaluations occur, reducing resource usage for persistently non-compliant policies.
Citations:
- 1: https://docs.redhat.com/en/documentation/openshift_container_platform/4.20/html/edge_computing/managing-cluster-policies-with-policygenerator-resources
- 2: https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.5/html-single/governance/index
- 3: https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.10/html/governance/governance
- 4: https://github.com/open-cluster-management-io/config-policy-controller/blob/main/api/v1/configurationpolicy_types.go
- 5: https://docs.okd.io/latest/edge_computing/policygenerator_for_ztp/ztp-advanced-policygenerator-config.html
Document the transient NonCompliance window between wave 201 and ztp-done application.
With this scoping, the sequence is:
- Wave 200 →
custom-mcp-unpause-22setsmaxUnavailable: 100%, becomes Compliant (stops re-evaluating due tocompliant: never). - Wave 201 →
core-custom-mcp-set-maxavailable-22setsmaxUnavailable: 1, flippingcustom-mcp-unpause-22back to NonCompliant (the default noncompliant interval kicks in, allowing re-evaluation). ztp-donelabel is applied →matchExpressionsdescopes the policy.
Between steps (2) and (3), the policy reports NonCompliant. This is harmless with remediationAction: inform + TALM (TALM has already moved past wave 200), but operators observing ACM dashboards will briefly see a "failed" unpause policy before the ZTP handoff completes. Consider adding a comment to the policy explaining this expected transient state so future maintainers don't mistake it for a regression.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@telco-core/configuration/core-finish.yaml` around lines 20 - 34, Add a short
YAML comment near the placement.matchExpressions (or adjacent to
evaluationInterval.compliant) documenting the expected transient NonCompliance
window that occurs between wave 201 (when core-custom-mcp-set-maxavailable-22
flips maxUnavailable back to 1) and the moment the ztp-done label is applied;
mention that this transient NonCompliant state is harmless because
remediationAction: inform + TALM already completed wave 200, and that dashboards
may briefly show a failure during handoff so maintainers should not treat it as
a regression.
Add evaluationInterval with
'compliant: never'to MCP unpause policies in core-finish.yaml. These policies manage temporal MCP orchestration during ZTP deployment (pause for config staging, unpause for application, reset maxUnavailable).Also those policies are mutually exclusive as
custom-mcp-unpause-22sets maxUnavailable: 100% whilecore-custom-mcp-set-maxavailable-22set it to maxUnavailable: 1After initial compliance during wave execution, continuous re-evaluation is unnecessary and adds API server load. The one-time evaluation aligns with ZTP handoff semantics - policies apply configuration during deployment, then ownership transfers to cluster administrators.
Reduces continuous polling overhead while preserving ZTP functionality.