Skip to content

Add evaluationInterval to core-finish MCP policies#710

Open
yprokule wants to merge 1 commit intoopenshift-kni:mainfrom
yprokule:rds-optimized-deployment-interval
Open

Add evaluationInterval to core-finish MCP policies#710
yprokule wants to merge 1 commit intoopenshift-kni:mainfrom
yprokule:rds-optimized-deployment-interval

Conversation

@yprokule
Copy link
Copy Markdown
Contributor

@yprokule yprokule commented Apr 13, 2026

Add evaluationInterval with 'compliant: never' to MCP unpause policies in core-finish.yaml. These policies manage temporal MCP orchestration during ZTP deployment (pause for config staging, unpause for application, reset maxUnavailable).

Also those policies are mutually exclusive as custom-mcp-unpause-22 sets maxUnavailable: 100% while core-custom-mcp-set-maxavailable-22 set it to maxUnavailable: 1

After initial compliance during wave execution, continuous re-evaluation is unnecessary and adds API server load. The one-time evaluation aligns with ZTP handoff semantics - policies apply configuration during deployment, then ownership transfers to cluster administrators.

Reduces continuous polling overhead while preserving ZTP functionality.

@openshift-ci openshift-ci Bot requested review from MarSik and cgoncalves April 13, 2026 08:51
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

Adds a placement selector to the custom-mcp-unpause-22 policy and sets evaluationInterval.compliant: never for both custom-mcp-unpause-22 and core-custom-mcp-set-maxavailable-22 in the telco-core configuration; existing policy annotations and manifest patch contents remain unchanged.

Changes

Cohort / File(s) Summary
Configuration change
telco-core/configuration/core-finish.yaml
For custom-mcp-unpause-22: added placement.labelSelector.matchExpressions targeting clusters with common=core, version=4.22, and absence of ztp-done; set evaluationInterval.compliant: never. For core-custom-mcp-set-maxavailable-22: set evaluationInterval.compliant: never. Left policyAnnotations and manifest patch content unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add evaluationInterval to core-finish MCP policies' directly describes the main change: adding evaluationInterval configuration to MCP policies in the core-finish file.
Description check ✅ Passed The description clearly explains what was changed (adding evaluationInterval with 'compliant: never' to MCP policies), why it matters (reduces API server load and polling overhead after initial compliance), and the context (ZTP deployment workflow with mutually exclusive policies).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@yprokule
Copy link
Copy Markdown
Contributor Author

/assign @imiller0

Add evaluationInterval with 'compliant: never' to MCP unpause policies
in core-finish.yaml. These policies manage temporal MCP orchestration
during ZTP deployment (pause for config staging, unpause for application,
reset maxUnavailable).

After initial compliance during wave execution, continuous re-evaluation
is unnecessary and adds API server load. The one-time evaluation aligns
with ZTP handoff semantics - policies apply configuration during
deployment, then ownership transfers to cluster administrators.

Also unbinds the policy after deployment is completed.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@yprokule yprokule force-pushed the rds-optimized-deployment-interval branch from 92fdcc3 to 3c86f81 Compare April 24, 2026 13:17
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 24, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: yprokule
Once this PR has been reviewed and has the lgtm label, please ask for approval from imiller0. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
telco-core/configuration/core-finish.yaml (1)

77-78: Intentional: drift detection disabled post-ZTP for maxUnavailable: 1.

Unlike custom-mcp-unpause-22, this policy has no ztp-done placement restriction, so it remains bound post-handoff. With compliant: never, once it reaches Compliant it won't re-evaluate — meaning if a cluster admin (or another controller) later changes spec.maxUnavailable on any of the three MCPs, the policy will silently continue reporting Compliant against a stale observation.

Per the PR rationale this is by design (ownership transfers to the cluster admin after ZTP), so no change requested. Just flagging that the asymmetry vs. custom-mcp-unpause-22 (which is descoped entirely) means this policy is the one keeping a permanent "stale green" signal on the hub — consider whether a non-never compliant interval (e.g., a long one like 24h) would give you a safer day-2 posture while still dramatically reducing API server load compared to the default.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@telco-core/configuration/core-finish.yaml` around lines 77 - 78, The policy
currently sets evaluationInterval.compliant: never which disables drift
detection post-ZTP and can leave a permanent “stale green” signal; change
evaluationInterval.compliant from "never" to a finite interval (for example
"24h") in the same policy YAML (look for the evaluationInterval/compliant keys
and the policy that contrasts with custom-mcp-unpause-22) so the policy will
periodically re-evaluate day-2 changes while still greatly reducing API load.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@telco-core/configuration/core-finish.yaml`:
- Around line 20-34: Add a short YAML comment near the
placement.matchExpressions (or adjacent to evaluationInterval.compliant)
documenting the expected transient NonCompliance window that occurs between wave
201 (when core-custom-mcp-set-maxavailable-22 flips maxUnavailable back to 1)
and the moment the ztp-done label is applied; mention that this transient
NonCompliant state is harmless because remediationAction: inform + TALM already
completed wave 200, and that dashboards may briefly show a failure during
handoff so maintainers should not treat it as a regression.

---

Nitpick comments:
In `@telco-core/configuration/core-finish.yaml`:
- Around line 77-78: The policy currently sets evaluationInterval.compliant:
never which disables drift detection post-ZTP and can leave a permanent “stale
green” signal; change evaluationInterval.compliant from "never" to a finite
interval (for example "24h") in the same policy YAML (look for the
evaluationInterval/compliant keys and the policy that contrasts with
custom-mcp-unpause-22) so the policy will periodically re-evaluate day-2 changes
while still greatly reducing API load.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 378ce12b-0b47-449e-a1c0-74c58c021efe

📥 Commits

Reviewing files that changed from the base of the PR and between 92fdcc3 and 3c86f81.

📒 Files selected for processing (1)
  • telco-core/configuration/core-finish.yaml

Comment on lines +20 to +34
placement:
labelSelector:
matchExpressions:
- key: common
operator: In
values:
- "core"
- key: version
operator: In
values:
- "4.22"
- key: ztp-done
operator: DoesNotExist
evaluationInterval:
compliant: never
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

policy-generator-plugin evaluationInterval placement policyDefaults override behavior

💡 Result:

The policy-generator-plugin is a Kustomize exec plugin for Open Cluster Management (OCM)/Red Hat Advanced Cluster Management (RHACM) that generates Policy, Placement, and PlacementBinding manifests from PolicyGenerator CRs containing Kubernetes YAML manifests. evaluationInterval: Configures the minimum elapsed time before reevaluating ConfigurationPolicies. Format: durations like "30m" or "watch" (default, uses API watches) or "never". Specified separately for compliant and noncompliant states. Set in policyDefaults or overridden per-policy in policies[].evaluationInterval. placement: Configures cluster targeting for generated policies. Includes labelSelector (to generate Placement), placementPath (existing file), or placementName (existing Placement). Defaults to matching all clusters. Set in policyDefaults.placement or overridden per-policy. policyDefaults: Top-level section providing defaults applied to all generated policies (except namespace, which cannot be overridden). Includes evaluationInterval, placement, remediationAction, severity, etc. override behavior: Defaults cascade: hard-coded defaults -> policyDefaults -> per-policy settings in policies[] -> per-manifest in policies[].manifests[]. Most policyDefaults fields (except namespace) can be overridden at policy level. Some fields (e.g., evaluationInterval) cannot be overridden when consolidateManifests or orderManifests is true. Placement can be consolidated across policies with matching selectors. Explicit values in policies[] override policyDefaults; namespace in policyDefaults is not overridable by policies[].

Citations:


🏁 Script executed:

fd -t f "core-finish.yaml" --exec cat -n {}

Repository: openshift-kni/telco-reference

Length of output: 3266


🌐 Web query:

ConfigurationPolicy evaluationInterval "never" noncompliant state transition behavior ACM RHACM

💡 Result:

In Red Hat Advanced Cluster Management (RHACM), the ConfigurationPolicy resource includes an optional evaluationInterval field with compliant and noncompliant subfields. These control how frequently the policy controller re-evaluates the policy's compliance status on managed clusters. Setting evaluationInterval.noncompliant to "never" disables periodic re-evaluation once the policy reaches a NONCOMPLIANT state. The policy will not be re-evaluated until the policy spec is updated or the list of selected namespaces changes. The controller uses Kubernetes API watches by default (equivalent to "watch" when unset or empty), but "never" explicitly prevents polling or re-checks in that state. This behavior applies similarly to the compliant state. Defaults in PolicyGenerator are often noncompliant: 10s and compliant: 10m, but can be set to "never" to stop evaluation after reaching the state. State transitions: The policy can still transition from NONCOMPLIANT to COMPLIANT if cluster changes (detected via watches) make it compliant, or if the policy is modified. However, while in NONCOMPLIANT, no periodic evaluations occur, reducing resource usage for persistently non-compliant policies.

Citations:


Document the transient NonCompliance window between wave 201 and ztp-done application.

With this scoping, the sequence is:

  1. Wave 200 → custom-mcp-unpause-22 sets maxUnavailable: 100%, becomes Compliant (stops re-evaluating due to compliant: never).
  2. Wave 201 → core-custom-mcp-set-maxavailable-22 sets maxUnavailable: 1, flipping custom-mcp-unpause-22 back to NonCompliant (the default noncompliant interval kicks in, allowing re-evaluation).
  3. ztp-done label is applied → matchExpressions descopes the policy.

Between steps (2) and (3), the policy reports NonCompliant. This is harmless with remediationAction: inform + TALM (TALM has already moved past wave 200), but operators observing ACM dashboards will briefly see a "failed" unpause policy before the ZTP handoff completes. Consider adding a comment to the policy explaining this expected transient state so future maintainers don't mistake it for a regression.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@telco-core/configuration/core-finish.yaml` around lines 20 - 34, Add a short
YAML comment near the placement.matchExpressions (or adjacent to
evaluationInterval.compliant) documenting the expected transient NonCompliance
window that occurs between wave 201 (when core-custom-mcp-set-maxavailable-22
flips maxUnavailable back to 1) and the moment the ztp-done label is applied;
mention that this transient NonCompliant state is harmless because
remediationAction: inform + TALM already completed wave 200, and that dashboards
may briefly show a failure during handoff so maintainers should not treat it as
a regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants