feat(cluster_healthcheck): add cluster health validation role#39
Conversation
sabre1041
left a comment
There was a problem hiding this comment.
Review the issues that are being reported.
Also, please review conflicted files
d4928cd to
51d077e
Compare
|
@stevefulme1 looking better. still seeing misalignment on the CDI components. Here are the labels that are applied to the CDI pods |
|
Fixed the CDI label selectors and added the missing components:
All four CDI pods are now covered:
|
|
Fixed the provider readiness reporting issue (item 3 from Andrew's review). Root cause: The original Jinja Fix: Replaced the inline Jinja loop with a two-step filter chain:
This produces a proper Python list and also correctly handles providers missing |
Adds a cluster_healthcheck role that validates OpenShift cluster health for virtualization migration readiness across six categories: OCP nodes, KubeVirt, MTV, storage, network, and post-migration VMs. Generates an HTML summary report with pass/fail/warning status. Review feedback addressed: - Fix CDI pod labels to use app.kubernetes.io/component selectors - Fix Provider readiness to correctly detect Ready condition status - Make migration network check conditional on HyperConverged CR config - Check migration NAD in openshift-cnv namespace, not openshift-mtv - Drop unrelated scaffolding file changes (CODE_OF_CONDUCT, etc.)
…sing components - Change CDI label selectors from app.kubernetes.io/component to cdi.kubevirt.io which matches actual pod labels on OCP 4.21+ - Add cdi-apiserver and cdi-uploadproxy pod health checks (were missing) - Add CDI API Server and CDI Upload Proxy to the health report details
The previous implementation used a {% set %}/{% for %} Jinja block inside
a >- YAML scalar, which outputs the Python list's string representation
(e.g. "[]") rather than an actual list. Downstream | length checks then
evaluated the string length (2 for "[]"), causing all providers to be
falsely reported as not ready.
Replaced with a two-step filter chain: first identify ready providers via
selectattr/contains, then subtract from the full list via reject. This
produces a proper Python list and also correctly handles providers that
are missing status.conditions entirely.
203e22c to
831dff8
Compare
Replace selectattr/contains filters with community.general.json_query JMESPath expressions for Provider readiness and Plan failure checks. The selectattr approach did not correctly match nested conditions within status.conditions arrays.
Summary
Adds a new
cluster_healthcheckrole that validates the health of an OpenShift cluster for virtualization migration readiness. The role performs comprehensive checks across six categories and generates an HTML summary report with pass/fail/warning status and actionable recommendations.Health checks included
Files added
Design decisions
validate_migrationrole patterns (task naming, k8s_info usage, variable prefixing)cluster_healthcheck_per collection convention__cluster_healthcheck_double-underscore prefixkubernetes.core.k8s_info,ansible.builtin.*)cluster_healthcheck_checksdefaultcluster_healthcheck_post_migration_vmsTesting
ansible-lint --profile productionpasses with 0 errors on the role (playbook FQCN resolution matches existing collection behavior)