Skip to content

OCP5 not available: Various issues with cluster operators... netapp problem? #353

@stefan-bergstein

Description

@stefan-bergstein
oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE    MESSAGE
authentication                             4.20.8    False       False         True       15d      OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.ocp5.stormshift.coe.muc.redhat.com/healthz": EOF
baremetal                                  4.20.8    True        False         False      10h
cert-manager                               1.7.1     True        False         False      2y354d
cloud-controller-manager                   4.20.8    True        False         False      3y198d
cloud-credential                           4.20.8    True        False         False      3y198d
cluster-autoscaler                         4.20.8    True        False         False      3y198d
config-operator                            4.20.8    True        False         False      3y198d
console                                    4.20.8    False       True          True       17d      DeploymentAvailable: 0 replicas available for console deployment...
control-plane-machine-set                  4.20.8    True        False         False      2y229d
csi-snapshot-controller                    4.20.8    True        False         False      366d
dns                                        4.20.8    True        False         False      82d
etcd                                       4.20.8    False       False         True       15d      EtcdMembersAvailable: 1 of 3 members are available, ocp5-control-1 is unhealthy, ocp5-control-0 is unhealthy
image-registry                             4.20.8    False       True          True       18d      NodeCADaemonAvailable: The daemon set node-ca has available replicas...
ingress                                    4.20.8    True        False         True       119d     The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing. Last 1 error messages:...
insights                                   4.20.8    True        False         False      14d
kube-apiserver                             4.20.8    True        False         False      3y198d
kube-controller-manager                    4.20.8    True        False         True       3y198d   GarbageCollectorDegraded: error fetching rules: client_error: client error: 401
kube-scheduler                             4.20.8    True        False         False      3y198d
kube-storage-version-migrator              4.20.8    True        False         False      19d
machine-api                                4.20.8    True        False         False      3y198d
machine-approver                           4.20.8    True        False         False      3y198d
machine-config                             4.20.8    True        False         False      494d
marketplace                                4.20.8    True        False         False      3y198d
monitoring                                 4.20.8    False       True          True       3d14h    UpdatingMetricsServer: reconciling MetricsServer Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/metrics-server: context deadline exceeded: the number of pods targeted by the deployment (3 pods) is different from the number of pods targeted by the deployment that have the desired template spec (2 pods)
network                                    4.20.8    True        True          True       3y198d   Deployment "/openshift-frr-k8s/frr-k8s-webhook-server" rollout is not making progress - pod frr-k8s-webhook-server-6fb7958cdd-vw9k6 is in CrashLoopBackOff State
node-tuning                                4.20.8    True        False         False      19d
olm                                        4.20.8    True        False         False      17d
openshift-apiserver                        4.20.8    True        False         False      39m
openshift-controller-manager               4.20.8    True        False         False      2y1d
openshift-samples                          4.20.8    True        False         False      290d
operator-lifecycle-manager                 4.20.8    True        False         False      2y198d
operator-lifecycle-manager-catalog         4.20.8    True        False         False      3y198d
operator-lifecycle-manager-packageserver   4.20.8    True        False         False      85d
service-ca                                 4.20.8    True        False         False      3y198d
storage                                    4.20.8    True        False         False      3y198d
oc get po -n netapp-trident
NAME                                  READY   STATUS             RESTARTS        AGE
trident-controller-69994fff4b-jckmj   6/6     Running            0               18d
trident-node-linux-277p7              0/2     CrashLoopBackOff   5 (2m36s ago)   8m39s
trident-node-linux-5f48d              0/2     CrashLoopBackOff   11 (23s ago)    8m40s
trident-node-linux-frbgr              0/2     CrashLoopBackOff   5 (2m47s ago)   8m47s
trident-node-linux-qdh8d              1/2     CrashLoopBackOff   12 (34s ago)    8m49s
trident-node-linux-r5d9p              1/2     CrashLoopBackOff   12 (28s ago)    8m45s
trident-node-linux-w5wnw              0/2     CrashLoopBackOff   11 (34s ago)    8m43s
trident-node-linux-whw96              1/2     CrashLoopBackOff   12 (37s ago)    8m51s
trident-operator-f495b989d-qfw7d      1/1     Running            0               18d
oc logs trident-node-linux-277p7 -n netapp-trident
time="2026-01-07T17:21:44Z" level=error msg="Could not read /sys/class/fc_host" error="open /sys/class/fc_host: no such file or directory" logLayer=csi_frontend requestID=00a9f730-bcb8-41da-8f57-7b881338f0c9 requestSource=Internal workflow="plugin=activate"
time="2026-01-07T17:21:44Z" level=warning msg="Problem getting FCP host node port name association." error="open /sys/class/fc_host: no such file or directory" logLayer=csi_frontend requestID=00a9f730-bcb8-41da-8f57-7b881338f0c9 requestSource=Internal workflow="plugin=activate"
time="2026-01-07T17:21:44Z" level=warning msg="Error discovering SMB service on host." error="SMBActiveOnHost is not supported for linux" logLayer=csi_frontend requestID=00a9f730-bcb8-41da-8f57-7b881338f0c9 requestSource=Internal workflow="plugin=activate"
time="2026-01-07T17:22:14Z" level=warning msg="Could not update Trident controller with node registration, will retry." error="failed during retry for CreateNode: could not log into the Trident CSI Controller: error communicating with Trident CSI Controller; Put \"https://172.30.2.249:34571/trident/v1/node/ocp5-control-0\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" increment=10.812238859s logLayer=csi_frontend requestID=00a9f730-bcb8-41da-8f57-7b881338f0c9 requestSource=Internal workflow="plugin=activate"
time="2026-01-07T17:22:55Z" level=warning msg="Could not update Trident controller with node registration, will retry." error="failed during retry for CreateNode: could not log into the Trident CSI Controller: error communicating with Trident CSI Controller; Put \"https://172.30.2.249:34571/trident/v1/node/ocp5-control-0\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" increment=18.599340969s logLayer=csi_frontend requestID=00a9f730-bcb8-41da-8f57-7b881338f0c9 requestSource=Internal workflow="plugin=activate"
time="2026-01-07T17:23:44Z" level=warning msg="Could not update Trident controller with node registration, will retry." error="failed during retry for CreateNode: could not log into the Trident CSI Controller: error communicating with Trident CSI Controller; Put \"https://172.30.2.249:34571/trident/v1/node/ocp5-control-0\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" increment=38.540434394s logLayer=csi_frontend requestID=00a9f730-bcb8-41da-8f57-7b881338f0c9 requestSource=Internal workflow="plugin=activate"
time="2026-01-07T17:24:52Z" level=warning msg="Could not update Trident controller with node registration, will retry." error="failed during retry for CreateNode: could not log into the Trident CSI Controller: error communicating with Trident CSI Controller; Put \"https://172.30.2.249:34571/trident/v1/node/ocp5-control-0\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" increment=1m24.967564439s logLayer=csi_frontend requestID=00a9f730-bcb8-41da-8f57-7b881338f0c9 requestSource=Internal workflow="plugin=activate"
time="2026-01-07T17:26:47Z" level=warning msg="Could not update Trident controller with node registration, will retry." error="failed during retry for CreateNode: could not log into the Trident CSI Controller: error communicating with Trident CSI Controller; Put \"https://172.30.2.249:34571/trident/v1/node/ocp5-control-0\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" increment=1m56.080029054s logLayer=csi_frontend requestID=00a9f730-bcb8-41da-8f57-7b881338f0c9 requestSource=Internal workflow="plugin=activate"
time="2026-01-07T17:29:13Z" level=warning msg="Could not update Trident controller with node registration, will retry." error="failed during retry for CreateNode: could not log into the Trident CSI Controller: error communicating with Trident CSI Controller; Put \"https://172.30.2.249:34571/trident/v1/node/ocp5-control-0\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" increment=2m5.863239973s logLayer=csi_frontend requestID=00a9f730-bcb8-41da-8f57-7b881338f0c9 requestSource=Internal workflow="plugin=activate"
```

Metadata

Metadata

Assignees

Labels

cluster/ocp5Releated with our ocp5 cluster at Stormshift

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions