OCPBUGS-62619: Add etcd size limit validation for rendered MachineConfigs#5729
OCPBUGS-62619: Add etcd size limit validation for rendered MachineConfigs#5729dkhater-redhat wants to merge 1 commit intoopenshift:mainfrom
Conversation
cb489af to
3b4256d
Compare
Fixes bug where MachineConfigPools get stuck in degraded state with "etcdserver: request is too large" errors when rendered MachineConfigs exceed etcd's 1.5MB size limit. Changes: - Add MaxMachineConfigSize constant (1572864 bytes) in constants.go - Add ValidateMachineConfigSize() function in helpers.go that: * Validates rendered MC size before sending to etcd * Returns clear error message with remediation guidance * Logs warning when size exceeds 80% of limit * Provides debug logging of MC size usage - Call validation in render controller before MC create/update This prevents the operator from attempting to write oversized MCs to etcd, provides early detection with helpful error messages, and avoids wasting retry attempts. The error message specifically mentions large registry mirror configurations (ImageDigestMirrorSet/ICSP) as the primary cause and suggests reducing their size.
3b4256d to
fdf1f44
Compare
|
@dkhater-redhat: This pull request references Jira Issue OCPBUGS-62619, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@dkhater-redhat: This pull request references Jira Issue OCPBUGS-62619, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
1 similar comment
|
@dkhater-redhat: This pull request references Jira Issue OCPBUGS-62619, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
isabella-janssen
left a comment
There was a problem hiding this comment.
/lgtm
Looks good, and I especially like the warning at 80% capacity!
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dkhater-redhat, isabella-janssen The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest-required |
1 similar comment
|
/retest-required |
|
@dkhater-redhat: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/test e2e-gcp-op-part1 |
Fixes bug where MachineConfigPools get stuck in degraded state with "etcdserver: request is too large" errors when rendered MachineConfigs exceed etcd's 1.5MB size limit.
Changes:
This prevents the operator from attempting to write oversized MCs to etcd, provides early detection with helpful error messages, and avoids wasting retry attempts. The error message specifically mentions large registry mirror configurations (ImageDigestMirrorSet/ICSP) as the primary cause and suggests reducing their size.
- What I did
- How to verify it
- Description for the changelog