Skip to content

OCPBUGS-62619: Add etcd size limit validation for rendered MachineConfigs#5729

Open
dkhater-redhat wants to merge 1 commit intoopenshift:mainfrom
dkhater-redhat:fix-etcd-size-limit-validation
Open

OCPBUGS-62619: Add etcd size limit validation for rendered MachineConfigs#5729
dkhater-redhat wants to merge 1 commit intoopenshift:mainfrom
dkhater-redhat:fix-etcd-size-limit-validation

Conversation

@dkhater-redhat
Copy link
Contributor

Fixes bug where MachineConfigPools get stuck in degraded state with "etcdserver: request is too large" errors when rendered MachineConfigs exceed etcd's 1.5MB size limit.

Changes:

  • Add MaxMachineConfigSize constant (1572864 bytes) in constants.go
  • Add ValidateMachineConfigSize() function in helpers.go that:
    • Validates rendered MC size before sending to etcd
    • Returns clear error message with remediation guidance
    • Logs warning when size exceeds 80% of limit
    • Provides debug logging of MC size usage
  • Call validation in render controller before MC create/update

This prevents the operator from attempting to write oversized MCs to etcd, provides early detection with helpful error messages, and avoids wasting retry attempts. The error message specifically mentions large registry mirror configurations (ImageDigestMirrorSet/ICSP) as the primary cause and suggests reducing their size.

- What I did

- How to verify it

- Description for the changelog

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 3, 2026
@dkhater-redhat dkhater-redhat force-pushed the fix-etcd-size-limit-validation branch from cb489af to 3b4256d Compare March 3, 2026 08:28
Fixes bug where MachineConfigPools get stuck in degraded state with
"etcdserver: request is too large" errors when rendered MachineConfigs
exceed etcd's 1.5MB size limit.

Changes:
- Add MaxMachineConfigSize constant (1572864 bytes) in constants.go
- Add ValidateMachineConfigSize() function in helpers.go that:
  * Validates rendered MC size before sending to etcd
  * Returns clear error message with remediation guidance
  * Logs warning when size exceeds 80% of limit
  * Provides debug logging of MC size usage
- Call validation in render controller before MC create/update

This prevents the operator from attempting to write oversized MCs to
etcd, provides early detection with helpful error messages, and avoids
wasting retry attempts. The error message specifically mentions large
registry mirror configurations (ImageDigestMirrorSet/ICSP) as the
primary cause and suggests reducing their size.
@dkhater-redhat dkhater-redhat force-pushed the fix-etcd-size-limit-validation branch from 3b4256d to fdf1f44 Compare March 3, 2026 08:30
@dkhater-redhat dkhater-redhat changed the title Add etcd size limit validation for rendered MachineConfigs OCPBUGS-62619: Add etcd size limit validation for rendered MachineConfigs Mar 3, 2026
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 3, 2026
@openshift-ci-robot
Copy link
Contributor

@dkhater-redhat: This pull request references Jira Issue OCPBUGS-62619, which is invalid:

  • expected the bug to target the "4.22.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Fixes bug where MachineConfigPools get stuck in degraded state with "etcdserver: request is too large" errors when rendered MachineConfigs exceed etcd's 1.5MB size limit.

Changes:

  • Add MaxMachineConfigSize constant (1572864 bytes) in constants.go
  • Add ValidateMachineConfigSize() function in helpers.go that:
  • Validates rendered MC size before sending to etcd
  • Returns clear error message with remediation guidance
  • Logs warning when size exceeds 80% of limit
  • Provides debug logging of MC size usage
  • Call validation in render controller before MC create/update

This prevents the operator from attempting to write oversized MCs to etcd, provides early detection with helpful error messages, and avoids wasting retry attempts. The error message specifically mentions large registry mirror configurations (ImageDigestMirrorSet/ICSP) as the primary cause and suggests reducing their size.

- What I did

- How to verify it

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@dkhater-redhat
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 3, 2026
@openshift-ci-robot
Copy link
Contributor

@dkhater-redhat: This pull request references Jira Issue OCPBUGS-62619, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

1 similar comment
@openshift-ci-robot
Copy link
Contributor

@dkhater-redhat: This pull request references Jira Issue OCPBUGS-62619, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from sergiordlr March 3, 2026 08:33
@dkhater-redhat
Copy link
Contributor Author

/retest-required

Copy link
Member

@isabella-janssen isabella-janssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Looks good, and I especially like the warning at 80% capacity!

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 3, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 3, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dkhater-redhat, isabella-janssen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [dkhater-redhat,isabella-janssen]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dkhater-redhat
Copy link
Contributor Author

/retest-required

1 similar comment
@dkhater-redhat
Copy link
Contributor Author

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 4, 2026

@dkhater-redhat: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-op-ocl-part1 fdf1f44 link false /test e2e-gcp-op-ocl-part1
ci/prow/e2e-gcp-op-ocl fdf1f44 link false /test e2e-gcp-op-ocl
ci/prow/e2e-gcp-op-ocl-part2 fdf1f44 link false /test e2e-gcp-op-ocl-part2

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@dkhater-redhat
Copy link
Contributor Author

/test e2e-gcp-op-part1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants