Skip to content

fix(hostedclustersizing): add override fall back#7554

Draft
SudoBrendan wants to merge 1 commit into
openshift:mainfrom
SudoBrendan:sudobrendan/fix-size-override-fallback
Draft

fix(hostedclustersizing): add override fall back#7554
SudoBrendan wants to merge 1 commit into
openshift:mainfrom
SudoBrendan:sudobrendan/fix-size-override-fallback

Conversation

@SudoBrendan

@SudoBrendan SudoBrendan commented Jan 20, 2026

Copy link
Copy Markdown

What this PR does / why we need it:

Today, if you set the size override annotation, but that size isn't an exact string match of what's in the config, your cluster will remain unsized rather than falling back to the standard logic (autoscale / node size). The only indication this has happened is a log message in the controller.

NOTE: I think this seems odd - do others agree this is a bug, not a feature? Is there a reason to prefer an unsized cluster when the override is set incorrectly that I'm unaware of?

Which issue(s) this PR fixes:

Adjacent work I happened to discover while trying to refine what CNTRLPLANE-2581 would be

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

When ClusterSizeOverrideAnnotation is set to a value not found in the
ClusterSizingConfiguration, the controller now gracefully falls back to
autoscaling (if enabled) or node count sizing, rather than skipping
sizing entirely.

Signed-off-by: Brendan Bergen <bbergen@redhat.com>
Assisted-by: Claude Opus 4.5 (via Cursor)
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 20, 2026
@openshift-ci

openshift-ci Bot commented Jan 20, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai

coderabbitai Bot commented Jan 20, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Review skipped — only excluded labels are configured. (1)
  • do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added do-not-merge/needs-area area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release labels Jan 20, 2026
@openshift-ci

openshift-ci Bot commented Jan 20, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: SudoBrendan
Once this PR has been reviewed and has the lgtm label, please assign devguyio for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci

openshift-ci Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

@SudoBrendan: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 11, 2026
@openshift-ci

openshift-ci Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Stale PRs are closed after 21d of inactivity.

If this PR is still relevant, comment to refresh it or remove the stale label.
Mark the PR as fresh by commenting /remove-lifecycle stale.

If this PR is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci Bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 11, 2026
@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Stale PRs rot after 14d of inactivity.

Mark the PR as fresh by commenting /remove-lifecycle rotten.
Rotten PRs close after an additional 7d of inactivity.

If this PR is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci Bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 25, 2026
@hypershift-jira-solve-ci

Copy link
Copy Markdown

Now I have a complete picture of all the failures. Here is the analysis:

Test Failure Analysis Complete

Job Information

  • Prow Job: tide, Red Hat Konflux enterprise-contract checks
  • Build ID: N/A (Konflux pipeline runs: hypershift-operator-main-enterprise-contract-l4jgm, hypershift-operator-enterprise-contract-4pg6w, enterprise-contract-mce-211-kzvv2)
  • PR: fix(hostedclustersizing): add override fall back #7554 (fix(hostedclustersizing): add override fall back)
  • PR Author: SudoBrendan
  • PR Created: 2026-01-20 (over 5 months ago)
  • PR State: OPEN, CONFLICTING, labeled lifecycle/rotten, needs-rebase, do-not-merge/work-in-progress
  • Commits Behind Main: 1,842

Test Failure Analysis

Error

tide: "Not mergeable. PR has a merge conflict." (state: error)

Red Hat Konflux / hypershift-operator-main-enterprise-contract: Integration test FAILED
  → verify task: 222 successes, 30 warnings, 10 failures

Red Hat Konflux / hypershift-operator-enterprise-contract: Integration test FAILED
  → verify task: 222 successes, 30 warnings, 10 failures

Red Hat Konflux / enterprise-contract-mce-211: Integration test FAILED
  → verify task: 240 successes, 62 warnings, 2 failures

Summary

All four failures on PR #7554 are caused by the PR being extremely stale — it was opened on January 20, 2026, is now 1,842 commits behind main, and has never been rebased. The tide error is a direct merge conflict because hostedclustersizing_controller.go was refactored in May 2026 (commit 615bc5bc — "reduce cyclomatic complexity and enable gocyclo linter"), which restructured the exact code this PR modifies. The three Konflux Enterprise Contract failures originated on the PR's initial run on January 20, 2026, and are unrelated to the PR's code changes — they stem from supply-chain policy violations in the build pipeline artifacts. Notably, the enterprise-contract-mce-211 scenario is itself legacy and has been replaced by enterprise-contract-mce-50 on current PRs. The PR also carries the do-not-merge/work-in-progress label, indicating the author did not intend it for merge in its current state.

Root Cause

There are two distinct root causes:

1. Tide merge conflict (primary blocker):
The PR modifies hostedclustersizing_controller.go lines 208–260, restructuring the if/else if/else chain for size class selection into a sequential if sizeClass == nil pattern. However, commit 615bc5bc (merged May 7, 2026) refactored this same controller to reduce cyclomatic complexity by extracting helper functions. A follow-up commit on May 12 restored comments lost in the refactor. These changes directly conflict with the PR's modifications to the same code block. The PR is 1,842 commits behind main and GitHub reports it as non-rebaseable (rebaseable: false).

2. Konflux Enterprise Contract failures (pre-existing, not caused by PR code):
The EC verification checks ran on January 20, 2026, the same day the PR was opened. They failed with 10 policy violations on the operator images and 2 on the MCE-211 component. These failures existed from the PR's first build and are a Konflux supply-chain policy issue, not a code issue. Evidence:

Recommendations
  1. Rebase or recreate the PR against current main: The PR is 1,842 commits behind and the target file has been significantly refactored. A rebase will require manually resolving conflicts in hostedclustersizing_controller.go against the new helper-function-extracted structure. Given the scale of divergence (rebaseable: false), creating a fresh branch from current main and re-applying the logic change may be easier than rebasing.

  2. Remove WIP label when ready: The PR has do-not-merge/work-in-progress — ensure this is removed once the code is ready for review.

  3. Enterprise Contract failures will self-resolve: Once the PR is rebased onto current main, the Konflux build pipeline will use the current .tekton/ pipeline definitions with up-to-date trusted task bundles, and the EC checks should pass (as they do on current PRs like CNTRLPLANE-3562, CNTRLPLANE-3563: test(healthcheck): add unit tests for AWS identity provider #8829 for operator checks).

  4. Verify tests still pass: The refactoring commit changed the controller structure. The new test cases added by this PR (4 new test cases for sizing priority fallback logic) need to be validated against the refactored controller code to ensure they still exercise the correct code paths.

  5. Consider closing as stale: Given the lifecycle/rotten label and 5+ months of inactivity, the author should confirm whether this fix is still needed or if the issue has been addressed by other changes in the intervening 1,842 commits.

Evidence
Evidence Detail
PR merge state mergeable: false, mergeable_state: dirty, rebaseable: false
Commits behind main 1,842 commits diverged
PR age Created 2026-01-20, last commit 2026-01-20T20:28:48Z (5+ months stale)
PR labels lifecycle/rotten, needs-rebase, do-not-merge/work-in-progress, area/hypershift-operator
Conflicting refactor Commit 615bc5bc (May 7, 2026): "reduce cyclomatic complexity and enable gocyclo linter" — restructured the exact reconcile() function this PR modifies
Follow-up refactor May 12, 2026: "restore non-obvious comments after gocyclo refactor" — additional changes to same file
Tide error "Not mergeable. PR has a merge conflict." — set by openshift-merge-bot on 2026-05-11
Operator EC failures 10 policy failures out of 262 checks on both hypershift-operator-main-enterprise-contract and hypershift-operator-enterprise-contract
MCE-211 EC failures 2 policy failures out of 304 checks on enterprise-contract-mce-211
EC failure timing All EC failures occurred on initial run (2026-01-20), same day PR was opened
EC context PR #7553 (merged next day) explicitly created to fix trusted_task.trusted EC policy failures
PR base commit ee28abe4 = merge commit of PR #7551 (Konflux Tekton task update)
Concurrent PRs pass EC PRs #7551, #7553 (merged same week) pass all EC checks
Legacy scenario enterprise-contract-mce-211 replaced by enterprise-contract-mce-50 on current PRs
Prow CI checks All pass: ci/prow/security succeeded; ci/prow/unit, ci/prow/verify-workflows, ci/prow/e2e-azure-self-managed retired
Changed files hostedclustersizing_controller.go (logic change), hostedclustersizing_controller_test.go (+252 lines of tests)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant