OCPBUGS-93982: skip generation check when deployment was just updated#1178
OCPBUGS-93982: skip generation check when deployment was just updated#1178sg00dwin wants to merge 1 commit into
Conversation
SyncDeployment discarded the changed boolean from ApplyDeployment, so the generation check could not distinguish self-inflicted gaps from genuine progressing states. During upgrades, each spec update caused a sub-second Progressing=True blip that failed the OTA invariant test ~5% of the time. Re-introduce changed from ApplyDeployment as a guard: skip the generation check when the operator itself just modified the deployment. The next sync loop catches any persistent gap Assisted by: Claude (Opus 4.6)
|
@sg00dwin: This pull request references Jira Issue OCPBUGS-93982, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Walkthrough
Deployment changed flag and progressing guard
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (13 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sg00dwin The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/console/operator/sync_v400.go (1)
214-224: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winExtract the progressing guard into a helper.
sync_v400is already carrying several responsibilities, and this new branch is now mirrored in the unit test instead of being exercised directly. A tiny helper here would keep the sync loop slimmer and let the test call the real logic.♻️ Proposed refactor
+func deploymentProgressingErr(depChanged bool, deployment *appsv1.Deployment) error { + if depChanged { + return nil + } + return checkDeploymentGenerationProgress(deployment) +} + func (co *consoleOperator) sync_v400(ctx context.Context, controllerContext factory.SyncContext, updatedOperatorConfig *operatorv1.Console, set configSet) error { ... statusHandler.AddCondition(status.HandleProgressing("SyncLoopRefresh", "InProgress", func() error { version := os.Getenv("OPERATOR_IMAGE_VERSION") - if !depChanged { - if err := checkDeploymentGenerationProgress(actualDeployment); err != nil { - return err - } + if err := deploymentProgressingErr(depChanged, actualDeployment); err != nil { + return err } ... }()))As per coding guidelines, "Keep functions focused and under ~100 lines. Extract large functions with multiple responsibilities into smaller, single-purpose functions." As per path instructions, "Use /go-quality-review for all Go code to check ... functions >100 lines."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/console/operator/sync_v400.go` around lines 214 - 224, Extract the progressing-generation guard from sync_v400 into a small helper so the sync loop stays focused and the test can exercise the real logic instead of duplicating it. Move the “skip when depChanged” branch and the call to checkDeploymentGenerationProgress(actualDeployment) into a dedicated function with a clear name, then have sync_v400 delegate to it before continuing. Keep the helper centered on this single responsibility and reuse it from the unit test to cover the same behavior directly.Sources: Coding guidelines, Path instructions
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/console/operator/sync_v400.go`:
- Around line 214-224: Extract the progressing-generation guard from sync_v400
into a small helper so the sync loop stays focused and the test can exercise the
real logic instead of duplicating it. Move the “skip when depChanged” branch and
the call to checkDeploymentGenerationProgress(actualDeployment) into a dedicated
function with a clear name, then have sync_v400 delegate to it before
continuing. Keep the helper centered on this single responsibility and reuse it
from the unit test to cover the same behavior directly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: d9baf190-c0b5-4a5f-9660-2f97cabf6357
📒 Files selected for processing (2)
pkg/console/operator/sync_v400.gopkg/console/operator/sync_v400_test.go
🔗 Linked repositories identified
CodeRabbit considers these linked repositories for cross-repo context during reviews:
openshift/console(manual)
📜 Review details
🧰 Additional context used
📓 Path-based instructions (8)
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*.go: Follow Go coding standards and patterns documented in CONVENTIONS.md
Organize imports according to conventions documented in CONVENTIONS.md
Usegofmtto format Go code with standard formatting
Rungo vetchecks on all Go packagesFollow Go coding standards and patterns as documented in CONVENTIONS.md, including proper import organization
Organize Go code following the repository structure: main entry point in
cmd/console/main.go, API constants inpkg/api/, operator command setup inpkg/cmd/operator/, and version command inpkg/cmd/version/
**/*.go: Usegofmtfor formatting Go code
Follow standard Go naming conventions
Group imports in order: standard lib, 3rd party, kube/openshift, internal (marked with comments)
Use meaningful error messages with context in Go code
Set status conditions usingstatus.Handle*functions with type prefixes (*Degraded, *Progressing, *Available, *Upgradeable)
Use typed errors and wrap errors to preserve stack context
**/*.go: Imports must be grouped with comments in the order: standard lib, 3rd party, kube, openshift, and operator (internal)
Wrap errors with context usingfmt.Errorf("failed to X: %w", err)pattern to provide meaningful error messages
**/*.go: Do NOT use deprecated ioutil package functions. Use os.ReadFile() instead of ioutil.ReadFile(), os.WriteFile() instead of ioutil.WriteFile(), and io.ReadAll() instead of ioutil.ReadAll() (Go 1.16+)
Do NOT use net.Dial() directly. Use (&net.Dialer{}).DialContext() to support context cancellation
Use %w verb in fmt.Errorf() to wrap errors and maintain error chains, not %v which loses the error chain
Add descriptive context to errors when wrapping them, including relevant identifiers and state information
Use specific error type checks (e.g., apierrors.IsNotFound()) instead of string matching with strings.Contains() on error.Error()
Always propagate the parent context in function calls instead of using context.Background(), to respect parent cont...
Files:
pkg/console/operator/sync_v400_test.gopkg/console/operator/sync_v400.go
⚙️ CodeRabbit configuration file
**/*.go: Review Go code following OpenShift operator patterns.
See CONVENTIONS.md for coding standards and patterns.Refer to the following skills based on CODE PATTERNS, not just file paths:
Refer to /controller-review when code contains:
- Controller struct types (e.g.,
type *Controller struct)func New*Controller(factory functionsfactory.New().WithFilteredEventsInformers(pattern.ToController(method callsSync(ctx context.Context, controllerContext factory.SyncContext)methodsoperatorConfig.Spec.ManagementStatechecksstatus.NewStatusHandlerorstatus.Handle*functionsRefer to /sync-handler-review when code contains:
- Main operator sync functions (e.g.,
sync_v400.gocontent)- Sequential resource syncing with early returns
- Incremental reconciliation loops
- Multiple
resourceapply.Apply*()calls in sequence- Dependency ordering of ConfigMaps → Secrets → Service Accounts → RBAC → Services → Deployments → Routes
- Feature gate conditional logic
Refer to /go-quality-review for all Go code to check:
- Deprecated imports:
ioutil.ReadFile,ioutil.WriteFile,ioutil.ReadAll- Deprecated patterns:
DialwithoutDialContext- Error handling: missing
%win fmt.Errorf- Code smells: deep nesting (4+ levels), functions >100 lines
- Magic values: unexplained numbers/strings
- Context propagation:
context.Background()instead of passed ctx- Missing godoc on exported functions
Files:
pkg/console/operator/sync_v400_test.gopkg/console/operator/sync_v400.go
**/*_test.go
📄 CodeRabbit inference engine (AGENTS.md)
Follow testing patterns and commands documented in TESTING.md
Follow testing patterns and commands as documented in TESTING.md, including running unit tests with 'make test-unit' and checks with 'make check'
**/*_test.go: Use table-driven tests for comprehensive coverage
Usehttptestfor HTTP handler testing in Go
Include proper cleanup functions in tests
Test both success and failure pathsCheck and handle all errors in tests using t.Fatalf() or t.Errorf(); do not ignore errors with blank identifiers
Files:
pkg/console/operator/sync_v400_test.go
⚙️ CodeRabbit configuration file
**/*_test.go: Review test code for quality and patterns.Refer to /unit-test-review when test is in pkg//*_test.go:**
- Table-driven test structure with test cases
- Use of
go-test/deepfor struct comparisons- Test naming conventions (TestFunctionName)
- Error handling with
wantErrpattern- Edge case coverage (nil, empty, boundary values)
- Proper assertions with helpful error messages
- Test isolation (no shared mutable state)
Refer to /e2e-test-review when test contains:
framework.MustNewClientset(t, nil)or similar e2e framework usagewait.Pollorwait.PollImmediatepatternsretry.RetryOnConflictfor updates- Cleanup via
deferfunctions- Console/operator CR manipulations
- Test assertions on cluster state
Suggest to use /e2e-test-review when:
- PR adds new feature requiring e2e coverage
- Test file is empty or skeleton
- Comments indicate "TODO: add test"
Review for common issues:
- Missing cleanup (defer statements)
- Using
time.Sleepinstead ofwait.Poll- Missing context timeouts
- Vague error messages in assertions
- Tests without table-driven structure when testing multiple cases
- Ignoring errors with
_- Tests without assertions
Files:
pkg/console/operator/sync_v400_test.go
{pkg,cmd}/**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
Use gofmt for code formatting on pkg and cmd directories
{pkg,cmd}/**/*.go: Format code usinggofmt -w ./pkg ./cmd
Rungo vetchecks on all Go packages in ./pkg and ./cmd
Files:
pkg/console/operator/sync_v400_test.gopkg/console/operator/sync_v400.go
**/*sync*.go
📄 CodeRabbit inference engine (CONVENTIONS.md)
Implement sync loops (
sync_v400) incrementally: start from zero, create/update missing requirements, and return to continue on next loop
**/*sync*.go: Sync handler implementations must use incremental sync pattern where each reconciliation loop returns early on error rather than collecting errors and continuing
Sync resources in dependency order: ConfigMaps/Secrets → Service Accounts → RBAC (Roles, RoleBindings) → Services → Deployments → Routes
Return early on errors to maintain incremental sync behavior instead of continuing execution or logging and proceeding
Use resourceapply.Apply*() functions to handle both resource creation and update operations
Ensure deleted resources are cleaned up from the cluster when removed from config, with proper handling for NotFound errors
Check feature gates before syncing gated resources to ensure conditional resource reconciliation
Status updates must reflect actual reconciliation state by adding conditions as operations complete using status handlers with HandleProgressingOrDegraded and HandleAvailable
Respect ManagementState configurations and implement cleanup logic for Removed state
Avoid mutating live objects; instead build desired state separately before applying
Do not sync all resources regardless of errors; stop reconciliation on dependency failures
Files:
pkg/console/operator/sync_v400_test.gopkg/console/operator/sync_v400.go
pkg/**/*_test.go
📄 CodeRabbit inference engine (.claude/skills/unit-test-review.md)
pkg/**/*_test.go: Go unit tests in pkg/**/*_test.go files should use table-driven test pattern for clarity and completeness with struct containing test cases
Test function names should be descriptive and clearly indicate what is being tested (e.g., TestGetNodeComputeEnvironments, TestNewRouteConfig)
Test case names within table-driven tests should explain the scenario being tested (e.g., 'Custom hostname and TLS secret set'), not be vague (e.g., 'test1')
Use github.com/go-test/deep package for struct comparisons to show exact differences instead of simple equality checks or manual field-by-field comparisons
Test both success and failure paths, including edge cases such as empty inputs (nil, "", empty slices/maps), boundary values (0, -1, max int), missing labels/fields, duplicate values, and large inputs
Organize test code using Arrange-Act-Assert pattern: setup test data in Arrange phase, call the function in Act phase, verify results in Assert phase
Check error presence using pattern (err != nil) != tt.wantErr instead of ignoring errors with underscore or missing error checks
When checking specific error messages, use strings.Contains to verify error content instead of just checking error presence
Extract common test setup into helper functions rather than duplicating setup code across multiple tests
Write specific assertion error messages with context (e.g., 'expected %d nodes, got %d') rather than vague messages (e.g., 'failed')
Inline simple test data in test structs; extract complex fixtures into testdata files or helper functions
Avoid red flags in tests: no table-driven tests for multi-scenario functions, tests without subtests, tests depending on execution order, global mutable state, hardcoded sleeps, tests without assertions, ignoring errors with underscore, testing implementation details instead of behavior, or overly complex test setup
Files:
pkg/console/operator/sync_v400_test.go
**/operator/**/*.go
📄 CodeRabbit inference engine (Custom checks)
When deployment manifests, operator code, or controllers are added/modified, ensure they do not introduce scheduling constraints assuming standard HA topology. Check ControlPlaneTopology for SingleReplica/DualReplica/HighlyAvailableArbiter/External modes before applying constraints. Use required anti-affinity with maxUnavailable >= 1 (not maxUnavailable: 0). Cap replica counts to schedulable nodes. Exclude arbiter nodes on TNA. Avoid master nodeSelectors on HyperShift. Use library-go DeploymentController hooks (WithTopologyAwareReplicasHook, WithTopologyAwareSchedulingHook, WithControlPlaneNodeSelectorHook).
Files:
pkg/console/operator/sync_v400_test.gopkg/console/operator/sync_v400.go
**
⚙️ CodeRabbit configuration file
**: # OpenShift Console Operator - AI Context HubThis file serves as the central AI documentation hub for the OpenShift Console Operator project. AI assistants (Claude, Cursor, Copilot, CodeRabbit, etc.) use this and the linked documents to understand project context.
Go Version and Dependencies
Go Version and Dependencies
- Go version: 1.24.0 (toolchain: go1.24.4)
- Dependency management: Uses
go.modwith vendoring- Build flags: Use
GOFLAGS="-mod=vendor"for builds and tests to ensure vendored dependencies are used- Key dependencies: openshift/api, openshift/library-go, k8s.io client libraries
- Go version: 1.24.0 (toolchain: go1.24.4)
- Dependency management: Uses
go.modwith vendoring- Build flags: Use
GOFLAGS="-mod=vendor"for builds and tests to ensure vendored dependencies are used- Key dependencies: openshift/api, openshift/library-go, k8s.io client libraries
Quick Reference
This Repository
Document Purpose ARCHITECTURE.md System architecture, components, repository structure CONVENTIONS.md Go coding standards, patterns, import organization TESTING.md Testing patterns, commands, debugging README.md Project README with setup instructions Console Repository (openshift/console)
For frontend-related guidelines, see the openshift/console repository:
Document Purpose STYLEGUIDE.md Frontend code style guidelines INTERNATIONALIZATION.md i18n patterns and translation guidelines CONTRIBUTING.md Contribution guidelines for the console project Project Summary
The **console-operator...
Files:
pkg/console/operator/sync_v400_test.gopkg/console/operator/sync_v400.go
**/*.{py,js,ts,go,rs,java,rb,php,kt,swift,cs}
⚙️ CodeRabbit configuration file
**/*.{py,js,ts,go,rs,java,rb,php,kt,swift,cs}: Injection prevention (prodsec-skills):
- SQL: parameterized queries only; no string concatenation
- Command: no shell=True, os.system, or backtick exec with user input
- LDAP/XPath: escape special characters in filters
- Path traversal: canonicalize paths, reject ../
- Deserialization: no pickle/yaml.load()/eval on untrusted data
- Prototype pollution: no recursive merge of untrusted objects
- Validate at trust boundaries with allow-lists, not deny-lists
- Normalize Unicode and anchor regexes (^$); watch for ReDoS
Files:
pkg/console/operator/sync_v400_test.gopkg/console/operator/sync_v400.go
🔇 Additional comments (1)
pkg/console/operator/sync_v400.go (1)
188-203: LGTM!Also applies to: 309-348
|
/test e2e-aws-console |
|
@sg00dwin: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
During upgrades, the console operator updates the console deployment spec 2-4 times (new image, config changes). Each update bumps
Generation, butObservedGenerationlags by 257ms-1s until the deployment controller processes it. The generation check (introduced by PR #1169 for OCPBUGS-64688) sees this gap and reportsProgressing=True— even though the operator itself just caused the gap.These sub-second blips coincide with MCO's
Progressing=Truewindow, failing the OTA invariant test (clusteroperator/console should stay Progressing=False while MCO is Progressing=True) in ~5% of upgrade jobs.Changes
SyncDeploymentalready callsApplyDeployment, which returns achangedboolean — but it was being discarded (removed by PR #1074 for OCPBUGS-64685). This PR re-introduces it:SyncDeployment: CaptureschangedfromApplyDeploymentand returns it to the callersync_v400: Skips the generation check whenchanged=true(self-inflicted gap, guaranteed transient)When
changed=false, the generation check runs normally — catching genuine deployment rollout delays. The next sync loop always runs withchanged=falseif no new changes are needed, so persistent gaps are still detected.Context
This is the 4th fix in a chain of Progressing-related bugs:
changedflagTests
TestDeploymentProgressingSkippedWhenChanged— 4 cases covering changed/unchanged x gap/no-gapTestDeploymentProgressingByGeneration— 5 cases unchanged and passingmake test-unitpasses (all packages, gofmt, govet)SyncLoopRefreshProgressingblips during upgrade windowSummary by CodeRabbit