Migrate integration tests from bash to Go - Phase 1 by BATMAN-JD · Pull Request #969 · openshift/aws-account-operator

BATMAN-JD · 2026-04-13T20:08:51Z

This commit introduces Go-based integration tests to replace the existing bash scripts, improving maintainability, type safety, and IDE integration.

Changes

Infrastructure (test/integration/helpers/)

client.go: Kubernetes client management
- GetKubeClient() - Create controller-runtime client with AAO types
- GetKubeConfig() - Load kubeconfig (in-cluster or local)
- CreateNamespace() / DeleteNamespace() - Namespace operations
- RemoveFinalizers() - Strip finalizers from any resource
accountclaim.go: AccountClaim-specific operations
- CreateFakeAccountClaim() - Create FAKE AccountClaim for testing
- WaitForAccountClaimReady() - Wait for Ready/Claimed status
- GetAccountClaim() - Retrieve AccountClaim CR
- DeleteAccountClaim() - Delete with optional finalizer removal
- VerifySecretExists() - Check for secret existence

Tests Migrated (test/integration/tests/)

fake_accountclaim_test.go: Full migration of test_fake_accountclaim.sh
- ~30 second runtime (no AWS interaction)
- Tests FAKE AccountClaim workflow
- Verifies finalizers, no accountLink, secret creation
- Automatic cleanup via t.Cleanup()

Build System

int-testing.mk: Added Go test targets
- make test-integration-go - Run all Go integration tests
- make test-integration-go-fake - Run FAKE AccountClaim test only

Documentation

tests/README.md: Comprehensive test guide
- How to run tests
- Helper function documentation
- Writing new tests guide
- Bash vs Go comparison
MIGRATION_STATUS.md: Migration tracking
- Completed: FAKE AccountClaim test (1/8)
- TODO: 7 remaining bash tests
- Benefits and patterns documented

Testing Status

Code compiles successfully:

go test -c ./test/integration/tests/
go build ./test/integration/helpers/...

Runtime testing blocked by separate issue (backplane QPS rate limiting in local dev - see upcoming PR).

Benefits Over Bash

Type-safe API interactions
Better error messages and handling
Automatic cleanup even on test failure
IDE integration (autocomplete, jump-to-definition)
Structured test phases (Setup/Validate/Cleanup)
Reusable helper libraries

Migration Plan

Phase 1: Foundation (COMPLETE)

✅ Helper infrastructure
✅ First test migrated (FAKE AccountClaim)
✅ Patterns established

Phase 2: Core Tests (NEXT)

Migrate high-priority local profile tests
Build out Account and AWS helper functions

Phase 3: Advanced Tests

Migrate CI/PROW-only tests
Handle special cases (KMS, long-running cleanup)

What is being added?

Is this a fix for a bug? What's the bug? Is this a new feature? Please describe it. Is this just a small typo fix? That's fine too!

Checklist before requesting review

I have tested this locally
I have included unit tests
I have updated any corresponding documentation

Steps To Manually Test

Please provide us steps to reproduce the scenarios needed to test this. If an integration test is provided, let us know how to run it. If not, enumerate the steps to validate this does what it's supposed to.

Start the operator
Run the thing
Observe X in the spec for the thing
Clean up the thing

Ref OSD-0000

Summary by CodeRabbit

Tests
- Added Go-based integration test framework and an initial fake AccountClaim end-to-end test
- Added reusable test helpers for creating, waiting on, verifying, and cleaning up test resources
Documentation
- Added migration status and test-suite README with run instructions, validation checklist, and migration plan
Chores
- Added Make targets for running Go integration tests and adjusted namespace bootstrap/selection and CI/PROW-aware build behavior for test runs

This commit introduces Go-based integration tests to replace the existing bash scripts, improving maintainability, type safety, and IDE integration. ## Changes ### Infrastructure (test/integration/helpers/) - **client.go**: Kubernetes client management - GetKubeClient() - Create controller-runtime client with AAO types - GetKubeConfig() - Load kubeconfig (in-cluster or local) - CreateNamespace() / DeleteNamespace() - Namespace operations - RemoveFinalizers() - Strip finalizers from any resource - **accountclaim.go**: AccountClaim-specific operations - CreateFakeAccountClaim() - Create FAKE AccountClaim for testing - WaitForAccountClaimReady() - Wait for Ready/Claimed status - GetAccountClaim() - Retrieve AccountClaim CR - DeleteAccountClaim() - Delete with optional finalizer removal - VerifySecretExists() - Check for secret existence ### Tests Migrated (test/integration/tests/) - **fake_accountclaim_test.go**: Full migration of test_fake_accountclaim.sh - ~30 second runtime (no AWS interaction) - Tests FAKE AccountClaim workflow - Verifies finalizers, no accountLink, secret creation - Automatic cleanup via t.Cleanup() ### Build System - **int-testing.mk**: Added Go test targets - make test-integration-go - Run all Go integration tests - make test-integration-go-fake - Run FAKE AccountClaim test only ### Documentation - **tests/README.md**: Comprehensive test guide - How to run tests - Helper function documentation - Writing new tests guide - Bash vs Go comparison - **MIGRATION_STATUS.md**: Migration tracking - Completed: FAKE AccountClaim test (1/8) - TODO: 7 remaining bash tests - Benefits and patterns documented ## Testing Status Code compiles successfully: - go test -c ./test/integration/tests/ - go build ./test/integration/helpers/... Runtime testing blocked by separate issue (backplane QPS rate limiting in local dev - see upcoming PR). ## Benefits Over Bash - Type-safe API interactions - Better error messages and handling - Automatic cleanup even on test failure - IDE integration (autocomplete, jump-to-definition) - Structured test phases (Setup/Validate/Cleanup) - Reusable helper libraries ## Migration Plan Phase 1: Foundation (COMPLETE) - ✅ Helper infrastructure - ✅ First test migrated (FAKE AccountClaim) - ✅ Patterns established Phase 2: Core Tests (NEXT) - Migrate high-priority local profile tests - Build out Account and AWS helper functions Phase 3: Advanced Tests - Migrate CI/PROW-only tests - Handle special cases (KMS, long-running cleanup) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

BATMAN-JD · 2026-04-13T20:09:13Z

/test integration-test

coderabbitai · 2026-04-13T20:09:19Z

Warning

Rate limit exceeded

@BATMAN-JD has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 24 minutes and 17 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 24 minutes and 17 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 345d253e-dd3c-4cd0-9e2b-5e08720f680f

📥 Commits

Reviewing files that changed from the base of the PR and between 93a0d46 and 82d3f4f.

📒 Files selected for processing (1)

test/integration/integration-test-bootstrap.sh

Walkthrough

Adds Go-based integration test infrastructure and docs: new test helpers for kube client and AccountClaim, Go test Make targets, migration/status documentation and README, a fake AccountClaim Go integration test, and small Makefile/script adjustments for deployment and CI behavior.

Changes

Cohort / File(s)	Summary
Migration & Test Docs `test/integration/MIGRATION_STATUS.md`, `test/integration/tests/README.md`	New migration status and tests README describing migration plan, run instructions, test structure, helper guidelines, troubleshooting, and checklist.
Test Helpers — Kubernetes Client `test/integration/helpers/client.go`	New helper functions to build a controller-runtime `client.Client` and scheme, retrieve kubeconfig (in-cluster or KUBECONFIG/~/.kube/config), create/delete namespaces with polling, and remove finalizers via merge patch.
Test Helpers — AccountClaim `test/integration/helpers/accountclaim.go`	New AccountClaim utilities: create a FAKE `awsv1alpha1.AccountClaim`, poll for Ready/Error states and conditions, delete with optional finalizer removal and wait-for-deletion, fetch claim, and verify referenced Secret existence.
Integration Test — Example `test/integration/tests/fake_accountclaim_test.go`	New `TestFakeAccountClaim` exercising FAKE AccountClaim end-to-end: setup namespace and claim, wait-for-ready, validate finalizer/absence of Account link/secret presence, and register cleanup via `t.Cleanup`.
Make targets for Go tests `test/integration/int-testing.mk`	Added `test-integration-go` and `test-integration-go-fake` targets to run Go tests; updated prow predeploy to apply with explicit `-n ${OPERATOR_NAMESPACE}`.
Deployment script / Makefile tweaks `Makefile`, `test/integration/integration-test-bootstrap.sh`	Makefile: silent namespace existence check and `oc project ${OPERATOR_NAMESPACE}` before applying CRs. Script: PROW detection to skip in-cluster build, use `PROW_BUILT_IMAGE` when set, and print PROW env vars; build/verify flow adjusted accordingly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 9 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Structure And Quality	⚠️ Warning	Test violates single responsibility principle by checking three unrelated behaviors in one subtest, lacks context deadline for bounded operations, and has inconsistent assertion messages.	Split 'Validate' subtest into three separate subtests for finalizers, accountLink, and secret; add context deadline with WithTimeout; ensure all assertions include descriptive error messages.

✅ Passed checks (9 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: migrating integration tests from bash to Go in its first phase, which is the core objective of this PR.
Description check	✅ Passed	The PR description is comprehensive and well-structured, covering infrastructure changes, migrated tests, build system updates, and documentation. However, it duplicates the template checklist at the end without completing it, and the 'Steps To Manually Test' section still contains placeholder examples rather than actual test instructions.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Stable And Deterministic Test Names	✅ Passed	Test names are static strings with no dynamic values. The PR uses standard Go testing.T framework with subtest names 'Setup' and 'Validate' that are deterministic and stable.
Microshift Test Compatibility	✅ Passed	The PR adds standard Go integration tests using testing.T and t.Run(), not Ginkgo e2e tests. The MicroShift API compatibility check applies only to Ginkgo tests, making it inapplicable here.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	The PR adds standard Go integration tests using the testing package, not Ginkgo e2e tests. SNO compatibility check applies only to new Ginkgo e2e tests with patterns like Describe(), It(), Context(), or When().
Topology-Aware Scheduling Compatibility	✅ Passed	PR modifies only test infrastructure files without introducing scheduling constraints or topology dependencies in deployment manifests.
Ote Binary Stdout Contract	✅ Passed	PR contains no process-level stdout writes violating OTE Binary Stdout Contract. All logging uses testing package methods (t.Logf, t.Log, t.Errorf) properly captured by test framework.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	Custom check targets Ginkgo e2e tests, but PR introduces standard Go tests using the testing package with no IPv4 assumptions or external connectivity requirements.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-04-13T20:10:47Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2026-04-13T20:11:58Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: BATMAN-JD
Once this PR has been reviewed and has the lgtm label, please assign reedcort for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (1)

test/integration/helpers/client.go (1)

50-56: Only fall back to local kubeconfig when not running in cluster.

Falling back for every InClusterConfig error can hide real in-cluster configuration failures.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/integration/helpers/client.go` around lines 50 - 56, The code currently
falls back to local kubeconfig on any rest.InClusterConfig() error; change the
logic to only attempt in-cluster config when the process appears to be running
in-cluster (e.g. check for KUBERNETES_SERVICE_HOST or presence of service
account files) and otherwise load the kubeconfig; update the block around
rest.InClusterConfig() so you call rest.InClusterConfig() only when in-cluster
indicators are present, and otherwise directly use the KUBECONFIG env var (or
clientcmd.BuildConfigFromFlags) to avoid masking real in-cluster errors.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/integration/helpers/accountclaim.go`:
- Around line 132-134: Replace the overly specific error text in the
secret-fetching failure path: instead of returning fmt.Errorf("secret %s/%s does
not exist: %w", namespace, name, err) when err != nil, change the message to a
generic failure like "failed to get secret %s/%s: %w" (or similar) so
RBAC/API/connectivity errors aren't misreported; keep using fmt.Errorf and wrap
the original err and reference the same namespace and name variables to preserve
context.
- Around line 109-115: The deletion wait loop incorrectly treats
client.IgnoreNotFound(err) == nil as success; instead explicitly handle the
three cases: if err == nil the resource still exists so return (false, nil); if
apierrors.IsNotFound(err) (or equivalent) return (true, nil) to indicate
deletion; for any other non-nil err return (false, err) so the
wait.PollImmediate loop can stop and surface real API errors; update the
anonymous func used in wait.PollImmediate (the c.Get call and its error
handling) to implement this explicit branching.

In `@test/integration/helpers/client.go`:
- Around line 104-110: The polling logic in the wait.PollImmediate callback
incorrectly treats a successful c.Get(ctx, client.ObjectKey{Name: name}, ns) as
deletion and masks all API errors by returning (false, nil); update the callback
used by wait.PollImmediate so that when c.Get returns nil you return (false,
nil) to continue polling (namespace still exists), when c.Get returns a NotFound
error (use client.IgnoreNotFound(err) != nil or k8s errors.IsNotFound) return
(true, nil) to signal deletion, and for any other non-NotFound error return
(false, err) so the poll stops and surfaces the real API/RBAC/network error;
refer to c.Get, client.IgnoreNotFound, and wait.PollImmediate to find and fix
the code.

In `@test/integration/MIGRATION_STATUS.md`:
- Around line 185-186: The documentation incorrectly states that finalizer
removal is enabled by default; update the wording to reflect that finalizer
removal is opt-in via the removeFinalizers argument in the test helpers (i.e.,
callers must pass removeFinalizers to enable it). Locate the line referencing
"Finalizer removal enabled by default" in MIGRATION_STATUS.md and change it to
something like "Finalizer removal is opt-in via the removeFinalizers argument in
the test helpers" so it matches the helpers' behavior.

In `@test/integration/tests/fake_accountclaim_test.go`:
- Around line 48-71: The "Validate" subtest is running even when the "Setup"
subtest fails because you ignore the boolean return of t.Run; capture the result
of t.Run("Setup", func(t *testing.T){...}) into a variable (e.g. ok :=
t.Run(...)) and if ok is false, abort the parent test (call t.Fatalf or return)
before calling t.Run("Validate", ...) so downstream helpers like
helpers.CreateNamespace, helpers.CreateFakeAccountClaim and
helpers.WaitForAccountClaimReady are not relied on when setup fails.

In `@test/integration/tests/README.md`:
- Around line 173-174: Update the README to correct the finalizer-removal
statement: it is not enabled by default—callers must explicitly pass true to
DeleteAccountClaim(..., removeFinalizers bool) to remove finalizers. Locate
references to finalizer behavior and the DeleteAccountClaim function in the docs
(and any examples) and change the wording and example calls to show that
removeFinalizers must be set to true to enable finalizer removal; if there are
sample snippets that omit this boolean, update them to include removeFinalizers:
true or describe that parameter explicitly.

---

Nitpick comments:
In `@test/integration/helpers/client.go`:
- Around line 50-56: The code currently falls back to local kubeconfig on any
rest.InClusterConfig() error; change the logic to only attempt in-cluster config
when the process appears to be running in-cluster (e.g. check for
KUBERNETES_SERVICE_HOST or presence of service account files) and otherwise load
the kubeconfig; update the block around rest.InClusterConfig() so you call
rest.InClusterConfig() only when in-cluster indicators are present, and
otherwise directly use the KUBECONFIG env var (or
clientcmd.BuildConfigFromFlags) to avoid masking real in-cluster errors.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: af8c926b-8782-464e-9a08-16061fbcc34f

📥 Commits

Reviewing files that changed from the base of the PR and between 9eba2a4 and 72d27d4.

📒 Files selected for processing (6)

test/integration/MIGRATION_STATUS.md
test/integration/helpers/accountclaim.go
test/integration/helpers/client.go
test/integration/int-testing.mk
test/integration/tests/README.md
test/integration/tests/fake_accountclaim_test.go

coderabbitai · 2026-04-13T20:17:34Z

+	return wait.PollImmediate(2*time.Second, timeout, func() (bool, error) {
+		err := c.Get(ctx, client.ObjectKey{Name: name, Namespace: namespace}, claim)
+		if client.IgnoreNotFound(err) == nil {
+			return true, nil // Deleted
+		}
+		return false, nil // Still exists
+	})


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "accountclaim.go" | head -20

Repository: openshift/aws-account-operator

Length of output: 116

🏁 Script executed:

cat -n test/integration/helpers/accountclaim.go

Repository: openshift/aws-account-operator

Length of output: 5234

Fix error handling in deletion wait loop to propagate non-NotFound errors.

The current code at lines 109-115 has a critical logic error: client.IgnoreNotFound(err) == nil is true when err is either nil OR a NotFound error. This means:

When the resource exists (err = nil): incorrectly returns (true, nil) signaling deletion

When other API errors occur (connection failures, permission errors, etc.): silently returns (false, nil) and continues polling indefinitely instead of propagating the error

Replace with explicit error type checking:

Suggested fix

+import ( + apierrors "k8s.io/apimachinery/pkg/api/errors" +) ... return wait.PollImmediate(2*time.Second, timeout, func() (bool, error) { err := c.Get(ctx, client.ObjectKey{Name: name, Namespace: namespace}, claim) - if client.IgnoreNotFound(err) == nil { - return true, nil // Deleted - } - return false, nil // Still exists + if err != nil { + if apierrors.IsNotFound(err) { + return true, nil // Deleted + } + return false, fmt.Errorf("failed while waiting for AccountClaim deletion %s/%s: %w", namespace, name, err) + } + return false, nil // Still exists })

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/integration/helpers/accountclaim.go` around lines 109 - 115, The deletion wait loop incorrectly treats client.IgnoreNotFound(err) == nil as success; instead explicitly handle the three cases: if err == nil the resource still exists so return (false, nil); if apierrors.IsNotFound(err) (or equivalent) return (true, nil) to indicate deletion; for any other non-nil err return (false, err) so the wait.PollImmediate loop can stop and surface real API errors; update the anonymous func used in wait.PollImmediate (the c.Get call and its error handling) to implement this explicit branching.

coderabbitai · 2026-04-13T20:17:34Z

+	if err != nil {
+		return fmt.Errorf("secret %s/%s does not exist: %w", namespace, name, err)
+	}


⚠️ Potential issue | 🟡 Minor

Secret error message is too specific for all failure modes.

Line 133 says “does not exist” even when the real issue is RBAC or API connectivity. Keep the message generic and include the wrapped error.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/integration/helpers/accountclaim.go` around lines 132 - 134, Replace the overly specific error text in the secret-fetching failure path: instead of returning fmt.Errorf("secret %s/%s does not exist: %w", namespace, name, err) when err != nil, change the message to a generic failure like "failed to get secret %s/%s: %w" (or similar) so RBAC/API/connectivity errors aren't misreported; keep using fmt.Errorf and wrap the original err and reference the same namespace and name variables to preserve context.

coderabbitai · 2026-04-13T20:17:34Z

+	return wait.PollImmediate(2*time.Second, timeout, func() (bool, error) {
+		err := c.Get(ctx, client.ObjectKey{Name: name}, ns)
+		if client.IgnoreNotFound(err) == nil {
+			return true, nil // Namespace is gone
+		}
+		return false, nil // Still exists, keep polling
+	})


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

git ls-files test/integration/helpers/client.go

Repository: openshift/aws-account-operator

Length of output: 108

🏁 Script executed:

cat -n test/integration/helpers/client.go | head -120

Repository: openshift/aws-account-operator

Length of output: 4149

Fix error handling in namespace deletion polling to prevent masking API errors.

The polling loop at lines 105-109 silently retries on any error (RBAC, network, etc.) by returning false, nil, which masks these issues and surfaces them only as timeouts instead of with diagnostic detail.

Also note the logic is inverted when c.Get() succeeds with no error: client.IgnoreNotFound(nil) returns nil, causing the condition to return true while the namespace still exists.

Suggested fix

+import ( + apierrors "k8s.io/apimachinery/pkg/api/errors" +) ... return wait.PollImmediate(2*time.Second, timeout, func() (bool, error) { err := c.Get(ctx, client.ObjectKey{Name: name}, ns) - if client.IgnoreNotFound(err) == nil { - return true, nil // Namespace is gone - } - return false, nil // Still exists, keep polling + if err != nil { + if apierrors.IsNotFound(err) { + return true, nil // Namespace is gone + } + return false, fmt.Errorf("failed while waiting for namespace deletion %s: %w", name, err) + } + return false, nil // Still exists, keep polling })

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

return wait.PollImmediate(2*time.Second, timeout, func() (bool, error) {

err := c.Get(ctx, client.ObjectKey{Name: name}, ns)

if client.IgnoreNotFound(err) == nil {

return true, nil // Namespace is gone

}

return false, nil // Still exists, keep polling

})

return wait.PollImmediate(2*time.Second, timeout, func() (bool, error) {

err := c.Get(ctx, client.ObjectKey{Name: name}, ns)

if err != nil {

if apierrors.IsNotFound(err) {

return true, nil // Namespace is gone

}

return false, fmt.Errorf("failed while waiting for namespace deletion %s: %w", name, err)

}

return false, nil // Still exists, keep polling

})

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/integration/helpers/client.go` around lines 104 - 110, The polling logic in the wait.PollImmediate callback incorrectly treats a successful c.Get(ctx, client.ObjectKey{Name: name}, ns) as deletion and masks all API errors by returning (false, nil); update the callback used by wait.PollImmediate so that when c.Get returns nil you return (false, nil) to continue polling (namespace still exists), when c.Get returns a NotFound error (use client.IgnoreNotFound(err) != nil or k8s errors.IsNotFound) return (true, nil) to signal deletion, and for any other non-NotFound error return (false, err) so the poll stops and surfaces the real API/RBAC/network error; refer to c.Get, client.IgnoreNotFound, and wait.PollImmediate to find and fix the code.

coderabbitai · 2026-04-13T20:17:34Z

+- Cleanup automatic via `t.Cleanup()` - runs even on failure
+- Finalizer removal enabled by default for faster local testing


⚠️ Potential issue | 🟡 Minor

Cleanup note overstates default behavior.

Line 186 says finalizer removal is enabled by default; in helpers it is opt-in via the removeFinalizers argument.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/integration/MIGRATION_STATUS.md` around lines 185 - 186, The documentation incorrectly states that finalizer removal is enabled by default; update the wording to reflect that finalizer removal is opt-in via the removeFinalizers argument in the test helpers (i.e., callers must pass removeFinalizers to enable it). Locate the line referencing "Finalizer removal enabled by default" in MIGRATION_STATUS.md and change it to something like "Finalizer removal is opt-in via the removeFinalizers argument in the test helpers" so it matches the helpers' behavior.

coderabbitai · 2026-04-13T20:17:34Z

+	t.Run("Setup", func(t *testing.T) {
+		t.Logf("Creating namespace: %s", namespaceName)
+		err := helpers.CreateNamespace(ctx, kubeClient, namespaceName)
+		if err != nil {
+			t.Fatalf("Failed to create namespace: %v", err)
+		}
+
+		t.Logf("Creating FAKE AccountClaim: %s", claimName)
+		_, err = helpers.CreateFakeAccountClaim(ctx, kubeClient, claimName, namespaceName)
+		if err != nil {
+			t.Fatalf("Failed to create FAKE AccountClaim: %v", err)
+		}
+
+		t.Logf("Waiting for FAKE AccountClaim to become Ready (timeout: %v)", accountClaimReadyTimeout)
+		err = helpers.WaitForAccountClaimReady(ctx, kubeClient, claimName, namespaceName, accountClaimReadyTimeout)
+		if err != nil {
+			t.Fatalf("FAKE AccountClaim did not become ready: %v", err)
+		}
+
+		t.Log("✓ Setup complete")
+	})
+
+	// Test: Validate FAKE AccountClaim
+	t.Run("Validate", func(t *testing.T) {


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the test file find . -name "fake_accountclaim_test.go" -type f

Repository: openshift/aws-account-operator

Length of output: 124

🏁 Script executed:

# Check the structure of the test file around the mentioned lines # Also look for similar patterns in the codebase rg -n "t.Run" test/integration/tests/fake_accountclaim_test.go | head -20

Repository: openshift/aws-account-operator

Length of output: 156

🏁 Script executed:

# Get more context - examine the full test structure cat -n test/integration/tests/fake_accountclaim_test.go | head -100

Repository: openshift/aws-account-operator

Length of output: 4095

🌐 Web query:

Go testing t.Run subtest t.Fatalf behavior return value

💡 Result:

In Go's testing package, when using t.Run to create a subtest, calling t.Fatalf (or t.Fatal, t.FailNow) inside the subtest function terminates that specific subtest by calling runtime.Goexit after marking it as failed. This runs any deferred functions (like t.Cleanup) in the subtest goroutine but does not execute code after the t.Fatalf call, including any explicit return statement. The parent t.Run call blocks until the subtest completes (via return from the function or Goexit). It then checks the subtest's failed state via !t2.failed and returns false if the subtest failed, true otherwise. This return value indicates success to the caller but does not stop the parent test—subsequent subtests and parent code continue executing. The parent test is marked failed only if using t.Error/t.Errorf (non-fatal), not by t.Fatal in a child subtest. Example behavior: func TestParent(t *testing.T) { if !t.Run("sub", func(t *testing.T) { t.Fatalf("fail") // Subtest ends here, t.Run returns false }) { t.Log("subtest failed, but parent continues") // This runs } t.Log("parent ends") // This runs } Output shows subtest FAIL, parent PASS (unless marked failed otherwise). This design allows table-driven tests to report all failures without one fatal subtest halting siblings, unlike pre-Go 1.7 flat tests where t.Fatalf stopped everything.

Citations:

1: https://pkg.go.dev/testing

2: https://go.dev/blog/subtests

3: https://go.dev/src/testing/testing.go

4: https://pkg.go.dev/testing@go1.22.2

🌐 Web query:

Go testing nested subtests failure propagation idiomatic pattern

💡 Result:

In Go's testing package, the idiomatic pattern for handling failures in nested subtests (created with t.Run) leverages the distinct behaviors of t.Fail/t.Error (continue execution) and t.FailNow/t.Fatal (stop current subtest). Key failure propagation rules: - t.Error or t.Fail in a subtest marks it (and all ancestors) as failed but continues running the subtest body, subsequent sibling subtests, and parent tests. - t.Fatal or t.FailNow in a subtest immediately stops that subtest (skipping remaining code and unstarted descendant subtests) but allows siblings and the parent to continue. The parent will be marked failed if any subtest fails. - This enables table-driven tests where one failing case doesn't skip others, unlike pre-1.7 t.Fatalf which halted the entire test. Example idiomatic table-driven subtest: func TestTime(t *testing.T) { testCases := []struct { gmt, loc string }{ {"12:31", "Europe/Zurich"}, {"12:31", "America/New_York"}, } for _, tc := range testCases { t.Run(fmt.Sprintf("%s in %s", tc.gmt, tc.loc), func(t *testing.T) { loc, err := time.LoadLocation(tc.loc) if err != nil { t.Fatal("could not load location") // Stops this subtest only } // ... further checks with t.Error if needed to continue }) } } If a subtest Fatals (e.g., bad location), output shows: --- FAIL: TestTime/12:31_in_Europe/Zuri --- FAIL: TestTime/12:31_in_America/New_York Both subtests report, parent fails, but others run. For parallelism: Call t.Parallel in subtests after t.Run; parent waits for all. FailNow skips unstarted parallel descendants in that branch. Best practices: - Use t.Run for table-driven tests to isolate failures. - Prefer t.Fatal in subtests for quick stops on setup errors; t.Error for assertions to run all cases. - Avoid closing over parent t in subtests (leads to FailNow precondition violations). - Run specific subtests: go test -run 'TestTime/12:31_in'. This pattern, introduced in Go 1.7, provides fine-grained control, setup/teardown sharing, and clear hierarchical reporting.

Citations:

1: https://go.dev/blog/subtests

2: https://go.dev/design/12166-subtests

3: https://pkg.go.dev/testing

4: https://pkg.go.dev/testing@go1.26rc1

5: https://go.dev/src/testing/testing.go?s=16299%3A16316

6: https://rednafi.com/go/subtest-grouping/

Gate Validate on successful Setup.

Validate runs even when Setup fails because the t.Run("Setup", ...) return value is ignored. In Go's testing package, a failed subtest (via t.Fatalf) does not stop the parent test; subsequent code continues executing. This causes cascading failures when dependent tests run without required resources.

Suggested fix

- t.Run("Setup", func(t *testing.T) { + if ok := t.Run("Setup", func(t *testing.T) { t.Logf("Creating namespace: %s", namespaceName) err := helpers.CreateNamespace(ctx, kubeClient, namespaceName) if err != nil { t.Fatalf("Failed to create namespace: %v", err) } ... t.Log("✓ Setup complete") - }) + }); !ok { + t.Fatalf("Setup failed; skipping Validate") + } // Test: Validate FAKE AccountClaim t.Run("Validate", func(t *testing.T) {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

t.Run("Setup", func(t *testing.T) {

t.Logf("Creating namespace: %s", namespaceName)

err := helpers.CreateNamespace(ctx, kubeClient, namespaceName)

if err != nil {

t.Fatalf("Failed to create namespace: %v", err)

}

t.Logf("Creating FAKE AccountClaim: %s", claimName)

_, err = helpers.CreateFakeAccountClaim(ctx, kubeClient, claimName, namespaceName)

if err != nil {

t.Fatalf("Failed to create FAKE AccountClaim: %v", err)

}

t.Logf("Waiting for FAKE AccountClaim to become Ready (timeout: %v)", accountClaimReadyTimeout)

err = helpers.WaitForAccountClaimReady(ctx, kubeClient, claimName, namespaceName, accountClaimReadyTimeout)

if err != nil {

t.Fatalf("FAKE AccountClaim did not become ready: %v", err)

}

t.Log("✓ Setup complete")

})

// Test: Validate FAKE AccountClaim

t.Run("Validate", func(t *testing.T) {

if ok := t.Run("Setup", func(t *testing.T) {

t.Logf("Creating namespace: %s", namespaceName)

err := helpers.CreateNamespace(ctx, kubeClient, namespaceName)

if err != nil {

t.Fatalf("Failed to create namespace: %v", err)

}

t.Logf("Creating FAKE AccountClaim: %s", claimName)

_, err = helpers.CreateFakeAccountClaim(ctx, kubeClient, claimName, namespaceName)

if err != nil {

t.Fatalf("Failed to create FAKE AccountClaim: %v", err)

}

t.Logf("Waiting for FAKE AccountClaim to become Ready (timeout: %v)", accountClaimReadyTimeout)

err = helpers.WaitForAccountClaimReady(ctx, kubeClient, claimName, namespaceName, accountClaimReadyTimeout)

if err != nil {

t.Fatalf("FAKE AccountClaim did not become ready: %v", err)

}

t.Log("✓ Setup complete")

}); !ok {

t.Fatalf("Setup failed; skipping Validate")

}

// Test: Validate FAKE AccountClaim

t.Run("Validate", func(t *testing.T) {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/integration/tests/fake_accountclaim_test.go` around lines 48 - 71, The "Validate" subtest is running even when the "Setup" subtest fails because you ignore the boolean return of t.Run; capture the result of t.Run("Setup", func(t *testing.T){...}) into a variable (e.g. ok := t.Run(...)) and if ok is false, abort the parent test (call t.Fatalf or return) before calling t.Run("Validate", ...) so downstream helpers like helpers.CreateNamespace, helpers.CreateFakeAccountClaim and helpers.WaitForAccountClaimReady are not relied on when setup fails.

coderabbitai · 2026-04-13T20:17:34Z

+- Finalizer removal is enabled by default for faster cleanup
+- Manual cleanup: `oc delete accountclaim <name> -n <namespace>`


⚠️ Potential issue | 🟡 Minor

Finalizer-removal behavior is documented inaccurately.

Line 173 says finalizer removal is enabled by default, but DeleteAccountClaim(..., removeFinalizers bool) requires explicit true at call sites.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/integration/tests/README.md` around lines 173 - 174, Update the README to correct the finalizer-removal statement: it is not enabled by default—callers must explicitly pass true to DeleteAccountClaim(..., removeFinalizers bool) to remove finalizers. Locate references to finalizer behavior and the DeleteAccountClaim function in the docs (and any examples) and change the wording and example calls to show that removeFinalizers must be set to true to enable finalizer removal; if there are sample snippets that omit this boolean, update them to include removeFinalizers: true or describe that parameter explicitly.

Fixes intermittent 'namespace not found' errors in PROW where service account creation would fail with 'namespaces ci-op-XXX not found'. Root cause: When namespace didn't exist, 'oc create' would run but 'oc project' (context switch) would not, leaving subsequent 'oc apply' commands targeting the wrong namespace. Now ensures we always switch to aws-account-operator namespace whether it existed or was just created.

BATMAN-JD · 2026-04-13T20:26:42Z

/test integration-test

Fixes ServiceUnavailable error when switching to newly created namespace. Root cause: 'oc project' was running immediately after 'oc create namespace', before OpenShift Project API had synced. This caused intermittent 'ServiceUnavailable' errors. Solution: Reorder operations to: 1. Create namespace (if needed) 2. Wait for namespace to be Active (up to 20s) 3. Then switch context with 'oc project' This ensures namespace is fully ready before attempting context switch.

BATMAN-JD · 2026-04-13T21:20:09Z

/test integration-test

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

Makefile (1)

106-117: ⚠️ Potential issue | 🟠 Major

Fail fast if namespace never becomes Active before switching project.

The loop can finish without success, but execution still continues to Line 117. That can cause flaky downstream failures instead of a clear timeout error.

🔧 Proposed fix

-	# Wait for namespace to be fully ready before switching to it
-	`@echo` "Waiting for namespace to be ready..."
-	`@for` i in {1..10}; do \
-		if oc get namespace ${OPERATOR_NAMESPACE} -o jsonpath='{.status.phase}' 2>/dev/null | grep -q "Active"; then \
-			echo "Namespace is ready"; \
-			break; \
-		fi; \
-		echo "Waiting for namespace (attempt $$i/10)..."; \
-		sleep 2; \
-	done
-	# Switch to the namespace now that it's ready
-	`@oc` project ${OPERATOR_NAMESPACE}
+	# Wait for namespace to be fully ready before switching to it
+	`@echo` "Waiting for namespace to be ready..."
+	`@oc` wait --for=jsonpath='{.status.phase}'=Active namespace/${OPERATOR_NAMESPACE} --timeout=20s
+	# Switch to the namespace now that it's ready
+	`@oc` project ${OPERATOR_NAMESPACE} >/dev/null

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 106 - 117, The namespace readiness loop in the
Makefile may exit after 10 attempts without the namespace becoming Active but
still proceeds to run oc project ${OPERATOR_NAMESPACE}; modify the Makefile so
that after the for-loop finishes you check whether the namespace became Active
(use the same oc get namespace ${OPERATOR_NAMESPACE} -o
jsonpath='{.status.phase}' check or track a success flag), and if it is not
Active print a clear error and exit non-zero (e.g., echo "Namespace
${OPERATOR_NAMESPACE} not ready after timeout" && exit 1) before calling oc
project ${OPERATOR_NAMESPACE}; ensure you update the same block around the
existing loop and oc project call so failures fail fast.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@Makefile`:
- Around line 106-117: The namespace readiness loop in the Makefile may exit
after 10 attempts without the namespace becoming Active but still proceeds to
run oc project ${OPERATOR_NAMESPACE}; modify the Makefile so that after the
for-loop finishes you check whether the namespace became Active (use the same oc
get namespace ${OPERATOR_NAMESPACE} -o jsonpath='{.status.phase}' check or track
a success flag), and if it is not Active print a clear error and exit non-zero
(e.g., echo "Namespace ${OPERATOR_NAMESPACE} not ready after timeout" && exit 1)
before calling oc project ${OPERATOR_NAMESPACE}; ensure you update the same
block around the existing loop and oc project call so failures fail fast.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 7f6db77d-099f-462b-a2a6-741738a37116

📥 Commits

Reviewing files that changed from the base of the PR and between 72d27d4 and a9d7c75.

📒 Files selected for processing (1)

Makefile

BATMAN-JD · 2026-04-13T21:35:45Z

/test integration-test

Eliminates dependency on OpenShift Project API and oc project context. Root cause: 'oc project' relies on Project API which is created asynchronously after namespace creation. Even though namespace is Active, Project API may still return ServiceUnavailable. Solution: Use explicit '-n ${OPERATOR_NAMESPACE}' on all namespaced commands instead of relying on context switching. This is more reliable and works immediately after namespace creation. Changes: - int-testing.mk: Added -n flag to deploy/*.yaml apply - Makefile: Removed 'oc project' call (no longer needed)

BATMAN-JD · 2026-04-15T14:39:57Z

/test integration-test

Introduces a new test approach that uses PROW's pre-built operator image instead of rebuilding in-cluster with OpenShift Build API. Problem: - Bash bootstrap rebuilds operator image in test cluster - Requires Build API to be available (intermittent PROW failures) - Wastes time (~2-3 min) rebuilding what PROW already built Solution: - New script: test/integration/run-go-tests.sh * Uses PROW's pre-built image directly * Deploys operator * Runs Go integration tests * No Build API dependency - New make target: test-integration-go-prow * Can be tested alongside existing bash tests * Zero risk to existing workflows Benefits: - More reliable (no Build API dependency) - Faster (skip rebuild step) - Simpler (~100 lines vs ~600 lines bash) - Better for Go test migration Files: - test/integration/run-go-tests.sh - New simplified runner - test/integration/GO_TESTS_README.md - Documentation - Makefile: Added test-integration-go-prow target - int-testing.mk: Added prow-ci-go-entrypoint target Testing: - Can be validated in PR without impacting existing tests - Falls back to bash tests if needed - Low-risk additive change

Stop wasting time rebuilding what PROW already built. Problem: - PROW builds operator image from PR commit ✓ - Bootstrap script rebuilds SAME code in-cluster ✗ * Wastes 2-3 minutes * Requires OpenShift Build API (intermittently unavailable) * Produces identical image to what PROW built Solution: - Detect PROW environment (IMAGE_FORMAT variable) - If PROW: Use pre-built image, skip rebuild - If local/staging: Build in-cluster as before Changes to integration-test-bootstrap.sh: - buildOperatorImage(): Check for IMAGE_FORMAT, skip build if set - verifyBuildSuccess(): Skip verification when using PROW image - configureKustomization(): Use PROW image or built image as appropriate Benefits: - ✓ Fixes intermittent Build API failures in PROW - ✓ Saves 2-3 minutes per test run - ✓ Zero risk: Falls back to rebuild if not in PROW - ✓ Maintains local dev workflow unchanged Testing: - Local dev: Still builds in-cluster (IMAGE_FORMAT not set) - PROW: Uses pre-built image (IMAGE_FORMAT set by ci-operator) This is the 'always build fresh' logic we questioned - it made sense for manual testing but was wasteful in PROW where the image is already built from the exact commit being tested.

BATMAN-JD · 2026-04-15T15:34:13Z

/test integration-test

Adding debug output to understand why IMAGE_FORMAT detection is not working in PROW environment. This will help us see what variables are actually set during PROW test runs. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

BATMAN-JD · 2026-04-15T15:59:58Z

/test integration-test

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/integration/integration-test-bootstrap.sh`:
- Around line 253-263: The fast-path that skips rebuilding when IMAGE_FORMAT is
set should only apply for the Prow profile; update the conditional inside
buildOperatorImage() (also used by profileStage()) to require PROFILE=prow (or
an explicit PROW flag) in addition to IMAGE_FORMAT being non-empty before
returning early, so only when PROFILE equals "prow" do you set PROW_BUILT_IMAGE
and skip the in-cluster rebuild and verification.
- Around line 300-317: The kustomization currently writes the full pull spec
into images.newName; instead, split IMAGE_TO_USE into IMAGE_NAME and IMAGE_TAG
(parse on the last ':' only if the ':' occurs after the last '/' so you don't
split registry:port), e.g. set IMAGE_NAME and IMAGE_TAG variables (fall back
IMAGE_TAG to "latest" if none), then write both fields in the here-doc as
"newName: ${IMAGE_NAME}" and "newTag: ${IMAGE_TAG}" (update the existing
IMAGE_TO_USE logic and the kustomization heredoc that emits images.newName to
use IMAGE_NAME and IMAGE_TAG).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: c246c45a-0903-4ae3-9df6-e406b4a15c66

📥 Commits

Reviewing files that changed from the base of the PR and between 084fb34 and 93a0d46.

📒 Files selected for processing (1)

test/integration/integration-test-bootstrap.sh

coderabbitai · 2026-04-15T16:07:40Z

+    if [ -n "${IMAGE_FORMAT:-}" ]; then
+        echo "==========================================================================="
+        echo "PROW environment detected - using pre-built operator image"
+        echo "Skipping in-cluster rebuild (IMAGE_FORMAT is set)"
+        echo "==========================================================================="
+        # PROW's IMAGE_FORMAT is like: "registry.ci.../pipeline:${component}"
+        # We want the "src" component which contains our operator binary
+        export PROW_BUILT_IMAGE="${IMAGE_FORMAT//\$\{component\}/src}"
+        echo "Using PROW image: ${PROW_BUILT_IMAGE}"
+        return 0
+    fi


⚠️ Potential issue | 🟠 Major

Scope the PROW fast-path to the PROW profile.

buildOperatorImage() is also used by profileStage(). With the current IMAGE_FORMAT check, any shell or job that happens to export that variable will skip the stage build, skip later build verification, and deploy the CI image instead of the checked-out code. Gate this branch on PROFILE=prow (or an explicit flag) so only Prow takes the shortcut.

Suggested fix

- if [ -n "${IMAGE_FORMAT:-}" ]; then + if [ "${PROFILE}" = "prow" ] && [ -n "${IMAGE_FORMAT:-}" ]; then echo "===========================================================================" echo "PROW environment detected - using pre-built operator image" echo "Skipping in-cluster rebuild (IMAGE_FORMAT is set)"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if [ -n "${IMAGE_FORMAT:-}" ]; then

echo "==========================================================================="

echo "PROW environment detected - using pre-built operator image"

echo "Skipping in-cluster rebuild (IMAGE_FORMAT is set)"

echo "==========================================================================="

# PROW's IMAGE_FORMAT is like: "registry.ci.../pipeline:${component}"

# We want the "src" component which contains our operator binary

export PROW_BUILT_IMAGE="${IMAGE_FORMAT//\$\{component\}/src}"

echo "Using PROW image: ${PROW_BUILT_IMAGE}"

return 0

fi

if [ "${PROFILE}" = "prow" ] && [ -n "${IMAGE_FORMAT:-}" ]; then

echo "==========================================================================="

echo "PROW environment detected - using pre-built operator image"

echo "Skipping in-cluster rebuild (IMAGE_FORMAT is set)"

echo "==========================================================================="

# PROW's IMAGE_FORMAT is like: "registry.ci.../pipeline:${component}"

# We want the "src" component which contains our operator binary

export PROW_BUILT_IMAGE="${IMAGE_FORMAT//\$\{component\}/src}"

echo "Using PROW image: ${PROW_BUILT_IMAGE}"

return 0

fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/integration/integration-test-bootstrap.sh` around lines 253 - 263, The fast-path that skips rebuilding when IMAGE_FORMAT is set should only apply for the Prow profile; update the conditional inside buildOperatorImage() (also used by profileStage()) to require PROFILE=prow (or an explicit PROW flag) in addition to IMAGE_FORMAT being non-empty before returning early, so only when PROFILE equals "prow" do you set PROW_BUILT_IMAGE and skip the in-cluster rebuild and verification.

coderabbitai · 2026-04-15T16:07:40Z

+    # Determine which image to use: PROW's pre-built image or the one we just built
+    local IMAGE_TO_USE
+    if [ -n "${PROW_BUILT_IMAGE:-}" ]; then
+        IMAGE_TO_USE="${PROW_BUILT_IMAGE}"
+        echo "Configuring kustomization to use PROW image: ${IMAGE_TO_USE}"
+    else
+        IMAGE_TO_USE="${BUILD_CONFIG}"
+        echo "Configuring kustomization to use in-cluster built image: ${IMAGE_TO_USE}"
+    fi
+
+    # make the AAO image (quay.io/app-sre/aws-account-operator:latest) resolve to the image we want
    cat <<EOF >./deploy/kustomization.yaml
 resources:
 - operator.yaml
 images:
 - name: quay.io/app-sre/aws-account-operator:latest
-  newName: $BUILD_CONFIG
+  newName: ${IMAGE_TO_USE}
 patchesJson6902:


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Kustomize, for an images:entry replacingquay.io/app-sre/aws-account-operator:latest, does newNameaccept a fully tagged pull spec likeregistry.ci.openshift.org/...:src, or should the tag be provided separately via newTag?

💡 Result:

In Kustomize, for an images: entry replacing quay.io/app-sre/aws-account-operator:latest, the newName field should be set to the registry and image path without the tag (e.g., registry.ci.openshift.org/...), and the tag (e.g., src) should be provided separately via newTag. newName does not accept a fully tagged pull spec like registry.ci.openshift.org/...:src. Kustomize constructs the final image by combining newName + : + newTag (or @ + digest if specified). Example configuration: images: - name: quay.io/app-sre/aws-account-operator newName: registry.ci.openshift.org/... newTag: src This transforms quay.io/app-sre/aws-account-operator:latest (matching on name) to registry.ci.openshift.org/...:src. The name field itself is explicitly "a tag-less image name", and newName follows the same convention as the replacement for that name part.

Citations:

1: https://github.com/kubernetes-sigs/kustomize/blob/master/api/types/image.go

2: https://kubectl.docs.kubernetes.io/references/kustomize/kustomization/images/

3: https://github.com/kubernetes-sigs/kustomize/blob/master/examples/image.md

4: https://oneuptime.com/blog/post/2026-02-09-kustomize-images-tag-management/view

5: https://www.shellhacks.com/kustomize-change-image-tag-or-registry/

Split the Prow pull spec into newName and newTag.

Kustomize's images[].newName field expects only the image registry and repository path, not a full pull spec with a tag. The tag must be provided separately via newTag. Kustomize constructs the final image reference by combining newName + : + newTag. Passing a fully tagged pull spec like registry.ci.openshift.org/...:src to newName produces an invalid image reference.

Parse the Prow image into its name and tag components and populate both newName and newTag in the kustomization.

Suggested fix

- local IMAGE_TO_USE + local IMAGE_NAME_TO_USE IMAGE_TAG_TO_USE if [ -n "${PROW_BUILT_IMAGE:-}" ]; then - IMAGE_TO_USE="${PROW_BUILT_IMAGE}" - echo "Configuring kustomization to use PROW image: ${IMAGE_TO_USE}" + IMAGE_NAME_TO_USE="${PROW_BUILT_IMAGE%:*}" + IMAGE_TAG_TO_USE="${PROW_BUILT_IMAGE##*:}" + echo "Configuring kustomization to use PROW image: ${PROW_BUILT_IMAGE}" else - IMAGE_TO_USE="${BUILD_CONFIG}" - echo "Configuring kustomization to use in-cluster built image: ${IMAGE_TO_USE}" + IMAGE_NAME_TO_USE="${BUILD_CONFIG}" + IMAGE_TAG_TO_USE="latest" + echo "Configuring kustomization to use in-cluster built image: ${IMAGE_NAME_TO_USE}:${IMAGE_TAG_TO_USE}" fi @@ -- name: quay.io/app-sre/aws-account-operator:latest - newName: ${IMAGE_TO_USE} +- name: quay.io/app-sre/aws-account-operator:latest + newName: ${IMAGE_NAME_TO_USE} + newTag: ${IMAGE_TAG_TO_USE}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Determine which image to use: PROW's pre-built image or the one we just built

local IMAGE_TO_USE

if [ -n "${PROW_BUILT_IMAGE:-}" ]; then

IMAGE_TO_USE="${PROW_BUILT_IMAGE}"

echo "Configuring kustomization to use PROW image: ${IMAGE_TO_USE}"

else

IMAGE_TO_USE="${BUILD_CONFIG}"

echo "Configuring kustomization to use in-cluster built image: ${IMAGE_TO_USE}"

fi

# make the AAO image (quay.io/app-sre/aws-account-operator:latest) resolve to the image we want

cat <<EOF >./deploy/kustomization.yaml

resources:

- operator.yaml

images:

- name: quay.io/app-sre/aws-account-operator:latest

newName: $BUILD_CONFIG

newName: ${IMAGE_TO_USE}

patchesJson6902:

# Determine which image to use: PROW's pre-built image or the one we just built

local IMAGE_NAME_TO_USE IMAGE_TAG_TO_USE

if [ -n "${PROW_BUILT_IMAGE:-}" ]; then

IMAGE_NAME_TO_USE="${PROW_BUILT_IMAGE%:*}"

IMAGE_TAG_TO_USE="${PROW_BUILT_IMAGE##*:}"

echo "Configuring kustomization to use PROW image: ${PROW_BUILT_IMAGE}"

else

IMAGE_NAME_TO_USE="${BUILD_CONFIG}"

IMAGE_TAG_TO_USE="latest"

echo "Configuring kustomization to use in-cluster built image: ${IMAGE_NAME_TO_USE}:${IMAGE_TAG_TO_USE}"

fi

# make the AAO image (quay.io/app-sre/aws-account-operator:latest) resolve to the image we want

cat <<EOF >./deploy/kustomization.yaml

resources:

- operator.yaml

images:

- name: quay.io/app-sre/aws-account-operator:latest

newName: ${IMAGE_NAME_TO_USE}

newTag: ${IMAGE_TAG_TO_USE}

patchesJson6902:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/integration/integration-test-bootstrap.sh` around lines 300 - 317, The kustomization currently writes the full pull spec into images.newName; instead, split IMAGE_TO_USE into IMAGE_NAME and IMAGE_TAG (parse on the last ':' only if the ':' occurs after the last '/' so you don't split registry:port), e.g. set IMAGE_NAME and IMAGE_TAG variables (fall back IMAGE_TAG to "latest" if none), then write both fields in the here-doc as "newName: ${IMAGE_NAME}" and "newTag: ${IMAGE_TAG}" (update the existing IMAGE_TO_USE logic and the kustomization heredoc that emits images.newName to use IMAGE_NAME and IMAGE_TAG).

BATMAN-JD · 2026-04-15T16:20:27Z

/retest

IMAGE_FORMAT is not set in PROW environment. Adding comprehensive debug output to find what env vars PROW actually sets for the pre-built operator image. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

BATMAN-JD · 2026-04-15T16:35:53Z

/test integration-test

openshift-ci · 2026-04-15T16:58:38Z

@BATMAN-JD: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/integration-test	`82d3f4f`	link	false	`/test integration-test`
ci/prow/coverage	`93a0d46`	link	true	`/test coverage`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 13, 2026

coderabbitai bot reviewed Apr 13, 2026

View reviewed changes

BATMAN-JD force-pushed the integration_test_go branch from 090cb7b to 084fb34 Compare April 15, 2026 14:39

BATMAN-JD added 2 commits April 15, 2026 10:05

coderabbitai bot reviewed Apr 15, 2026

View reviewed changes

Debug: Search for all image-related env vars in PROW

82d3f4f

IMAGE_FORMAT is not set in PROW environment. Adding comprehensive debug output to find what env vars PROW actually sets for the pre-built operator image. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

		- Cleanup automatic via `t.Cleanup()` - runs even on failure
		- Finalizer removal enabled by default for faster local testing

		- Finalizer removal is enabled by default for faster cleanup
		- Manual cleanup: `oc delete accountclaim <name> -n <namespace>`

Conversation

BATMAN-JD commented Apr 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Infrastructure (test/integration/helpers/)

Tests Migrated (test/integration/tests/)

Build System

Documentation

Testing Status

Benefits Over Bash

Migration Plan

What is being added?

Checklist before requesting review

Steps To Manually Test

Summary by CodeRabbit

Uh oh!

BATMAN-JD commented Apr 13, 2026

Uh oh!

coderabbitai bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

openshift-ci bot commented Apr 13, 2026

Uh oh!

openshift-ci bot commented Apr 13, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

BATMAN-JD commented Apr 13, 2026

Uh oh!

BATMAN-JD commented Apr 13, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

BATMAN-JD commented Apr 13, 2026

Uh oh!

BATMAN-JD commented Apr 15, 2026

Uh oh!

BATMAN-JD commented Apr 15, 2026

Uh oh!

BATMAN-JD commented Apr 15, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

BATMAN-JD commented Apr 15, 2026

Uh oh!

BATMAN-JD commented Apr 15, 2026

Uh oh!

openshift-ci bot commented Apr 15, 2026

BATMAN-JD commented Apr 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 13, 2026 •

edited

Loading