Skip to content

Migrate integration tests from bash to Go - Phase 1#969

Draft
BATMAN-JD wants to merge 8 commits intoopenshift:masterfrom
BATMAN-JD:integration_test_go
Draft

Migrate integration tests from bash to Go - Phase 1#969
BATMAN-JD wants to merge 8 commits intoopenshift:masterfrom
BATMAN-JD:integration_test_go

Conversation

@BATMAN-JD
Copy link
Copy Markdown
Contributor

@BATMAN-JD BATMAN-JD commented Apr 13, 2026

This commit introduces Go-based integration tests to replace the existing bash scripts, improving maintainability, type safety, and IDE integration.

Changes

Infrastructure (test/integration/helpers/)

  • client.go: Kubernetes client management

    • GetKubeClient() - Create controller-runtime client with AAO types
    • GetKubeConfig() - Load kubeconfig (in-cluster or local)
    • CreateNamespace() / DeleteNamespace() - Namespace operations
    • RemoveFinalizers() - Strip finalizers from any resource
  • accountclaim.go: AccountClaim-specific operations

    • CreateFakeAccountClaim() - Create FAKE AccountClaim for testing
    • WaitForAccountClaimReady() - Wait for Ready/Claimed status
    • GetAccountClaim() - Retrieve AccountClaim CR
    • DeleteAccountClaim() - Delete with optional finalizer removal
    • VerifySecretExists() - Check for secret existence

Tests Migrated (test/integration/tests/)

  • fake_accountclaim_test.go: Full migration of test_fake_accountclaim.sh
    • ~30 second runtime (no AWS interaction)
    • Tests FAKE AccountClaim workflow
    • Verifies finalizers, no accountLink, secret creation
    • Automatic cleanup via t.Cleanup()

Build System

  • int-testing.mk: Added Go test targets
    • make test-integration-go - Run all Go integration tests
    • make test-integration-go-fake - Run FAKE AccountClaim test only

Documentation

  • tests/README.md: Comprehensive test guide

    • How to run tests
    • Helper function documentation
    • Writing new tests guide
    • Bash vs Go comparison
  • MIGRATION_STATUS.md: Migration tracking

    • Completed: FAKE AccountClaim test (1/8)
    • TODO: 7 remaining bash tests
    • Benefits and patterns documented

Testing Status

Code compiles successfully:

  • go test -c ./test/integration/tests/
  • go build ./test/integration/helpers/...

Runtime testing blocked by separate issue (backplane QPS rate limiting in local dev - see upcoming PR).

Benefits Over Bash

  • Type-safe API interactions
  • Better error messages and handling
  • Automatic cleanup even on test failure
  • IDE integration (autocomplete, jump-to-definition)
  • Structured test phases (Setup/Validate/Cleanup)
  • Reusable helper libraries

Migration Plan

Phase 1: Foundation (COMPLETE)

  • ✅ Helper infrastructure
  • ✅ First test migrated (FAKE AccountClaim)
  • ✅ Patterns established

Phase 2: Core Tests (NEXT)

  • Migrate high-priority local profile tests
  • Build out Account and AWS helper functions

Phase 3: Advanced Tests

  • Migrate CI/PROW-only tests
  • Handle special cases (KMS, long-running cleanup)

What is being added?

Is this a fix for a bug? What's the bug? Is this a new feature? Please describe it. Is this just a small typo fix? That's fine too!

Checklist before requesting review

  • I have tested this locally
  • I have included unit tests
  • I have updated any corresponding documentation

Steps To Manually Test

Please provide us steps to reproduce the scenarios needed to test this. If an integration test is provided, let us know how to run it. If not, enumerate the steps to validate this does what it's supposed to.

  1. Start the operator
  2. Run the thing
  3. Observe X in the spec for the thing
  4. Clean up the thing

Ref OSD-0000

Summary by CodeRabbit

  • Tests

    • Added Go-based integration test framework and an initial fake AccountClaim end-to-end test
    • Added reusable test helpers for creating, waiting on, verifying, and cleaning up test resources
  • Documentation

    • Added migration status and test-suite README with run instructions, validation checklist, and migration plan
  • Chores

    • Added Make targets for running Go integration tests and adjusted namespace bootstrap/selection and CI/PROW-aware build behavior for test runs

This commit introduces Go-based integration tests to replace the existing
bash scripts, improving maintainability, type safety, and IDE integration.

## Changes

### Infrastructure (test/integration/helpers/)
- **client.go**: Kubernetes client management
  - GetKubeClient() - Create controller-runtime client with AAO types
  - GetKubeConfig() - Load kubeconfig (in-cluster or local)
  - CreateNamespace() / DeleteNamespace() - Namespace operations
  - RemoveFinalizers() - Strip finalizers from any resource

- **accountclaim.go**: AccountClaim-specific operations
  - CreateFakeAccountClaim() - Create FAKE AccountClaim for testing
  - WaitForAccountClaimReady() - Wait for Ready/Claimed status
  - GetAccountClaim() - Retrieve AccountClaim CR
  - DeleteAccountClaim() - Delete with optional finalizer removal
  - VerifySecretExists() - Check for secret existence

### Tests Migrated (test/integration/tests/)
- **fake_accountclaim_test.go**: Full migration of test_fake_accountclaim.sh
  - ~30 second runtime (no AWS interaction)
  - Tests FAKE AccountClaim workflow
  - Verifies finalizers, no accountLink, secret creation
  - Automatic cleanup via t.Cleanup()

### Build System
- **int-testing.mk**: Added Go test targets
  - make test-integration-go - Run all Go integration tests
  - make test-integration-go-fake - Run FAKE AccountClaim test only

### Documentation
- **tests/README.md**: Comprehensive test guide
  - How to run tests
  - Helper function documentation
  - Writing new tests guide
  - Bash vs Go comparison

- **MIGRATION_STATUS.md**: Migration tracking
  - Completed: FAKE AccountClaim test (1/8)
  - TODO: 7 remaining bash tests
  - Benefits and patterns documented

## Testing Status

Code compiles successfully:
- go test -c ./test/integration/tests/
- go build ./test/integration/helpers/...

Runtime testing blocked by separate issue (backplane QPS rate limiting
in local dev - see upcoming PR).

## Benefits Over Bash

- Type-safe API interactions
- Better error messages and handling
- Automatic cleanup even on test failure
- IDE integration (autocomplete, jump-to-definition)
- Structured test phases (Setup/Validate/Cleanup)
- Reusable helper libraries

## Migration Plan

Phase 1: Foundation (COMPLETE)
- ✅ Helper infrastructure
- ✅ First test migrated (FAKE AccountClaim)
- ✅ Patterns established

Phase 2: Core Tests (NEXT)
- Migrate high-priority local profile tests
- Build out Account and AWS helper functions

Phase 3: Advanced Tests
- Migrate CI/PROW-only tests
- Handle special cases (KMS, long-running cleanup)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@BATMAN-JD
Copy link
Copy Markdown
Contributor Author

/test integration-test

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

Warning

Rate limit exceeded

@BATMAN-JD has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 24 minutes and 17 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 24 minutes and 17 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 345d253e-dd3c-4cd0-9e2b-5e08720f680f

📥 Commits

Reviewing files that changed from the base of the PR and between 93a0d46 and 82d3f4f.

📒 Files selected for processing (1)
  • test/integration/integration-test-bootstrap.sh

Walkthrough

Adds Go-based integration test infrastructure and docs: new test helpers for kube client and AccountClaim, Go test Make targets, migration/status documentation and README, a fake AccountClaim Go integration test, and small Makefile/script adjustments for deployment and CI behavior.

Changes

Cohort / File(s) Summary
Migration & Test Docs
test/integration/MIGRATION_STATUS.md, test/integration/tests/README.md
New migration status and tests README describing migration plan, run instructions, test structure, helper guidelines, troubleshooting, and checklist.
Test Helpers — Kubernetes Client
test/integration/helpers/client.go
New helper functions to build a controller-runtime client.Client and scheme, retrieve kubeconfig (in-cluster or KUBECONFIG/~/.kube/config), create/delete namespaces with polling, and remove finalizers via merge patch.
Test Helpers — AccountClaim
test/integration/helpers/accountclaim.go
New AccountClaim utilities: create a FAKE awsv1alpha1.AccountClaim, poll for Ready/Error states and conditions, delete with optional finalizer removal and wait-for-deletion, fetch claim, and verify referenced Secret existence.
Integration Test — Example
test/integration/tests/fake_accountclaim_test.go
New TestFakeAccountClaim exercising FAKE AccountClaim end-to-end: setup namespace and claim, wait-for-ready, validate finalizer/absence of Account link/secret presence, and register cleanup via t.Cleanup.
Make targets for Go tests
test/integration/int-testing.mk
Added test-integration-go and test-integration-go-fake targets to run Go tests; updated prow predeploy to apply with explicit -n ${OPERATOR_NAMESPACE}.
Deployment script / Makefile tweaks
Makefile, test/integration/integration-test-bootstrap.sh
Makefile: silent namespace existence check and oc project ${OPERATOR_NAMESPACE} before applying CRs. Script: PROW detection to skip in-cluster build, use PROW_BUILT_IMAGE when set, and print PROW env vars; build/verify flow adjusted accordingly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 9 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning Test violates single responsibility principle by checking three unrelated behaviors in one subtest, lacks context deadline for bounded operations, and has inconsistent assertion messages. Split 'Validate' subtest into three separate subtests for finalizers, accountLink, and secret; add context deadline with WithTimeout; ensure all assertions include descriptive error messages.
✅ Passed checks (9 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: migrating integration tests from bash to Go in its first phase, which is the core objective of this PR.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering infrastructure changes, migrated tests, build system updates, and documentation. However, it duplicates the template checklist at the end without completing it, and the 'Steps To Manually Test' section still contains placeholder examples rather than actual test instructions.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Stable And Deterministic Test Names ✅ Passed Test names are static strings with no dynamic values. The PR uses standard Go testing.T framework with subtest names 'Setup' and 'Validate' that are deterministic and stable.
Microshift Test Compatibility ✅ Passed The PR adds standard Go integration tests using testing.T and t.Run(), not Ginkgo e2e tests. The MicroShift API compatibility check applies only to Ginkgo tests, making it inapplicable here.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The PR adds standard Go integration tests using the testing package, not Ginkgo e2e tests. SNO compatibility check applies only to new Ginkgo e2e tests with patterns like Describe(), It(), Context(), or When().
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only test infrastructure files without introducing scheduling constraints or topology dependencies in deployment manifests.
Ote Binary Stdout Contract ✅ Passed PR contains no process-level stdout writes violating OTE Binary Stdout Contract. All logging uses testing package methods (t.Logf, t.Log, t.Errorf) properly captured by test framework.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Custom check targets Ginkgo e2e tests, but PR introduces standard Go tests using the testing package with no IPv4 assumptions or external connectivity requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 13, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 13, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: BATMAN-JD
Once this PR has been reviewed and has the lgtm label, please assign reedcort for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
test/integration/helpers/client.go (1)

50-56: Only fall back to local kubeconfig when not running in cluster.

Falling back for every InClusterConfig error can hide real in-cluster configuration failures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integration/helpers/client.go` around lines 50 - 56, The code currently
falls back to local kubeconfig on any rest.InClusterConfig() error; change the
logic to only attempt in-cluster config when the process appears to be running
in-cluster (e.g. check for KUBERNETES_SERVICE_HOST or presence of service
account files) and otherwise load the kubeconfig; update the block around
rest.InClusterConfig() so you call rest.InClusterConfig() only when in-cluster
indicators are present, and otherwise directly use the KUBECONFIG env var (or
clientcmd.BuildConfigFromFlags) to avoid masking real in-cluster errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/integration/helpers/accountclaim.go`:
- Around line 132-134: Replace the overly specific error text in the
secret-fetching failure path: instead of returning fmt.Errorf("secret %s/%s does
not exist: %w", namespace, name, err) when err != nil, change the message to a
generic failure like "failed to get secret %s/%s: %w" (or similar) so
RBAC/API/connectivity errors aren't misreported; keep using fmt.Errorf and wrap
the original err and reference the same namespace and name variables to preserve
context.
- Around line 109-115: The deletion wait loop incorrectly treats
client.IgnoreNotFound(err) == nil as success; instead explicitly handle the
three cases: if err == nil the resource still exists so return (false, nil); if
apierrors.IsNotFound(err) (or equivalent) return (true, nil) to indicate
deletion; for any other non-nil err return (false, err) so the
wait.PollImmediate loop can stop and surface real API errors; update the
anonymous func used in wait.PollImmediate (the c.Get call and its error
handling) to implement this explicit branching.

In `@test/integration/helpers/client.go`:
- Around line 104-110: The polling logic in the wait.PollImmediate callback
incorrectly treats a successful c.Get(ctx, client.ObjectKey{Name: name}, ns) as
deletion and masks all API errors by returning (false, nil); update the callback
used by wait.PollImmediate so that when c.Get returns nil you return (false,
nil) to continue polling (namespace still exists), when c.Get returns a NotFound
error (use client.IgnoreNotFound(err) != nil or k8s errors.IsNotFound) return
(true, nil) to signal deletion, and for any other non-NotFound error return
(false, err) so the poll stops and surfaces the real API/RBAC/network error;
refer to c.Get, client.IgnoreNotFound, and wait.PollImmediate to find and fix
the code.

In `@test/integration/MIGRATION_STATUS.md`:
- Around line 185-186: The documentation incorrectly states that finalizer
removal is enabled by default; update the wording to reflect that finalizer
removal is opt-in via the removeFinalizers argument in the test helpers (i.e.,
callers must pass removeFinalizers to enable it). Locate the line referencing
"Finalizer removal enabled by default" in MIGRATION_STATUS.md and change it to
something like "Finalizer removal is opt-in via the removeFinalizers argument in
the test helpers" so it matches the helpers' behavior.

In `@test/integration/tests/fake_accountclaim_test.go`:
- Around line 48-71: The "Validate" subtest is running even when the "Setup"
subtest fails because you ignore the boolean return of t.Run; capture the result
of t.Run("Setup", func(t *testing.T){...}) into a variable (e.g. ok :=
t.Run(...)) and if ok is false, abort the parent test (call t.Fatalf or return)
before calling t.Run("Validate", ...) so downstream helpers like
helpers.CreateNamespace, helpers.CreateFakeAccountClaim and
helpers.WaitForAccountClaimReady are not relied on when setup fails.

In `@test/integration/tests/README.md`:
- Around line 173-174: Update the README to correct the finalizer-removal
statement: it is not enabled by default—callers must explicitly pass true to
DeleteAccountClaim(..., removeFinalizers bool) to remove finalizers. Locate
references to finalizer behavior and the DeleteAccountClaim function in the docs
(and any examples) and change the wording and example calls to show that
removeFinalizers must be set to true to enable finalizer removal; if there are
sample snippets that omit this boolean, update them to include removeFinalizers:
true or describe that parameter explicitly.

---

Nitpick comments:
In `@test/integration/helpers/client.go`:
- Around line 50-56: The code currently falls back to local kubeconfig on any
rest.InClusterConfig() error; change the logic to only attempt in-cluster config
when the process appears to be running in-cluster (e.g. check for
KUBERNETES_SERVICE_HOST or presence of service account files) and otherwise load
the kubeconfig; update the block around rest.InClusterConfig() so you call
rest.InClusterConfig() only when in-cluster indicators are present, and
otherwise directly use the KUBECONFIG env var (or
clientcmd.BuildConfigFromFlags) to avoid masking real in-cluster errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: af8c926b-8782-464e-9a08-16061fbcc34f

📥 Commits

Reviewing files that changed from the base of the PR and between 9eba2a4 and 72d27d4.

📒 Files selected for processing (6)
  • test/integration/MIGRATION_STATUS.md
  • test/integration/helpers/accountclaim.go
  • test/integration/helpers/client.go
  • test/integration/int-testing.mk
  • test/integration/tests/README.md
  • test/integration/tests/fake_accountclaim_test.go

Comment on lines +109 to +115
return wait.PollImmediate(2*time.Second, timeout, func() (bool, error) {
err := c.Get(ctx, client.ObjectKey{Name: name, Namespace: namespace}, claim)
if client.IgnoreNotFound(err) == nil {
return true, nil // Deleted
}
return false, nil // Still exists
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "accountclaim.go" | head -20

Repository: openshift/aws-account-operator

Length of output: 116


🏁 Script executed:

cat -n test/integration/helpers/accountclaim.go

Repository: openshift/aws-account-operator

Length of output: 5234


Fix error handling in deletion wait loop to propagate non-NotFound errors.

The current code at lines 109-115 has a critical logic error: client.IgnoreNotFound(err) == nil is true when err is either nil OR a NotFound error. This means:

  • When the resource exists (err = nil): incorrectly returns (true, nil) signaling deletion
  • When other API errors occur (connection failures, permission errors, etc.): silently returns (false, nil) and continues polling indefinitely instead of propagating the error

Replace with explicit error type checking:

Suggested fix
+import (
+    apierrors "k8s.io/apimachinery/pkg/api/errors"
+)
...
 	return wait.PollImmediate(2*time.Second, timeout, func() (bool, error) {
 		err := c.Get(ctx, client.ObjectKey{Name: name, Namespace: namespace}, claim)
-		if client.IgnoreNotFound(err) == nil {
-			return true, nil // Deleted
-		}
-		return false, nil // Still exists
+		if err != nil {
+			if apierrors.IsNotFound(err) {
+				return true, nil // Deleted
+			}
+			return false, fmt.Errorf("failed while waiting for AccountClaim deletion %s/%s: %w", namespace, name, err)
+		}
+		return false, nil // Still exists
 	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integration/helpers/accountclaim.go` around lines 109 - 115, The
deletion wait loop incorrectly treats client.IgnoreNotFound(err) == nil as
success; instead explicitly handle the three cases: if err == nil the resource
still exists so return (false, nil); if apierrors.IsNotFound(err) (or
equivalent) return (true, nil) to indicate deletion; for any other non-nil err
return (false, err) so the wait.PollImmediate loop can stop and surface real API
errors; update the anonymous func used in wait.PollImmediate (the c.Get call and
its error handling) to implement this explicit branching.

Comment on lines +132 to +134
if err != nil {
return fmt.Errorf("secret %s/%s does not exist: %w", namespace, name, err)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Secret error message is too specific for all failure modes.

Line 133 says “does not exist” even when the real issue is RBAC or API connectivity. Keep the message generic and include the wrapped error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integration/helpers/accountclaim.go` around lines 132 - 134, Replace the
overly specific error text in the secret-fetching failure path: instead of
returning fmt.Errorf("secret %s/%s does not exist: %w", namespace, name, err)
when err != nil, change the message to a generic failure like "failed to get
secret %s/%s: %w" (or similar) so RBAC/API/connectivity errors aren't
misreported; keep using fmt.Errorf and wrap the original err and reference the
same namespace and name variables to preserve context.

Comment on lines +104 to +110
return wait.PollImmediate(2*time.Second, timeout, func() (bool, error) {
err := c.Get(ctx, client.ObjectKey{Name: name}, ns)
if client.IgnoreNotFound(err) == nil {
return true, nil // Namespace is gone
}
return false, nil // Still exists, keep polling
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

git ls-files test/integration/helpers/client.go

Repository: openshift/aws-account-operator

Length of output: 108


🏁 Script executed:

cat -n test/integration/helpers/client.go | head -120

Repository: openshift/aws-account-operator

Length of output: 4149


Fix error handling in namespace deletion polling to prevent masking API errors.

The polling loop at lines 105-109 silently retries on any error (RBAC, network, etc.) by returning false, nil, which masks these issues and surfaces them only as timeouts instead of with diagnostic detail.

Also note the logic is inverted when c.Get() succeeds with no error: client.IgnoreNotFound(nil) returns nil, causing the condition to return true while the namespace still exists.

Suggested fix
+import (
+    apierrors "k8s.io/apimachinery/pkg/api/errors"
+)
...
 	return wait.PollImmediate(2*time.Second, timeout, func() (bool, error) {
 		err := c.Get(ctx, client.ObjectKey{Name: name}, ns)
-		if client.IgnoreNotFound(err) == nil {
-			return true, nil // Namespace is gone
-		}
-		return false, nil // Still exists, keep polling
+		if err != nil {
+			if apierrors.IsNotFound(err) {
+				return true, nil // Namespace is gone
+			}
+			return false, fmt.Errorf("failed while waiting for namespace deletion %s: %w", name, err)
+		}
+		return false, nil // Still exists, keep polling
 	})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return wait.PollImmediate(2*time.Second, timeout, func() (bool, error) {
err := c.Get(ctx, client.ObjectKey{Name: name}, ns)
if client.IgnoreNotFound(err) == nil {
return true, nil // Namespace is gone
}
return false, nil // Still exists, keep polling
})
return wait.PollImmediate(2*time.Second, timeout, func() (bool, error) {
err := c.Get(ctx, client.ObjectKey{Name: name}, ns)
if err != nil {
if apierrors.IsNotFound(err) {
return true, nil // Namespace is gone
}
return false, fmt.Errorf("failed while waiting for namespace deletion %s: %w", name, err)
}
return false, nil // Still exists, keep polling
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integration/helpers/client.go` around lines 104 - 110, The polling logic
in the wait.PollImmediate callback incorrectly treats a successful c.Get(ctx,
client.ObjectKey{Name: name}, ns) as deletion and masks all API errors by
returning (false, nil); update the callback used by wait.PollImmediate so that
when c.Get returns nil you return (false, nil) to continue polling (namespace
still exists), when c.Get returns a NotFound error (use
client.IgnoreNotFound(err) != nil or k8s errors.IsNotFound) return (true, nil)
to signal deletion, and for any other non-NotFound error return (false, err) so
the poll stops and surfaces the real API/RBAC/network error; refer to c.Get,
client.IgnoreNotFound, and wait.PollImmediate to find and fix the code.

Comment on lines +185 to +186
- Cleanup automatic via `t.Cleanup()` - runs even on failure
- Finalizer removal enabled by default for faster local testing
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Cleanup note overstates default behavior.

Line 186 says finalizer removal is enabled by default; in helpers it is opt-in via the removeFinalizers argument.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integration/MIGRATION_STATUS.md` around lines 185 - 186, The
documentation incorrectly states that finalizer removal is enabled by default;
update the wording to reflect that finalizer removal is opt-in via the
removeFinalizers argument in the test helpers (i.e., callers must pass
removeFinalizers to enable it). Locate the line referencing "Finalizer removal
enabled by default" in MIGRATION_STATUS.md and change it to something like
"Finalizer removal is opt-in via the removeFinalizers argument in the test
helpers" so it matches the helpers' behavior.

Comment on lines +48 to +71
t.Run("Setup", func(t *testing.T) {
t.Logf("Creating namespace: %s", namespaceName)
err := helpers.CreateNamespace(ctx, kubeClient, namespaceName)
if err != nil {
t.Fatalf("Failed to create namespace: %v", err)
}

t.Logf("Creating FAKE AccountClaim: %s", claimName)
_, err = helpers.CreateFakeAccountClaim(ctx, kubeClient, claimName, namespaceName)
if err != nil {
t.Fatalf("Failed to create FAKE AccountClaim: %v", err)
}

t.Logf("Waiting for FAKE AccountClaim to become Ready (timeout: %v)", accountClaimReadyTimeout)
err = helpers.WaitForAccountClaimReady(ctx, kubeClient, claimName, namespaceName, accountClaimReadyTimeout)
if err != nil {
t.Fatalf("FAKE AccountClaim did not become ready: %v", err)
}

t.Log("✓ Setup complete")
})

// Test: Validate FAKE AccountClaim
t.Run("Validate", func(t *testing.T) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the test file
find . -name "fake_accountclaim_test.go" -type f

Repository: openshift/aws-account-operator

Length of output: 124


🏁 Script executed:

# Check the structure of the test file around the mentioned lines
# Also look for similar patterns in the codebase
rg -n "t.Run" test/integration/tests/fake_accountclaim_test.go | head -20

Repository: openshift/aws-account-operator

Length of output: 156


🏁 Script executed:

# Get more context - examine the full test structure
cat -n test/integration/tests/fake_accountclaim_test.go | head -100

Repository: openshift/aws-account-operator

Length of output: 4095


🌐 Web query:

Go testing t.Run subtest t.Fatalf behavior return value

💡 Result:

In Go's testing package, when using t.Run to create a subtest, calling t.Fatalf (or t.Fatal, t.FailNow) inside the subtest function terminates that specific subtest by calling runtime.Goexit after marking it as failed. This runs any deferred functions (like t.Cleanup) in the subtest goroutine but does not execute code after the t.Fatalf call, including any explicit return statement. The parent t.Run call blocks until the subtest completes (via return from the function or Goexit). It then checks the subtest's failed state via !t2.failed and returns false if the subtest failed, true otherwise. This return value indicates success to the caller but does not stop the parent test—subsequent subtests and parent code continue executing. The parent test is marked failed only if using t.Error/t.Errorf (non-fatal), not by t.Fatal in a child subtest. Example behavior: func TestParent(t *testing.T) { if !t.Run("sub", func(t *testing.T) { t.Fatalf("fail") // Subtest ends here, t.Run returns false }) { t.Log("subtest failed, but parent continues") // This runs } t.Log("parent ends") // This runs } Output shows subtest FAIL, parent PASS (unless marked failed otherwise). This design allows table-driven tests to report all failures without one fatal subtest halting siblings, unlike pre-Go 1.7 flat tests where t.Fatalf stopped everything.

Citations:


🌐 Web query:

Go testing nested subtests failure propagation idiomatic pattern

💡 Result:

In Go's testing package, the idiomatic pattern for handling failures in nested subtests (created with t.Run) leverages the distinct behaviors of t.Fail/t.Error (continue execution) and t.FailNow/t.Fatal (stop current subtest). Key failure propagation rules: - t.Error or t.Fail in a subtest marks it (and all ancestors) as failed but continues running the subtest body, subsequent sibling subtests, and parent tests. - t.Fatal or t.FailNow in a subtest immediately stops that subtest (skipping remaining code and unstarted descendant subtests) but allows siblings and the parent to continue. The parent will be marked failed if any subtest fails. - This enables table-driven tests where one failing case doesn't skip others, unlike pre-1.7 t.Fatalf which halted the entire test. Example idiomatic table-driven subtest: func TestTime(t *testing.T) { testCases := []struct { gmt, loc string }{ {"12:31", "Europe/Zurich"}, {"12:31", "America/New_York"}, } for _, tc := range testCases { t.Run(fmt.Sprintf("%s in %s", tc.gmt, tc.loc), func(t *testing.T) { loc, err := time.LoadLocation(tc.loc) if err != nil { t.Fatal("could not load location") // Stops this subtest only } // ... further checks with t.Error if needed to continue }) } } If a subtest Fatals (e.g., bad location), output shows: --- FAIL: TestTime/12:31_in_Europe/Zuri --- FAIL: TestTime/12:31_in_America/New_York Both subtests report, parent fails, but others run. For parallelism: Call t.Parallel in subtests after t.Run; parent waits for all. FailNow skips unstarted parallel descendants in that branch. Best practices: - Use t.Run for table-driven tests to isolate failures. - Prefer t.Fatal in subtests for quick stops on setup errors; t.Error for assertions to run all cases. - Avoid closing over parent t in subtests (leads to FailNow precondition violations). - Run specific subtests: go test -run 'TestTime/12:31_in'. This pattern, introduced in Go 1.7, provides fine-grained control, setup/teardown sharing, and clear hierarchical reporting.

Citations:


Gate Validate on successful Setup.

Validate runs even when Setup fails because the t.Run("Setup", ...) return value is ignored. In Go's testing package, a failed subtest (via t.Fatalf) does not stop the parent test; subsequent code continues executing. This causes cascading failures when dependent tests run without required resources.

Suggested fix
-	t.Run("Setup", func(t *testing.T) {
+	if ok := t.Run("Setup", func(t *testing.T) {
 		t.Logf("Creating namespace: %s", namespaceName)
 		err := helpers.CreateNamespace(ctx, kubeClient, namespaceName)
 		if err != nil {
 			t.Fatalf("Failed to create namespace: %v", err)
 		}
 ...
 		t.Log("✓ Setup complete")
-	})
+	}); !ok {
+		t.Fatalf("Setup failed; skipping Validate")
+	}
 
 	// Test: Validate FAKE AccountClaim
 	t.Run("Validate", func(t *testing.T) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
t.Run("Setup", func(t *testing.T) {
t.Logf("Creating namespace: %s", namespaceName)
err := helpers.CreateNamespace(ctx, kubeClient, namespaceName)
if err != nil {
t.Fatalf("Failed to create namespace: %v", err)
}
t.Logf("Creating FAKE AccountClaim: %s", claimName)
_, err = helpers.CreateFakeAccountClaim(ctx, kubeClient, claimName, namespaceName)
if err != nil {
t.Fatalf("Failed to create FAKE AccountClaim: %v", err)
}
t.Logf("Waiting for FAKE AccountClaim to become Ready (timeout: %v)", accountClaimReadyTimeout)
err = helpers.WaitForAccountClaimReady(ctx, kubeClient, claimName, namespaceName, accountClaimReadyTimeout)
if err != nil {
t.Fatalf("FAKE AccountClaim did not become ready: %v", err)
}
t.Log("✓ Setup complete")
})
// Test: Validate FAKE AccountClaim
t.Run("Validate", func(t *testing.T) {
if ok := t.Run("Setup", func(t *testing.T) {
t.Logf("Creating namespace: %s", namespaceName)
err := helpers.CreateNamespace(ctx, kubeClient, namespaceName)
if err != nil {
t.Fatalf("Failed to create namespace: %v", err)
}
t.Logf("Creating FAKE AccountClaim: %s", claimName)
_, err = helpers.CreateFakeAccountClaim(ctx, kubeClient, claimName, namespaceName)
if err != nil {
t.Fatalf("Failed to create FAKE AccountClaim: %v", err)
}
t.Logf("Waiting for FAKE AccountClaim to become Ready (timeout: %v)", accountClaimReadyTimeout)
err = helpers.WaitForAccountClaimReady(ctx, kubeClient, claimName, namespaceName, accountClaimReadyTimeout)
if err != nil {
t.Fatalf("FAKE AccountClaim did not become ready: %v", err)
}
t.Log("✓ Setup complete")
}); !ok {
t.Fatalf("Setup failed; skipping Validate")
}
// Test: Validate FAKE AccountClaim
t.Run("Validate", func(t *testing.T) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integration/tests/fake_accountclaim_test.go` around lines 48 - 71, The
"Validate" subtest is running even when the "Setup" subtest fails because you
ignore the boolean return of t.Run; capture the result of t.Run("Setup", func(t
*testing.T){...}) into a variable (e.g. ok := t.Run(...)) and if ok is false,
abort the parent test (call t.Fatalf or return) before calling t.Run("Validate",
...) so downstream helpers like helpers.CreateNamespace,
helpers.CreateFakeAccountClaim and helpers.WaitForAccountClaimReady are not
relied on when setup fails.

Comment on lines +173 to +174
- Finalizer removal is enabled by default for faster cleanup
- Manual cleanup: `oc delete accountclaim <name> -n <namespace>`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Finalizer-removal behavior is documented inaccurately.

Line 173 says finalizer removal is enabled by default, but DeleteAccountClaim(..., removeFinalizers bool) requires explicit true at call sites.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integration/tests/README.md` around lines 173 - 174, Update the README
to correct the finalizer-removal statement: it is not enabled by default—callers
must explicitly pass true to DeleteAccountClaim(..., removeFinalizers bool) to
remove finalizers. Locate references to finalizer behavior and the
DeleteAccountClaim function in the docs (and any examples) and change the
wording and example calls to show that removeFinalizers must be set to true to
enable finalizer removal; if there are sample snippets that omit this boolean,
update them to include removeFinalizers: true or describe that parameter
explicitly.

Fixes intermittent 'namespace not found' errors in PROW where
service account creation would fail with 'namespaces ci-op-XXX not found'.

Root cause: When namespace didn't exist, 'oc create' would run but
'oc project' (context switch) would not, leaving subsequent 'oc apply'
commands targeting the wrong namespace.

Now ensures we always switch to aws-account-operator namespace
whether it existed or was just created.
@BATMAN-JD
Copy link
Copy Markdown
Contributor Author

/test integration-test

Fixes ServiceUnavailable error when switching to newly created namespace.

Root cause: 'oc project' was running immediately after 'oc create namespace',
before OpenShift Project API had synced. This caused intermittent
'ServiceUnavailable' errors.

Solution: Reorder operations to:
1. Create namespace (if needed)
2. Wait for namespace to be Active (up to 20s)
3. Then switch context with 'oc project'

This ensures namespace is fully ready before attempting context switch.
@BATMAN-JD
Copy link
Copy Markdown
Contributor Author

/test integration-test

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
Makefile (1)

106-117: ⚠️ Potential issue | 🟠 Major

Fail fast if namespace never becomes Active before switching project.

The loop can finish without success, but execution still continues to Line 117. That can cause flaky downstream failures instead of a clear timeout error.

🔧 Proposed fix
-	# Wait for namespace to be fully ready before switching to it
-	`@echo` "Waiting for namespace to be ready..."
-	`@for` i in {1..10}; do \
-		if oc get namespace ${OPERATOR_NAMESPACE} -o jsonpath='{.status.phase}' 2>/dev/null | grep -q "Active"; then \
-			echo "Namespace is ready"; \
-			break; \
-		fi; \
-		echo "Waiting for namespace (attempt $$i/10)..."; \
-		sleep 2; \
-	done
-	# Switch to the namespace now that it's ready
-	`@oc` project ${OPERATOR_NAMESPACE}
+	# Wait for namespace to be fully ready before switching to it
+	`@echo` "Waiting for namespace to be ready..."
+	`@oc` wait --for=jsonpath='{.status.phase}'=Active namespace/${OPERATOR_NAMESPACE} --timeout=20s
+	# Switch to the namespace now that it's ready
+	`@oc` project ${OPERATOR_NAMESPACE} >/dev/null
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 106 - 117, The namespace readiness loop in the
Makefile may exit after 10 attempts without the namespace becoming Active but
still proceeds to run oc project ${OPERATOR_NAMESPACE}; modify the Makefile so
that after the for-loop finishes you check whether the namespace became Active
(use the same oc get namespace ${OPERATOR_NAMESPACE} -o
jsonpath='{.status.phase}' check or track a success flag), and if it is not
Active print a clear error and exit non-zero (e.g., echo "Namespace
${OPERATOR_NAMESPACE} not ready after timeout" && exit 1) before calling oc
project ${OPERATOR_NAMESPACE}; ensure you update the same block around the
existing loop and oc project call so failures fail fast.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@Makefile`:
- Around line 106-117: The namespace readiness loop in the Makefile may exit
after 10 attempts without the namespace becoming Active but still proceeds to
run oc project ${OPERATOR_NAMESPACE}; modify the Makefile so that after the
for-loop finishes you check whether the namespace became Active (use the same oc
get namespace ${OPERATOR_NAMESPACE} -o jsonpath='{.status.phase}' check or track
a success flag), and if it is not Active print a clear error and exit non-zero
(e.g., echo "Namespace ${OPERATOR_NAMESPACE} not ready after timeout" && exit 1)
before calling oc project ${OPERATOR_NAMESPACE}; ensure you update the same
block around the existing loop and oc project call so failures fail fast.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 7f6db77d-099f-462b-a2a6-741738a37116

📥 Commits

Reviewing files that changed from the base of the PR and between 72d27d4 and a9d7c75.

📒 Files selected for processing (1)
  • Makefile

@BATMAN-JD
Copy link
Copy Markdown
Contributor Author

/test integration-test

Eliminates dependency on OpenShift Project API and oc project context.

Root cause: 'oc project' relies on Project API which is created
asynchronously after namespace creation. Even though namespace is
Active, Project API may still return ServiceUnavailable.

Solution: Use explicit '-n ${OPERATOR_NAMESPACE}' on all namespaced
commands instead of relying on context switching. This is more
reliable and works immediately after namespace creation.

Changes:
- int-testing.mk: Added -n flag to deploy/*.yaml apply
- Makefile: Removed 'oc project' call (no longer needed)
@BATMAN-JD BATMAN-JD force-pushed the integration_test_go branch from 090cb7b to 084fb34 Compare April 15, 2026 14:39
@BATMAN-JD
Copy link
Copy Markdown
Contributor Author

/test integration-test

Introduces a new test approach that uses PROW's pre-built operator image
instead of rebuilding in-cluster with OpenShift Build API.

Problem:
- Bash bootstrap rebuilds operator image in test cluster
- Requires Build API to be available (intermittent PROW failures)
- Wastes time (~2-3 min) rebuilding what PROW already built

Solution:
- New script: test/integration/run-go-tests.sh
  * Uses PROW's pre-built image directly
  * Deploys operator
  * Runs Go integration tests
  * No Build API dependency

- New make target: test-integration-go-prow
  * Can be tested alongside existing bash tests
  * Zero risk to existing workflows

Benefits:
- More reliable (no Build API dependency)
- Faster (skip rebuild step)
- Simpler (~100 lines vs ~600 lines bash)
- Better for Go test migration

Files:
- test/integration/run-go-tests.sh - New simplified runner
- test/integration/GO_TESTS_README.md - Documentation
- Makefile: Added test-integration-go-prow target
- int-testing.mk: Added prow-ci-go-entrypoint target

Testing:
- Can be validated in PR without impacting existing tests
- Falls back to bash tests if needed
- Low-risk additive change
Stop wasting time rebuilding what PROW already built.

Problem:
- PROW builds operator image from PR commit ✓
- Bootstrap script rebuilds SAME code in-cluster ✗
  * Wastes 2-3 minutes
  * Requires OpenShift Build API (intermittently unavailable)
  * Produces identical image to what PROW built

Solution:
- Detect PROW environment (IMAGE_FORMAT variable)
- If PROW: Use pre-built image, skip rebuild
- If local/staging: Build in-cluster as before

Changes to integration-test-bootstrap.sh:
- buildOperatorImage(): Check for IMAGE_FORMAT, skip build if set
- verifyBuildSuccess(): Skip verification when using PROW image
- configureKustomization(): Use PROW image or built image as appropriate

Benefits:
- ✓ Fixes intermittent Build API failures in PROW
- ✓ Saves 2-3 minutes per test run
- ✓ Zero risk: Falls back to rebuild if not in PROW
- ✓ Maintains local dev workflow unchanged

Testing:
- Local dev: Still builds in-cluster (IMAGE_FORMAT not set)
- PROW: Uses pre-built image (IMAGE_FORMAT set by ci-operator)

This is the 'always build fresh' logic we questioned - it made sense
for manual testing but was wasteful in PROW where the image is already
built from the exact commit being tested.
@BATMAN-JD
Copy link
Copy Markdown
Contributor Author

/test integration-test

Adding debug output to understand why IMAGE_FORMAT detection is not
working in PROW environment. This will help us see what variables
are actually set during PROW test runs.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@BATMAN-JD
Copy link
Copy Markdown
Contributor Author

/test integration-test

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/integration/integration-test-bootstrap.sh`:
- Around line 253-263: The fast-path that skips rebuilding when IMAGE_FORMAT is
set should only apply for the Prow profile; update the conditional inside
buildOperatorImage() (also used by profileStage()) to require PROFILE=prow (or
an explicit PROW flag) in addition to IMAGE_FORMAT being non-empty before
returning early, so only when PROFILE equals "prow" do you set PROW_BUILT_IMAGE
and skip the in-cluster rebuild and verification.
- Around line 300-317: The kustomization currently writes the full pull spec
into images.newName; instead, split IMAGE_TO_USE into IMAGE_NAME and IMAGE_TAG
(parse on the last ':' only if the ':' occurs after the last '/' so you don't
split registry:port), e.g. set IMAGE_NAME and IMAGE_TAG variables (fall back
IMAGE_TAG to "latest" if none), then write both fields in the here-doc as
"newName: ${IMAGE_NAME}" and "newTag: ${IMAGE_TAG}" (update the existing
IMAGE_TO_USE logic and the kustomization heredoc that emits images.newName to
use IMAGE_NAME and IMAGE_TAG).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: c246c45a-0903-4ae3-9df6-e406b4a15c66

📥 Commits

Reviewing files that changed from the base of the PR and between 084fb34 and 93a0d46.

📒 Files selected for processing (1)
  • test/integration/integration-test-bootstrap.sh

Comment on lines +253 to +263
if [ -n "${IMAGE_FORMAT:-}" ]; then
echo "==========================================================================="
echo "PROW environment detected - using pre-built operator image"
echo "Skipping in-cluster rebuild (IMAGE_FORMAT is set)"
echo "==========================================================================="
# PROW's IMAGE_FORMAT is like: "registry.ci.../pipeline:${component}"
# We want the "src" component which contains our operator binary
export PROW_BUILT_IMAGE="${IMAGE_FORMAT//\$\{component\}/src}"
echo "Using PROW image: ${PROW_BUILT_IMAGE}"
return 0
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Scope the PROW fast-path to the PROW profile.

buildOperatorImage() is also used by profileStage(). With the current IMAGE_FORMAT check, any shell or job that happens to export that variable will skip the stage build, skip later build verification, and deploy the CI image instead of the checked-out code. Gate this branch on PROFILE=prow (or an explicit flag) so only Prow takes the shortcut.

Suggested fix
-    if [ -n "${IMAGE_FORMAT:-}" ]; then
+    if [ "${PROFILE}" = "prow" ] && [ -n "${IMAGE_FORMAT:-}" ]; then
         echo "==========================================================================="
         echo "PROW environment detected - using pre-built operator image"
         echo "Skipping in-cluster rebuild (IMAGE_FORMAT is set)"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [ -n "${IMAGE_FORMAT:-}" ]; then
echo "==========================================================================="
echo "PROW environment detected - using pre-built operator image"
echo "Skipping in-cluster rebuild (IMAGE_FORMAT is set)"
echo "==========================================================================="
# PROW's IMAGE_FORMAT is like: "registry.ci.../pipeline:${component}"
# We want the "src" component which contains our operator binary
export PROW_BUILT_IMAGE="${IMAGE_FORMAT//\$\{component\}/src}"
echo "Using PROW image: ${PROW_BUILT_IMAGE}"
return 0
fi
if [ "${PROFILE}" = "prow" ] && [ -n "${IMAGE_FORMAT:-}" ]; then
echo "==========================================================================="
echo "PROW environment detected - using pre-built operator image"
echo "Skipping in-cluster rebuild (IMAGE_FORMAT is set)"
echo "==========================================================================="
# PROW's IMAGE_FORMAT is like: "registry.ci.../pipeline:${component}"
# We want the "src" component which contains our operator binary
export PROW_BUILT_IMAGE="${IMAGE_FORMAT//\$\{component\}/src}"
echo "Using PROW image: ${PROW_BUILT_IMAGE}"
return 0
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integration/integration-test-bootstrap.sh` around lines 253 - 263, The
fast-path that skips rebuilding when IMAGE_FORMAT is set should only apply for
the Prow profile; update the conditional inside buildOperatorImage() (also used
by profileStage()) to require PROFILE=prow (or an explicit PROW flag) in
addition to IMAGE_FORMAT being non-empty before returning early, so only when
PROFILE equals "prow" do you set PROW_BUILT_IMAGE and skip the in-cluster
rebuild and verification.

Comment on lines +300 to 317
# Determine which image to use: PROW's pre-built image or the one we just built
local IMAGE_TO_USE
if [ -n "${PROW_BUILT_IMAGE:-}" ]; then
IMAGE_TO_USE="${PROW_BUILT_IMAGE}"
echo "Configuring kustomization to use PROW image: ${IMAGE_TO_USE}"
else
IMAGE_TO_USE="${BUILD_CONFIG}"
echo "Configuring kustomization to use in-cluster built image: ${IMAGE_TO_USE}"
fi

# make the AAO image (quay.io/app-sre/aws-account-operator:latest) resolve to the image we want
cat <<EOF >./deploy/kustomization.yaml
resources:
- operator.yaml
images:
- name: quay.io/app-sre/aws-account-operator:latest
newName: $BUILD_CONFIG
newName: ${IMAGE_TO_USE}
patchesJson6902:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Kustomize, for an images:entry replacingquay.io/app-sre/aws-account-operator:latest, does newNameaccept a fully tagged pull spec likeregistry.ci.openshift.org/...:src, or should the tag be provided separately via newTag?

💡 Result:

In Kustomize, for an images: entry replacing quay.io/app-sre/aws-account-operator:latest, the newName field should be set to the registry and image path without the tag (e.g., registry.ci.openshift.org/...), and the tag (e.g., src) should be provided separately via newTag. newName does not accept a fully tagged pull spec like registry.ci.openshift.org/...:src. Kustomize constructs the final image by combining newName + : + newTag (or @ + digest if specified). Example configuration: images: - name: quay.io/app-sre/aws-account-operator newName: registry.ci.openshift.org/... newTag: src This transforms quay.io/app-sre/aws-account-operator:latest (matching on name) to registry.ci.openshift.org/...:src. The name field itself is explicitly "a tag-less image name", and newName follows the same convention as the replacement for that name part.

Citations:


Split the Prow pull spec into newName and newTag.

Kustomize's images[].newName field expects only the image registry and repository path, not a full pull spec with a tag. The tag must be provided separately via newTag. Kustomize constructs the final image reference by combining newName + : + newTag. Passing a fully tagged pull spec like registry.ci.openshift.org/...:src to newName produces an invalid image reference.

Parse the Prow image into its name and tag components and populate both newName and newTag in the kustomization.

Suggested fix
-    local IMAGE_TO_USE
+    local IMAGE_NAME_TO_USE IMAGE_TAG_TO_USE
     if [ -n "${PROW_BUILT_IMAGE:-}" ]; then
-        IMAGE_TO_USE="${PROW_BUILT_IMAGE}"
-        echo "Configuring kustomization to use PROW image: ${IMAGE_TO_USE}"
+        IMAGE_NAME_TO_USE="${PROW_BUILT_IMAGE%:*}"
+        IMAGE_TAG_TO_USE="${PROW_BUILT_IMAGE##*:}"
+        echo "Configuring kustomization to use PROW image: ${PROW_BUILT_IMAGE}"
     else
-        IMAGE_TO_USE="${BUILD_CONFIG}"
-        echo "Configuring kustomization to use in-cluster built image: ${IMAGE_TO_USE}"
+        IMAGE_NAME_TO_USE="${BUILD_CONFIG}"
+        IMAGE_TAG_TO_USE="latest"
+        echo "Configuring kustomization to use in-cluster built image: ${IMAGE_NAME_TO_USE}:${IMAGE_TAG_TO_USE}"
     fi
@@
-- name: quay.io/app-sre/aws-account-operator:latest
-  newName: ${IMAGE_TO_USE}
+- name: quay.io/app-sre/aws-account-operator:latest
+  newName: ${IMAGE_NAME_TO_USE}
+  newTag: ${IMAGE_TAG_TO_USE}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Determine which image to use: PROW's pre-built image or the one we just built
local IMAGE_TO_USE
if [ -n "${PROW_BUILT_IMAGE:-}" ]; then
IMAGE_TO_USE="${PROW_BUILT_IMAGE}"
echo "Configuring kustomization to use PROW image: ${IMAGE_TO_USE}"
else
IMAGE_TO_USE="${BUILD_CONFIG}"
echo "Configuring kustomization to use in-cluster built image: ${IMAGE_TO_USE}"
fi
# make the AAO image (quay.io/app-sre/aws-account-operator:latest) resolve to the image we want
cat <<EOF >./deploy/kustomization.yaml
resources:
- operator.yaml
images:
- name: quay.io/app-sre/aws-account-operator:latest
newName: $BUILD_CONFIG
newName: ${IMAGE_TO_USE}
patchesJson6902:
# Determine which image to use: PROW's pre-built image or the one we just built
local IMAGE_NAME_TO_USE IMAGE_TAG_TO_USE
if [ -n "${PROW_BUILT_IMAGE:-}" ]; then
IMAGE_NAME_TO_USE="${PROW_BUILT_IMAGE%:*}"
IMAGE_TAG_TO_USE="${PROW_BUILT_IMAGE##*:}"
echo "Configuring kustomization to use PROW image: ${PROW_BUILT_IMAGE}"
else
IMAGE_NAME_TO_USE="${BUILD_CONFIG}"
IMAGE_TAG_TO_USE="latest"
echo "Configuring kustomization to use in-cluster built image: ${IMAGE_NAME_TO_USE}:${IMAGE_TAG_TO_USE}"
fi
# make the AAO image (quay.io/app-sre/aws-account-operator:latest) resolve to the image we want
cat <<EOF >./deploy/kustomization.yaml
resources:
- operator.yaml
images:
- name: quay.io/app-sre/aws-account-operator:latest
newName: ${IMAGE_NAME_TO_USE}
newTag: ${IMAGE_TAG_TO_USE}
patchesJson6902:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integration/integration-test-bootstrap.sh` around lines 300 - 317, The
kustomization currently writes the full pull spec into images.newName; instead,
split IMAGE_TO_USE into IMAGE_NAME and IMAGE_TAG (parse on the last ':' only if
the ':' occurs after the last '/' so you don't split registry:port), e.g. set
IMAGE_NAME and IMAGE_TAG variables (fall back IMAGE_TAG to "latest" if none),
then write both fields in the here-doc as "newName: ${IMAGE_NAME}" and "newTag:
${IMAGE_TAG}" (update the existing IMAGE_TO_USE logic and the kustomization
heredoc that emits images.newName to use IMAGE_NAME and IMAGE_TAG).

@BATMAN-JD
Copy link
Copy Markdown
Contributor Author

/retest

IMAGE_FORMAT is not set in PROW environment. Adding comprehensive
debug output to find what env vars PROW actually sets for the
pre-built operator image.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@BATMAN-JD
Copy link
Copy Markdown
Contributor Author

/test integration-test

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 15, 2026

@BATMAN-JD: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/integration-test 82d3f4f link false /test integration-test
ci/prow/coverage 93a0d46 link true /test coverage

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant