ci-operator: Implement 3-level naming hierarchy changes to CSI mechanism by psalajova · Pull Request #4900 · openshift/ci-tools

psalajova · 2026-01-14T17:04:10Z

This PR implements support for the 3-level GSM secret naming structure (collection__group__field) in ci-operator, as per the design document.

Changes

3-level hierarchy

Secrets now use collection__group__field naming instead of collection__name:
Before:

credentials:
  - name: aws-creds
    collection: test-platform-infra
    mount_path: /tmp/aws

After:

credentials:
  - collection: test-platform-infra
    group: cluster-init
    field: aws-creds
    mount_path: /tmp/aws

Supported credential resolution modes

There are three ways one can reference credentials:

Bundle reference (via gsm-config.yaml):

credentials:
  - bundle: aws-bundle
    mount_path: /tmp/aws

Auto-discovery (lists all fields for collection+group):

credentials:
  - collection: test-platform-infra
    group: cluster-init
    mount_path: /tmp/secrets

Explicit field:

credentials:
  - collection: test-platform-infra
    group: cluster-init
    field: aws-creds
    mount_path: /tmp/aws

Implementation details

Wired gsmConfig through main.go → defaults → multi_stage chain
Credentials are resolved once during createSPCs() before SecretProviderClass creation
Added validation to prevent file collisions when different groups share the same mount path
Censoring SPCs now track credentials by full secret name (collection__group__field)
Updated all tests to use 3-level structure
A separate PR needs to be done which will make sure ci-operator has gsm-config.yaml available at all times, as this is critical for bundle resolution.

Jira: https://issues.redhat.com/browse/DPTP-4656

openshift-ci-robot · 2026-01-14T17:04:12Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

coderabbitai · 2026-01-14T17:04:19Z

Walkthrough

This PR integrates Google Secret Manager (GSM) into the CI tools infrastructure by relocating GSM types from secretbootstrap to the api package, extending credential references with collection/group/field hierarchies and bundle support, and threading GSM configuration through the build system to enable credential resolution with bundle expansion and auto-discovery capabilities.

Changes

Cohort / File(s)	Summary
CI Operator Command Enhancements `cmd/ci-operator/main.go`	Added GSM config support by introducing command-line flags (--gsm-config, --gsm-credentials-file), loading GSM config from file in Complete() when CSI driver is enabled, and passing GSM configuration to the build graph generation in Run().
CI Secret Bootstrap Command Migration `cmd/ci-secret-bootstrap/main.go`	Migrated all GSM-related types from secretbootstrap to api package, updating type signatures and function calls for GSMConfig, GSMBundle, TargetSpec, FieldEntry, RegistryAuthData throughout the codebase.
CI Secret Bootstrap Tests `cmd/ci-secret-bootstrap/main_test.go`	Updated test fixtures and mocks to use api package types instead of secretbootstrap types; added vault API import aliasing to disambiguate from api package.
API Package GSM Types `pkg/api/gsm.go`, `pkg/api/gsm_test.go`	Relocated GSM functionality from secretbootstrap to api package; renamed Bundle to GSMBundle; updated all type references and test data accordingly.
Credential Reference Enhancements `pkg/api/types.go`	Added Bundle, Group, Field fields to CredentialReference for hierarchical credential resolution; made Collection field omitempty; added three new predicate methods (IsAutoDiscovery, IsExplicitField, IsBundleReference).
Generated Deep Copy Methods `pkg/api/zz_generated.deepcopy.go`	Added DeepCopyInto and DeepCopy methods for new/updated types (GSMBundle, GSMConfig, GSMSecretRef, DockerConfigSpec, FieldEntry, RegistryAuthData, TargetSpec) to support proper object duplication.
Defaults Package Integration `pkg/defaults/defaults.go`, `pkg/defaults/defaults_test.go`	Extended FromConfig, fromConfig, and stepForTest signatures to accept and thread GSM configuration; added GSM client initialization when CSI driver is enabled; updated test call sites with new parameters.
Multi-Stage Step Utilities `pkg/steps/multi_stage/csi_utils.go`, `pkg/steps/multi_stage/csi_utils_test.go`	Updated credential-to-secret-name resolution to use GSM utilities; renamed replaceForbiddenSymbolsInCredentialName to restoreForbiddenSymbolsInSecretName; changed SPC name generation to use credential.Field instead of credential.Name; updated test fixtures for new CredentialReference structure.
Multi-Stage Step Core `pkg/steps/multi_stage/multi_stage.go`, `pkg/steps/multi_stage/multi_stage_test.go`, `pkg/steps/multi_stage/gen_test.go`, `pkg/steps/multi_stage/run_test.go`	Added GSMConfiguration struct to encapsulate GSM client and config; extended MultiStageTestStep and newMultiStageTestStep signatures with gsmConfig parameter; updated all test call sites with new parameter.
Credential Resolution Logic `pkg/steps/multi_stage/gsm_bundle_resolver.go`	New file introducing GSM-based credential resolution with bundle expansion, auto-discovery via field listing, and validation of mount path collisions; exported ResolveCredentialReferences and ValidateNoGroupCollisionsOnMountPath functions.
Multi-Stage Step Initialization `pkg/steps/multi_stage/init.go`, `pkg/steps/multi_stage/init_test.go`	Added pre-resolution of credentials for all steps before SPC creation; updated deduplication to use full GSM secret names instead of raw credential names; extended test setup to configure GSM with ProjectConfig.
Web Registry Reference Documentation `pkg/webreg/zz_generated.ci_operator_reference.go`	Updated embedded YAML documentation to reflect bundle-based credential schema with Bundle, Collection, Group, Field fields; adjusted comments to describe new hierarchical credential resolution approach.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Important

Action Needed: IP Allowlist Update

If your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:

✨ 136.113.208.247/32 (new)
34.170.211.100/32
35.222.179.152/32

Failure to add the new IP will result in interrupted reviews.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-01-14T17:14:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: psalajova

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [psalajova]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pkg/api/types.go

Prucek · 2026-01-15T09:51:29Z

pkg/steps/multi_stage/multi_stage.go

+	gsmClient                   *secretmanager.Client
+	gsmCredentialsFile          string
+	gsmConfig                   *api.GSMConfig
+	gsmProjectConfig            gsm.Config


Is it necessary to send all these params to the multistage step? Couldn't it just use the gsmClient that would be created in defaults maybe?

Yeah, you have a point, I don't like this amount of new parameters as well... I'll look into it, maybe I can simplify it

I've moved them all into a common struct

psalajova · 2026-01-20T09:38:15Z

/hold

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@pkg/defaults/defaults.go`:
- Around line 171-197: When enableSecretsStoreCSIDriver is true, do not swallow
errors from gsm.GetConfigFromEnv or secretmanager.NewClient; instead return the
initialization error immediately so GSM is not left nil. In the block that
constructs gsmConfiguration (symbols: gsmConfiguration,
enableSecretsStoreCSIDriver, gsm.GetConfigFromEnv, secretmanager.NewClient),
replace the logrus.WithError(...).Error calls with returning a wrapped error
(e.g. fmt.Errorf("gsm init: %w", err)) from the surrounding function; ensure the
function signature and callers are updated to propagate this error instead of
assuming success. Also keep assigning gsmConfiguration.ProjectConfig and
gsmConfiguration.Client only on success.

In `@pkg/steps/multi_stage/multi_stage.go`:
- Around line 225-231: Remove the per-step deferred close of the shared GSM
client: inside the run method where you check s.enableSecretsStoreCSIDriver and
s.gsm (and currently call defer s.gsm.Client.Close()), delete that defer so the
shared s.gsm.Client is not closed by each step (this prevents breaking later
steps and race conditions); ensure the GSM client lifecycle is managed centrally
(closed once at job shutdown) and leave the rest of the logic (the nil checks
and call to s.createSPCs) unchanged.

🧹 Nitpick comments (3)

cmd/ci-secret-generator/main_test.go (1)
581-586: Consider adding a dedicated field for disabled clusters instead of matching test name.

Using tc.name string matching to determine test behavior is fragile. If the test name changes, this conditional silently stops working correctly.
♻️ Suggested improvement

Add a disabledClusters field to the test case struct:
 testCases := []struct {
     name               string
     config             secretgenerator.Config
     GSMsyncEnabled     bool
     expectedIndexCalls int
     verifyIndexPayload func(t *testing.T, itemName string, payload []byte)
+    disabledClusters   sets.Set[string]
 }{
Then use tc.disabledClusters directly instead of the name-based conditional.
pkg/steps/multi_stage/gsm_bundle_resolver.go (1)
46-54: Non-deterministic group ordering in error message.

The groupList slice is populated by iterating over a map, which has non-deterministic order. This can cause the error message to vary between runs, making debugging and test assertions harder.
♻️ Suggested fix
+import "sort"
+
 for key, groups := range mountPathGroups {
     if len(groups) > 1 {
         var groupList []string
         for group := range groups {
             groupList = append(groupList, group)
         }
+        sort.Strings(groupList)
         return fmt.Errorf("multiple groups (%v) found for collection=%s, mount_path=%s - different groups in the same collection must use different mount paths to avoid file name collisions",
             groupList, key.collection, key.mountPath)
     }
 }
pkg/gsm-secrets/secrets.go (1)
87-89: Consider creating a defensive copy of the input slice.

The function appends to and sorts the input secretsList in place, which mutates the caller's slice. While current usage passes an empty slice, this could cause subtle bugs if callers expect their input to remain unchanged.
♻️ Suggested defensive copy
 func ConstructIndexSecretContent(secretsList []string) []byte {
-	secretsList = append(secretsList, UpdaterSASecretName)
-	sort.Strings(secretsList)
+	result := make([]string, len(secretsList), len(secretsList)+1)
+	copy(result, secretsList)
+	result = append(result, UpdaterSASecretName)
+	sort.Strings(result)

 	var formattedSecrets []string
-	for _, secret := range secretsList {
+	for _, secret := range result {
 		formattedSecrets = append(formattedSecrets, fmt.Sprintf("- %s", secret))
 	}

 	return []byte(strings.Join(formattedSecrets, "\n"))
 }

pkg/defaults/defaults.go

pkg/steps/multi_stage/multi_stage.go

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

pkg/steps/multi_stage/csi_utils.go (1)

77-95: ⚠️ Potential issue | 🟠 Major

Include Group field in the hash to prevent potential SPC name collisions.

The function hashes only credential.Field, ignoring the Group field which is part of the credential's identity in the collection__group__field hierarchy. Two credentials with identical collection, mountPath, and Field but different Group values would produce the same SPC name, even though they reference different GSM secrets. Although such credentials would be grouped together for the same mount path, the current test coverage (TestGetSPCNameCollisionPrevention) does not verify Group differentiation. Adding Group to the hash input would eliminate this collision risk.

pkg/steps/multi_stage/multi_stage.go (1)

503-541: ⚠️ Potential issue | 🔴 Critical

Use the same deduplication key as init.go to match full credential identity.

The code deduplicates credentials using only credential.Name (line 508), but init.go creates SPCs using the full credential identifier from gsm.GetGSMSecretName(credential.Collection, credential.Group, credential.Field). This inconsistency can cause mismatches: if two credentials share the same Name but differ in Collection, Group, or Field, the second one would be incorrectly skipped here even though init.go would create separate SPCs for both.

Align the deduplication key with init.go by using the full credential identity (collection + group + field) instead of just Name.

🤖 Fix all issues with AI agents

In `@cmd/ci-operator/main.go`:
- Around line 761-766: The startup attempts to load GSM config unconditionally
when o.enableSecretsStoreCSIDriver is true, but o.gsmConfigPath defaults to
empty which causes a file read error; in Complete() validate that if
enableSecretsStoreCSIDriver (o.enableSecretsStoreCSIDriver) is true then
gsmConfigPath (o.gsmConfigPath) must be non-empty and return a clear error if
missing, or alternatively make GSM loading conditional only when gsmConfigPath
is set (i.e., check o.gsmConfigPath != "" before calling
api.LoadGSMConfigFromFile), updating the code paths that call
LoadGSMConfigFromFile to use this validation and preserving existing behavior in
the main startup block that loads the config.

In `@pkg/defaults/defaults.go`:
- Around line 128-149: The GSM secretmanager.Client created by
secretmanager.NewClient in FromConfig (variable gsmClient) is never closed;
immediately after successfully creating gsmClient (right after the
secretmanager.NewClient call) add a defer gsmClient.Close() so the gRPC
connections and background resources are released when FromConfig returns;
ensure this cleanup is placed before assigning gsmClient into gsmConfig to
guarantee the client is closed if FromConfig exits early due to an error.

In `@pkg/steps/multi_stage/init.go`:
- Around line 91-115: The resolved credentials from ResolveCredentialReferences
are only written into the local allSteps copy, so s.pre/s.test/s.post remain
unchanged and later addCredentialsToCensoring reads stale credentials causing
SPC name mismatches; update the real slices after resolving by writing
resolvedCredentials back into the corresponding original step entries (s.pre,
s.test, s.post) or refactor to only use the single resolved allSteps for
subsequent operations (createSPCs, addCredentialsToCensoring,
groupCredentialsByCollectionAndMountPath) so SPC creation and censoring
reference the same credential objects.

🧹 Nitpick comments (1)

pkg/steps/multi_stage/gsm_bundle_resolver.go (1)
88-159: Consider simplifying error handling - aggregate is unnecessary with fail-fast behavior.

The code uses break after each error (lines 92, 97, 103, 118, 130, 153), meaning at most one error is ever collected. Using utilerrors.NewAggregate is unnecessary overhead when the slice will never contain multiple errors.

Either:

Remove the aggregate and return the single error directly, or

If multiple errors should be collected for better diagnostics, use continue instead of break

The current behavior (fail-fast) is reasonable for credential resolution, so option 1 aligns better with the intent.
♻️ Simplify to direct error return
 	for _, cred := range credentials {
 		if cred.IsBundleReference() {
 			if gsmConfig == nil {
-				errs = append(errs, fmt.Errorf("bundle reference %q requires gsm-config file, but config is not loaded", cred.Bundle))
-				break
+				return nil, fmt.Errorf("bundle reference %q requires gsm-config file, but config is not loaded", cred.Bundle)
 			}
 			// ... similar pattern for other errors

coderabbitai · 2026-02-03T14:29:51Z

cmd/ci-operator/main.go

+	if o.enableSecretsStoreCSIDriver {
+		err := api.LoadGSMConfigFromFile(o.gsmConfigPath, &o.gsmConfig)
+		if err != nil {
+			return err
+		}
+	}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find all usages of --enable-secrets-store-csi-driver and --gsm-config flags echo "=== Searching for --enable-secrets-store-csi-driver usage ===" rg -n -- '--enable-secrets-store-csi-driver' --type go echo -e "\n=== Searching for --gsm-config usage ===" rg -n -- '--gsm-config' --type go echo -e "\n=== Check the main.go file around flag definitions ===" rg -n -B3 -A3 'enableSecretsStoreCSIDriver' cmd/ci-operator/main.go | head -60

Repository: openshift/ci-tools

Length of output: 2368

🏁 Script executed:

#!/bin/bash # Find LoadGSMConfigFromFile implementation echo "=== Searching for LoadGSMConfigFromFile implementation ===" rg -n 'LoadGSMConfigFromFile' --max-count 20 echo -e "\n=== Searching for test files or CI configs that might invoke ci-operator ===" find . -name "*.yaml" -o -name "*.yml" | xargs grep -l 'enable-secrets-store-csi-driver' 2>/dev/null | head -10 echo -e "\n=== Look at api package for GSM config loading ===" rg -n -A10 'func LoadGSMConfigFromFile'

Repository: openshift/ci-tools

Length of output: 1744

🏁 Script executed:

#!/bin/bash echo "=== Check the test fixture with CSI enabled ===" cat "./pkg/prowgen/testdata/zz_fixture_TestNewProwJobBaseBuilderForTest_multi_stage_test_with_CSI_enabled.yaml" | head -100 echo -e "\n=== Check how ci-secret-bootstrap handles GSM config loading ===" rg -n -B5 -A5 'LoadGSMConfigFromFile' cmd/ci-secret-bootstrap/main.go | head -40 echo -e "\n=== Check what ReadFileMaybeGZIP does with empty string ===" rg -n -A10 'func ReadFileMaybeGZIP' pkg/gzip

Repository: openshift/ci-tools

Length of output: 2384

🏁 Script executed:

#!/bin/bash echo "=== Find the gzip package ===" find . -name gzip.go -o -name '*gzip*' | grep -E '\.go$' echo -e "\n=== Check ReadFileMaybeGZIP implementation ===" rg -n -A10 'func ReadFileMaybeGZIP'

Repository: openshift/ci-tools

Length of output: 1878

🏁 Script executed:

#!/bin/bash echo "=== Search for all test fixtures with CSI driver enabled ===" rg -l 'enable-secrets-store-csi-driver' --type yaml echo -e "\n=== Check each fixture for gsm-config ===" rg -n 'enable-secrets-store-csi-driver' --type yaml -A 50 | grep -E '(enable-secrets-store-csi-driver|gsm-config)'

Repository: openshift/ci-tools

Length of output: 420

🏁 Script executed:

#!/bin/bash echo "=== Find tests that use or generate the CSI-enabled fixture ===" rg -n 'multi_stage_test_with_CSI_enabled' --type go echo -e "\n=== Look at the prowgen test file ===" rg -n -B5 -A15 'TestNewProwJobBaseBuilderForTest' pkg/prowgen/prowgen_test.go | head -100

Repository: openshift/ci-tools

Length of output: 164

🏁 Script executed:

#!/bin/bash echo "=== Look for prowgen_test.go file ===" find . -name 'prowgen*test*.go' echo -e "\n=== Search for fixtures in prowgen testdata ===" ls -la pkg/prowgen/testdata/ | grep CSI echo -e "\n=== Look for code that generates test fixtures ===" rg -n 'zz_fixture_' pkg/prowgen/ --type go | head -20

Repository: openshift/ci-tools

Length of output: 489

🏁 Script executed:

#!/bin/bash echo "=== Look for CSI test cases in prowgen_test.go ===" rg -n -B10 -A30 'CSI_enabled' pkg/prowgen/prowgen_test.go echo -e "\n=== Check if there's a test that validates GSM config requirement ===" rg -n 'enableSecretsStoreCSIDriver\|gsm' pkg/prowgen/prowgen_test.go

Repository: openshift/ci-tools

Length of output: 183

🏁 Script executed:

#!/bin/bash echo "=== Check the ci-operator main.go Complete() function more closely ===" rg -n -B20 -A20 'enableSecretsStoreCSIDriver' cmd/ci-operator/main.go | head -80 echo -e "\n=== Look for any documentation or comments about GSM config requirement ===" rg -n -B3 -A3 'gsm-config' cmd/ci-operator/main.go echo -e "\n=== Check defaults.FromConfig signature to see where GSM config is used ===" rg -n 'func FromConfig' pkg/defaults/ --type go -A3

Repository: openshift/ci-tools

Length of output: 6413

🏁 Script executed:

#!/bin/bash echo "=== Check git history of the CSI test fixture to understand its state ===" git log -n 5 --oneline -- pkg/prowgen/testdata/zz_fixture_TestNewProwJobBaseBuilderForTest_multi_stage_test_with_CSI_enabled.yaml 2>/dev/null || echo "Git history not available" echo -e "\n=== Check if there are any integration tests or e2e tests ===" find . -name '*test*.go' | xargs grep -l 'enableSecretsStoreCSIDriver\|enable-secrets-store-csi-driver' 2>/dev/null | head -5 echo -e "\n=== Look at Complete() function to see if there are any other guards ===" sed -n '750,800p' cmd/ci-operator/main.go

Repository: openshift/ci-tools

Length of output: 1778

🏁 Script executed:

#!/bin/bash echo "=== Check the init_test.go file for CSI-related tests ===" rg -n -B5 -A15 'enableSecretsStoreCSIDriver\|enable-secrets-store-csi-driver' pkg/steps/multi_stage/init_test.go | head -50 echo -e "\n=== Check if there's validation for gsmConfigPath when CSI is enabled ===" rg -n 'gsmConfigPath' cmd/ci-operator/main.go echo -e "\n=== Check if there are any recent changes to CSI or GSM config handling ===" git log -n 3 --oneline -- cmd/ci-operator/main.go pkg/api/gsm.go 2>/dev/null | head -10

Repository: openshift/ci-tools

Length of output: 517

🏁 Script executed:

#!/bin/bash echo "=== Check if the CSI test fixture is actually being run in tests ===" rg -n 'zz_fixture_TestNewProwJobBaseBuilderForTest_multi_stage_test_with_CSI_enabled' --type go echo -e "\n=== Look at both CSI test fixtures ===" echo "--- simple_test_with_CSI_enabled.yaml ---" cat pkg/prowgen/testdata/zz_fixture_TestNewProwJobBaseBuilderForTest_simple_test_with_CSI_enabled.yaml | grep -E '(enable-secrets-store-csi-driver|gsm-config)' echo -e "\n--- multi_stage_test_with_CSI_enabled.yaml ---" cat pkg/prowgen/testdata/zz_fixture_TestNewProwJobBaseBuilderForTest_multi_stage_test_with_CSI_enabled.yaml | grep -E '(enable-secrets-store-csi-driver|gsm-config)' echo -e "\n=== Check for any config validation or warnings ===" rg -n 'gsmConfigPath\|GSMConfig' cmd/ci-operator/main.go -A2 -B2

Repository: openshift/ci-tools

Length of output: 358

🏁 Script executed:

#!/bin/bash echo "=== Look for tests that reference these fixtures ===" rg -n 'simple_test_with_CSI_enabled\|multi_stage_test_with_CSI_enabled' pkg/prowgen/ --type go -B5 -A5 echo -e "\n=== Check prowgen_test.go structure to understand fixture generation ===" head -100 pkg/prowgen/prowgen_test.go echo -e "\n=== Check if there are any guard conditions for empty gsmConfigPath ===" rg -n 'gsmConfigPath.*==""\|gsmConfigPath.*!=.*""' cmd/ci-operator/ --type go echo -e "\n=== Search for any documentation about CSI driver and GSM config ===" find . -name '*.md' -o -name '*.txt' | xargs grep -l 'enable-secrets-store-csi-driver\|gsm-config' 2>/dev/null | head -5

Repository: openshift/ci-tools

Length of output: 3167

Add validation to require --gsm-config when --enable-secrets-store-csi-driver is enabled.

The code loads GSM config without verifying that gsmConfigPath is provided. Since gsmConfigPath defaults to an empty string, calling LoadGSMConfigFromFile("", &o.gsmConfig) will fail at startup with a file read error. Test fixtures confirm this problem exists: both CSI-enabled Prow job fixtures specify --enable-secrets-store-csi-driver=true but omit --gsm-config, which would cause startup failure.

Add a validation check in Complete() to ensure gsmConfigPath is non-empty when enableSecretsStoreCSIDriver is true, or update the logic to make GSM config optional when CSI is disabled.

🤖 Prompt for AI Agents

In `@cmd/ci-operator/main.go` around lines 761 - 766, The startup attempts to load GSM config unconditionally when o.enableSecretsStoreCSIDriver is true, but o.gsmConfigPath defaults to empty which causes a file read error; in Complete() validate that if enableSecretsStoreCSIDriver (o.enableSecretsStoreCSIDriver) is true then gsmConfigPath (o.gsmConfigPath) must be non-empty and return a clear error if missing, or alternatively make GSM loading conditional only when gsmConfigPath is set (i.e., check o.gsmConfigPath != "" before calling api.LoadGSMConfigFromFile), updating the code paths that call LoadGSMConfigFromFile to use this validation and preserving existing behavior in the main startup block that loads the config.

coderabbitai · 2026-02-03T14:29:51Z

pkg/defaults/defaults.go

+	// Initialize GSM configuration if CSI driver is enabled
+	var gsmConfig *multi_stage.GSMConfiguration
+	if enableSecretsStoreCSIDriver {
+		gsmProjectConfig, err := gsm.GetConfigFromEnv()
+		if err != nil {
+			return nil, nil, fmt.Errorf("failed to get GSM project config from environment: %w", err)
+		}
+		var opts []option.ClientOption
+		if gsmCredentialsFile != "" {
+			opts = append(opts, option.WithCredentialsFile(gsmCredentialsFile))
+		}
+		gsmClient, err := secretmanager.NewClient(ctx, opts...)
+		if err != nil {
+			return nil, nil, fmt.Errorf("failed to initialize GSM client: %w", err)
+		}
+		gsmConfig = &multi_stage.GSMConfiguration{
+			Config:          gsmSecretsConfig,
+			CredentialsFile: gsmCredentialsFile,
+			ProjectConfig:   gsmProjectConfig,
+			Client:          gsmClient,
+		}
+	}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check for GSM client Close() calls and related cleanup patterns rg "gsm.*Client.*Close|\.Client\.Close|gsmConfig.*Close" --type go -B 2 -A 2

Repository: openshift/ci-tools

Length of output: 655

🏁 Script executed:

#!/bin/bash # Also check what happens to gsmConfig after it's returned from this function # Look at the function signature and how gsmConfig is threaded through cd /root/workspace && grep -n "gsmConfig" pkg/defaults/defaults.go | head -20

Repository: openshift/ci-tools

Length of output: 127

🏁 Script executed:

#!/bin/bash # Check if there's a cleanup/defer pattern for the returned config in calling code rg "gsmConfig|GSMConfiguration" --type go -l | head -10

Repository: openshift/ci-tools

Length of output: 259

🏁 Script executed:

#!/bin/bash # Get the function signature and see what's returned rg "func.*$" pkg/defaults/defaults.go -A 200 | grep -B 150 "GSMConfiguration" | head -100

Repository: openshift/ci-tools

Length of output: 4008

🏁 Script executed:

#!/bin/bash # Look at the complete context around lines 128-149 including function signature sed -n '1,200p' pkg/defaults/defaults.go | tail -100

Repository: openshift/ci-tools

Length of output: 4223

🏁 Script executed:

#!/bin/bash # Check where gsmConfig is returned and how it's handled rg "GSMConfiguration" pkg/defaults/defaults.go -B 5 -A 5

Repository: openshift/ci-tools

Length of output: 1616

🏁 Script executed:

#!/bin/bash # Check the multi_stage.GSMConfiguration struct definition for cleanup methods rg "type GSMConfiguration" pkg/steps/multi_stage/ -A 10

Repository: openshift/ci-tools

Length of output: 875

🏁 Script executed:

#!/bin/bash # Check if GSMConfiguration has a Close() or cleanup method rg "func.*GSMConfiguration.*Close|func \(.*\*GSMConfiguration$.*Close|func $.*GSMConfiguration$.*Close" pkg/ --type go

Repository: openshift/ci-tools

Length of output: 44

🏁 Script executed:

#!/bin/bash # Check how gsmConfig is used in step creation and if there's cleanup rg "gsmConfig" pkg/steps/multi_stage/ --type go -B 2 -A 2 | head -80

Repository: openshift/ci-tools

Length of output: 2575

🏁 Script executed:

#!/bin/bash # Look for any cleanup/close patterns in multi_stage step execution rg "type.*Step.*struct|func.*Close|defer.*Close" pkg/steps/multi_stage/multi_stage.go -B 1 -A 1 | head -50

Repository: openshift/ci-tools

Length of output: 120

🏁 Script executed:

#!/bin/bash # Check what the FromConfig return value is used for and if there's cleanup rg "FromConfig.*\(" pkg/ --type go -B 1 -A 5 | head -60

Repository: openshift/ci-tools

Length of output: 3536

🏁 Script executed:

#!/bin/bash # Look for job execution and cleanup patterns rg "api.Step.*error\|struct.*api.Step\|Job.*Clean|Cleanup|defer" pkg/defaults/ --type go | head -20

Repository: openshift/ci-tools

Length of output: 169

🏁 Script executed:

#!/bin/bash # Check if the gsmConfig client is ever closed anywhere after step creation rg "gsmConfig.*nil\|gsmConfig = nil\|Client.Close" pkg/steps/multi_stage/ --type go

Repository: openshift/ci-tools

Length of output: 44

🏁 Script executed:

#!/bin/bash # Check how jobs are executed and if there's cleanup of returned steps rg "FromConfig\(" cmd/ --type go -B 2 -A 10 | head -80

Repository: openshift/ci-tools

Length of output: 5287

🏁 Script executed:

#!/bin/bash # Look for job execution context and cleanup patterns rg "executeJobTillCompletion|runJob|Run\(.*ctx\|defer.*close|defer.*Close" cmd/ci-operator/ --type go -B 2 -A 2 | head -60

Repository: openshift/ci-tools

Length of output: 44

🏁 Script executed:

#!/bin/bash # Check if secretmanager.Client needs to be closed web_search "google cloud secretmanager client go Close"

Repository: openshift/ci-tools

Length of output: 112

🌐 Web query:

Google Cloud Secret Manager client Go Close method cleanup

💡 Result:

In the Google Cloud Secret Manager Go client (cloud.google.com/go/secretmanager/apiv1), (*Client).Close() is the cleanup call that closes the underlying gRPC connection(s) to the API service and releases related background resources; you should call it when the client is no longer needed (typically defer client.Close() right after NewClient). [1][2]

Notes:

NewClient returns a client that “must be Closed” to clean up underlying connections. [2]

Clients are intended to be reused (safe for concurrent use), so don’t create one per request—create once and close once at shutdown. [1]

Sources:
[1] pkg.go.dev – cloud.google.com/go/secretmanager/apiv1 (Client usage + Close docs)
[2] Google Cloud Go reference – Secret Manager NewClient / Close docs

🏁 Script executed:

#!/bin/bash # Check how steps are executed and if there's opportunity for cleanup rg "for.*range.*buildSteps\|for.*range.*promotionSteps\|step.Run\(" cmd/ci-operator/ --type go -B 2 -A 5 | head -80

Repository: openshift/ci-tools

Length of output: 44

🏁 Script executed:

#!/bin/bash # Look for deferred cleanup of steps or context cancellation rg "defer.*cancel\|defer.*func\|cleanup\|Close" cmd/ci-operator/main.go -B 2 -A 2 | head -50

Repository: openshift/ci-tools

Length of output: 44

Resource leak: GSM client created in FromConfig must be explicitly closed.

The secretmanager.Client created at line 139 requires explicit cleanup via defer Close() to release underlying gRPC connections and background resources. Currently, the client is stored in gsmConfig but never closed within the FromConfig function or at any visible cleanup point.

The error handling improvement is good, but add defer gsmClient.Close() immediately after successful client initialization to prevent connection leaks.

🤖 Prompt for AI Agents

In `@pkg/defaults/defaults.go` around lines 128 - 149, The GSM secretmanager.Client created by secretmanager.NewClient in FromConfig (variable gsmClient) is never closed; immediately after successfully creating gsmClient (right after the secretmanager.NewClient call) add a defer gsmClient.Close() so the gRPC connections and background resources are released when FromConfig returns; ensure this cleanup is placed before assigning gsmClient into gsmConfig to guarantee the client is closed if FromConfig exits early due to an error.

coderabbitai · 2026-02-03T14:29:52Z

pkg/steps/multi_stage/init.go

+	// (Crucial) Resolve credential references (bundles, auto-discovery, explicit fields) BEFORE creating SPCs
+	// This expands bundle references and auto-discovers fields for each step
+	allSteps := append(append(s.pre, s.test...), s.post...)
+	for i := range allSteps {
+		step := &allSteps[i]
+		resolvedCredentials, err := ResolveCredentialReferences(
+			ctx,
+			step.Credentials,
+			s.gsm.Config,
+			s.gsm.Client,
+			s.gsm.ProjectConfig,
+		)
+		if err != nil {
+			return fmt.Errorf("failed to resolve credentials for step %s: %w", step.As, err)
+		}
+		if err := ValidateNoGroupCollisionsOnMountPath(resolvedCredentials); err != nil {
+			return fmt.Errorf("invalid credentials for step %s: %w", step.As, err)
+		}
+		step.Credentials = resolvedCredentials
+	}
+
 	logrus.Infof("Creating SPCs for actual credential usage...")
 	// Create grouped SPCs for actual credential usage
-	for _, step := range append(s.pre, append(s.test, s.post...)...) {
+	for _, step := range allSteps {
 		collectionMountGroups := groupCredentialsByCollectionAndMountPath(step.Credentials)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's check the file structure and read the code at lines 91-115 cat -n pkg/steps/multi_stage/init.go | sed -n '85,120p'

Repository: openshift/ci-tools

Length of output: 1875

🏁 Script executed:

# Look at the rest of the createSPCs function and related methods cat -n pkg/steps/multi_stage/init.go | sed -n '88,180p'

Repository: openshift/ci-tools

Length of output: 4408

🏁 Script executed:

# Search for where createSPCs is called rg "createSPCs" pkg/steps/multi_stage/init.go -B 2 -A 2

Repository: openshift/ci-tools

Length of output: 461

🏁 Script executed:

# Search for calls to createSPCs across the package rg "createSPCs" pkg/steps/multi_stage/ -B 3 -A 3

Repository: openshift/ci-tools

Length of output: 2478

🏁 Script executed:

# Look at multi_stage.go around the createSPCs call rg -B 10 "createSPCs" pkg/steps/multi_stage/multi_stage.go | head -30

Repository: openshift/ci-tools

Length of output: 847

🏁 Script executed:

# Look at the full execution flow in multi_stage.go rg -B 5 -A 30 "func $s \*multiStageTestStep$ Run\(" pkg/steps/multi_stage/multi_stage.go | head -70

Repository: openshift/ci-tools

Length of output: 1148

🏁 Script executed:

# Look at what happens after createSPCs in the run method rg -A 100 "if err := s.createSPCs" pkg/steps/multi_stage/multi_stage.go | head -120

Repository: openshift/ci-tools

Length of output: 4202

🏁 Script executed:

# Find addCredentialsToCensoring to see what it does rg -B 5 -A 30 "func.*addCredentialsToCensoring" pkg/steps/multi_stage/

Repository: openshift/ci-tools

Length of output: 2742

🏁 Script executed:

# Check the struct definition for api.LiteralTestStep to understand if it's a value or pointer type rg "type LiteralTestStep struct" pkg/api/ -A 20

Repository: openshift/ci-tools

Length of output: 1609

🏁 Script executed:

# Check what ResolveCredentialReferences does and whether it changes credential.Name rg -B 5 -A 30 "func ResolveCredentialReferences" pkg/

Repository: openshift/ci-tools

Length of output: 2972

🏁 Script executed:

# Let's look at the full addCredentialsToCensoring function rg -A 50 "func $s \*multiStageTestStep$ addCredentialsToCensoring" pkg/steps/multi_stage/multi_stage.go

Repository: openshift/ci-tools

Length of output: 1867

🏁 Script executed:

# Now let me verify the Go slice behavior by checking how the credential resolution is actually used # Let's see if there's any other place that reads from s.pre/s.test/s.post after createSPCs rg "s\.pre|s\.test|s\.post" pkg/steps/multi_stage/multi_stage.go -n | head -40

Repository: openshift/ci-tools

Length of output: 530

🏁 Script executed:

# Let's understand Go slice append behavior and verify the issue # Create a test to see if modifications to appended slices persist cat > /tmp/test_slice.go << 'EOF' package main import "fmt" type Cred struct { Name string } type Step struct { As string Cred Cred } func main() { // Similar to s.pre, s.test, s.post pre := []Step{{As: "pre-step", Cred: Cred{Name: "pre-cred"}}} test := []Step{{As: "test-step", Cred: Cred{Name: "test-cred"}}} post := []Step{{As: "post-step", Cred: Cred{Name: "post-cred"}}} // Similar to what createSPCs does allSteps := append(append(pre, test...), post...) // Modify through allSteps allSteps[0].Cred.Name = "modified-pre" allSteps[1].Cred.Name = "modified-test" allSteps[2].Cred.Name = "modified-post" // Check if original slices were modified fmt.Println("Original pre:", pre[0].Cred.Name) fmt.Println("Original test:", test[0].Cred.Name) fmt.Println("Original post:", post[0].Cred.Name) } EOF go run /tmp/test_slice.go

Repository: openshift/ci-tools

Length of output: 134

🏁 Script executed:

# Let's verify the credential usage in addCredentialsToCensoring more carefully # by checking if the loop reads credential.Name or credential.Collection rg -B 2 -A 8 "for _, step := range append" pkg/steps/multi_stage/multi_stage.go | grep -A 8 "addCredentialsToCensoring"

Repository: openshift/ci-tools

Length of output: 44

🏁 Script executed:

# Let me verify the exact issue more directly - check if modifications # to appended slices persist with a more precise test matching the actual code pattern cat > /tmp/test_append.go << 'EOF' package main import "fmt" type CredentialReference struct { Name string } type LiteralTestStep struct { As string Credentials []CredentialReference } func main() { pre := []LiteralTestStep{{As: "pre-step", Credentials: []CredentialReference{{Name: "pre-cred"}}}} test := []LiteralTestStep{{As: "test-step", Credentials: []CredentialReference{{Name: "test-cred"}}}} post := []LiteralTestStep{{As: "post-step", Credentials: []CredentialReference{{Name: "post-cred"}}}} // What createSPCs does allSteps := append(append(pre, test...), post...) // Modify credentials in allSteps allSteps[0].Credentials = []CredentialReference{{Name: "resolved-pre"}} allSteps[1].Credentials = []CredentialReference{{Name: "resolved-test"}} allSteps[2].Credentials = []CredentialReference{{Name: "resolved-post"}} // Check original slices fmt.Println("pre[0].Credentials[0].Name:", pre[0].Credentials[0].Name) fmt.Println("test[0].Credentials[0].Name:", test[0].Credentials[0].Name) fmt.Println("post[0].Credentials[0].Name:", post[0].Credentials[0].Name) // What addCredentialsToCensoring does - reads from s.pre/s.test/s.post fmt.Println("\nReading from original slices later:") for _, step := range append(append(pre, test...), post...) { fmt.Println("Credentials in step:", step.Credentials[0].Name) } } EOF go run /tmp/test_append.go

Repository: openshift/ci-tools

Length of output: 305

🏁 Script executed:

# Read the full ResolveCredentialReferences function sed -n '/^func ResolveCredentialReferences/,/^func [a-zA-Z]/p' pkg/steps/multi_stage/gsm_bundle_resolver.go | head -100

Repository: openshift/ci-tools

Length of output: 3072

Resolved credentials aren't persisted back to s.pre/s.test/s.post.

ResolveCredentialReferences updates only the derived allSteps slice. Since append creates value copies, modifications don't reach the original slices. Later, addCredentialsToCensoring reads from s.pre/s.test/s.post directly and constructs SPC names using original credentials. Because resolved credentials don't populate the Name field, the SPC names created in createSPCs (lines 130–148, using allSteps) differ from those referenced in addCredentialsToCensoring (using original slices), causing a mismatch.

Persist resolved credentials to the actual step slices before building SPCs, or refactor to consistently use a single resolved credential collection throughout.

🤖 Prompt for AI Agents

In `@pkg/steps/multi_stage/init.go` around lines 91 - 115, The resolved credentials from ResolveCredentialReferences are only written into the local allSteps copy, so s.pre/s.test/s.post remain unchanged and later addCredentialsToCensoring reads stale credentials causing SPC name mismatches; update the real slices after resolving by writing resolvedCredentials back into the corresponding original step entries (s.pre, s.test, s.post) or refactor to only use the single resolved allSteps for subsequent operations (createSPCs, addCredentialsToCensoring, groupCredentialsByCollectionAndMountPath) so SPC creation and censoring reference the same credential objects.

openshift-ci-robot · 2026-02-03T15:47:46Z

Scheduling required tests:
/test e2e

openshift-ci · 2026-02-03T16:43:17Z

@psalajova: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/breaking-changes	`11ea941`	link	false	`/test breaking-changes`
ci/prow/e2e	`11ea941`	link	true	`/test e2e`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 14, 2026

openshift-ci bot requested review from Prucek and hector-vido January 14, 2026 17:13

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 14, 2026

psalajova mentioned this pull request Jan 15, 2026

Add gsm-config.yaml openshift/release#73557

Merged

Prucek reviewed Jan 15, 2026

View reviewed changes

psalajova mentioned this pull request Jan 16, 2026

Add gsm-config ConfigMap to prowgen #4903

Open

psalajova force-pushed the ci-operator-include-groups-and-mapping-file branch 2 times, most recently from d3b02d3 to d796b98 Compare January 19, 2026 16:16

psalajova changed the title ~~[WIP] ci-operator: Implement 3-level naming hierarchy changes to CSI mechanism~~ ci-operator: Implement 3-level naming hierarchy changes to CSI mechanism Jan 19, 2026

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 19, 2026

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 20, 2026

coderabbitai bot reviewed Jan 20, 2026

View reviewed changes

pkg/defaults/defaults.go Outdated Show resolved Hide resolved

pkg/steps/multi_stage/multi_stage.go Show resolved Hide resolved

psalajova added 2 commits February 3, 2026 13:38

Refactoring: Rename and move GSMBundle

46824e5

Implement 3-level structure changes

11ea941

psalajova force-pushed the ci-operator-include-groups-and-mapping-file branch from 05cd174 to 11ea941 Compare February 3, 2026 14:08

coderabbitai bot reviewed Feb 3, 2026

View reviewed changes

Conversation

psalajova commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

3-level hierarchy

Supported credential resolution modes

Implementation details

Uh oh!

openshift-ci-robot commented Jan 14, 2026

Uh oh!

coderabbitai bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

openshift-ci bot commented Jan 14, 2026

Uh oh!

Uh oh!

Prucek Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

psalajova Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

psalajova Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

psalajova commented Jan 20, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Feb 3, 2026

Uh oh!

openshift-ci bot commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

psalajova commented Jan 14, 2026 •

edited

Loading

coderabbitai bot commented Jan 14, 2026 •

edited

Loading