ci-operator: Implement 3-level naming hierarchy changes to CSI mechanism#4900
ci-operator: Implement 3-level naming hierarchy changes to CSI mechanism#4900psalajova wants to merge 2 commits intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
WalkthroughThis PR integrates Google Secret Manager (GSM) into the CI tools infrastructure by relocating GSM types from secretbootstrap to the api package, extending credential references with collection/group/field hierarchies and bundle support, and threading GSM configuration through the build system to enable credential resolution with bundle expansion and auto-discovery capabilities. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~35 minutes ✨ Finishing touches
🧪 Generate unit tests (beta)
Important Action Needed: IP Allowlist UpdateIf your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:
Failure to add the new IP will result in interrupted reviews. Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: psalajova The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
pkg/steps/multi_stage/multi_stage.go
Outdated
| gsmClient *secretmanager.Client | ||
| gsmCredentialsFile string | ||
| gsmConfig *api.GSMConfig | ||
| gsmProjectConfig gsm.Config |
There was a problem hiding this comment.
Is it necessary to send all these params to the multistage step? Couldn't it just use the gsmClient that would be created in defaults maybe?
There was a problem hiding this comment.
Yeah, you have a point, I don't like this amount of new parameters as well... I'll look into it, maybe I can simplify it
There was a problem hiding this comment.
I've moved them all into a common struct
d3b02d3 to
d796b98
Compare
|
/hold |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@pkg/defaults/defaults.go`:
- Around line 171-197: When enableSecretsStoreCSIDriver is true, do not swallow
errors from gsm.GetConfigFromEnv or secretmanager.NewClient; instead return the
initialization error immediately so GSM is not left nil. In the block that
constructs gsmConfiguration (symbols: gsmConfiguration,
enableSecretsStoreCSIDriver, gsm.GetConfigFromEnv, secretmanager.NewClient),
replace the logrus.WithError(...).Error calls with returning a wrapped error
(e.g. fmt.Errorf("gsm init: %w", err)) from the surrounding function; ensure the
function signature and callers are updated to propagate this error instead of
assuming success. Also keep assigning gsmConfiguration.ProjectConfig and
gsmConfiguration.Client only on success.
In `@pkg/steps/multi_stage/multi_stage.go`:
- Around line 225-231: Remove the per-step deferred close of the shared GSM
client: inside the run method where you check s.enableSecretsStoreCSIDriver and
s.gsm (and currently call defer s.gsm.Client.Close()), delete that defer so the
shared s.gsm.Client is not closed by each step (this prevents breaking later
steps and race conditions); ensure the GSM client lifecycle is managed centrally
(closed once at job shutdown) and leave the rest of the logic (the nil checks
and call to s.createSPCs) unchanged.
🧹 Nitpick comments (3)
cmd/ci-secret-generator/main_test.go (1)
581-586: Consider adding a dedicated field for disabled clusters instead of matching test name.Using
tc.namestring matching to determine test behavior is fragile. If the test name changes, this conditional silently stops working correctly.♻️ Suggested improvement
Add a
disabledClustersfield to the test case struct:testCases := []struct { name string config secretgenerator.Config GSMsyncEnabled bool expectedIndexCalls int verifyIndexPayload func(t *testing.T, itemName string, payload []byte) + disabledClusters sets.Set[string] }{Then use
tc.disabledClustersdirectly instead of the name-based conditional.pkg/steps/multi_stage/gsm_bundle_resolver.go (1)
46-54: Non-deterministic group ordering in error message.The
groupListslice is populated by iterating over a map, which has non-deterministic order. This can cause the error message to vary between runs, making debugging and test assertions harder.♻️ Suggested fix
+import "sort" + for key, groups := range mountPathGroups { if len(groups) > 1 { var groupList []string for group := range groups { groupList = append(groupList, group) } + sort.Strings(groupList) return fmt.Errorf("multiple groups (%v) found for collection=%s, mount_path=%s - different groups in the same collection must use different mount paths to avoid file name collisions", groupList, key.collection, key.mountPath) } }pkg/gsm-secrets/secrets.go (1)
87-89: Consider creating a defensive copy of the input slice.The function appends to and sorts the input
secretsListin place, which mutates the caller's slice. While current usage passes an empty slice, this could cause subtle bugs if callers expect their input to remain unchanged.♻️ Suggested defensive copy
func ConstructIndexSecretContent(secretsList []string) []byte { - secretsList = append(secretsList, UpdaterSASecretName) - sort.Strings(secretsList) + result := make([]string, len(secretsList), len(secretsList)+1) + copy(result, secretsList) + result = append(result, UpdaterSASecretName) + sort.Strings(result) var formattedSecrets []string - for _, secret := range secretsList { + for _, secret := range result { formattedSecrets = append(formattedSecrets, fmt.Sprintf("- %s", secret)) } return []byte(strings.Join(formattedSecrets, "\n")) }
05cd174 to
11ea941
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pkg/steps/multi_stage/csi_utils.go (1)
77-95:⚠️ Potential issue | 🟠 MajorInclude
Groupfield in the hash to prevent potential SPC name collisions.The function hashes only
credential.Field, ignoring theGroupfield which is part of the credential's identity in the collection__group__field hierarchy. Two credentials with identical collection, mountPath, and Field but different Group values would produce the same SPC name, even though they reference different GSM secrets. Although such credentials would be grouped together for the same mount path, the current test coverage (TestGetSPCNameCollisionPrevention) does not verify Group differentiation. AddingGroupto the hash input would eliminate this collision risk.pkg/steps/multi_stage/multi_stage.go (1)
503-541:⚠️ Potential issue | 🔴 CriticalUse the same deduplication key as init.go to match full credential identity.
The code deduplicates credentials using only
credential.Name(line 508), butinit.gocreates SPCs using the full credential identifier fromgsm.GetGSMSecretName(credential.Collection, credential.Group, credential.Field). This inconsistency can cause mismatches: if two credentials share the sameNamebut differ inCollection,Group, orField, the second one would be incorrectly skipped here even thoughinit.gowould create separate SPCs for both.Align the deduplication key with
init.goby using the full credential identity (collection + group + field) instead of justName.
🤖 Fix all issues with AI agents
In `@cmd/ci-operator/main.go`:
- Around line 761-766: The startup attempts to load GSM config unconditionally
when o.enableSecretsStoreCSIDriver is true, but o.gsmConfigPath defaults to
empty which causes a file read error; in Complete() validate that if
enableSecretsStoreCSIDriver (o.enableSecretsStoreCSIDriver) is true then
gsmConfigPath (o.gsmConfigPath) must be non-empty and return a clear error if
missing, or alternatively make GSM loading conditional only when gsmConfigPath
is set (i.e., check o.gsmConfigPath != "" before calling
api.LoadGSMConfigFromFile), updating the code paths that call
LoadGSMConfigFromFile to use this validation and preserving existing behavior in
the main startup block that loads the config.
In `@pkg/defaults/defaults.go`:
- Around line 128-149: The GSM secretmanager.Client created by
secretmanager.NewClient in FromConfig (variable gsmClient) is never closed;
immediately after successfully creating gsmClient (right after the
secretmanager.NewClient call) add a defer gsmClient.Close() so the gRPC
connections and background resources are released when FromConfig returns;
ensure this cleanup is placed before assigning gsmClient into gsmConfig to
guarantee the client is closed if FromConfig exits early due to an error.
In `@pkg/steps/multi_stage/init.go`:
- Around line 91-115: The resolved credentials from ResolveCredentialReferences
are only written into the local allSteps copy, so s.pre/s.test/s.post remain
unchanged and later addCredentialsToCensoring reads stale credentials causing
SPC name mismatches; update the real slices after resolving by writing
resolvedCredentials back into the corresponding original step entries (s.pre,
s.test, s.post) or refactor to only use the single resolved allSteps for
subsequent operations (createSPCs, addCredentialsToCensoring,
groupCredentialsByCollectionAndMountPath) so SPC creation and censoring
reference the same credential objects.
🧹 Nitpick comments (1)
pkg/steps/multi_stage/gsm_bundle_resolver.go (1)
88-159: Consider simplifying error handling - aggregate is unnecessary with fail-fast behavior.The code uses
breakafter each error (lines 92, 97, 103, 118, 130, 153), meaning at most one error is ever collected. Usingutilerrors.NewAggregateis unnecessary overhead when the slice will never contain multiple errors.Either:
- Remove the aggregate and return the single error directly, or
- If multiple errors should be collected for better diagnostics, use
continueinstead ofbreakThe current behavior (fail-fast) is reasonable for credential resolution, so option 1 aligns better with the intent.
♻️ Simplify to direct error return
for _, cred := range credentials { if cred.IsBundleReference() { if gsmConfig == nil { - errs = append(errs, fmt.Errorf("bundle reference %q requires gsm-config file, but config is not loaded", cred.Bundle)) - break + return nil, fmt.Errorf("bundle reference %q requires gsm-config file, but config is not loaded", cred.Bundle) } // ... similar pattern for other errors
| if o.enableSecretsStoreCSIDriver { | ||
| err := api.LoadGSMConfigFromFile(o.gsmConfigPath, &o.gsmConfig) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all usages of --enable-secrets-store-csi-driver and --gsm-config flags
echo "=== Searching for --enable-secrets-store-csi-driver usage ==="
rg -n -- '--enable-secrets-store-csi-driver' --type go
echo -e "\n=== Searching for --gsm-config usage ==="
rg -n -- '--gsm-config' --type go
echo -e "\n=== Check the main.go file around flag definitions ==="
rg -n -B3 -A3 'enableSecretsStoreCSIDriver' cmd/ci-operator/main.go | head -60Repository: openshift/ci-tools
Length of output: 2368
🏁 Script executed:
#!/bin/bash
# Find LoadGSMConfigFromFile implementation
echo "=== Searching for LoadGSMConfigFromFile implementation ==="
rg -n 'LoadGSMConfigFromFile' --max-count 20
echo -e "\n=== Searching for test files or CI configs that might invoke ci-operator ==="
find . -name "*.yaml" -o -name "*.yml" | xargs grep -l 'enable-secrets-store-csi-driver' 2>/dev/null | head -10
echo -e "\n=== Look at api package for GSM config loading ==="
rg -n -A10 'func LoadGSMConfigFromFile'Repository: openshift/ci-tools
Length of output: 1744
🏁 Script executed:
#!/bin/bash
echo "=== Check the test fixture with CSI enabled ==="
cat "./pkg/prowgen/testdata/zz_fixture_TestNewProwJobBaseBuilderForTest_multi_stage_test_with_CSI_enabled.yaml" | head -100
echo -e "\n=== Check how ci-secret-bootstrap handles GSM config loading ==="
rg -n -B5 -A5 'LoadGSMConfigFromFile' cmd/ci-secret-bootstrap/main.go | head -40
echo -e "\n=== Check what ReadFileMaybeGZIP does with empty string ==="
rg -n -A10 'func ReadFileMaybeGZIP' pkg/gzipRepository: openshift/ci-tools
Length of output: 2384
🏁 Script executed:
#!/bin/bash
echo "=== Find the gzip package ==="
find . -name gzip.go -o -name '*gzip*' | grep -E '\.go$'
echo -e "\n=== Check ReadFileMaybeGZIP implementation ==="
rg -n -A10 'func ReadFileMaybeGZIP'Repository: openshift/ci-tools
Length of output: 1878
🏁 Script executed:
#!/bin/bash
echo "=== Search for all test fixtures with CSI driver enabled ==="
rg -l 'enable-secrets-store-csi-driver' --type yaml
echo -e "\n=== Check each fixture for gsm-config ==="
rg -n 'enable-secrets-store-csi-driver' --type yaml -A 50 | grep -E '(enable-secrets-store-csi-driver|gsm-config)'Repository: openshift/ci-tools
Length of output: 420
🏁 Script executed:
#!/bin/bash
echo "=== Find tests that use or generate the CSI-enabled fixture ==="
rg -n 'multi_stage_test_with_CSI_enabled' --type go
echo -e "\n=== Look at the prowgen test file ==="
rg -n -B5 -A15 'TestNewProwJobBaseBuilderForTest' pkg/prowgen/prowgen_test.go | head -100Repository: openshift/ci-tools
Length of output: 164
🏁 Script executed:
#!/bin/bash
echo "=== Look for prowgen_test.go file ==="
find . -name 'prowgen*test*.go'
echo -e "\n=== Search for fixtures in prowgen testdata ==="
ls -la pkg/prowgen/testdata/ | grep CSI
echo -e "\n=== Look for code that generates test fixtures ==="
rg -n 'zz_fixture_' pkg/prowgen/ --type go | head -20Repository: openshift/ci-tools
Length of output: 489
🏁 Script executed:
#!/bin/bash
echo "=== Look for CSI test cases in prowgen_test.go ==="
rg -n -B10 -A30 'CSI_enabled' pkg/prowgen/prowgen_test.go
echo -e "\n=== Check if there's a test that validates GSM config requirement ==="
rg -n 'enableSecretsStoreCSIDriver\|gsm' pkg/prowgen/prowgen_test.goRepository: openshift/ci-tools
Length of output: 183
🏁 Script executed:
#!/bin/bash
echo "=== Check the ci-operator main.go Complete() function more closely ==="
rg -n -B20 -A20 'enableSecretsStoreCSIDriver' cmd/ci-operator/main.go | head -80
echo -e "\n=== Look for any documentation or comments about GSM config requirement ==="
rg -n -B3 -A3 'gsm-config' cmd/ci-operator/main.go
echo -e "\n=== Check defaults.FromConfig signature to see where GSM config is used ==="
rg -n 'func FromConfig' pkg/defaults/ --type go -A3Repository: openshift/ci-tools
Length of output: 6413
🏁 Script executed:
#!/bin/bash
echo "=== Check git history of the CSI test fixture to understand its state ==="
git log -n 5 --oneline -- pkg/prowgen/testdata/zz_fixture_TestNewProwJobBaseBuilderForTest_multi_stage_test_with_CSI_enabled.yaml 2>/dev/null || echo "Git history not available"
echo -e "\n=== Check if there are any integration tests or e2e tests ==="
find . -name '*test*.go' | xargs grep -l 'enableSecretsStoreCSIDriver\|enable-secrets-store-csi-driver' 2>/dev/null | head -5
echo -e "\n=== Look at Complete() function to see if there are any other guards ==="
sed -n '750,800p' cmd/ci-operator/main.goRepository: openshift/ci-tools
Length of output: 1778
🏁 Script executed:
#!/bin/bash
echo "=== Check the init_test.go file for CSI-related tests ==="
rg -n -B5 -A15 'enableSecretsStoreCSIDriver\|enable-secrets-store-csi-driver' pkg/steps/multi_stage/init_test.go | head -50
echo -e "\n=== Check if there's validation for gsmConfigPath when CSI is enabled ==="
rg -n 'gsmConfigPath' cmd/ci-operator/main.go
echo -e "\n=== Check if there are any recent changes to CSI or GSM config handling ==="
git log -n 3 --oneline -- cmd/ci-operator/main.go pkg/api/gsm.go 2>/dev/null | head -10Repository: openshift/ci-tools
Length of output: 517
🏁 Script executed:
#!/bin/bash
echo "=== Check if the CSI test fixture is actually being run in tests ==="
rg -n 'zz_fixture_TestNewProwJobBaseBuilderForTest_multi_stage_test_with_CSI_enabled' --type go
echo -e "\n=== Look at both CSI test fixtures ==="
echo "--- simple_test_with_CSI_enabled.yaml ---"
cat pkg/prowgen/testdata/zz_fixture_TestNewProwJobBaseBuilderForTest_simple_test_with_CSI_enabled.yaml | grep -E '(enable-secrets-store-csi-driver|gsm-config)'
echo -e "\n--- multi_stage_test_with_CSI_enabled.yaml ---"
cat pkg/prowgen/testdata/zz_fixture_TestNewProwJobBaseBuilderForTest_multi_stage_test_with_CSI_enabled.yaml | grep -E '(enable-secrets-store-csi-driver|gsm-config)'
echo -e "\n=== Check for any config validation or warnings ==="
rg -n 'gsmConfigPath\|GSMConfig' cmd/ci-operator/main.go -A2 -B2Repository: openshift/ci-tools
Length of output: 358
🏁 Script executed:
#!/bin/bash
echo "=== Look for tests that reference these fixtures ==="
rg -n 'simple_test_with_CSI_enabled\|multi_stage_test_with_CSI_enabled' pkg/prowgen/ --type go -B5 -A5
echo -e "\n=== Check prowgen_test.go structure to understand fixture generation ==="
head -100 pkg/prowgen/prowgen_test.go
echo -e "\n=== Check if there are any guard conditions for empty gsmConfigPath ==="
rg -n 'gsmConfigPath.*==""\|gsmConfigPath.*!=.*""' cmd/ci-operator/ --type go
echo -e "\n=== Search for any documentation about CSI driver and GSM config ==="
find . -name '*.md' -o -name '*.txt' | xargs grep -l 'enable-secrets-store-csi-driver\|gsm-config' 2>/dev/null | head -5Repository: openshift/ci-tools
Length of output: 3167
Add validation to require --gsm-config when --enable-secrets-store-csi-driver is enabled.
The code loads GSM config without verifying that gsmConfigPath is provided. Since gsmConfigPath defaults to an empty string, calling LoadGSMConfigFromFile("", &o.gsmConfig) will fail at startup with a file read error. Test fixtures confirm this problem exists: both CSI-enabled Prow job fixtures specify --enable-secrets-store-csi-driver=true but omit --gsm-config, which would cause startup failure.
Add a validation check in Complete() to ensure gsmConfigPath is non-empty when enableSecretsStoreCSIDriver is true, or update the logic to make GSM config optional when CSI is disabled.
🤖 Prompt for AI Agents
In `@cmd/ci-operator/main.go` around lines 761 - 766, The startup attempts to load
GSM config unconditionally when o.enableSecretsStoreCSIDriver is true, but
o.gsmConfigPath defaults to empty which causes a file read error; in Complete()
validate that if enableSecretsStoreCSIDriver (o.enableSecretsStoreCSIDriver) is
true then gsmConfigPath (o.gsmConfigPath) must be non-empty and return a clear
error if missing, or alternatively make GSM loading conditional only when
gsmConfigPath is set (i.e., check o.gsmConfigPath != "" before calling
api.LoadGSMConfigFromFile), updating the code paths that call
LoadGSMConfigFromFile to use this validation and preserving existing behavior in
the main startup block that loads the config.
| // Initialize GSM configuration if CSI driver is enabled | ||
| var gsmConfig *multi_stage.GSMConfiguration | ||
| if enableSecretsStoreCSIDriver { | ||
| gsmProjectConfig, err := gsm.GetConfigFromEnv() | ||
| if err != nil { | ||
| return nil, nil, fmt.Errorf("failed to get GSM project config from environment: %w", err) | ||
| } | ||
| var opts []option.ClientOption | ||
| if gsmCredentialsFile != "" { | ||
| opts = append(opts, option.WithCredentialsFile(gsmCredentialsFile)) | ||
| } | ||
| gsmClient, err := secretmanager.NewClient(ctx, opts...) | ||
| if err != nil { | ||
| return nil, nil, fmt.Errorf("failed to initialize GSM client: %w", err) | ||
| } | ||
| gsmConfig = &multi_stage.GSMConfiguration{ | ||
| Config: gsmSecretsConfig, | ||
| CredentialsFile: gsmCredentialsFile, | ||
| ProjectConfig: gsmProjectConfig, | ||
| Client: gsmClient, | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check for GSM client Close() calls and related cleanup patterns
rg "gsm.*Client.*Close|\.Client\.Close|gsmConfig.*Close" --type go -B 2 -A 2Repository: openshift/ci-tools
Length of output: 655
🏁 Script executed:
#!/bin/bash
# Also check what happens to gsmConfig after it's returned from this function
# Look at the function signature and how gsmConfig is threaded through
cd /root/workspace && grep -n "gsmConfig" pkg/defaults/defaults.go | head -20Repository: openshift/ci-tools
Length of output: 127
🏁 Script executed:
#!/bin/bash
# Check if there's a cleanup/defer pattern for the returned config in calling code
rg "gsmConfig|GSMConfiguration" --type go -l | head -10Repository: openshift/ci-tools
Length of output: 259
🏁 Script executed:
#!/bin/bash
# Get the function signature and see what's returned
rg "func.*\(" pkg/defaults/defaults.go -A 200 | grep -B 150 "GSMConfiguration" | head -100Repository: openshift/ci-tools
Length of output: 4008
🏁 Script executed:
#!/bin/bash
# Look at the complete context around lines 128-149 including function signature
sed -n '1,200p' pkg/defaults/defaults.go | tail -100Repository: openshift/ci-tools
Length of output: 4223
🏁 Script executed:
#!/bin/bash
# Check where gsmConfig is returned and how it's handled
rg "GSMConfiguration" pkg/defaults/defaults.go -B 5 -A 5Repository: openshift/ci-tools
Length of output: 1616
🏁 Script executed:
#!/bin/bash
# Check the multi_stage.GSMConfiguration struct definition for cleanup methods
rg "type GSMConfiguration" pkg/steps/multi_stage/ -A 10Repository: openshift/ci-tools
Length of output: 875
🏁 Script executed:
#!/bin/bash
# Check if GSMConfiguration has a Close() or cleanup method
rg "func.*GSMConfiguration.*Close|func \(.*\*GSMConfiguration\).*Close|func \(.*GSMConfiguration\).*Close" pkg/ --type goRepository: openshift/ci-tools
Length of output: 44
🏁 Script executed:
#!/bin/bash
# Check how gsmConfig is used in step creation and if there's cleanup
rg "gsmConfig" pkg/steps/multi_stage/ --type go -B 2 -A 2 | head -80Repository: openshift/ci-tools
Length of output: 2575
🏁 Script executed:
#!/bin/bash
# Look for any cleanup/close patterns in multi_stage step execution
rg "type.*Step.*struct|func.*Close|defer.*Close" pkg/steps/multi_stage/multi_stage.go -B 1 -A 1 | head -50Repository: openshift/ci-tools
Length of output: 120
🏁 Script executed:
#!/bin/bash
# Check what the FromConfig return value is used for and if there's cleanup
rg "FromConfig.*\(" pkg/ --type go -B 1 -A 5 | head -60Repository: openshift/ci-tools
Length of output: 3536
🏁 Script executed:
#!/bin/bash
# Look for job execution and cleanup patterns
rg "api.Step.*error\|struct.*api.Step\|Job.*Clean|Cleanup|defer" pkg/defaults/ --type go | head -20Repository: openshift/ci-tools
Length of output: 169
🏁 Script executed:
#!/bin/bash
# Check if the gsmConfig client is ever closed anywhere after step creation
rg "gsmConfig.*nil\|gsmConfig = nil\|Client.Close" pkg/steps/multi_stage/ --type goRepository: openshift/ci-tools
Length of output: 44
🏁 Script executed:
#!/bin/bash
# Check how jobs are executed and if there's cleanup of returned steps
rg "FromConfig\(" cmd/ --type go -B 2 -A 10 | head -80Repository: openshift/ci-tools
Length of output: 5287
🏁 Script executed:
#!/bin/bash
# Look for job execution context and cleanup patterns
rg "executeJobTillCompletion|runJob|Run\(.*ctx\|defer.*close|defer.*Close" cmd/ci-operator/ --type go -B 2 -A 2 | head -60Repository: openshift/ci-tools
Length of output: 44
🏁 Script executed:
#!/bin/bash
# Check if secretmanager.Client needs to be closed
web_search "google cloud secretmanager client go Close"Repository: openshift/ci-tools
Length of output: 112
🌐 Web query:
Google Cloud Secret Manager client Go Close method cleanup
💡 Result:
In the Google Cloud Secret Manager Go client (cloud.google.com/go/secretmanager/apiv1), (*Client).Close() is the cleanup call that closes the underlying gRPC connection(s) to the API service and releases related background resources; you should call it when the client is no longer needed (typically defer client.Close() right after NewClient). [1][2]
Notes:
NewClientreturns a client that “must be Closed” to clean up underlying connections. [2]- Clients are intended to be reused (safe for concurrent use), so don’t create one per request—create once and close once at shutdown. [1]
Sources:
[1] pkg.go.dev – cloud.google.com/go/secretmanager/apiv1 (Client usage + Close docs)
[2] Google Cloud Go reference – Secret Manager NewClient / Close docs
🏁 Script executed:
#!/bin/bash
# Check how steps are executed and if there's opportunity for cleanup
rg "for.*range.*buildSteps\|for.*range.*promotionSteps\|step.Run\(" cmd/ci-operator/ --type go -B 2 -A 5 | head -80Repository: openshift/ci-tools
Length of output: 44
🏁 Script executed:
#!/bin/bash
# Look for deferred cleanup of steps or context cancellation
rg "defer.*cancel\|defer.*func\|cleanup\|Close" cmd/ci-operator/main.go -B 2 -A 2 | head -50Repository: openshift/ci-tools
Length of output: 44
Resource leak: GSM client created in FromConfig must be explicitly closed.
The secretmanager.Client created at line 139 requires explicit cleanup via defer Close() to release underlying gRPC connections and background resources. Currently, the client is stored in gsmConfig but never closed within the FromConfig function or at any visible cleanup point.
The error handling improvement is good, but add defer gsmClient.Close() immediately after successful client initialization to prevent connection leaks.
🤖 Prompt for AI Agents
In `@pkg/defaults/defaults.go` around lines 128 - 149, The GSM
secretmanager.Client created by secretmanager.NewClient in FromConfig (variable
gsmClient) is never closed; immediately after successfully creating gsmClient
(right after the secretmanager.NewClient call) add a defer gsmClient.Close() so
the gRPC connections and background resources are released when FromConfig
returns; ensure this cleanup is placed before assigning gsmClient into gsmConfig
to guarantee the client is closed if FromConfig exits early due to an error.
| // (Crucial) Resolve credential references (bundles, auto-discovery, explicit fields) BEFORE creating SPCs | ||
| // This expands bundle references and auto-discovers fields for each step | ||
| allSteps := append(append(s.pre, s.test...), s.post...) | ||
| for i := range allSteps { | ||
| step := &allSteps[i] | ||
| resolvedCredentials, err := ResolveCredentialReferences( | ||
| ctx, | ||
| step.Credentials, | ||
| s.gsm.Config, | ||
| s.gsm.Client, | ||
| s.gsm.ProjectConfig, | ||
| ) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to resolve credentials for step %s: %w", step.As, err) | ||
| } | ||
| if err := ValidateNoGroupCollisionsOnMountPath(resolvedCredentials); err != nil { | ||
| return fmt.Errorf("invalid credentials for step %s: %w", step.As, err) | ||
| } | ||
| step.Credentials = resolvedCredentials | ||
| } | ||
|
|
||
| logrus.Infof("Creating SPCs for actual credential usage...") | ||
| // Create grouped SPCs for actual credential usage | ||
| for _, step := range append(s.pre, append(s.test, s.post...)...) { | ||
| for _, step := range allSteps { | ||
| collectionMountGroups := groupCredentialsByCollectionAndMountPath(step.Credentials) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's check the file structure and read the code at lines 91-115
cat -n pkg/steps/multi_stage/init.go | sed -n '85,120p'Repository: openshift/ci-tools
Length of output: 1875
🏁 Script executed:
# Look at the rest of the createSPCs function and related methods
cat -n pkg/steps/multi_stage/init.go | sed -n '88,180p'Repository: openshift/ci-tools
Length of output: 4408
🏁 Script executed:
# Search for where createSPCs is called
rg "createSPCs" pkg/steps/multi_stage/init.go -B 2 -A 2Repository: openshift/ci-tools
Length of output: 461
🏁 Script executed:
# Search for calls to createSPCs across the package
rg "createSPCs" pkg/steps/multi_stage/ -B 3 -A 3Repository: openshift/ci-tools
Length of output: 2478
🏁 Script executed:
# Look at multi_stage.go around the createSPCs call
rg -B 10 "createSPCs" pkg/steps/multi_stage/multi_stage.go | head -30Repository: openshift/ci-tools
Length of output: 847
🏁 Script executed:
# Look at the full execution flow in multi_stage.go
rg -B 5 -A 30 "func \(s \*multiStageTestStep\) Run\(" pkg/steps/multi_stage/multi_stage.go | head -70Repository: openshift/ci-tools
Length of output: 1148
🏁 Script executed:
# Look at what happens after createSPCs in the run method
rg -A 100 "if err := s.createSPCs" pkg/steps/multi_stage/multi_stage.go | head -120Repository: openshift/ci-tools
Length of output: 4202
🏁 Script executed:
# Find addCredentialsToCensoring to see what it does
rg -B 5 -A 30 "func.*addCredentialsToCensoring" pkg/steps/multi_stage/Repository: openshift/ci-tools
Length of output: 2742
🏁 Script executed:
# Check the struct definition for api.LiteralTestStep to understand if it's a value or pointer type
rg "type LiteralTestStep struct" pkg/api/ -A 20Repository: openshift/ci-tools
Length of output: 1609
🏁 Script executed:
# Check what ResolveCredentialReferences does and whether it changes credential.Name
rg -B 5 -A 30 "func ResolveCredentialReferences" pkg/Repository: openshift/ci-tools
Length of output: 2972
🏁 Script executed:
# Let's look at the full addCredentialsToCensoring function
rg -A 50 "func \(s \*multiStageTestStep\) addCredentialsToCensoring" pkg/steps/multi_stage/multi_stage.goRepository: openshift/ci-tools
Length of output: 1867
🏁 Script executed:
# Now let me verify the Go slice behavior by checking how the credential resolution is actually used
# Let's see if there's any other place that reads from s.pre/s.test/s.post after createSPCs
rg "s\.pre|s\.test|s\.post" pkg/steps/multi_stage/multi_stage.go -n | head -40Repository: openshift/ci-tools
Length of output: 530
🏁 Script executed:
# Let's understand Go slice append behavior and verify the issue
# Create a test to see if modifications to appended slices persist
cat > /tmp/test_slice.go << 'EOF'
package main
import "fmt"
type Cred struct {
Name string
}
type Step struct {
As string
Cred Cred
}
func main() {
// Similar to s.pre, s.test, s.post
pre := []Step{{As: "pre-step", Cred: Cred{Name: "pre-cred"}}}
test := []Step{{As: "test-step", Cred: Cred{Name: "test-cred"}}}
post := []Step{{As: "post-step", Cred: Cred{Name: "post-cred"}}}
// Similar to what createSPCs does
allSteps := append(append(pre, test...), post...)
// Modify through allSteps
allSteps[0].Cred.Name = "modified-pre"
allSteps[1].Cred.Name = "modified-test"
allSteps[2].Cred.Name = "modified-post"
// Check if original slices were modified
fmt.Println("Original pre:", pre[0].Cred.Name)
fmt.Println("Original test:", test[0].Cred.Name)
fmt.Println("Original post:", post[0].Cred.Name)
}
EOF
go run /tmp/test_slice.goRepository: openshift/ci-tools
Length of output: 134
🏁 Script executed:
# Let's verify the credential usage in addCredentialsToCensoring more carefully
# by checking if the loop reads credential.Name or credential.Collection
rg -B 2 -A 8 "for _, step := range append" pkg/steps/multi_stage/multi_stage.go | grep -A 8 "addCredentialsToCensoring"Repository: openshift/ci-tools
Length of output: 44
🏁 Script executed:
# Let me verify the exact issue more directly - check if modifications
# to appended slices persist with a more precise test matching the actual code pattern
cat > /tmp/test_append.go << 'EOF'
package main
import "fmt"
type CredentialReference struct {
Name string
}
type LiteralTestStep struct {
As string
Credentials []CredentialReference
}
func main() {
pre := []LiteralTestStep{{As: "pre-step", Credentials: []CredentialReference{{Name: "pre-cred"}}}}
test := []LiteralTestStep{{As: "test-step", Credentials: []CredentialReference{{Name: "test-cred"}}}}
post := []LiteralTestStep{{As: "post-step", Credentials: []CredentialReference{{Name: "post-cred"}}}}
// What createSPCs does
allSteps := append(append(pre, test...), post...)
// Modify credentials in allSteps
allSteps[0].Credentials = []CredentialReference{{Name: "resolved-pre"}}
allSteps[1].Credentials = []CredentialReference{{Name: "resolved-test"}}
allSteps[2].Credentials = []CredentialReference{{Name: "resolved-post"}}
// Check original slices
fmt.Println("pre[0].Credentials[0].Name:", pre[0].Credentials[0].Name)
fmt.Println("test[0].Credentials[0].Name:", test[0].Credentials[0].Name)
fmt.Println("post[0].Credentials[0].Name:", post[0].Credentials[0].Name)
// What addCredentialsToCensoring does - reads from s.pre/s.test/s.post
fmt.Println("\nReading from original slices later:")
for _, step := range append(append(pre, test...), post...) {
fmt.Println("Credentials in step:", step.Credentials[0].Name)
}
}
EOF
go run /tmp/test_append.goRepository: openshift/ci-tools
Length of output: 305
🏁 Script executed:
# Read the full ResolveCredentialReferences function
sed -n '/^func ResolveCredentialReferences/,/^func [a-zA-Z]/p' pkg/steps/multi_stage/gsm_bundle_resolver.go | head -100Repository: openshift/ci-tools
Length of output: 3072
Resolved credentials aren't persisted back to s.pre/s.test/s.post.
ResolveCredentialReferences updates only the derived allSteps slice. Since append creates value copies, modifications don't reach the original slices. Later, addCredentialsToCensoring reads from s.pre/s.test/s.post directly and constructs SPC names using original credentials. Because resolved credentials don't populate the Name field, the SPC names created in createSPCs (lines 130–148, using allSteps) differ from those referenced in addCredentialsToCensoring (using original slices), causing a mismatch.
Persist resolved credentials to the actual step slices before building SPCs, or refactor to consistently use a single resolved credential collection throughout.
🤖 Prompt for AI Agents
In `@pkg/steps/multi_stage/init.go` around lines 91 - 115, The resolved
credentials from ResolveCredentialReferences are only written into the local
allSteps copy, so s.pre/s.test/s.post remain unchanged and later
addCredentialsToCensoring reads stale credentials causing SPC name mismatches;
update the real slices after resolving by writing resolvedCredentials back into
the corresponding original step entries (s.pre, s.test, s.post) or refactor to
only use the single resolved allSteps for subsequent operations (createSPCs,
addCredentialsToCensoring, groupCredentialsByCollectionAndMountPath) so SPC
creation and censoring reference the same credential objects.
|
Scheduling required tests: |
|
@psalajova: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This PR implements support for the 3-level GSM secret naming structure (
collection__group__field) in ci-operator, as per the design document.Changes
3-level hierarchy
Secrets now use
collection__group__fieldnaming instead ofcollection__name:Before:
After:
Supported credential resolution modes
There are three ways one can reference credentials:
gsm-config.yaml):Implementation details
gsm-config.yamlavailable at all times, as this is critical for bundle resolution.Jira: https://issues.redhat.com/browse/DPTP-4656