Skip to content

test(e2e): improve test reliability and fix silent failures#730

Open
skevetter wants to merge 6 commits intomainfrom
e2e-test-audit
Open

test(e2e): improve test reliability and fix silent failures#730
skevetter wants to merge 6 commits intomainfrom
e2e-test-audit

Conversation

@skevetter
Copy link
Copy Markdown
Owner

@skevetter skevetter commented Apr 13, 2026

Summary

  • Replace context.Background() with Ginkgo-managed cleanup contexts to prevent goroutine leaks
  • Add SpecTimeout to provider tests and fix cleanup issues
  • Fix goroutine leaks and improve test safety across the e2e suite
  • Harden test suite reliability and fix silent failures

Summary by CodeRabbit

Release Notes

  • Tests
    • Added explicit per-spec and env-var timeouts to improve e2e stability
    • Improved context propagation for test operations and cleanup callbacks
    • Hardened test setup and teardown with stronger error reporting and cleanup tracking
    • Replaced ad-hoc cleanup/teardown patterns with framework-managed cleanup hooks
    • Consolidated Docker provider setup into a single reusable helper to reduce duplication

Fix critical issues in the e2e test infrastructure:
- Add 30s timeout to agent URL wait loop to prevent infinite hangs
- Propagate errors in SetupGPG instead of silently returning nil
- Use dedicated http.ServeMux in ServeAgent to avoid global state conflicts
- Add explicit timeouts/polling to all Eventually assertions missing them
- Replace bare defer cleanup with ginkgo.DeferCleanup for timeout safety
- Remove unnecessary 3s sleep in SSH port forwarding test
- Consolidate duplicate setupDockerProvider into framework.SetupDockerProvider
- Delete 47 lines of commented-out GPG forwarding test code
- Fix "nachine" typos and apt-get install spacing in error messages
…nd add SpecTimeout

DeferCleanup callbacks now accept context.Context to use Ginkgo's
cancellable cleanup context instead of context.Background(). Added
SpecTimeout to test specs that were missing timeout protection.
Updated DevPodMachineCreate/Delete to accept context parameter.
- Add SpecTimeout to 10 It blocks in build.go and ssh.go that lacked
  timeout protection
- Fix goroutine leak in ssh.go port forwarding test by tracking with
  sync.WaitGroup and registering DeferCleanup
- Fix temp dir leak in ide.go by registering DeferCleanup immediately
  after CopyToTempDir
- Fix wsl.go server lifecycle: register DeferCleanup(server.Close)
  immediately after creation, remove redundant explicit Close()
- Add env var validation in up_private_token.go (skip if unset) and
  register credential file cleanup before writing
- Add gomega empty-array assertions in context.go before IDE list loops
  to prevent false negatives on empty results
- Add SpecTimeout to 11 It blocks in provider.go missing timeout
  protection
- Remove duplicate DeferCleanup in machineprovider.go that would
  delete the workspace twice
- Standardize SpecTimeout position in machineprovider.go (move from
  trailing to inline with It declaration)
- Remove bare defer in provider_podman.go BeforeEach
- Fix DeferCleanup ordering in provider_docker.go so Stop runs before
  Remove (LIFO order)
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

Added per-spec timeouts and context propagation across e2e tests, consolidated Docker provider setup into a new framework helper, improved error handling in GPG setup, replaced DefaultServeMux usage with a dedicated HTTP mux, and refactored SSH test goroutine and cleanup handling.

Changes

Cohort / File(s) Summary
Framework Core Updates
e2e/e2e_suite_test.go, e2e/framework/command.go, e2e/framework/server_utils.go
Added 30s env-var startup timeout; DevPodMachineCreate/DevPodMachineDelete now accept context.Context; added SetupDockerProvider(binDir, dockerPath) (*Framework, error); improved SetupGPG error returns; ServeAgent uses dedicated http.ServeMux and serves with that mux.
Build & Feature Tests
e2e/tests/build/build.go, e2e/tests/up-features/up_features.go, e2e/tests/up-features/wsl.go
Wrapped many specs with ginkgo.SpecTimeout(framework.GetTimeout()); removed context.Background() from DeferCleanup workspace deletions; registered HTTP server cleanup via ginkgo.DeferCleanup.
Provider, Machine & Context Tests
e2e/tests/provider/provider.go, e2e/tests/machine/machine.go, e2e/tests/machineprovider/machineprovider.go, e2e/tests/context/context.go
Added spec-level timeouts; tests now pass ctx into machine provider calls; added gomega assertions and IDE constant in context tests; restructured ginkgo.It bodies to include timeouts.
Integration, SSH & IDE Tests
e2e/tests/integration/integration.go, e2e/tests/ssh/ssh.go, e2e/tests/ide/ide.go, e2e/tests/upgrade/upgrade.go, e2e/tests/dockerinstall/dockerinstall.go
Applied spec timeouts; replaced raw goroutines with sync.WaitGroup in SSH port-forwarding; removed fixed sleeps; switched file removals to ginkgo.DeferCleanup; added extra temp-dir cleanup in IDE tests.
Docker Compose & Eventually Assertions
e2e/tests/up-docker-compose/build.go, e2e/tests/up-docker-compose/helper.go, e2e/tests/up/provider_docker.go
Configured Gomega Eventually calls with explicit WithTimeout/WithPolling; delegate docker provider setup to framework.SetupDockerProvider.
Up Test Helpers & Cleanup Propagation
e2e/tests/up/helper.go, e2e/tests/up/git_repositories.go, e2e/tests/up/handles_errors.go, e2e/tests/up/provider_kubernetes.go, e2e/tests/up/provider_podman.go, e2e/tests/up/up.go, e2e/tests/up/up_private_token.go
Consolidated Docker provider setup via new helper; updated many ginkgo.DeferCleanup callbacks to accept cleanupCtx context.Context and pass it to DevPodProviderDelete; removed context.Background() where appropriate; improved credential and wrapper file cleanup handling.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objective: improving test reliability and fixing silent failures through context management, timeouts, and cleanup patterns across the e2e test suite.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch e2e-test-audit

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

DeferCleanup callbacks execute in LIFO order. The previous ordering
registered Stop first and Remove second, so Remove ran before Stop,
causing "container is running" errors during cleanup.
@skevetter skevetter marked this pull request as ready for review April 13, 2026 06:27
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
e2e/framework/command.go (1)

326-356: ⚠️ Potential issue | 🟠 Major

Fix shell variable expansion in gpg-agent setup.

On line 350, $HOME is passed as a literal string to exec.Command since it does not invoke a shell and will not expand environment variables. The argument --homedir "$HOME/.gnupg" will be interpreted by gpg-agent as a literal path, not as the user's home directory.

Suggested fix
+	homeDir := os.Getenv("HOME")
-	if err := exec.Command("gpg-agent", "--homedir", "$HOME/.gnupg", "--use-standard-socket", "--daemon").
+	if err := exec.Command("gpg-agent", "--homedir", filepath.Join(homeDir, ".gnupg"), "--use-standard-socket", "--daemon").
 		Run(); err != nil {
 		return fmt.Errorf("failed to start gpg-agent: %w", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@e2e/framework/command.go` around lines 326 - 356, The SetupGPG function
passes "$HOME/.gnupg" literally to exec.Command for gpg-agent, so the shell
variable isn't expanded; change the gpg-agent exec.Command invocation in
SetupGPG to compute the homedir path (e.g. via os.UserHomeDir() or
os.Getenv("HOME")) and pass "--homedir" with the resolved "<home>/.gnupg" string
as the argument (replace the literal "$HOME/.gnupg" in the exec.Command call),
keeping the other flags ("--use-standard-socket", "--daemon") unchanged and
returning any errors as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@e2e/framework/command.go`:
- Around line 460-467: SetupDockerProvider currently discards the error from
DevPodProviderAdd which hides real failures; modify SetupDockerProvider to
capture the error returned by f.DevPodProviderAdd(context.Background(),
"docker", "-o", "DOCKER_PATH="+dockerPath), check if the error is non-nil and
not an expected "already exists" condition, and return that error (or wrap it)
instead of continuing to call f.DevPodProviderUse; keep the existing
DevPodProviderDelete call (it may remain ignored) and ensure return values still
call NewDefaultFramework and DevPodProviderUse only after successful
DevPodProviderAdd.

In `@e2e/tests/context/context.go`:
- Around line 60-69: The DeferCleanup calls are inconsistent: the first ignores
errors from f.DevPodContextDelete(cleanupCtx, contextA) while the second asserts
with framework.ExpectNoError for contextB; make them consistent by either (A)
ignoring both cleanup errors (drop framework.ExpectNoError in the contextB
cleanup) or (B) asserting both (capture the return from
f.DevPodContextDelete(cleanupCtx, contextA) into err and call
framework.ExpectNoError(err) like the contextB block). Update the DeferCleanup
that references f.DevPodContextDelete(cleanupCtx, contextA) (and/or the one for
contextB) accordingly so both cleanup handlers use the same error-handling
approach.

In `@e2e/tests/machine/machine.go`:
- Around line 72-78: Add a teardown that removes the added provider so state is
not leaked: after calling f.DevPodProviderAdd(...) and f.DevPodProviderUse(...),
register a cleanup (e.g., framework.AddCleanupAction or ginkgo.AfterEach for
that spec) which calls the provider removal method (use the corresponding method
on f such as DevPodProviderRemove/DevPodProviderDelete with ctx and
"mock-provider") and asserts no error (framework.ExpectNoError). Ensure the
cleanup runs regardless of test outcome so the "mock-provider" is always
removed.

In `@e2e/tests/machineprovider/machineprovider.go`:
- Around line 47-50: The provider cleanup is currently tied to workspace
teardown (DevPodWorkspaceDelete) or missing entirely; register provider deletion
as its own ginkgo.DeferCleanup immediately after the DevPodProviderAdd call so
the provider is always cleaned regardless of workspace deletion result.
Specifically, after the call to DevPodProviderAdd in each spec, add a separate
ginkgo.DeferCleanup that calls the corresponding provider delete helper (e.g.,
DevPodProviderDelete or the existing provider-delete helper) with the cleanup
context and provider identifier; do this for both occurrences (the one around
lines 47-50 and the other around 116-121) to ensure provider state is cleaned
independently of DevPodWorkspaceDelete.

In `@e2e/tests/up/git_repositories.go`:
- Around line 68-71: Register the cleanup for the test provider before calling
DevPodProviderUse so the provider is removed even if DevPodProviderUse fails;
move the ginkgo.DeferCleanup block that calls f.DevPodProviderDelete(cleanupCtx,
providerName) to immediately after creating/defining providerName (and before
calling f.DevPodProviderUse), ensuring the cleanup is registered using
ginkgo.DeferCleanup with the same providerName and context so the provider is
always deleted on teardown.

In `@e2e/tests/up/handles_errors.go`:
- Around line 71-73: The cleanup currently swallows errors from
DevPodProviderDelete in the ginkgo.DeferCleanup callbacks; change both callbacks
to capture the returned error (err := f.DevPodProviderDelete(cleanupCtx,
"docker")) and, if non-nil, report it (e.g., call
ginkgo.Fail(fmt.Sprintf("DevPodProviderDelete failed: %v", err)) or use the test
logger to record the error) so teardown failures are surfaced instead of
ignored; update both instances that currently use "_ =
f.DevPodProviderDelete(...)" and ensure the error path logs/fails with the error
details.

In `@e2e/tests/up/up_private_token.go`:
- Around line 35-37: The current cleanup unconditionally removes
~/.git-credentials; change it to snapshot and restore instead: before any test
writes, set credentialPath := filepath.Join(os.Getenv("HOME"),
".git-credentials") then detect existing content by checking os.Stat/ReadFile
and store the original bytes and a boolean like originalExists; register
ginkgo.DeferCleanup that if originalExists writes the original bytes back (using
os.WriteFile) and if not exists then removes the file (os.Remove); ensure any
write path in the test sets a createdByTest flag or relies on originalExists to
avoid deleting pre-existing credentials.

In `@e2e/tests/up/up.go`:
- Around line 159-162: The cleanup for provider deletion is registered too
late—after DevPodProviderUse succeeds—so if DevPodProviderUse fails the provider
(e.g., providerName/test-docker) leaks; move the ginkgo.DeferCleanup
registration that calls f.DevPodProviderDelete(cleanupCtx, providerName) to
before the call to f.DevPodProviderUse so deletion is guaranteed regardless of
DevPodProviderUse outcome, keeping the same cleanup function and providerName
variables; apply the same change to the other occurrences that register cleanup
after DevPodProviderUse (the blocks around lines referenced: 159-162, 206-209,
232-235, 271-274).

---

Outside diff comments:
In `@e2e/framework/command.go`:
- Around line 326-356: The SetupGPG function passes "$HOME/.gnupg" literally to
exec.Command for gpg-agent, so the shell variable isn't expanded; change the
gpg-agent exec.Command invocation in SetupGPG to compute the homedir path (e.g.
via os.UserHomeDir() or os.Getenv("HOME")) and pass "--homedir" with the
resolved "<home>/.gnupg" string as the argument (replace the literal
"$HOME/.gnupg" in the exec.Command call), keeping the other flags
("--use-standard-socket", "--daemon") unchanged and returning any errors as
before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d693ab82-e23b-4e12-bd46-57270a38ce0f

📥 Commits

Reviewing files that changed from the base of the PR and between 2613a0f and f230f5d.

📒 Files selected for processing (26)
  • e2e/e2e_suite_test.go
  • e2e/framework/command.go
  • e2e/framework/server_utils.go
  • e2e/tests/build/build.go
  • e2e/tests/context/context.go
  • e2e/tests/dockerinstall/dockerinstall.go
  • e2e/tests/ide/ide.go
  • e2e/tests/integration/integration.go
  • e2e/tests/machine/machine.go
  • e2e/tests/machineprovider/machineprovider.go
  • e2e/tests/provider/provider.go
  • e2e/tests/ssh/ssh.go
  • e2e/tests/up-docker-compose/build.go
  • e2e/tests/up-docker-compose/helper.go
  • e2e/tests/up-features/helper.go
  • e2e/tests/up-features/up_features.go
  • e2e/tests/up-features/wsl.go
  • e2e/tests/up/git_repositories.go
  • e2e/tests/up/handles_errors.go
  • e2e/tests/up/helper.go
  • e2e/tests/up/provider_docker.go
  • e2e/tests/up/provider_kubernetes.go
  • e2e/tests/up/provider_podman.go
  • e2e/tests/up/up.go
  • e2e/tests/up/up_private_token.go
  • e2e/tests/upgrade/upgrade.go

Comment on lines +60 to +69
ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
_ = f.DevPodContextDelete(cleanupCtx, contextA)
})

err = f.DevPodContextCreate(ctx, contextB)
framework.ExpectNoError(err)
ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
err = f.DevPodContextDelete(cleanupCtx, contextB)
err = f.DevPodContextCreate(ctx, contextB)
framework.ExpectNoError(err)
})
ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
err = f.DevPodContextDelete(cleanupCtx, contextB)
framework.ExpectNoError(err)
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent error handling between DeferCleanup blocks.

Line 61 silently ignores the cleanup error for contextA, while lines 67-68 check the error for contextB. This inconsistency is confusing and could mask cleanup failures for contextA.

Suggested fix - either ignore both or check both

If cleanup errors should be ignored (common for cleanup code):

 ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
-    err = f.DevPodContextDelete(cleanupCtx, contextB)
-    framework.ExpectNoError(err)
+    _ = f.DevPodContextDelete(cleanupCtx, contextB)
 })

Or if cleanup errors should be checked:

 ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
-    _ = f.DevPodContextDelete(cleanupCtx, contextA)
+    cleanupErr := f.DevPodContextDelete(cleanupCtx, contextA)
+    framework.ExpectNoError(cleanupErr)
 })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
_ = f.DevPodContextDelete(cleanupCtx, contextA)
})
err = f.DevPodContextCreate(ctx, contextB)
framework.ExpectNoError(err)
ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
err = f.DevPodContextDelete(cleanupCtx, contextB)
err = f.DevPodContextCreate(ctx, contextB)
framework.ExpectNoError(err)
})
ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
err = f.DevPodContextDelete(cleanupCtx, contextB)
framework.ExpectNoError(err)
})
ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
_ = f.DevPodContextDelete(cleanupCtx, contextA)
})
err = f.DevPodContextCreate(ctx, contextB)
framework.ExpectNoError(err)
ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
_ = f.DevPodContextDelete(cleanupCtx, contextB)
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@e2e/tests/context/context.go` around lines 60 - 69, The DeferCleanup calls
are inconsistent: the first ignores errors from
f.DevPodContextDelete(cleanupCtx, contextA) while the second asserts with
framework.ExpectNoError for contextB; make them consistent by either (A)
ignoring both cleanup errors (drop framework.ExpectNoError in the contextB
cleanup) or (B) asserting both (capture the return from
f.DevPodContextDelete(cleanupCtx, contextA) into err and call
framework.ExpectNoError(err) like the contextB block). Update the DeferCleanup
that references f.DevPodContextDelete(cleanupCtx, contextA) (and/or the one for
contextB) accordingly so both cleanup handlers use the same error-handling
approach.

Comment on lines +47 to +50
ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
err = f.DevPodWorkspaceDelete(cleanupCtx, tempDir)
framework.ExpectNoError(err)
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clean provider state independently from workspace teardown.

The first spec only registers workspace deletion, so the provider created above is never cleaned up. In the second spec, provider deletion is chained after workspace deletion in the same cleanup closure, so a failed workspace delete skips provider cleanup too. Register provider deletion as its own DeferCleanup immediately after DevPodProviderAdd.

Also applies to: 116-121

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@e2e/tests/machineprovider/machineprovider.go` around lines 47 - 50, The
provider cleanup is currently tied to workspace teardown (DevPodWorkspaceDelete)
or missing entirely; register provider deletion as its own ginkgo.DeferCleanup
immediately after the DevPodProviderAdd call so the provider is always cleaned
regardless of workspace deletion result. Specifically, after the call to
DevPodProviderAdd in each spec, add a separate ginkgo.DeferCleanup that calls
the corresponding provider delete helper (e.g., DevPodProviderDelete or the
existing provider-delete helper) with the cleanup context and provider
identifier; do this for both occurrences (the one around lines 47-50 and the
other around 116-121) to ensure provider state is cleaned independently of
DevPodWorkspaceDelete.

Comment on lines +68 to 71
ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
err := f.DevPodProviderDelete(cleanupCtx, providerName)
framework.ExpectNoError(err)
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Register provider teardown before DevPodProviderUse.

This cleanup is only registered after the use call above. If DevPodProviderUse fails, test-docker is left behind and can poison later specs.

Suggested fix
 err := f.DevPodProviderAdd(ctx, "docker", "--name", providerName)
 framework.ExpectNoError(err)
+ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
+	err := f.DevPodProviderDelete(cleanupCtx, providerName)
+	framework.ExpectNoError(err)
+})
 err = f.DevPodProviderUse(ctx, providerName)
 framework.ExpectNoError(err)
-ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
-	err := f.DevPodProviderDelete(cleanupCtx, providerName)
-	framework.ExpectNoError(err)
-})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
err := f.DevPodProviderDelete(cleanupCtx, providerName)
framework.ExpectNoError(err)
})
err := f.DevPodProviderAdd(ctx, "docker", "--name", providerName)
framework.ExpectNoError(err)
ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
err := f.DevPodProviderDelete(cleanupCtx, providerName)
framework.ExpectNoError(err)
})
err = f.DevPodProviderUse(ctx, providerName)
framework.ExpectNoError(err)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@e2e/tests/up/git_repositories.go` around lines 68 - 71, Register the cleanup
for the test provider before calling DevPodProviderUse so the provider is
removed even if DevPodProviderUse fails; move the ginkgo.DeferCleanup block that
calls f.DevPodProviderDelete(cleanupCtx, providerName) to immediately after
creating/defining providerName (and before calling f.DevPodProviderUse),
ensuring the cleanup is registered using ginkgo.DeferCleanup with the same
providerName and context so the provider is always deleted on teardown.

Comment on lines +71 to 73
ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
_ = f.DevPodProviderDelete(cleanupCtx, "docker")
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Stop swallowing cleanup failures in provider teardown.

Line 72 and Line 94 still ignore DevPodProviderDelete errors, which can hide teardown failures and create cross-test contamination.

Suggested fix
 ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
-	_ = f.DevPodProviderDelete(cleanupCtx, "docker")
+	err := f.DevPodProviderDelete(cleanupCtx, "docker")
+	framework.ExpectNoError(err)
 })

Apply this to both cleanup callbacks.

Also applies to: 93-95

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@e2e/tests/up/handles_errors.go` around lines 71 - 73, The cleanup currently
swallows errors from DevPodProviderDelete in the ginkgo.DeferCleanup callbacks;
change both callbacks to capture the returned error (err :=
f.DevPodProviderDelete(cleanupCtx, "docker")) and, if non-nil, report it (e.g.,
call ginkgo.Fail(fmt.Sprintf("DevPodProviderDelete failed: %v", err)) or use the
test logger to record the error) so teardown failures are surfaced instead of
ignored; update both instances that currently use "_ =
f.DevPodProviderDelete(...)" and ensure the error path logs/fails with the error
details.

Comment on lines +35 to +37
// Register credential cleanup before writing to ensure cleanup on any failure
credentialPath := filepath.Join(os.Getenv("HOME"), ".git-credentials")
ginkgo.DeferCleanup(func() { _ = os.Remove(credentialPath) })
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve pre-existing ~/.git-credentials instead of always deleting it.

This cleanup deletes any existing user credentials file when tests run locally. Please snapshot and restore existing content (or remove only if test created it).

Proposed fix
-		credentialPath := filepath.Join(os.Getenv("HOME"), ".git-credentials")
-		ginkgo.DeferCleanup(func() { _ = os.Remove(credentialPath) })
+		credentialPath := filepath.Join(os.Getenv("HOME"), ".git-credentials")
+		originalCreds, readErr := os.ReadFile(credentialPath)
+		hadOriginalCreds := readErr == nil
+		if readErr != nil && !os.IsNotExist(readErr) {
+			framework.ExpectNoError(readErr)
+		}
+		ginkgo.DeferCleanup(func() {
+			if hadOriginalCreds {
+				_ = os.WriteFile(credentialPath, originalCreds, 0o600)
+				return
+			}
+			_ = os.Remove(credentialPath)
+		})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@e2e/tests/up/up_private_token.go` around lines 35 - 37, The current cleanup
unconditionally removes ~/.git-credentials; change it to snapshot and restore
instead: before any test writes, set credentialPath :=
filepath.Join(os.Getenv("HOME"), ".git-credentials") then detect existing
content by checking os.Stat/ReadFile and store the original bytes and a boolean
like originalExists; register ginkgo.DeferCleanup that if originalExists writes
the original bytes back (using os.WriteFile) and if not exists then removes the
file (os.Remove); ensure any write path in the test sets a createdByTest flag or
relies on originalExists to avoid deleting pre-existing credentials.

Comment on lines +159 to 162
ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
err := f.DevPodProviderDelete(cleanupCtx, providerName)
framework.ExpectNoError(err)
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Move provider cleanup registration ahead of DevPodProviderUse.

In all four specs, provider deletion is only registered after DevPodProviderUse succeeds. If that call fails, test-docker leaks and can affect the next case.

Suggested fix pattern
 err = f.DevPodProviderAdd(ctx, "docker", "--name", providerName)
 framework.ExpectNoError(err)
+ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
+	err := f.DevPodProviderDelete(cleanupCtx, providerName)
+	framework.ExpectNoError(err)
+})
 err = f.DevPodProviderUse(ctx, providerName)
 framework.ExpectNoError(err)
-ginkgo.DeferCleanup(func(cleanupCtx context.Context) {
-	err := f.DevPodProviderDelete(cleanupCtx, providerName)
-	framework.ExpectNoError(err)
-})

Also applies to: 206-209, 232-235, 271-274

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@e2e/tests/up/up.go` around lines 159 - 162, The cleanup for provider deletion
is registered too late—after DevPodProviderUse succeeds—so if DevPodProviderUse
fails the provider (e.g., providerName/test-docker) leaks; move the
ginkgo.DeferCleanup registration that calls f.DevPodProviderDelete(cleanupCtx,
providerName) to before the call to f.DevPodProviderUse so deletion is
guaranteed regardless of DevPodProviderUse outcome, keeping the same cleanup
function and providerName variables; apply the same change to the other
occurrences that register cleanup after DevPodProviderUse (the blocks around
lines referenced: 159-162, 206-209, 232-235, 271-274).

…anup

- SetupDockerProvider now returns the error from DevPodProviderAdd
  instead of silently discarding it.
- Add missing DeferCleanup for mock-provider in the second machine spec,
  matching the pattern used in the first spec.
@skevetter skevetter changed the title fix(e2e): harden test suite reliability and fix silent failures test(e2e): improve test reliability and fix silent failures Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant