Skip to content

Tilt dev env & Tests: Go tests load .env no longer requiring variables injection#4175

Merged
pfcoperez merged 24 commits intomainfrom
DBI-640/local-dev-env/e2e/automatic-envvars-injection
Apr 16, 2026
Merged

Tilt dev env & Tests: Go tests load .env no longer requiring variables injection#4175
pfcoperez merged 24 commits intomainfrom
DBI-640/local-dev-env/e2e/automatic-envvars-injection

Conversation

@pfcoperez
Copy link
Copy Markdown
Contributor

@pfcoperez pfcoperez commented Apr 14, 2026

This Pull Requests adds two main features to local development environment and tests:

  1. Explicitly loading environment variables to go tests is not longer required: flow/connectors and flow/e2e tests read all required environment variables from .env file at the project root directory. Remember that .env can be set-up using .env.example file. This removes the frictions across different dev OSs and make it possible to start debugging sessions without editing launching scripts or vscode tempaltes
image image
  1. Tilt improvements:
  • Test launchers don't clean-up go cache by default, thus reducing wait times between UI interactions. A button is added to explicitly remove them.
image
  • A resource has been added to start all the ancillary services required to run all tests, this lets get everything ready for any debug session with one click or command:
tilt --port 10352 trigger all-test-resources
image
  • Test launchers for connectors, API, Peer flow and switchboard e2e tests have been added.

Closes: https://linear.app/clickhouse/issue/DBI-640/make-e2e-and-integration-tests-pass-in-peerdb-tilt

@pfcoperez
Copy link
Copy Markdown
Contributor Author

@claude Fix Go linting issues

@claude
Copy link
Copy Markdown

claude bot commented Apr 14, 2026

Claude finished @pfcoperez's task in 7m 22s —— View job


Fixing Go Linting Issues

  • Gather context on PR changes
  • Identify Go linting issues
  • Fix env.go: sloglint (use *Context slog variants) + errcheck (godotenv.Load return value)
  • Fix env_test.go: usetesting (os.Chdirt.Chdir) + errcheck (unchecked os.Chdir in cleanup) + gosec G306 (file permissions 06440600)
  • Push changes ⚠️ Push failed — see note below

All fixes are committed locally (commit 1fdd61e) but the push failed with a 403 due to the GitHub App not having write permissions on this repo. Please cherry-pick or manually apply the changes below.

Changes made:

flow/internal/testutil/env.go:

  • Added context import
  • Replaced slog.Error/Info/Warn with slog.ErrorContext/InfoContext/WarnContext(context.Background(), ...) — fixes sloglint
  • Wrapped godotenv.Load(envPath) to check and log the error — fixes errcheck

flow/internal/testutil/env_test.go:

  • Replaced the os.Getwd() + t.Cleanup(func() { os.Chdir(originalDir) }) + require.NoError(t, os.Chdir(...)) pattern with just t.Chdir(...) in all 6 tests — fixes both errcheck and usetesting
  • Changed os.WriteFile permissions from 0o644 to 0o600 in all tests — fixes gosec G306
Full diff
diff --git a/flow/internal/testutil/env.go b/flow/internal/testutil/env.go
--- a/flow/internal/testutil/env.go
+++ b/flow/internal/testutil/env.go
@@ -1,6 +1,7 @@
 package testutil
 
 import (
+	"context"
 	"log/slog"
 	"os"
 	"path/filepath"
@@ -13,11 +14,11 @@ func LoadEnv() {
 	dir, err := os.Getwd()
 	if err != nil {
-		slog.Error("LoadEnv: failed to get working directory", "error", err)
+		slog.ErrorContext(context.Background(), "LoadEnv: failed to get working directory", "error", err)
 		return
 	}
-	slog.Info("LoadEnv: starting search", "cwd", dir)
+	slog.InfoContext(context.Background(), "LoadEnv: starting search", "cwd", dir)
 	...
 		if _, err := os.Stat(envPath); err == nil && filepath.Base(lastVisited) == "flow" {
-			slog.Info("LoadEnv: found .env", "path", envPath)
-			godotenv.Load(envPath)
+			slog.InfoContext(context.Background(), "LoadEnv: found .env", "path", envPath)
+			if err := godotenv.Load(envPath); err != nil {
+				slog.WarnContext(context.Background(), "LoadEnv: failed to load .env", "error", err)
+			}
 			return
 		}
-	slog.Warn("LoadEnv: no .env file found above flow directory")
+	slog.WarnContext(context.Background(), "LoadEnv: no .env file found above flow directory")

diff --git a/flow/internal/testutil/env_test.go b/flow/internal/testutil/env_test.go
 	// In each of the 6 test functions, replace:
-	originalDir, err := os.Getwd()
-	require.NoError(t, err)
-	t.Cleanup(func() { os.Chdir(originalDir) })
-	require.NoError(t, os.Chdir(xDir))
+	t.Chdir(xDir)

 	// And change os.WriteFile permissions:
-	os.WriteFile(envFile, ..., 0o644)
+	os.WriteFile(envFile, ..., 0o600)

@pfcoperez pfcoperez self-assigned this Apr 14, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
2197 2 2195 187
View the top 2 failed test(s) by shortest run time
github.com/PeerDB-io/peerdb/flow/e2e::TestApiPg
Stack Traces | 0.02s run time
=== RUN   TestApiPg
=== PAUSE TestApiPg
=== CONT  TestApiPg
--- FAIL: TestApiPg (0.02s)
github.com/PeerDB-io/peerdb/flow/e2e::TestApiPg/TestPostgresDestinationValidation_NonEmptyTableBlocksSnapshot
Stack Traces | 0.06s run time
=== RUN   TestApiPg/TestPostgresDestinationValidation_NonEmptyTableBlocksSnapshot
=== PAUSE TestApiPg/TestPostgresDestinationValidation_NonEmptyTableBlocksSnapshot
=== CONT  TestApiPg/TestPostgresDestinationValidation_NonEmptyTableBlocksSnapshot
2026/04/16 08:37:34 INFO Received AWS credentials from peer for connector: clickhouse x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN}
2026/04/16 08:37:34 INFO Received AWS credentials from peer for connector: ci x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN}
2026/04/16 08:37:34 INFO fetched schema x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN} table=e2e_test_mych_osjkz7h6.test_exclude_ch
2026/04/16 08:37:34 INFO Received AWS credentials from peer for connector: clickhouse x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN}
    api_test.go:605: 
        	Error Trace:	.../flow/e2e/congen.go:44
        	            				.../flow/e2e/pg.go:190
        	            				.../flow/e2e/pg.go:178
        	            				.../flow/e2e/api_test.go:605
        	            				.../flow/e2e/api_test.go:632
        	            				.../hostedtoolcache/go/1.26.1.../src/reflect/value.go:369
        	            				.../flow/e2eshared/e2eshared.go:40
        	Error:      	Received unexpected error:
        	            	unable to establish connection with catalog: FATAL: terminating connection due to administrator command (SQLSTATE 57P01)
        	Test:       	TestApiPg/TestPostgresDestinationValidation_NonEmptyTableBlocksSnapshot
    api_test.go:48: begin tearing down postgres schema api_siqms0hg
--- FAIL: TestApiPg/TestPostgresDestinationValidation_NonEmptyTableBlocksSnapshot (0.06s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@github-actions
Copy link
Copy Markdown
Contributor

🔄 Flaky Test Detected

Analysis: Two e2e tests failed due to concurrency/timing issues: one hit a PostgreSQL SQLSTATE 57P01 (connection killed by admin command, likely from a parallel test's cancellation logic), and the other hit a ClickHouse "unknown table" error because the QRep initial load hadn't completed before the assertion fired — both are race conditions in the highly parallel (-p 32) test suite, not code regressions.
Confidence: 0.92

✅ Automatically retrying the workflow

View workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🔄 Flaky Test Detected

Analysis: All three failures are infrastructure/timing issues: one Temporal client connection timeout ("context deadline exceeded") and two BigQuery normalization timeouts ("UNEXPECTED TIMEOUT normalizing N rows"), consistent with CI resource contention under 32-parallel-test load rather than a code regression.
Confidence: 0.92

✅ Automatically retrying the workflow

View workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🔄 Flaky Test Detected

Analysis: Both failures show race conditions in e2e tests — ClickHouse returning "unknown table identifier" and PostgreSQL returning "column does not exist" on destination tables indicate schema changes/table creation hadn't propagated before the test polled for results.
Confidence: 0.87

✅ Automatically retrying the workflow

View workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🔄 Flaky Test Detected

Analysis: TestApiPg/TestFlowStatusUpdate failed with PostgreSQL SQLSTATE 57P01 ("terminating connection due to administrator command"), a transient infrastructure-level connection kill unrelated to test logic.
Confidence: 0.95

✅ Automatically retrying the workflow

View workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🔄 Flaky Test Detected

Analysis: Test_Soft_Delete_Basic timed out after 200s waiting for BigQuery normalization to replicate a soft-deleted row (source had 1 record, destination consistently showed 0), indicating a transient CDC synchronization delay rather than a code regression.
Confidence: 0.88

✅ Automatically retrying the workflow

View workflow run

Copy link
Copy Markdown
Contributor

@ilidemi ilidemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these comments and #4180 all tests run and rerun for me locally (besides CI_MYSQL_VERSION, haven't tested that one)

Comment thread Tiltfile Outdated
Comment thread Tiltfile Outdated
Comment thread Tiltfile Outdated
Comment thread Tiltfile Outdated
Comment thread Tiltfile Outdated
Comment thread Tiltfile
Comment thread Tiltfile
Comment thread Tiltfile Outdated
Comment thread flow/internal/testutil/env.go
Comment thread Tiltfile Outdated
… default with e2e execution and adds a "start all test services" button
…e superfluous dependencies for connectors tests from connectors test launchers.
…actoring out overrides in common Tilt constants
…least one test execution regardless of the cache state.
@pfcoperez pfcoperez force-pushed the DBI-640/local-dev-env/e2e/automatic-envvars-injection branch from cea4dcc to a32e948 Compare April 15, 2026 13:35
@github-actions
Copy link
Copy Markdown
Contributor

❌ Test Failure

Analysis: Two newly introduced unit tests (TestLoadEnv_FromConnectorsSubdir and TestLoadEnv_RunFromFlowDirectly) consistently fail across all CI matrix combinations with deterministic "expected X but got empty string" assertions, indicating the LoadEnv implementation does not load .env files as the tests expect.
Confidence: 0.92

⚠️ This appears to be a real bug - manual intervention needed

View workflow run

@pfcoperez
Copy link
Copy Markdown
Contributor Author

@claude Fix linting issues

@claude
Copy link
Copy Markdown

claude bot commented Apr 15, 2026

Claude finished @pfcoperez's task in 4m 36s —— View job


Fixing Go Linting Issues & Failing Tests

  • Gather context and read current file state
  • Identify issues
  • Fix env_test.go: reset loadedEnv before each LoadEnv() call for test isolation
  • Push changes

Root cause of the failing tests: loadedEnv is a package-level atomic.Bool that acts as a once-guard. All tests in a package share the same process, so the first test to call LoadEnv() sets it true — making every subsequent LoadEnv() call a no-op. Tests 5 (TestLoadEnv_FromConnectorsSubdir) and 6 (TestLoadEnv_RunFromFlowDirectly) ran after an earlier test had already set the flag, so their .env files were never loaded and the assertions saw empty strings.

Fix (flow/internal/testutil/env_test.go): added loadedEnv.Store(false) before each LoadEnv() call in all 6 tests. Since the test file is in package testutil (same package as env.go), it can access the unexported variable directly. This resets the once-guard so each test gets a fresh LoadEnv() execution with its own temp directory and .env file.

Note: the linting fixes (context-aware slog calls, errcheck for godotenv.Load, t.Chdir, 0o600 permissions) were already applied by the previous Claude run and are already committed on this branch.
| Branch

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 15, 2026

CLA assistant check
All committers have signed the CLA.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Test Failure

Analysis: Two unit tests in internal/testutil (TestLoadEnv_FromConnectorsSubdir and TestLoadEnv_RunFromFlowDirectly) deterministically fail across all CI matrix variants, returning empty string instead of expected values, indicating a real bug — likely missing .env fixture files or a broken LoadEnv implementation in the PR branch DBI-640/local-dev-env/e2e/automatic-envvars-injection.
Confidence: 0.95

⚠️ This appears to be a real bug - manual intervention needed

View workflow run

@pfcoperez pfcoperez requested a review from ilidemi April 15, 2026 14:10
loadedEnv is a package-level atomic.Bool that acts as a once-guard.
Tests run in the same process, so the first test to call LoadEnv()
sets it true, making all subsequent LoadEnv() calls no-ops. This
caused TestLoadEnv_FromConnectorsSubdir and TestLoadEnv_RunFromFlowDirectly
to always see empty strings instead of their expected env var values.

Reset the flag before each test's LoadEnv() call to ensure proper
per-test isolation.
@pfcoperez pfcoperez force-pushed the DBI-640/local-dev-env/e2e/automatic-envvars-injection branch from dff3805 to 3fe8ae6 Compare April 15, 2026 14:14
@github-actions
Copy link
Copy Markdown
Contributor

🔄 Flaky Test Detected

Analysis: TestApiPg/TestQRep failed because the ClickHouse destination table was never created within the wait window (63s), likely due to resource contention from 32 parallel tests starving the async Temporal QRep workflow — the triggering commit only touched postgres-to-postgres MERGE logic and is unrelated to ClickHouse QRep.
Confidence: 0.82

✅ Automatically retrying the workflow

View workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🔄 Flaky Test Detected

Analysis: TestGenericBQ/Test_Inheritance_Table_With_Dynamic_Setting timed out (after 196s) waiting for BigQuery row propagation from inherited Postgres tables — a classic e2e timing failure under heavy parallelism (-p 32), not a logic regression.
Confidence: 0.88

✅ Automatically retrying the workflow

View workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🔄 Flaky Test Detected

Analysis: The tests fail during teardown with PostgreSQL SQLSTATE 57P01 ("terminating connection due to administrator command"), indicating the catalog DB connection was externally killed mid-cleanup — different tests fail across matrix runs, confirming an infrastructure-level flake rather than a code regression.
Confidence: 0.92

✅ Automatically retrying the workflow

View workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🔄 Flaky Test Detected

Analysis: Both failing jobs hit SQLSTATE 57P01 (PostgreSQL admin_shutdown) during a catalog connection, indicating the CI catalog database was externally terminated mid-test — a transient infrastructure failure, not a code bug.
Confidence: 0.92

✅ Automatically retrying the workflow

View workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🔄 Possible Flaky Test

Analysis: The CI run was still in progress when logs were requested, preventing analysis of the failure; low confidence assessment as flaky due to inability to retrieve logs.
Confidence: 0.4

⚠️ Confidence too low (0.4) to retry automatically - manual review recommended

View workflow run

Comment thread Tiltfile Outdated
Co-authored-by: Ilia Demianenko <ilia.demianenko@clickhouse.com>
@pfcoperez
Copy link
Copy Markdown
Contributor Author

@claude Replace the atomic loadedEnv that makes LoadEnv logic to run once by sync.Once . Take into account that you'll need to reset it in the unit tests for this function at flow/internal/testutil/env_test.go

@claude
Copy link
Copy Markdown

claude bot commented Apr 16, 2026

Claude finished @pfcoperez's task in 1m 14s —— View job


Replacing atomic.Bool with sync.Once in LoadEnv

  • Gather context
  • Read current env.go and env_test.go
  • Replace atomic.Bool with sync.Once in env.go
  • Update env_test.go to reset sync.Once for test isolation
  • Commit and push

Changes made (commit c0b218b):

flow/internal/testutil/env.go:

  • Replaced "sync/atomic" import with "sync"
  • Changed var loadedEnv atomic.Boolvar loadedEnv sync.Once
  • LoadEnv() now delegates to loadedEnv.Do(func() { loadEnvOnce() })
  • Extracted the body into loadEnvOnce() so LoadEnv stays clean

flow/internal/testutil/env_test.go:

  • Added "sync" import
  • Replaced all loadedEnv.Store(false) calls with loadedEnv = sync.Once{} — reassigning the package-level variable (accessible since tests are in package testutil) gives a fresh, unused Once before each test call
    |

github-actions bot and others added 2 commits April 16, 2026 08:02
Use the idiomatic sync.Once pattern instead of atomic.Bool for the
LoadEnv once-guard. Tests reset the guard with `loadedEnv = sync.Once{}`
(same-package access) for test isolation.

Co-authored-by: Pablo Francisco Pérez Hidalgo <pfcoperez@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔄 Flaky Test Detected

Analysis: The test failed with a transient PostgreSQL catalog connection error (SQLSTATE 57P01 — "terminating connection due to administrator command"), indicating the DB connection was killed mid-test by a concurrent operation rather than a code regression.
Confidence: 0.92

✅ Automatically retrying the workflow

View workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🔄 Flaky Test Detected

Analysis: All three TestGenericBQ failures are STATUS_SNAPSHOT timeouts in a high-concurrency (p=32) mysql-gtid run — a known flaky scenario explicitly acknowledged in the codebase, with no code changes in this commit related to BigQuery snapshot behavior.
Confidence: 0.85

✅ Automatically retrying the workflow

View workflow run

@pfcoperez pfcoperez merged commit daa6658 into main Apr 16, 2026
18 of 20 checks passed
@pfcoperez pfcoperez deleted the DBI-640/local-dev-env/e2e/automatic-envvars-injection branch April 16, 2026 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants