Skip to content

Tests: Reduce flakiness#1878

Open
jrajahalme wants to merge 4 commits intomainfrom
test-flakiness-fixes
Open

Tests: Reduce flakiness#1878
jrajahalme wants to merge 4 commits intomainfrom
test-flakiness-fixes

Conversation

@jrajahalme
Copy link
Copy Markdown
Member

@jrajahalme jrajahalme commented Apr 24, 2026

This PR addresses possible causes for test flakes in the CI and local test runs. These tend to be in the area of test fixture teardown causing issues or blending with next test setup in unintended ways. Please review by commit.

As some of the flakiness was due to production code deficiencies, it would be nice to get to a state where we have low enough flake probability to let individual flakes fail the test runs in CI.

@jrajahalme jrajahalme requested a review from a team as a code owner April 24, 2026 11:04
@jrajahalme jrajahalme added the clean-up Changes not affecting release behavior (style, removal of deprecated or dead code) label Apr 24, 2026
@jrajahalme jrajahalme force-pushed the test-flakiness-fixes branch 2 times, most recently from 4795d28 to e376432 Compare April 24, 2026 13:11
Be more careful of starting the accesslog server only after the derived
classes have fully initialized, and shut down before members are
destructed.

Remove unused access log server from TCP integration test. This helps cut
down on test flakes due to racy clean up.

Fix UDS server thread shutdown on destruction to avoid this race in the
remaining tests that actually use the access log server.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Wait for TLS handshake to finish instead of simply waiting for a fixed
time.

While not functionally necessary, also wait for the handshake to finish
before sending data so that we get a clear error signal if the handshake
does not finish in time.

Flip order of context/ssl_client in declaration, this ensures ssl_client
is deleted before callbacks.

Remove always-true upstream_tls_.

Replace inlined code with upstream test helper
createFakeUpstreamSslContext().

Be more careful in tearing down test connections to avoid test flakes.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Add a new patch for upstream Envoy’s fake-upstream HTTP test harness,
where a newly accepted upstream connection could have reads re-enabled
before FakeHttpConnection had finished installing its HTTP codec/read
filter. In TLS HTTP tests, that meant decrypted request bytes could
arrive and be consumed on the socket before the HTTP layer was ready to
turn them into a FakeStream, so the test would later time out waiting for
an upstream request that had effectively been dropped on the floor. The
patch fixes this by deferring read re-enable until
FakeHttpConnection::initialize() completes, ensuring the HTTP filter
stack is in place before any request bytes are allowed through.

Keep the change HTTP-only to avoid perturbing TCP/raw fake-upstream
behavior, and drop the test-local eager read-enable override from the
Cilium TLS HTTP integration fixture.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Remove unused accesslog config from websocket decap test.

Avoid closing and waiting on the fake upstream connection twice when
already closed.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@jrajahalme jrajahalme force-pushed the test-flakiness-fixes branch from e376432 to 9891d84 Compare April 28, 2026 08:55
Comment thread tests/uds_server.cc

void UDSServer::startServerThread() {
if (fd_ < 0 || thread_ != nullptr) {
return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could add a log warning here that thread was not actually started

Copy link
Copy Markdown
Contributor

@nezdolik nezdolik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with 2 nits

.WillByDefault(Invoke(client_write_buffer_, &MockWatermarkBuffer::trackDrains));
return client_write_buffer_;
}))
.WillRepeatedly(Invoke([](std::function<void()> below_low, std::function<void()> above_high,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does the buffer need to be allocated twice?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clean-up Changes not affecting release behavior (style, removal of deprecated or dead code)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants