Tests: Reduce flakiness#1878
Open
jrajahalme wants to merge 4 commits intomainfrom
Open
Conversation
4795d28 to
e376432
Compare
Be more careful of starting the accesslog server only after the derived classes have fully initialized, and shut down before members are destructed. Remove unused access log server from TCP integration test. This helps cut down on test flakes due to racy clean up. Fix UDS server thread shutdown on destruction to avoid this race in the remaining tests that actually use the access log server. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Wait for TLS handshake to finish instead of simply waiting for a fixed time. While not functionally necessary, also wait for the handshake to finish before sending data so that we get a clear error signal if the handshake does not finish in time. Flip order of context/ssl_client in declaration, this ensures ssl_client is deleted before callbacks. Remove always-true upstream_tls_. Replace inlined code with upstream test helper createFakeUpstreamSslContext(). Be more careful in tearing down test connections to avoid test flakes. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Add a new patch for upstream Envoy’s fake-upstream HTTP test harness, where a newly accepted upstream connection could have reads re-enabled before FakeHttpConnection had finished installing its HTTP codec/read filter. In TLS HTTP tests, that meant decrypted request bytes could arrive and be consumed on the socket before the HTTP layer was ready to turn them into a FakeStream, so the test would later time out waiting for an upstream request that had effectively been dropped on the floor. The patch fixes this by deferring read re-enable until FakeHttpConnection::initialize() completes, ensuring the HTTP filter stack is in place before any request bytes are allowed through. Keep the change HTTP-only to avoid perturbing TCP/raw fake-upstream behavior, and drop the test-local eager read-enable override from the Cilium TLS HTTP integration fixture. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Remove unused accesslog config from websocket decap test. Avoid closing and waiting on the fake upstream connection twice when already closed. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
e376432 to
9891d84
Compare
nezdolik
reviewed
Apr 29, 2026
|
|
||
| void UDSServer::startServerThread() { | ||
| if (fd_ < 0 || thread_ != nullptr) { | ||
| return; |
Contributor
There was a problem hiding this comment.
nit: could add a log warning here that thread was not actually started
nezdolik
approved these changes
Apr 29, 2026
| .WillByDefault(Invoke(client_write_buffer_, &MockWatermarkBuffer::trackDrains)); | ||
| return client_write_buffer_; | ||
| })) | ||
| .WillRepeatedly(Invoke([](std::function<void()> below_low, std::function<void()> above_high, |
Contributor
There was a problem hiding this comment.
why does the buffer need to be allocated twice?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR addresses possible causes for test flakes in the CI and local test runs. These tend to be in the area of test fixture teardown causing issues or blending with next test setup in unintended ways. Please review by commit.
As some of the flakiness was due to production code deficiencies, it would be nice to get to a state where we have low enough flake probability to let individual flakes fail the test runs in CI.