unflake a System Test: 'test_remote_sampling_rules_retention' by vlad-scherbich · Pull Request #6342 · DataDog/system-tests

vlad-scherbich · 2026-02-18T22:09:26Z

https://datadoghq.atlassian.net/browse/PROF-13796

Motivation

test_remote_sampling_rules_retention fails intermittently in CI. Observed flaking in a dd-trace-py PR that only changes CI workflow config (zero relation to test logic). See failed job.

The root cause is a race condition in set_and_wait_rc: its wait signals (telemetry event + RC ACKNOWLEDGED state) are not config-version-specific, so stale events from a prior RC config can satisfy them before the new sampling rules are actually active in the library.

Changes

Adds a retry loop around the first trace assertion in test_remote_sampling_rules_retention to tolerate the brief propagation window between RC acknowledgment and actual rule application. This is consistent with how other tests in the same file handle similar timing sensitivity (e.g., get_sampled_trace).

Testing

$ ./run.sh PARAMETRIC -L python -k "retention"
...
============================================================================= test context ==============================================================================
Scenario: PARAMETRIC
Logs folder: ./logs_parametric
Library: python@4.4.0
========================================================================== test session starts ==========================================================================
gw0 [1] / gw1 [1] / gw2 [1] / gw3 [1] / gw4 [1] / gw5 [1] / gw6 [1] / gw7 [1] / gw8 [1] / gw9 [1] / gw10 [1] / gw11 [1] / gw12 [1] / gw13 [1] / gw14 [1] / gw15 [1]
.                                                                                                                                                                 [100%]
--------------------------- generated xml file: /Users/vlad.scherbich/go/src/github.com/DataDog/system-tests/logs_parametric/reportJunit.xml ----------------------------
========================================================================== 1 passed in 32.80s ===========================================================================
COMP-LR7JK0FKW1:system-tests vlad.scherbich$

Workflow

⚠️ Create your PR as draft ⚠️
Work on you PR until the CI passes
Mark it as ready for review
- Test logic is modified? -> Get a review from RFC owner.
- Framework is modified, or non obvious usage of it -> get a review from R&P team

🚀 Once your PR is reviewed and the CI green, you can merge it!

🛟 #apm-shared-testing 🛟

Reviewer checklist

Anything but tests/ or manifests/ is modified ? I have the approval from R&P team
A docker base image is modified?
- the relevant build-XXX-image label is present
A scenario is added, removed or renamed?
- Get a review from R&P team

github-actions · 2026-02-18T22:10:04Z

CODEOWNERS have been resolved as:

tests/parametric/test_dynamic_configuration.py                          @DataDog/system-tests-core @DataDog/apm-sdk-capabilities

docs/CI/enable-test-optimization-for-tracers.md

KowalskiThomas · 2026-02-19T09:15:52Z

tests/parametric/test_dynamic_configuration.py

+        for _ in range(30):
+            trace = send_and_wait_trace(test_library, test_agent, name="test", service="foo")
+            span = find_first_span_in_trace_payload(trace)
+            if span["metrics"].get("_dd.rule_psr", 1.0) == pytest.approx(0.1):
+                break
+            time.sleep(0.1)


If I understand correctly, we try up to 30 times (waiting 0.1 seconds between each attempt) to see a span whose metadata shows that the new sampling rules were correctly received and applied?

This reverts commit 57a30e3.

cbeauchesne · 2026-02-19T14:12:07Z

tests/parametric/test_dynamic_configuration.py

-        assert_sampling_rate(trace, 0.1)
+        # After updating the RC config, the library may briefly still be applying the
+        # previous sampling rules. set_and_wait_rc waits for telemetry and RC acknowledgment,
+        # but these signals can be satisfied by stale events from the prior config, causing a


Would it be possible to improve the wait_rc condition to wait for the precise new config ?

@cbeauchesne , I like this idea.
set_and_wait_rc is just a helper function used only in test_dynamic_configuration.py at 20 call sites.

I'll open a separate PR for the proposed fix, which would be to clear the session before setting the new RC config.

@cbeauchesne , new PR implementing your suggestion: #6349

Lots of unrelated system test fails, do I just re-run them until they pass?

@cbeauchesne I tried to make this work, but seems like #6349 has broken .Net and potentially PHP and Ruby parametric tests. It seems the general fix will be more involved than I thought.

Should we just merge this to fix flaky Python tests for now, or do you have some pointers on how to make the generalized fix work for all?

datadog-datadog-prod-us1 · 2026-02-19T14:16:39Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: be1f7fd | Docs | Datadog PR Page | Was this helpful? Give us feedback!}

…#6342)" This reverts commit 5d9142e.

vlad-scherbich requested review from a team as code owners February 18, 2026 22:09

vlad-scherbich requested review from zacharycmontoya and removed request for a team February 18, 2026 22:09

vlad-scherbich requested review from a team and KowalskiThomas February 18, 2026 22:14

vlad-scherbich marked this pull request as draft February 18, 2026 22:20

KowalskiThomas reviewed Feb 19, 2026

View reviewed changes

vlad-scherbich added 2 commits February 19, 2026 08:31

unflake a System Test: 'test_remote_sampling_rules_retention'

d9f0514

Revert "unflake a System Test: 'test_remote_sampling_rules_retention'"

d22ddf4

This reverts commit 57a30e3.

vlad-scherbich force-pushed the fix/flaky-test-remote-sampling-rules-retention branch from b65f71b to d22ddf4 Compare February 19, 2026 13:31

vlad-scherbich added 2 commits February 19, 2026 08:33

unrevert the fix

4221962

name magic numbers, add type annotations

9f8ebf4

KowalskiThomas approved these changes Feb 19, 2026

View reviewed changes

cbeauchesne reviewed Feb 19, 2026

View reviewed changes

vlad-scherbich mentioned this pull request Feb 19, 2026

Fix set_and_wait_rc helper by clearing out stale events before new co… #6349

Draft

5 tasks

Merge branch 'main' into fix/flaky-test-remote-sampling-rules-retention

be1f7fd

vlad-scherbich marked this pull request as ready for review February 23, 2026 18:00

vlad-scherbich merged commit 5d9142e into main Feb 23, 2026
424 checks passed

vlad-scherbich deleted the fix/flaky-test-remote-sampling-rules-retention branch February 23, 2026 18:21

vlad-scherbich added a commit that referenced this pull request Feb 24, 2026

Revert "unflake a System Test: 'test_remote_sampling_rules_retention' (…

0fff71e

…#6342)" This reverts commit 5d9142e.

vlad-scherbich added a commit that referenced this pull request Feb 24, 2026

Revert "unflake a System Test: 'test_remote_sampling_rules_retention' (…

7885079

…#6342)" This reverts commit 5d9142e.

vlad-scherbich mentioned this pull request Feb 24, 2026

[system_tests] Fix race conditions in 'set_and_wait_rc' by counting telemetry events #6371

Draft

5 tasks

vlad-scherbich added a commit that referenced this pull request Feb 25, 2026

Revert "unflake a System Test: 'test_remote_sampling_rules_retention' (…

aa088b0

…#6342)" This reverts commit 5d9142e.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unflake a System Test: 'test_remote_sampling_rules_retention'#6342

unflake a System Test: 'test_remote_sampling_rules_retention'#6342
vlad-scherbich merged 5 commits intomainfrom
fix/flaky-test-remote-sampling-rules-retention

vlad-scherbich commented Feb 18, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

KowalskiThomas Feb 19, 2026

Uh oh!

vlad-scherbich Feb 19, 2026

Uh oh!

cbeauchesne Feb 19, 2026

Uh oh!

vlad-scherbich Feb 19, 2026 •

edited

Loading

Uh oh!

vlad-scherbich Feb 19, 2026

Uh oh!

vlad-scherbich Feb 23, 2026 •

edited

Loading

Uh oh!

datadog-datadog-prod-us1 bot commented Feb 19, 2026 •

edited by datadog-official bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vlad-scherbich commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Testing

Workflow

Reviewer checklist

Uh oh!

github-actions bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

KowalskiThomas Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

vlad-scherbich Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

cbeauchesne Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

vlad-scherbich Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vlad-scherbich Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

vlad-scherbich Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

datadog-datadog-prod-us1 bot commented Feb 19, 2026 • edited by datadog-official bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vlad-scherbich commented Feb 18, 2026 •

edited

Loading

github-actions bot commented Feb 18, 2026 •

edited

Loading

vlad-scherbich Feb 19, 2026 •

edited

Loading

vlad-scherbich Feb 23, 2026 •

edited

Loading

datadog-datadog-prod-us1 bot commented Feb 19, 2026 •

edited by datadog-official bot

Loading