[system_tests] Fix race conditions in 'set_and_wait_rc' by counting telemetry events by vlad-scherbich · Pull Request #6371 · DataDog/system-tests

vlad-scherbich · 2026-02-23T20:31:37Z

https://datadoghq.atlassian.net/browse/PROF-13796

Motivation

Out of the latest 100 flaked system-test CI runs on dd-trace-py main, 20 are dynamic configuration tests (15% of total). See JIRA for details.

Root cause was identified as a race condition in set_and_wait_rc. The function can match stale app-client-configuration-change telemetry events from a previous RC update and return before the new config is actually applied.

Tests fixed

note: obtained a list of 11 unique flaky tests from 2026 via this script

Rank	Hits	Test
1	6	`test_dynamic_configuration.py::TestDynamicConfigSamplingRules::test_remote_sampling_rules_retention`
2	6	`test_dynamic_configuration.py::TestDynamicConfigV1::test_trace_sampling_rate_override_default`
3	5	`test_dynamic_configuration.py::TestDynamicConfigSamplingRules::test_capability_tracing_sample_rules`
4	3	`test_dynamic_configuration.py::TestDynamicConfigSamplingRules::test_trace_sampling_rules_override_env`

Other tests fixed (same root cause): test_apply_state, test_trace_sampling_rate_override_env, test_trace_sampling_rate_with_sampling_rules, test_log_injection_enabled, test_tracing_client_tracing_tags, test_trace_sampling_rules_override_rate, test_trace_sampling_rules_with_tags.

Changes

set_and_wait_rc: Replace wait_for_telemetry_event(..., clear=True) with telemetry event counting. Snapshot the count of app-client-configuration-change events before setting RC, then poll until the count increases—guaranteeing we match a genuinely new event, not a stale one.
set_and_wait_rc (slow tracers): For dotnet/php/ruby, which skip telemetry gating, add test_agent.clear() after setting RC to discard stale RC requests from prior configs. Prevents matching an old ACK before the new config is applied.
test_capability_tracing_sample_rules: Increase wait_loops from 100 to 400 (~4s) so the library has enough time to send its first RC request.
reverted unflake a System Test: 'test_remote_sampling_rules_retention' #6342 , as this generalized fix applies there as well

Reviewer checklist

Anything but tests/ or manifests/ is modified ? I have the approval from R&P team
A docker base image is modified?
- the relevant build-XXX-image label is present
A scenario is added, removed or renamed?
- Get a review from R&P team

github-actions · 2026-02-23T20:32:09Z

CODEOWNERS have been resolved as:

tests/appsec/test_asm_standalone.py                                     @DataDog/asm-libraries @DataDog/system-tests-core
tests/parametric/test_dynamic_configuration.py                          @DataDog/system-tests-core @DataDog/apm-sdk-capabilities
utils/docker_fixtures/_test_agent.py                                    @DataDog/system-tests-core

datadog-official · 2026-02-23T20:39:55Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: b6929c3 | Docs | Datadog PR Page | Was this helpful? Give us feedback!}

…#6342)" This reverts commit 5d9142e.

vlad-scherbich · 2026-02-24T21:05:57Z

@cbeauchesne , second attempt to generalize the fix for #6342

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 057ab1dc44

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-02-24T21:08:13Z

tests/parametric/test_dynamic_configuration.py

+    if context.library.name in _SLOW_TRACERS:
+        # these tracers do not reliably emit app-client-configuration-change on RC update
+        _set_rc(test_agent, rc_config, config_id)


Clear stale RC requests in slow-tracer fallback path

In the context.library.name in _SLOW_TRACERS branch, set_and_wait_rc skips telemetry gating and immediately waits for ACKNOWLEDGED, but wait_for_rc_apply_state matches any previously recorded ACK in the session. Because RC polling is continuous, dotnet/php/ruby can have stale ACK requests from the prior config, so this call may return before the new config is applied; that recreates the same race this patch is trying to eliminate for those tracers.

Useful? React with 👍 / 👎.

KowalskiThomas · 2026-02-25T09:16:36Z

tests/parametric/test_dynamic_configuration.py

+    on RC update, so we skip the telemetry wait and use config_id filtering to avoid stale ACKs.
    """
    rc_config = _create_rc_config(config_overrides)
+    resolved_config_id = config_id or str(hash(json.dumps(rc_config)))


Does json.dumps provide deterministic ordering of fields if you don't pass sort_fields or whatever the flag is? I think it could mean the resolved_config_id is not deterministic either (and I'm not sure whether it matters in this context)

KowalskiThomas · 2026-02-25T09:17:41Z

tests/parametric/test_dynamic_configuration.py

+    for _ in range(_MAX_RC_EVENT_WAIT_LOOPS):
+        if test_agent.count_telemetry_events("app-client-configuration-change") > pre_count:
+            break
+        time.sleep(0.01)
+    else:


for / else really is something I'll never be able to wrap my head around, but good job using it here 😅

KowalskiThomas · 2026-02-25T09:18:16Z

utils/docker_fixtures/_test_agent.py

+                    if message.get("request_type") == event_name:
+                        if message.get("application", {}).get("language_version") != "SIDECAR":
+                            count += 1


Could we do guard-style here? like if not: continue instead of if: if: if: do()?

vlad-scherbich mentioned this pull request Feb 23, 2026

Fix set_and_wait_rc helper by clearing out stale events before new co… #6349

Draft

5 tasks

vlad-scherbich force-pushed the vlad/fix-dynamic-config-flakes branch 2 times, most recently from 1efc4da to a8cf21a Compare February 23, 2026 21:10

vlad-scherbich changed the title ~~Attempt to fix race conditions in 'set_and_wait_rc' by counting telem…~~ [system_tests] Fix race conditions in 'set_and_wait_rc' by counting telemetry events Feb 24, 2026

vlad-scherbich force-pushed the vlad/fix-dynamic-config-flakes branch 2 times, most recently from 43ab89f to 6f22dcf Compare February 24, 2026 16:55

vlad-scherbich added 2 commits February 24, 2026 12:14

Fix race conditions in 'set_and_wait_rc' by counting telemetry events

62142c4

Revert "unflake a System Test: 'test_remote_sampling_rules_retention' (…

7885079

…#6342)" This reverts commit 5d9142e.

vlad-scherbich force-pushed the vlad/fix-dynamic-config-flakes branch from 0fff71e to 7885079 Compare February 24, 2026 17:14

re-import time

057ab1d

vlad-scherbich mentioned this pull request Feb 24, 2026

[system_tests] Fix flaky test_asm_standalone.py tests #6377

Open

5 tasks

vlad-scherbich requested review from KowalskiThomas and taegyunkim February 24, 2026 19:45

vlad-scherbich marked this pull request as ready for review February 24, 2026 21:04

vlad-scherbich requested review from a team as code owners February 24, 2026 21:04

vlad-scherbich requested review from mtoffl01 and removed request for a team February 24, 2026 21:04

chatgpt-codex-connector bot reviewed Feb 24, 2026

View reviewed changes

vlad-scherbich marked this pull request as draft February 24, 2026 22:54

vlad-scherbich added 2 commits February 24, 2026 17:57

clear agent for slow runtimes

5f02902

filter ACKs by config ID when available, instead of clearing session

b6929c3

KowalskiThomas reviewed Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[system_tests] Fix race conditions in 'set_and_wait_rc' by counting telemetry events#6371

[system_tests] Fix race conditions in 'set_and_wait_rc' by counting telemetry events#6371
vlad-scherbich wants to merge 5 commits intomainfrom
vlad/fix-dynamic-config-flakes

vlad-scherbich commented Feb 23, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

datadog-official bot commented Feb 23, 2026 •

edited by datadog-datadog-prod-us1-2 bot

Loading

Uh oh!

vlad-scherbich commented Feb 24, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 24, 2026

Uh oh!

KowalskiThomas Feb 25, 2026

Uh oh!

KowalskiThomas Feb 25, 2026

Uh oh!

KowalskiThomas Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

vlad-scherbich commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Tests fixed

Changes

Reviewer checklist

Uh oh!

github-actions bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datadog-official bot commented Feb 23, 2026 • edited by datadog-datadog-prod-us1-2 bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vlad-scherbich commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

KowalskiThomas Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

KowalskiThomas Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

KowalskiThomas Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vlad-scherbich commented Feb 23, 2026 •

edited

Loading

github-actions bot commented Feb 23, 2026 •

edited

Loading

datadog-official bot commented Feb 23, 2026 •

edited by datadog-datadog-prod-us1-2 bot

Loading

vlad-scherbich commented Feb 24, 2026 •

edited

Loading