Skip to content

[nodejs] fix remote config polling slowing down multiple tests#6808

Open
rochdev wants to merge 15 commits intomainfrom
rochdev/fix-slowest-nodejs-tests
Open

[nodejs] fix remote config polling slowing down multiple tests#6808
rochdev wants to merge 15 commits intomainfrom
rochdev/fix-slowest-nodejs-tests

Conversation

@rochdev
Copy link
Copy Markdown
Member

@rochdev rochdev commented Apr 24, 2026

Motivation

That was slowing everything down.

Changes

Fix remote config polling slowing down multiple tests.

Explanation from Claude when I originally asked it to fix APPSEC_RUNTIME_ACTIVATION and APPSEC_AUTO_EVENTS_RC being the slowest tests:


Two changes made:

utils/_context/_scenarios/init.py — Added DD_REMOTE_CONFIG_POLL_INTERVAL_SECONDS: "0.5" to APPSEC_RUNTIME_ACTIVATION. Without this, each apply() call waited for the default ~5s poll cycle before the tracer acknowledged the config. The scenario had 58 total apply() calls across its test files (34 in test_remote_config_rule_changes.py alone), making this the dominant bottleneck.

utils/_remote_config.py:144 — Reduced time.sleep(2) to time.sleep(0.5). This sleep runs after every send_state() call to give subprocesses time to catch up. With a 0.5s poll interval, 0.5s (one poll cycle) is sufficient — any worker process that hadn't applied yet will poll within that window. The old 2s value was actually already too short for scenarios using the default 5s interval, but 4× too long for scenarios using 0.5s.

Estimated speedup:

APPSEC_RUNTIME_ACTIVATION: each apply goes from ~5s + 2s = ~7s → ~0.5s + 0.5s = ~1s — roughly 7× faster
APPSEC_AUTO_EVENTS_RC: each apply goes from ~0.5s + 2s = ~2.5s → ~0.5s + 0.5s = ~1s — roughly 2.5× faster


I then asked it to apply the same fix to everything else and applied further suggestions. I isolated the changes to Node because system tests seem extremely flaky for other languages, so I don't know how to differentiate flakiness from actual failures caused by a change.

Workflow

  1. ⚠️ Create your PR as draft ⚠️
  2. Work on you PR until the CI passes
  3. Mark it as ready for review
    • Test logic is modified? -> Get a review from RFC owner.
    • Framework is modified, or non obvious usage of it -> get a review from R&P team

🚀 Once your PR is reviewed and the CI green, you can merge it!

🛟 #apm-shared-testing 🛟

Reviewer checklist

  • Anything but tests/ or manifests/ is modified ? I have the approval from R&P team
  • A docker base image is modified?
    • the relevant build-XXX-image label is present
  • A scenario is added, removed or renamed?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 24, 2026

CODEOWNERS have been resolved as:

tests/debugger/test_debugger_inproduct_enablement.py                    @DataDog/debugger @DataDog/system-tests-core
utils/_context/_scenarios/debugger.py                                   @DataDog/system-tests-core
utils/_context/_scenarios/endtoend.py                                   @DataDog/system-tests-core
utils/_remote_config.py                                                 @DataDog/system-tests-core

@datadog-prod-us1-5
Copy link
Copy Markdown

datadog-prod-us1-5 Bot commented Apr 24, 2026

Tests

Fix all issues with BitsAI or with Cursor

⚠️ Warnings

🧪 1 Test failed

tests.remote_config.test_remote_configuration.Test_RemoteConfigurationUpdateSequenceLiveDebugging.test_tracer_update_sequence[apache-mod-7.2-zts] from system_tests_suite   View in Datadog   (Fix with Cursor)
utils.interfaces._core.ValidationError: ("{'path': 'datadog/2/LIVE_DEBUGGING/metricProbe_33a64d99-fbed-5eab-bb10-80735405c09b/config', 'length': 360, 'hashes': [{'algorithm': 'sha256', 'hash': '6daaa0eb13996d340d99983bb014ef17453bad39edf19041f24a87a159ff94fe'}]} should be in cached_target_files property: [{'path': 'datadog/2/LIVE_DEBUGGING/metricProbe_33a64d99-fbed-5eab-bb10-80735405c09b/config', 'length': 365, 'hashes': [{'algorithm': 'sha256', 'hash': '4f12b33894fd7178f2464d3fc2c63223c3ee2a29a5cf0936de60ceee88fd0656'}]}, {'path': 'datadog/2/LIVE_DEBUGGING/spanProbe_kepf0cf2-9top-45cf-9f39-59installed/config', 'length': 188, 'hashes': [{'algorithm': 'sha256', 'hash': 'd22df7cf36e9f2b0134c4f6535a7340b9a4435876b79280f91d80942c9562b5b'}]}, {'path': 'datadog/2/LIVE_DEBUGGING/logProbe_22953c88-eadc-4f9a-aa0f-7f6243f4bf8a/config', 'length': 239, 'hashes': [{'algorithm': 'sha256', 'hash': '8176095e451a5f4d49db40e5eadf7d79b0ca6956cf28c83f87d18f4d66ea2583'}]}]", 'SUCCESS - Add back the initial config along with the second (add multiple). RFC about integrating with remote-config: https://docs.google.com/document/d/1u_G7TOr8wJX0dOM_zUDKuRJgxoJU_hVTd5SeaMucQUs')

self = <tests.remote_config.test_remote_configuration.Test_RemoteConfigurationUpdateSequenceLiveDebugging object at 0x7f75142444a0>

    def test_tracer_update_sequence(self):
        """Test update sequence, based on a scenario mocked in the proxy"""
    
        # Index the request number by runtime ID so that we can support applications
        # that spawns multiple worker processes, each running its own RCM client.
        request_number: dict = defaultdict(int)
...

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: ee07ec9 | Docs | Datadog PR Page | Give us feedback!

@rochdev rochdev marked this pull request as ready for review April 25, 2026 01:00
@rochdev rochdev requested a review from a team as a code owner April 25, 2026 01:00
@rochdev rochdev requested a review from a team as a code owner April 25, 2026 21:01
@rochdev rochdev marked this pull request as draft April 25, 2026 21:07
@rochdev rochdev marked this pull request as ready for review April 26, 2026 00:03
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee07ec97c2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread utils/_remote_config.py
Comment on lines +144 to +145
if context.library.name != "nodejs":
time.sleep(2)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep a post-ACK delay for Node remote-config updates

send_state now skips the post-acknowledgement wait entirely for Node.js, even though this function explicitly needs a grace period for subprocess propagation. In setups that call send_state(...) and immediately issue a request (for example AppSec RC setup flows), this can race: the ACK can be observed while some worker processes are still on the previous config, producing intermittent false negatives only on Node.js. A short Node-specific delay (e.g., one poll interval) is still needed to preserve determinism.

Useful? React with 👍 / 👎.

self.di_initial_disabled = not self.wait_for_all_probes(statuses=["EMITTING"], timeout=TIMEOUT)

_send_config(enabled=True, reset=False)
_send_config(enabled=True, reset=False, wait_installed=context.library == "nodejs")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid runtime library branching in shared test setup

This introduces Node-specific behavior directly in shared test logic (context.library == "nodejs") instead of splitting behavior via manifests/separate test methods. Per .cursor/rules/pr-review.mdc (rule 2), runtime library branching in tests should be avoided because it hides per-library behavior differences inside one test path and makes scenario coverage and future maintenance error-prone.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant