Skip to content

Conversation

@thomasbertet
Copy link
Collaborator

@thomasbertet thomasbertet commented Feb 9, 2026

Motivation

The profiler can get stuck in "stopped" status after periods of inactivity, preventing it from collecting profiling data. Two race conditions were identified:

  1. Session expires while profiler is paused: When the tab is hidden, the profiler pauses. If the session expires during this time, stopProfilerInstance returned early without updating the instance state from 'paused' to 'stopped', causing the session renewal check to fail.

  2. Session renews while stop is in progress: When SESSION_RENEWED fires while the async stopProfiling was still executing, the check instance.state === 'stopped' failed because the state hadn't been updated yet.

Changes

Fix 1: Handle paused state in stopProfilerInstance

When stopProfilerInstance is called while the profiler is paused, properly transition the state to 'stopped' with the appropriate stateReason.

Fix 2: Sync state changes with fire-and-forget data collection

Instead of using await to wait for profiler to stop (with data being sent), we use a sync stopProfiling :

  • stopProfilerInstance and pauseProfilerInstance now update state synchronously
  • Data collection (collectProfilerInstance) continues in the background as fire-and-forget
  • This eliminates the race condition by design: when SESSION_RENEWED fires, the state is already 'stopped'
  • Changed stop() return type from Promise<void> to void

This approach is simpler and more robust than the flag-based solution (commit 2 in this PR) because:

  • No flags to track or reset
  • State transitions are immediate and predictable
  • No risk of forgetting to handle edge cases in flag logic

Unit tests

Since we changed the stop and stopProfilerInstance functions to be sync instead of async, and we don't wait for the profiles to be sent, we needed to update the unit tests to account for:

  • stop is updating synchronously the status of the profiler.
  • Checking isStopped no longer tells you the latest "Profile" was sent.
    So overall we needed to await the requests to make sure it was properly sent.

Test instructions

  1. Start the profiler
  2. Switch to another tab (profiler pauses)
  3. Wait for session to expire (15 min inactivity) or manually trigger SESSION_EXPIRED
  4. Trigger session renewal (or return to tab and interact)
  5. Verify profiler restarts

For the race condition test:

  1. Fire SESSION_EXPIRED immediately followed by SESSION_RENEWED
  2. Verify profiler restarts immediately (state is already 'stopped' when renewal fires)

Both of theses have unit test to cover it.

Checklist

  • Tested locally
  • Tested on staging
  • Added unit tests for this change.
  • Added e2e/integration tests for this change.
  • Updated documentation and/or relevant AGENTS.md file

When the profiler was paused (tab hidden) and the session expired,
stopProfilerInstance would return early without updating instance.state
from 'paused' to 'stopped'. This caused the session renewal check to
fail, leaving the profiler stuck.
@cit-pr-commenter-54b7da
Copy link

cit-pr-commenter-54b7da bot commented Feb 9, 2026

Bundles Sizes Evolution

📦 Bundle Name Base Size Local Size 𝚫 𝚫% Status
Rum 168.31 KiB 168.30 KiB -12 B -0.01%
Rum Profiler 4.31 KiB 4.29 KiB -21 B -0.48%
Rum Recorder 24.54 KiB 24.54 KiB 0 B 0.00%
Logs 56.25 KiB 56.25 KiB 0 B 0.00%
Flagging 944 B 944 B 0 B 0.00%
Rum Slim 125.19 KiB 125.19 KiB 0 B 0.00%
Worker 23.63 KiB 23.63 KiB 0 B 0.00%
🚀 CPU Performance
Action Name Base CPU Time (ms) Local CPU Time (ms) 𝚫%
RUM - add global context 0.0039 0.0046 +17.95%
RUM - add action 0.0125 0.0161 +28.80%
RUM - add error 0.0122 0.0198 +62.30%
RUM - add timing 0.0025 0.0034 +36.00%
RUM - start view 0.0119 0.0146 +22.69%
RUM - start/stop session replay recording 0.0006 0.001 +66.67%
Logs - log message 0.0137 0.0175 +27.74%
🧠 Memory Performance
Action Name Base Memory Consumption Local Memory Consumption 𝚫
RUM - add global context 26.41 KiB 27.75 KiB +1.34 KiB
RUM - add action 48.86 KiB 49.78 KiB +945 B
RUM - add timing 28.07 KiB 27.53 KiB -549 B
RUM - add error 54.07 KiB 54.48 KiB +423 B
RUM - start/stop session replay recording 26.78 KiB 26.34 KiB -457 B
RUM - start view 452.59 KiB 455.20 KiB +2.61 KiB
Logs - log message 45.03 KiB 45.56 KiB +535 B

🔗 RealWorld

When SESSION_RENEWED fires while the async stopProfiling is still in
progress, the check for instance.state === 'stopped' fails because
the state hasn't been updated yet. This causes the profiler to not
restart.

Use flags to track when SESSION_RENEWED fires during the stop process
and restart the profiler after stop completes if needed.
@thomasbertet thomasbertet force-pushed the thomas.bertet/PROF-13701-fix-profiler-stuck-on-session-expire-while-paused branch from 31d5c0f to e1f9df7 Compare February 9, 2026 09:23
… collection

Replace flag-based approach with simpler sync design:
- stopProfilerInstance/pauseProfilerInstance update state synchronously
- Data collection continues in background (fire-and-forget)
- Eliminates race condition by design: SESSION_RENEWED always sees correct state
- Change stop() return type from Promise<void> to void
@thomasbertet thomasbertet marked this pull request as ready for review February 9, 2026 12:48
@thomasbertet thomasbertet requested a review from a team as a code owner February 9, 2026 12:48
@thomasbertet
Copy link
Collaborator Author

/to-staging

@gh-worker-devflow-routing-ef8351
Copy link

gh-worker-devflow-routing-ef8351 bot commented Feb 9, 2026

View all feedbacks in Devflow UI.

2026-02-09 12:53:45 UTC ℹ️ Start processing command /to-staging


2026-02-09 12:53:51 UTC ℹ️ Branch Integration: starting soon, merge expected in approximately 0s (p90)

Commit 7e19e6260a will soon be integrated into staging-07.


2026-02-09 13:07:42 UTC ℹ️ Branch Integration: this commit was successfully integrated

Commit 7e19e6260a has been merged into staging-07 in merge commit b6bcfd462c.

Check out the triggered pipeline on Gitlab 🦊

If you need to revert this integration, you can use the following command: /code revert-integration -b staging-07

@thomasbertet thomasbertet marked this pull request as draft February 9, 2026 12:53
gh-worker-dd-mergequeue-cf854d bot added a commit that referenced this pull request Feb 9, 2026
…re-while-paused (#4152) into staging-07

Integrated commit sha: 7e19e62

Co-authored-by: thomasbertet <thomas.bertet@datadoghq.com>
… operations

- Use waitNextMicrotask() instead of Promise.resolve() for explicitness
- Remove unnecessary waitForBoolean() after profiler.stop() and SESSION_EXPIRED
  since state changes are now synchronous
- Add comments explaining microtask flushes needed for async data collection
@thomasbertet thomasbertet marked this pull request as ready for review February 9, 2026 15:01
@thomasbertet thomasbertet changed the title 🐛 [PROF-13701] Fix profiler stuck when session expires while paused 🐛 [PROF-13701] Fix profiler stuck when session expires Feb 9, 2026
@thomasbertet thomasbertet changed the title 🐛 [PROF-13701] Fix profiler stuck when session expires 🐛 [RUM Profiler] Fix profiler stuck when session expires Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants