Skip to content

Add elapsed-time telemetry to RunProcessAndGetOutput test helper#13829

Merged
jankratochvilcz merged 3 commits into
dotnet:mainfrom
jankratochvilcz:upstream-pr/nettaskhost-telemetry
May 21, 2026
Merged

Add elapsed-time telemetry to RunProcessAndGetOutput test helper#13829
jankratochvilcz merged 3 commits into
dotnet:mainfrom
jankratochvilcz:upstream-pr/nettaskhost-telemetry

Conversation

@jankratochvilcz
Copy link
Copy Markdown
Contributor

The bootstrap-MSBuild test helper (RunProcessAndGetOutput in src/UnitTests.Shared/RunnerUtilities.cs) currently surfaces wall-clock duration only when the process exceeds its timeout. On success the caller has no signal at all about how close to the budget it ran.

This makes timeout tuning blind — both raising a budget (no data to justify the new value) and tightening one later (no data showing actual elapsed never approached the previous value).

Change

Wrap the existing WaitForExit with a Stopwatch and log elapsedMs / budgetMs / headroomMs on success via the existing WriteOutput path. Pure additive — no behavior change.

Risk

Test-only helper, single WriteOutput line added. All ~50 callers benefit; none are affected.

Copilot AI review requested due to automatic review settings May 21, 2026 09:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds elapsed-time telemetry to the RunnerUtilities.RunProcessAndGetOutput unit-test helper so test runs emit timing info even when they succeed, helping callers tune timeout budgets with real data.

Changes:

  • Start a Stopwatch around process execution/waiting logic.
  • Emit a new WriteOutput line that reports elapsed time, configured budget, and computed headroom.
Comments suppressed due to low confidence (1)

src/UnitTests.Shared/RunnerUtilities.cs:180

  • In DebugUnitTests mode the timeout budget isn’t enforced (the code uses WaitForExit() with no timeout), but the telemetry line still reports budget/headroom based on timeoutMilliseconds. This can produce misleading negative headroom for successful runs; consider either skipping the budget/headroom fields when Traits.Instance.DebugUnitTests is true or logging a distinct message indicating the run was unbounded.
                Stopwatch sw = Stopwatch.StartNew();
                if (Traits.Instance.DebugUnitTests)
                {
                    p.WaitForExit();
                }
                else if (!p.WaitForExit(timeoutMilliseconds))

Comment thread src/UnitTests.Shared/RunnerUtilities.cs Outdated
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary — PR #13829

Clean, low-risk change adding timing telemetry to test infrastructure (RunnerUtilities.cs).

Status Dimensions
✅ LGTM (22) Backwards Compat, ChangeWave, Performance, Test Coverage, Error Messages, Logging, String Comparison, API Surface, Target Authoring, Design, Cross-Platform, Code Simplification, Concurrency, Naming, SDK Integration, Idiomatic C#, File I/O, Build Infrastructure, Scope, Evaluation Model, Dependency Mgmt, Security
📝 NIT (1) Documentation Accuracy — phantom workflow reference
⏭️ N/A (1) Correctness — no edge case concerns for local Stopwatch in test code

Assessment: The Stopwatch is a local variable with no concurrency/shared-state concerns. The System.Diagnostics using already exists. The arithmetic (timeoutMilliseconds - sw.ElapsedMilliseconds) correctly promotes int to long. Negative headroom in the DebugUnitTests path (no timeout enforcement) or when stream flushing exceeds budget is valid and useful information.

One minor nit filed inline about the comment referencing a non-existent workflow.

Note

🔒 Integrity filter blocked 1 item

The following item were blocked because they don't meet the GitHub integrity level.

  • #13829 pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

Generated by Expert Code Review (on open) for issue #13829 · ● 1.5M

Comment thread src/UnitTests.Shared/RunnerUtilities.cs Outdated
@jankratochvilcz jankratochvilcz force-pushed the upstream-pr/nettaskhost-telemetry branch from 823de5a to 1e89a6a Compare May 21, 2026 09:38
@jankratochvilcz
Copy link
Copy Markdown
Contributor Author

Thanks for the review. Addressed the legitimate concern; commented on the others.

Addressed (force-pushed):

  • DebugUnitTests mode no longer logs a misleading budget/headroom. When that flag is set the timeout is intentionally not enforced ( p.WaitForExit() without timeout), so the line now reads:
    Process {pid} exited in {ms}ms (DebugUnitTests: budget not enforced)
    Non-debug runs are unchanged.

Not addressed:

  • The reviewer's Integrity filter blocked / phantom workflow notes are auto-generated agent meta; nothing actionable for the diff itself.

This PR is test infra only (src/UnitTests.Shared/RunnerUtilities.cs), so production hot-path concerns don't apply.

@jankratochvilcz jankratochvilcz force-pushed the upstream-pr/nettaskhost-telemetry branch from 1e89a6a to e79c332 Compare May 21, 2026 09:43
Replaces the global default-timeout bump (reviewed against on PR #45) with
elapsed-time logging in the bootstrap-MSBuild test helper. The Stopwatch
around WaitForExit lets callers see actual ms taken vs the budget, so when
follow-up PRs add per-test timeout overrides (e.g. the unmerged
NetTaskHostTest_FallbackToDotnet tests tracked in #41/#42), the author can
pick a tight value based on data instead of guessing.

Rationale for not bumping the default: the helper is called from 53 test
sites across 11 files; a global bump would silently triple the worst-case
budget for tests that today don't even use 30s, masking perf regressions.
The two flaky tests cited in the original issues don't exist in main yet
(they're on a feature branch), so per-call scoping isn't possible today;
this PR makes the data available so it can be scoped correctly when they
land.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jankratochvilcz jankratochvilcz force-pushed the upstream-pr/nettaskhost-telemetry branch from e79c332 to 34add7c Compare May 21, 2026 09:43
Comment thread src/UnitTests.Shared/RunnerUtilities.cs Outdated
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jankratochvilcz jankratochvilcz enabled auto-merge (squash) May 21, 2026 13:35
@jankratochvilcz jankratochvilcz merged commit f095875 into dotnet:main May 21, 2026
11 checks passed
@jankratochvilcz
Copy link
Copy Markdown
Contributor Author

Closing in favor of #13836. The required license/cla check has been pending for hours on the head commit (c0801793) even though it reported success on every earlier commit in this PR. Re-pushed the exact same head SHA to a new branch and opened #13836 to mint a fresh CLA check — buggy CLA pipeline.

@jankratochvilcz
Copy link
Copy Markdown
Contributor Author

Heads-up: auto-merge fired ~6 s before I checked, so this actually landed. Disregard the prior comment about #13836 — that PR is being closed as a duplicate.

@jankratochvilcz
Copy link
Copy Markdown
Contributor Author

24h post-merge data from main (dnceng-public pipeline 75, builds 1430301, 1430551, 1430938, 1431045, 1431261).

Aggregate across 5 414 process exits captured by the new telemetry (no DebugUnitTests mode runs observed):

min ms p50 ms p95 ms max ms
elapsed 48 1 288 5 745 19 323
budget 30 000 30 000 900 000 900 000
remaining 13 641 28 934 897 420 899 680

Useful signal:

  • remaining < 0 count: 0. No process came close to its budget across 5k+ exits — the helper's budget enforcement is comfortably correct.
  • remaining > 10 × elapsed count: 5 059 (~93% of all exits). Most callers have very generous budgets.

Egregious cases — note that Microsoft.Build.BuildCheck.UnitTests.EndToEndTests callers run with a 900 000 ms (15 min) budget while typical actual elapsed is 700–12 000 ms. That's a ~75–900x oversize. Example line:

Process 15880 exited in 12047ms (budget 900000ms, 887953ms remaining)

Not opening a follow-up PR for that here (out of scope for this telemetry PR), but flagging it as a worthwhile target for a separate cleanup — the helper now makes that kind of audit possible across all ~50 callers without extra instrumentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants