Add elapsed-time telemetry to RunProcessAndGetOutput test helper#13836
Add elapsed-time telemetry to RunProcessAndGetOutput test helper#13836jankratochvilcz wants to merge 3 commits into
Conversation
Replaces the global default-timeout bump (reviewed against on PR #45) with elapsed-time logging in the bootstrap-MSBuild test helper. The Stopwatch around WaitForExit lets callers see actual ms taken vs the budget, so when follow-up PRs add per-test timeout overrides (e.g. the unmerged NetTaskHostTest_FallbackToDotnet tests tracked in #41/#42), the author can pick a tight value based on data instead of guessing. Rationale for not bumping the default: the helper is called from 53 test sites across 11 files; a global bump would silently triple the worst-case budget for tests that today don't even use 30s, masking perf regressions. The two flaky tests cited in the original issues don't exist in main yet (they're on a feature branch), so per-call scoping isn't possible today; this PR makes the data available so it can be scoped correctly when they land. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@AlesProkop sorry for the re-ping — could I get a re-approval here? This is #13829 reopened verbatim (same head SHA, just on a new branch) because the required |
|
Closing as duplicate — #13829 auto-merged 6 seconds before I opened this one (the auto-merge I'd set up earlier fired the moment |
There was a problem hiding this comment.
Review Summary
This is a straightforward, low-risk diagnostic improvement that adds timing output to RunProcessAndGetOutput. The change helps developers investigate timeout-related test failures by showing how much of the timeout budget was consumed. Good addition for debuggability.
No blocking issues found.
Minor observations (all non-blocking):
| # | Severity | Issue |
|---|---|---|
| 1 | Low | Stopwatch measures wait time, not total execution time from p.Start(). The log message "exited in Xms" could be slightly misleading. |
| 2 | Low | sw.Stop() is functionally dead — the stopwatch value was already captured and is never read again. |
| 3 | Low | Negative "remaining" is theoretically possible due to scheduling jitter after WaitForExit returns. |
| 4 | Nit | "Telemetry" label in comment is a misnomer for diagnostic log output. |
What's good:
- Correctly captures elapsed before the post-exit drain
WaitForExit()so the budget comparison is accurate. - Correctly handles the DebugUnitTests path (no misleading budget info when budget isn't enforced).
- No behavioral change to existing logic — purely additive diagnostic output.
- Scoped to test infrastructure only — zero risk to production code.
Note
🔒 Integrity filter blocked 2 items
The following items were blocked because they don't meet the GitHub integrity level.
- #13836
pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved". - #13836
pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
To allow these resources, lower min-integrity in your GitHub frontmatter:
tools:
github:
min-integrity: approved # merged | approved | unapproved | noneGenerated by Expert Code Review (on open) for issue #13836 · ● 1.2M
| p.StandardInput.Dispose(); | ||
|
|
||
| TimeSpan timeout = TimeSpan.FromMilliseconds(timeoutMilliseconds); | ||
| Stopwatch sw = Stopwatch.StartNew(); |
There was a problem hiding this comment.
Nit (documentation gap): The stopwatch starts after p.Start() and async output reading has begun (lines 168–172). This means exitElapsedMs measures wait time (time spent in WaitForExit), not total process execution time. The process may have done significant work between Start() and this line.
This is arguably the right thing to measure for timeout-budget purposes (since it's the same duration WaitForExit(timeout) is checking), but the log message says "exited in {exitElapsedMs}ms" which could be misread as total execution time. Consider rewording to something like "Process {pid} waited {exitElapsedMs}ms for exit" or adding a brief inline comment noting it's measuring wait duration, not wall-clock execution time from process start.
| pid = p.Id; | ||
| successfulExit = p.ExitCode == 0; | ||
|
|
||
| // Telemetry: log actual elapsed time so callers tuning their timeouts can see |
There was a problem hiding this comment.
Nit (naming): The comment says "Telemetry" but this is diagnostic logging via WriteOutput, not actual telemetry infrastructure (no counters, no structured events, no collection pipeline). Consider using "Diagnostics:" or just removing the label — the comment already explains the purpose clearly without it.
| } | ||
| else | ||
| { | ||
| WriteOutput($"Process {pid} exited in {exitElapsedMs}ms (budget {timeoutMilliseconds}ms, {timeoutMilliseconds - exitElapsedMs}ms remaining)"); |
There was a problem hiding this comment.
Minor robustness: timeoutMilliseconds - exitElapsedMs mixes int and long, producing a long result. In normal operation this should always be non-negative (since WaitForExit(timeout) would have thrown on timeout). However, there's a theoretical window where thread scheduling after WaitForExit returns could make exitElapsedMs slightly exceed timeoutMilliseconds. A negative "remaining" value would be confusing in the log.
Consider using Math.Max(0, timeoutMilliseconds - exitElapsedMs) or simply documenting that negative values indicate measurement jitter. This is very low severity — just a polish item.
| // See https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.process.waitforexit?view=net-6.0#system-diagnostics-process-waitforexit(system-int32). | ||
| // The overload WaitForExit() waits for the error and output to be handled. The WaitForExit(int timeout) overload does not, so we could lose the data. | ||
| p.WaitForExit(); | ||
| sw.Stop(); |
There was a problem hiding this comment.
Nit (dead code): sw.Stop() has no observable effect here — exitElapsedMs was already captured on line 190 and the stopwatch is never read again after this point. It could be removed to reduce noise, or if you prefer to keep it for clarity/hygiene that's fine too — just noting it's functionally inert.
Re-opening of #13829 to get a fresh
license/clacheck.#13829 has been blocked for hours with
mergeStateStatus=BLOCKEDeven though everymsbuild-prleg is green and the PR is approved (@AlesProkop). Root cause: the requiredlicense/clacheck is missing on the head commitc0801793— the Microsoft policy service reported on earlier commits in the PR but never on the merge commit. Same code, identical SHA pushed to a new branch to mint a fresh CLA check.Below is the original description from #13829:
RunProcessAndGetOutputis the helper that almost every cross-process MSBuild test uses (net472/netcoretask host launches, multi-proc node smoke tests, etc.). It already enforces a per-call timeout, but on failure we only see "process X is active for more than N sec" — no signal at all on the happy path about how close we ran to the budget.This is exactly the kind of data we need to retune the three flaky
ToolTasktests stabilized in the sibling PRs (#13828, #13830) — they pick their timeouts and slow-vs-fast gaps blind right now.Changes
sw.ElapsedMillisecondsimmediately afterWaitForExit()returns, before the post-exit drain loop, so the telemetry reports budget-relevant time only.Process {pid} exited in {N}ms (budget {T}ms, {T-N}ms remaining)DebugUnitTestsmode (no enforced timeout):Process {pid} exited in {N}ms (DebugUnitTests: budget not enforced)Risk
Test-helper-only changes. Single file:
src/UnitTests.Shared/RunnerUtilities.cs. No product code touched.Closes #13829.