refactor(test/e2e): consolidate shell-probe with scenario spawn infra by laitingsheng · Pull Request #5033 · NVIDIA/NemoClaw

laitingsheng · 2026-06-09T07:15:12Z

Summary

Consolidate the TypeScript spawn boundaries shared between the Vitest fixture layer and the scenario orchestrator into a single framework module. Every TS spawn site now reaches the same detached process-group cleanup, SIGTERM -> SIGKILL escalation, and NUL-byte argv guard, and the validation_suites/sandbox-exec.sh transport stays centralised for in-sandbox commands.

Related Issue

Resolves #4988.

Changes

New shared modules under test/e2e-scenario/framework/shell/: a lifecycle-only superviseChild (timeout, abort, SIGTERM -> SIGKILL on a detached process group, chunk callbacks) and trustedShellCommand / validateShellToken (NUL-byte argv guard, brand symbol).
framework/shell-probe.ts, scenarios/orchestrators/phase.ts (runAction + runShellStep), and the probe helpers (probes/util.ts, probes/docs-validation.ts, probes/diagnostics.ts) now spawn at their own site (preserving each site's argv contract and CodeQL marker) and hand the child to the shared supervisor.
Probe helpers (spawnBash, runHostCmd, docs / diagnostics probes) gain detached process-group cleanup and SIGKILL escalation; runHostCmd gains the NUL-byte argv guard.
Adds framework-tests/e2e-shell-supervisor.test.ts to cover the headline guarantees at the leaf level; existing e2e-fixture-context.test.ts and e2e-phase-orchestrators.test.ts continue to cover end-to-end behaviour.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

`npx prek run --all-files` passes
`npm test` passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
`npm run docs` builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

New Features
- Centralized shell supervision for e2e flows: robust timeouts, escalation, cancellation, and streamed stdout/stderr.
- Shared trusted-command validation to reject unsafe argv tokens (e.g., NUL bytes) and require a reason.
Tests
- Added and updated e2e tests covering supervision, timeout/escalation, spawn failures, stdin/stdout forwarding, and validation.
Bug Fixes
- More consistent timeout/spawn-error classification and improved redaction in failure reporting.

…wn infra Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai · 2026-06-09T07:15:26Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 281105e9-4518-4598-ab63-c6422d096515

📥 Commits

Reviewing files that changed from the base of the PR and between a30ddf3 and c213718.

📒 Files selected for processing (1)

test/e2e-scenario/framework-tests/e2e-phase-orchestrators.test.ts

📝 Walkthrough

Walkthrough

Extracts trusted-shell token validation and a superviseChild helper; migrates shell-probe, phase orchestrator, probe utilities, and probes to validate argv tokens and use the shared supervisor for detached process-group management, timeout escalation, abort handling, and stdout/stderr streaming.

Changes

Unified Shell Infrastructure Consolidation

Layer / File(s)	Summary
Trusted shell command brand and validation `test/e2e-scenario/framework/shell/trusted-command.ts`, `test/e2e-scenario/framework-tests/e2e-shell-supervisor.test.ts`, `test/e2e-scenario/framework-tests/e2e-phase-orchestrators.test.ts`	Branded `TrustedShellCommand` and `TrustedShellCommandInput`, `validateShellToken` rejects NUL bytes, and `trustedShellCommand` factory enforces trimmed/non-empty command and reason; tests cover NUL rejection, clean tokens, required reason, and validate-hook propagation.
Child process supervision lifecycle `test/e2e-scenario/framework/shell/supervisor.ts`, `test/e2e-scenario/framework-tests/e2e-shell-supervisor.test.ts`	`superviseChild` implements SuperviseOptions/SuperviseResult: wall-clock timeout, SIGTERM→SIGKILL escalation after grace, AbortSignal-aware cancellation (keeps timedOut false), stdin forwarding, stdout/stderr chunk callbacks, and spawnError reporting; tests validate exit codes, stream capture, timeout escalation timing, abort behavior, spawn errors, and stdin piping.
Shell probe migration to shared infrastructure `test/e2e-scenario/framework/shell-probe.ts`	Re-exports trusted-command types and factory; replaces manual process-group tracking, stdout/stderr listeners, and local timers with a single `superviseChild` call and uses supervised outcomes for artifact/error construction.
Phase orchestrator execution via supervisor `test/e2e-scenario/scenarios/orchestrators/phase.ts`	`runAction` and `runShellStep` validate all bash/script tokens via `validateShellToken`, spawn validated argv, delegate lifecycle and stdio to `superviseChild`, redact explicit secret values verbatim, await log flush, and map superviseChild outcomes to action/step results.
Probe utility helpers refactored `test/e2e-scenario/scenarios/probes/util.ts`	`spawnBash` and `runHostCmd` now validate argv with `validateShellToken`, spawn detached children, delegate timeout/lifecycle/stdio to `superviseChild`, add `timedOut` to `CmdResult`, and remove the local NUL helper.
Diagnostics probe integration `test/e2e-scenario/scenarios/probes/diagnostics.ts`	Validates `nemoclaw` argv, uses `superviseChild` for supervision and stderr capture, extends evidence with `timedOut` and `spawnError`, and returns early on spawn errors/timeouts.
Docs validation probe integration `test/e2e-scenario/scenarios/probes/docs-validation.ts`	`runCheck` now validates script/args with `validateShellToken`, uses `superviseChild`, extends results with `timedOut`/`spawnError`, and classifies timeouts/spawn errors using the new flags.
Probe timeout classifier updates `test/e2e-scenario/scenarios/probes/*`	Multiple probes updated to classify transient failures using `timedOut` instead of checking `signal === "SIGTERM"`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

NVIDIA/NemoClaw#4965: Fixture tests validate the same trustedShellCommand NUL-byte/reason guardrails and superviseChild lifecycle behaviors that this PR extracts into shared framework modules.

Suggested labels

v0.0.62

Suggested reviewers

jyaunches
cv

Poem

🐇 I dug a tunnel, found a shell,

Trimmed its args and guarded well,
SIGTERM tolled and SIGKILL came after,
Streams fed back, and secrets went softer,
Hopping home with logs all swell.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: consolidating shell-probe infrastructure with scenario spawn infrastructure into shared framework modules.
Linked Issues check	✅ Passed	The PR fully implements all acceptance criteria from `#4988`: extracts shared shell/spawn infrastructure (superviseChild, trustedShellCommand, validateShellToken) into framework/shell modules [`#4988`], preserves sandbox-exec transport centralization [`#4988`], maintains detached process-group lifecycle handling with timeout/abort/SIGTERM→SIGKILL [`#4988`], keeps trusted command contract with NUL-byte rejection and branded descriptors [`#4988`], and preserves single artifact/evidence path with canonical redaction [`#4988`].
Out of Scope Changes check	✅ Passed	All changes are directly scoped to the objectives: new shell framework modules (supervisor.ts, trusted-command.ts), refactored spawn sites (shell-probe.ts, phase.ts, util.ts, probe helpers), and updated tests. No unrelated changes detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refactor/e2e-shell-spawn-consolidation

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-09T07:19:21Z

E2E Advisor Recommendation

Required E2E: e2e-scenarios:ubuntu-repo-cloud-openclaw, e2e-vitest-scenarios:ubuntu-repo-cloud-openclaw, network-policy-e2e, shields-config-e2e, telegram-injection-e2e, diagnostics-e2e
Optional E2E: docs-validation-e2e, cloud-e2e

Dispatch hint: network-policy-e2e,shields-config-e2e,telegram-injection-e2e,diagnostics-e2e

Auto-dispatched E2E: network-policy-e2e, shields-config-e2e, telegram-injection-e2e, diagnostics-e2e via nightly-e2e.yaml at e14300899bd43ef45ed7601d437664a921f1d0bc — nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: medium

Required E2E

e2e-scenarios:ubuntu-repo-cloud-openclaw (high; live cloud onboarding plus runtime validation, up to 90 minutes): Runs the typed Scenario Runner path that consumes PhaseOrchestrator.runAction/runShellStep for install, onboarding, state-validation, and runtime assertions. Required because the PR changes shared spawn supervision, timeout handling, log flushing, and secret redaction at this boundary.
e2e-vitest-scenarios:ubuntu-repo-cloud-openclaw (high; live scenario fixture run, workflow timeout 45 minutes per scenario): Exercises the Vitest live fixture stack that uses ShellProbe and trustedShellCommand for host CLI, gateway, sandbox, and provider checks. Required because ShellProbe now delegates lifecycle cleanup and spawn result handling to the new supervisor.
network-policy-e2e (medium-high; live sandbox and gateway policy check): Validates the real network-policy deny boundary. Required because the PR changes the TypeScript network policy probe and shared sandbox command execution/timeout semantics used by security probe logic.
shields-config-e2e (medium-high; live sandbox security posture check, workflow timeout 30 minutes): Validates shields status and protected config permissions in a real sandbox. Required because the PR changes shields probe timeout classification and the shared host/sandbox command runner it depends on.
telegram-injection-e2e (high; live sandbox plus Telegram injection validation, workflow timeout 60 minutes): Validates command-injection resistance in a real assistant/messaging flow. Required because the PR changes injection-blocked probe timeout classification and shared sandbox command execution used for injection payload checks.
diagnostics-e2e (medium; live CLI diagnostics workflow): Validates the real diagnostics bundle flow. Required because the PR changes diagnostics probe spawning, timeout handling, spawn-error classification, and evidence shape for debug bundle production.

Optional E2E

docs-validation-e2e (low; no sandbox required, workflow timeout 15 minutes): Useful confidence for the docsValidationProbe changes to bash supervision, timeout classification, and spawn-error evidence. Not merge-blocking for this PR because the product docs/CLI surfaces themselves were not changed.
cloud-e2e (high; full cloud E2E with sandbox lifecycle): Broad legacy end-to-end smoke for install, onboarding, sandbox, and inference that can catch regressions not covered by the typed scenario runner recommendation. Optional because the direct code changes are in the E2E TypeScript framework/probes, not product runtime code.

New E2E recommendations

typed-security-probe-coverage (high): Existing nightly jobs cover the product security behavior, but the changed TypeScript built-in probes under test/e2e-scenario/scenarios/probes are not clearly exercised by any canonical live typed scenario because the current baseline scenarios do not include security-shields, security-policy, or security-injection suites.
- Suggested test: Add a dispatchable typed scenario, for example ubuntu-repo-cloud-openclaw-security-probes, that includes security-shields, security-policy, and security-injection suites and runs through E2E / Scenario Runner.
typed-diagnostics-docs-probe-coverage (medium): diagnosticsProbe and docsValidationProbe are registered in the typed assertion registry, but there is no obvious canonical live typed scenario dedicated to exercising those probe refs after changes to their spawn supervision and evidence shapes.
- Suggested test: Add a lightweight typed probe-validation scenario or workflow input that runs the diagnostics and docs-validation suites through test/e2e-scenario/scenarios/run.ts.
advisor-dispatch-coverage-for-scenario-workflows (medium): The dispatch hint can target nightly-e2e jobs, but this PR also needs scenario-runner and Vitest-scenario workflows to validate the changed TypeScript scenario framework directly.
- Suggested test: Extend E2E advisor dispatch support to trigger e2e-scenarios.yaml and e2e-vitest-scenarios.yaml with a scenario input such as ubuntu-repo-cloud-openclaw.

Dispatch hint

Workflow: nightly-e2e.yaml
jobs input: network-policy-e2e,shields-config-e2e,telegram-injection-e2e,diagnostics-e2e

github-actions · 2026-06-09T07:19:22Z

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

e2e-scenarios-all: Shared scenario execution infrastructure changed: the shell supervisor/trusted-command boundary, ShellProbe, PhaseOrchestrator, and multiple built-in probe helpers now share new spawn timeout, process-group cleanup, NUL-byte validation, and timeout classification behavior. These paths are exercised across scenario phases and suites, so the full scenario fan-out is required.
- Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

None.

Relevant changed files

test/e2e-scenario/framework-tests/e2e-phase-orchestrators.test.ts
test/e2e-scenario/framework-tests/e2e-shell-supervisor.test.ts
test/e2e-scenario/framework/shell-probe.ts
test/e2e-scenario/framework/shell/supervisor.ts
test/e2e-scenario/framework/shell/trusted-command.ts
test/e2e-scenario/scenarios/orchestrators/phase.ts
test/e2e-scenario/scenarios/probes/diagnostics.ts
test/e2e-scenario/scenarios/probes/docs-validation.ts
test/e2e-scenario/scenarios/probes/injection-blocked.ts
test/e2e-scenario/scenarios/probes/network-policy.ts
test/e2e-scenario/scenarios/probes/shields-config.ts
test/e2e-scenario/scenarios/probes/util.ts

github-actions · 2026-06-09T07:19:51Z

PR Review Advisor

Findings: 1 needs attention, 1 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 0 still apply, 1 new item found

Review findings

🛠️ Needs attention

Keep SIGKILL escalation armed after the process-group leader closes (test/e2e-scenario/framework/shell/supervisor.ts:139): The linked issue requires preserving detached process-group timeout/abort cleanup, but `settle()` clears the pending SIGKILL timer as soon as the supervised child emits `close`. That can regress the previous per-call-site cleanup behavior when the process-group leader exits on SIGTERM while a descendant in the same group ignores SIGTERM: the promise resolves, the SIGKILL escalation is canceled, and the descendant can keep running. The new test covers a leader that ignores SIGTERM, but not the descendant-only case.
- Recommendation: Do not cancel the escalation timer just because the leader closed after timeout/abort, or otherwise send a final process-group SIGKILL before resolving when termination was started. Add a regression test where the leader exits on SIGTERM while a child process in the detached group ignores SIGTERM, then assert the descendant is reaped after the grace period.
- Evidence: `terminate()` arms `killTimer` for SIGKILL, but `settle()` unconditionally runs `if (killTimer) clearTimeout(killTimer);` on `close`. Prior inline implementations either left the escalation timer armed after termination started or used an untracked nested timer, so this contradicts the acceptance clause: “Preserve detached process-group timeout/abort cleanup and the trusted/NUL-validated command contract.”

🔎 Worth checking

Source-of-truth review needed: test/e2e-scenario/framework/shell/supervisor.ts SIGKILL escalation lifecycle: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `settle()` unconditionally clears `killTimer` on child `close`, canceling the scheduled SIGKILL escalation.

🌱 Nice ideas

None.

Consider writing more tests for

**Runtime validation** — superviseChild keeps SIGKILL escalation armed after the leader exits on SIGTERM while a descendant ignores SIGTERM. Leaf and fixture coverage is strong for the intended consolidation, but the central lifecycle helper needs one more runtime-style regression for descendant-only process-group cleanup because the current timeout test only covers a leader that ignores SIGTERM.
**Runtime validation** — runSandboxCmd preserves argv with shell metacharacters after the temp-script to bash -c conversion. Leaf and fixture coverage is strong for the intended consolidation, but the central lifecycle helper needs one more runtime-style regression for descendant-only process-group cleanup because the current timeout test only covers a leader that ignores SIGTERM.
**Acceptance clause:** `test/e2e-scenario/framework/shell-probe.ts` is a bridge-only host probe while the stack moves existing E2E scenarios onto Vitest fixtures. It now mirrors the hardened test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner #4380 behavior for trusted command descriptors, NUL-byte rejection, explicit env by default, canonical redaction, and process-group cleanup, but the end state should not keep a separate spawn boundary. — add test evidence or identify existing coverage. `shell-probe.ts` now imports `superviseChild` and `TrustedShellCommand` from shared `framework/shell/*`, removing the local trusted-command and lifecycle implementations. However, the new shared supervisor can cancel SIGKILL escalation after leader close, so the process-group cleanup portion needs correction.
**Acceptance clause:** Preserve detached process-group timeout/abort cleanup and the trusted/NUL-validated command contract. — add test evidence or identify existing coverage. Call sites set `detached: true`, NUL validation is centralized in `validateShellToken`, and tests cover leader-level timeout/abort escalation. The cleanup guarantee is incomplete because `superviseChild` clears the SIGKILL escalation timer on leader `close`, which can leave descendant processes alive.
**Acceptance clause:** Related open PRs: — add test evidence or identify existing coverage. Comment context only; deterministic overlap data shows active related E2E PRs touching some of the same files, especially test(e2e): typed-shell-runner cutover (parity → retirement) #5106.
**Acceptance clause:** [test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner #4380 test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner](https://github.com/NVIDIA/NemoClaw/pull/4380\) — add test evidence or identify existing coverage. The patch preserves hardened shell assertion concepts such as real spawn, redaction, NUL rejection, and process groups, but the new supervisor’s early SIGKILL timer cancellation needs fixing to fully preserve process-group cleanup behavior.
**Acceptance clause:** Related open issues: — add test evidence or identify existing coverage. Comment context only; no additional acceptance clauses were provided for the related issue in the deterministic context.
**test/e2e-scenario/framework/shell/supervisor.ts SIGKILL escalation lifecycle** — Add `superviseChild keeps SIGKILL escalation armed after the leader exits on SIGTERM while a descendant ignores SIGTERM`.. `settle()` unconditionally clears `killTimer` on child `close`, canceling the scheduled SIGKILL escalation.

Since last review details

Current findings:

Source-of-truth review needed: test/e2e-scenario/framework/shell/supervisor.ts SIGKILL escalation lifecycle: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `settle()` unconditionally clears `killTimer` on child `close`, canceling the scheduled SIGKILL escalation.
Keep SIGKILL escalation armed after the process-group leader closes (test/e2e-scenario/framework/shell/supervisor.ts:139): The linked issue requires preserving detached process-group timeout/abort cleanup, but `settle()` clears the pending SIGKILL timer as soon as the supervised child emits `close`. That can regress the previous per-call-site cleanup behavior when the process-group leader exits on SIGTERM while a descendant in the same group ignores SIGTERM: the promise resolves, the SIGKILL escalation is canceled, and the descendant can keep running. The new test covers a leader that ignores SIGTERM, but not the descendant-only case.
- Recommendation: Do not cancel the escalation timer just because the leader closed after timeout/abort, or otherwise send a final process-group SIGKILL before resolving when termination was started. Add a regression test where the leader exits on SIGTERM while a child process in the detached group ignores SIGTERM, then assert the descendant is reaped after the grace period.
- Evidence: `terminate()` arms `killTimer` for SIGKILL, but `settle()` unconditionally runs `if (killTimer) clearTimeout(killTimer);` on `close`. Prior inline implementations either left the escalation timer armed after termination started or used an untracked nested timer, so this contradicts the acceptance clause: “Preserve detached process-group timeout/abort cleanup and the trusted/NUL-validated command contract.”

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

test/e2e-scenario/scenarios/orchestrators/phase.ts (1)

174-181: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pass declared secret values into redactString.

Line 203/206 and Line 453/456 only run the regex/context redactors. Since these children still receive action.secretEnv / step.secretEnv, a probe that echoes one of those values verbatim can now leak it into the evidence log and stderrTail unchanged.

🔧 Minimal fix

+    const explicitRedactions = (action.secretEnv ?? [])
+      .map((key) => env[key])
+      .filter((value): value is string => typeof value === "string" && value.length > 0);
+
     const supervised = await superviseChild(child, {
       timeoutMs: timeoutSeconds * 1_000,
       onStdout: (chunk) => {
-        logStream.write(redactString(chunk));
+        logStream.write(redactString(chunk, explicitRedactions));
       },
       onStderr: (chunk) => {
-        const redacted = redactString(chunk);
+        const redacted = redactString(chunk, explicitRedactions);
         logStream.write(redacted);
         stderrTail = (stderrTail + redacted).slice(-4096);
       },
     });

Mirror the same change in runShellStep using step.secretEnv.

Also applies to: 202-208, 390-397, 452-458

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/scenarios/orchestrators/phase.ts` around lines 174 - 181,
The regex/context-only redaction calls need to also consider declared secret
values so verbatim secrets from child processes aren't leaked; update
runShellStep to pass step.secretEnv into redactString (just like the change
already applied for action.secretEnv when building env) wherever you produce
evidence logs or stderrTail using redactString, ensuring both action.secretEnv
and step.secretEnv are forwarded to redactString in the code paths that create
evidence/log entries; search for runShellStep, redactString, step.secretEnv and
update the redaction calls to include the secret list.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-scenario/framework/shell/supervisor.ts`:
- Around line 101-114: The abort path can be misreported as a timeout because
the wall-clock timeout remains active after opts.signal triggers; update the
abort handling so that onAbort clears the timeout (clearTimeout(timeout)) before
calling terminate and ensure that terminate (or the surrounding logic that sets
timedOut) is not setting timedOut when termination was caused by opts.signal;
specifically modify the onAbort handler and/or terminate to clear the timer and
propagate an explicit abort reason (or a boolean) so timedOut remains false for
external cancels (references: timeout, timedOut, terminate, onAbort,
opts.signal).

In `@test/e2e-scenario/scenarios/orchestrators/phase.ts`:
- Around line 183-184: validateShellToken can throw on NUL bytes and those
exceptions currently escape runAction / runShellStep (seen where bashArgs are
mapped and where runShellStep is called), aborting the entire phase; wrap the
validateShellToken call(s) inside runShellStep/runAction so any thrown error is
caught and converted into a structured failure object with type "runner-infra"
(include the original error message in the failure details) and return that
result instead of letting the exception bubble to run()/runStep(); update both
call sites (the bashArgs mapping and the other occurrence around runShellStep)
to follow this pattern and ensure callers handle the structured failure return.

In `@test/e2e-scenario/scenarios/probes/diagnostics.ts`:
- Around line 89-95: When supervised.spawnError is present, don't collapse it
into a normal child exit path; instead record the spawn failure separately (e.g.
set a boolean like infraFailure/runnerInfra or a distinct sentinel variable)
while still appending the message to stderrTail, and leave exitCode
unset/nullable (or set to a distinct sentinel) so later logic that inspects
exitCode !== 0 can still differentiate a real child exit from a runner-infra
spawn failure; update the code around supervised.spawnError, exitCode and any
later classification logic that checks exitCode to use the new
infraFailure/runnerInfra signal when classifying failures.

In `@test/e2e-scenario/scenarios/probes/docs-validation.ts`:
- Around line 78-88: The runCheck() path in docs-validation.ts currently masks
supervised.spawnError by converting it into exitCode: 127; update runCheck() and
the DocsCheckResult shape so it returns the original spawn error instead of an
artificial exit code: add/return a spawnError (or error) field on
DocsCheckResult when supervised.spawnError is present (preserve
stderrTail/stdoutTail and elapsedMs), and ensure any callers of
runCheck()/DocsCheckResult (the docs check runner/runner-infra classifier)
inspect this new spawnError field rather than treating exitCode 127 as a normal
script failure; update type/interface for DocsCheckResult and usages that
construct/consume it accordingly (look for runCheck and DocsCheckResult in
docs-validation.ts and consumer code).

In `@test/e2e-scenario/scenarios/probes/util.ts`:
- Around line 126-141: The returned command result currently omits the
supervised.timedOut flag, so update both return paths (the spawnError branch and
the normal branch around supervised.spawnError, supervised.exitCode,
supervised.signal) to include timedOut: supervised.timedOut (or false if absent)
so callers can detect timeouts reliably; search for superviseChild /
SuperviseResult and ensure the returned object mirrors timedOut from the
supervise result (and then update downstream callers to branch on
result.timedOut instead of inferring from signal).

---

Outside diff comments:
In `@test/e2e-scenario/scenarios/orchestrators/phase.ts`:
- Around line 174-181: The regex/context-only redaction calls need to also
consider declared secret values so verbatim secrets from child processes aren't
leaked; update runShellStep to pass step.secretEnv into redactString (just like
the change already applied for action.secretEnv when building env) wherever you
produce evidence logs or stderrTail using redactString, ensuring both
action.secretEnv and step.secretEnv are forwarded to redactString in the code
paths that create evidence/log entries; search for runShellStep, redactString,
step.secretEnv and update the redaction calls to include the secret list.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 73dd647b-2149-4482-834c-320b399ff340

📥 Commits

Reviewing files that changed from the base of the PR and between 38c03b1 and da50821.

📒 Files selected for processing (8)

test/e2e-scenario/framework-tests/e2e-shell-supervisor.test.ts
test/e2e-scenario/framework/shell-probe.ts
test/e2e-scenario/framework/shell/supervisor.ts
test/e2e-scenario/framework/shell/trusted-command.ts
test/e2e-scenario/scenarios/orchestrators/phase.ts
test/e2e-scenario/scenarios/probes/diagnostics.ts
test/e2e-scenario/scenarios/probes/docs-validation.ts
test/e2e-scenario/scenarios/probes/util.ts

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

Drop main-side reintroduction of inline child-process lifecycle in test/e2e-scenario/framework/shell-probe.ts and test/e2e-scenario/scenarios/probes/util.ts; the refactor routes both through framework/shell/supervisor.ts (process-group cleanup, SIGTERM -> SIGKILL escalation, timeout, AbortSignal) and NUL-byte validation through framework/shell/trusted-command.ts. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-06-10T06:49:40Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27258067885
Target ref: e14300899bd43ef45ed7601d437664a921f1d0bc
Workflow ref: main
Requested jobs: network-policy-e2e,shields-config-e2e,telegram-injection-e2e,diagnostics-e2e
Summary: 4 passed, 0 failed, 0 skipped

Job	Result
diagnostics-e2e	✅ success
network-policy-e2e	✅ success
shields-config-e2e	✅ success
telegram-injection-e2e	✅ success

refactor(test/e2e): consolidate fixture shell probe with scenario spa…

da50821

…wn infra Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng added the area: e2e End-to-end tests, nightly failures, or validation infrastructure label Jun 9, 2026

github-advanced-security AI found potential problems Jun 9, 2026

View reviewed changes

Comment thread test/e2e-scenario/framework/shell-probe.ts Fixed

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

fix(test/e2e): tighten review-flagged spawn-boundary contracts

a30ddf3

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng added the refactor PR restructures code without intended behavior change label Jun 9, 2026

laitingsheng added 3 commits June 9, 2026 16:10

Merge branch 'main' into refactor/e2e-shell-spawn-consolidation

c213718

style: apply biome format to supervisor module

efe3cd9

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng added the v0.0.63 Release target label Jun 10, 2026

Merge branch 'main' into refactor/e2e-shell-spawn-consolidation

e143008

cv approved these changes Jun 10, 2026

View reviewed changes

cv enabled auto-merge (squash) June 10, 2026 06:31

cv merged commit 602fde0 into main Jun 10, 2026
34 checks passed

cv deleted the refactor/e2e-shell-spawn-consolidation branch June 10, 2026 06:33

Conversation

laitingsheng commented Jun 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 10, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

laitingsheng commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading