Skip to content

refactor(test/e2e): consolidate shell-probe with scenario spawn infra#5033

Merged
cv merged 6 commits into
mainfrom
refactor/e2e-shell-spawn-consolidation
Jun 10, 2026
Merged

refactor(test/e2e): consolidate shell-probe with scenario spawn infra#5033
cv merged 6 commits into
mainfrom
refactor/e2e-shell-spawn-consolidation

Conversation

@laitingsheng

@laitingsheng laitingsheng commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Consolidate the TypeScript spawn boundaries shared between the Vitest fixture layer and the scenario orchestrator into a single framework module. Every TS spawn site now reaches the same detached process-group cleanup, SIGTERM -> SIGKILL escalation, and NUL-byte argv guard, and the validation_suites/sandbox-exec.sh transport stays centralised for in-sandbox commands.

Related Issue

Resolves #4988.

Changes

  • New shared modules under test/e2e-scenario/framework/shell/: a lifecycle-only superviseChild (timeout, abort, SIGTERM -> SIGKILL on a detached process group, chunk callbacks) and trustedShellCommand / validateShellToken (NUL-byte argv guard, brand symbol).
  • framework/shell-probe.ts, scenarios/orchestrators/phase.ts (runAction + runShellStep), and the probe helpers (probes/util.ts, probes/docs-validation.ts, probes/diagnostics.ts) now spawn at their own site (preserving each site's argv contract and CodeQL marker) and hand the child to the shared supervisor.
  • Probe helpers (spawnBash, runHostCmd, docs / diagnostics probes) gain detached process-group cleanup and SIGKILL escalation; runHostCmd gains the NUL-byte argv guard.
  • Adds framework-tests/e2e-shell-supervisor.test.ts to cover the headline guarantees at the leaf level; existing e2e-fixture-context.test.ts and e2e-phase-orchestrators.test.ts continue to cover end-to-end behaviour.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • `npx prek run --all-files` passes
  • `npm test` passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • `npm run docs` builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

  • New Features

    • Centralized shell supervision for e2e flows: robust timeouts, escalation, cancellation, and streamed stdout/stderr.
    • Shared trusted-command validation to reject unsafe argv tokens (e.g., NUL bytes) and require a reason.
  • Tests

    • Added and updated e2e tests covering supervision, timeout/escalation, spawn failures, stdin/stdout forwarding, and validation.
  • Bug Fixes

    • More consistent timeout/spawn-error classification and improved redaction in failure reporting.

…wn infra

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@laitingsheng laitingsheng added the area: e2e End-to-end tests, nightly failures, or validation infrastructure label Jun 9, 2026
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 281105e9-4518-4598-ab63-c6422d096515

📥 Commits

Reviewing files that changed from the base of the PR and between a30ddf3 and c213718.

📒 Files selected for processing (1)
  • test/e2e-scenario/framework-tests/e2e-phase-orchestrators.test.ts

📝 Walkthrough

Walkthrough

Extracts trusted-shell token validation and a superviseChild helper; migrates shell-probe, phase orchestrator, probe utilities, and probes to validate argv tokens and use the shared supervisor for detached process-group management, timeout escalation, abort handling, and stdout/stderr streaming.

Changes

Unified Shell Infrastructure Consolidation

Layer / File(s) Summary
Trusted shell command brand and validation
test/e2e-scenario/framework/shell/trusted-command.ts, test/e2e-scenario/framework-tests/e2e-shell-supervisor.test.ts, test/e2e-scenario/framework-tests/e2e-phase-orchestrators.test.ts
Branded TrustedShellCommand and TrustedShellCommandInput, validateShellToken rejects NUL bytes, and trustedShellCommand factory enforces trimmed/non-empty command and reason; tests cover NUL rejection, clean tokens, required reason, and validate-hook propagation.
Child process supervision lifecycle
test/e2e-scenario/framework/shell/supervisor.ts, test/e2e-scenario/framework-tests/e2e-shell-supervisor.test.ts
superviseChild implements SuperviseOptions/SuperviseResult: wall-clock timeout, SIGTERM→SIGKILL escalation after grace, AbortSignal-aware cancellation (keeps timedOut false), stdin forwarding, stdout/stderr chunk callbacks, and spawnError reporting; tests validate exit codes, stream capture, timeout escalation timing, abort behavior, spawn errors, and stdin piping.
Shell probe migration to shared infrastructure
test/e2e-scenario/framework/shell-probe.ts
Re-exports trusted-command types and factory; replaces manual process-group tracking, stdout/stderr listeners, and local timers with a single superviseChild call and uses supervised outcomes for artifact/error construction.
Phase orchestrator execution via supervisor
test/e2e-scenario/scenarios/orchestrators/phase.ts
runAction and runShellStep validate all bash/script tokens via validateShellToken, spawn validated argv, delegate lifecycle and stdio to superviseChild, redact explicit secret values verbatim, await log flush, and map superviseChild outcomes to action/step results.
Probe utility helpers refactored
test/e2e-scenario/scenarios/probes/util.ts
spawnBash and runHostCmd now validate argv with validateShellToken, spawn detached children, delegate timeout/lifecycle/stdio to superviseChild, add timedOut to CmdResult, and remove the local NUL helper.
Diagnostics probe integration
test/e2e-scenario/scenarios/probes/diagnostics.ts
Validates nemoclaw argv, uses superviseChild for supervision and stderr capture, extends evidence with timedOut and spawnError, and returns early on spawn errors/timeouts.
Docs validation probe integration
test/e2e-scenario/scenarios/probes/docs-validation.ts
runCheck now validates script/args with validateShellToken, uses superviseChild, extends results with timedOut/spawnError, and classifies timeouts/spawn errors using the new flags.
Probe timeout classifier updates
test/e2e-scenario/scenarios/probes/*
Multiple probes updated to classify transient failures using timedOut instead of checking signal === "SIGTERM".

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4965: Fixture tests validate the same trustedShellCommand NUL-byte/reason guardrails and superviseChild lifecycle behaviors that this PR extracts into shared framework modules.

Suggested labels

v0.0.62

Suggested reviewers

  • jyaunches
  • cv

Poem

🐇 I dug a tunnel, found a shell,

Trimmed its args and guarded well,
SIGTERM tolled and SIGKILL came after,
Streams fed back, and secrets went softer,
Hopping home with logs all swell.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: consolidating shell-probe infrastructure with scenario spawn infrastructure into shared framework modules.
Linked Issues check ✅ Passed The PR fully implements all acceptance criteria from #4988: extracts shared shell/spawn infrastructure (superviseChild, trustedShellCommand, validateShellToken) into framework/shell modules [#4988], preserves sandbox-exec transport centralization [#4988], maintains detached process-group lifecycle handling with timeout/abort/SIGTERM→SIGKILL [#4988], keeps trusted command contract with NUL-byte rejection and branded descriptors [#4988], and preserves single artifact/evidence path with canonical redaction [#4988].
Out of Scope Changes check ✅ Passed All changes are directly scoped to the objectives: new shell framework modules (supervisor.ts, trusted-command.ts), refactored spawn sites (shell-probe.ts, phase.ts, util.ts, probe helpers), and updated tests. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/e2e-shell-spawn-consolidation

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread test/e2e-scenario/framework/shell-probe.ts Fixed
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: e2e-scenarios:ubuntu-repo-cloud-openclaw, e2e-vitest-scenarios:ubuntu-repo-cloud-openclaw, network-policy-e2e, shields-config-e2e, telegram-injection-e2e, diagnostics-e2e
Optional E2E: docs-validation-e2e, cloud-e2e

Dispatch hint: network-policy-e2e,shields-config-e2e,telegram-injection-e2e,diagnostics-e2e

Auto-dispatched E2E: network-policy-e2e, shields-config-e2e, telegram-injection-e2e, diagnostics-e2e via nightly-e2e.yaml at e14300899bd43ef45ed7601d437664a921f1d0bcnightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: medium

Required E2E

  • e2e-scenarios:ubuntu-repo-cloud-openclaw (high; live cloud onboarding plus runtime validation, up to 90 minutes): Runs the typed Scenario Runner path that consumes PhaseOrchestrator.runAction/runShellStep for install, onboarding, state-validation, and runtime assertions. Required because the PR changes shared spawn supervision, timeout handling, log flushing, and secret redaction at this boundary.
  • e2e-vitest-scenarios:ubuntu-repo-cloud-openclaw (high; live scenario fixture run, workflow timeout 45 minutes per scenario): Exercises the Vitest live fixture stack that uses ShellProbe and trustedShellCommand for host CLI, gateway, sandbox, and provider checks. Required because ShellProbe now delegates lifecycle cleanup and spawn result handling to the new supervisor.
  • network-policy-e2e (medium-high; live sandbox and gateway policy check): Validates the real network-policy deny boundary. Required because the PR changes the TypeScript network policy probe and shared sandbox command execution/timeout semantics used by security probe logic.
  • shields-config-e2e (medium-high; live sandbox security posture check, workflow timeout 30 minutes): Validates shields status and protected config permissions in a real sandbox. Required because the PR changes shields probe timeout classification and the shared host/sandbox command runner it depends on.
  • telegram-injection-e2e (high; live sandbox plus Telegram injection validation, workflow timeout 60 minutes): Validates command-injection resistance in a real assistant/messaging flow. Required because the PR changes injection-blocked probe timeout classification and shared sandbox command execution used for injection payload checks.
  • diagnostics-e2e (medium; live CLI diagnostics workflow): Validates the real diagnostics bundle flow. Required because the PR changes diagnostics probe spawning, timeout handling, spawn-error classification, and evidence shape for debug bundle production.

Optional E2E

  • docs-validation-e2e (low; no sandbox required, workflow timeout 15 minutes): Useful confidence for the docsValidationProbe changes to bash supervision, timeout classification, and spawn-error evidence. Not merge-blocking for this PR because the product docs/CLI surfaces themselves were not changed.
  • cloud-e2e (high; full cloud E2E with sandbox lifecycle): Broad legacy end-to-end smoke for install, onboarding, sandbox, and inference that can catch regressions not covered by the typed scenario runner recommendation. Optional because the direct code changes are in the E2E TypeScript framework/probes, not product runtime code.

New E2E recommendations

  • typed-security-probe-coverage (high): Existing nightly jobs cover the product security behavior, but the changed TypeScript built-in probes under test/e2e-scenario/scenarios/probes are not clearly exercised by any canonical live typed scenario because the current baseline scenarios do not include security-shields, security-policy, or security-injection suites.
    • Suggested test: Add a dispatchable typed scenario, for example ubuntu-repo-cloud-openclaw-security-probes, that includes security-shields, security-policy, and security-injection suites and runs through E2E / Scenario Runner.
  • typed-diagnostics-docs-probe-coverage (medium): diagnosticsProbe and docsValidationProbe are registered in the typed assertion registry, but there is no obvious canonical live typed scenario dedicated to exercising those probe refs after changes to their spawn supervision and evidence shapes.
    • Suggested test: Add a lightweight typed probe-validation scenario or workflow input that runs the diagnostics and docs-validation suites through test/e2e-scenario/scenarios/run.ts.
  • advisor-dispatch-coverage-for-scenario-workflows (medium): The dispatch hint can target nightly-e2e jobs, but this PR also needs scenario-runner and Vitest-scenario workflows to validate the changed TypeScript scenario framework directly.
    • Suggested test: Extend E2E advisor dispatch support to trigger e2e-scenarios.yaml and e2e-vitest-scenarios.yaml with a scenario input such as ubuntu-repo-cloud-openclaw.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: network-policy-e2e,shields-config-e2e,telegram-injection-e2e,diagnostics-e2e

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • e2e-scenarios-all: Shared scenario execution infrastructure changed: the shell supervisor/trusted-command boundary, ShellProbe, PhaseOrchestrator, and multiple built-in probe helpers now share new spawn timeout, process-group cleanup, NUL-byte validation, and timeout classification behavior. These paths are exercised across scenario phases and suites, so the full scenario fan-out is required.
    • Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

  • None.

Relevant changed files

  • test/e2e-scenario/framework-tests/e2e-phase-orchestrators.test.ts
  • test/e2e-scenario/framework-tests/e2e-shell-supervisor.test.ts
  • test/e2e-scenario/framework/shell-probe.ts
  • test/e2e-scenario/framework/shell/supervisor.ts
  • test/e2e-scenario/framework/shell/trusted-command.ts
  • test/e2e-scenario/scenarios/orchestrators/phase.ts
  • test/e2e-scenario/scenarios/probes/diagnostics.ts
  • test/e2e-scenario/scenarios/probes/docs-validation.ts
  • test/e2e-scenario/scenarios/probes/injection-blocked.ts
  • test/e2e-scenario/scenarios/probes/network-policy.ts
  • test/e2e-scenario/scenarios/probes/shields-config.ts
  • test/e2e-scenario/scenarios/probes/util.ts

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 1 needs attention, 1 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 0 still apply, 1 new item found

Review findings

🛠️ Needs attention

  • Keep SIGKILL escalation armed after the process-group leader closes (test/e2e-scenario/framework/shell/supervisor.ts:139): The linked issue requires preserving detached process-group timeout/abort cleanup, but `settle()` clears the pending SIGKILL timer as soon as the supervised child emits `close`. That can regress the previous per-call-site cleanup behavior when the process-group leader exits on SIGTERM while a descendant in the same group ignores SIGTERM: the promise resolves, the SIGKILL escalation is canceled, and the descendant can keep running. The new test covers a leader that ignores SIGTERM, but not the descendant-only case.
    • Recommendation: Do not cancel the escalation timer just because the leader closed after timeout/abort, or otherwise send a final process-group SIGKILL before resolving when termination was started. Add a regression test where the leader exits on SIGTERM while a child process in the detached group ignores SIGTERM, then assert the descendant is reaped after the grace period.
    • Evidence: `terminate()` arms `killTimer` for SIGKILL, but `settle()` unconditionally runs `if (killTimer) clearTimeout(killTimer);` on `close`. Prior inline implementations either left the escalation timer armed after termination started or used an untracked nested timer, so this contradicts the acceptance clause: “Preserve detached process-group timeout/abort cleanup and the trusted/NUL-validated command contract.”

🔎 Worth checking

  • Source-of-truth review needed: test/e2e-scenario/framework/shell/supervisor.ts SIGKILL escalation lifecycle: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `settle()` unconditionally clears `killTimer` on child `close`, canceling the scheduled SIGKILL escalation.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — superviseChild keeps SIGKILL escalation armed after the leader exits on SIGTERM while a descendant ignores SIGTERM. Leaf and fixture coverage is strong for the intended consolidation, but the central lifecycle helper needs one more runtime-style regression for descendant-only process-group cleanup because the current timeout test only covers a leader that ignores SIGTERM.
  • **Runtime validation** — runSandboxCmd preserves argv with shell metacharacters after the temp-script to bash -c conversion. Leaf and fixture coverage is strong for the intended consolidation, but the central lifecycle helper needs one more runtime-style regression for descendant-only process-group cleanup because the current timeout test only covers a leader that ignores SIGTERM.
  • **Acceptance clause:** `test/e2e-scenario/framework/shell-probe.ts` is a bridge-only host probe while the stack moves existing E2E scenarios onto Vitest fixtures. It now mirrors the hardened test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner #4380 behavior for trusted command descriptors, NUL-byte rejection, explicit env by default, canonical redaction, and process-group cleanup, but the end state should not keep a separate spawn boundary. — add test evidence or identify existing coverage. `shell-probe.ts` now imports `superviseChild` and `TrustedShellCommand` from shared `framework/shell/*`, removing the local trusted-command and lifecycle implementations. However, the new shared supervisor can cancel SIGKILL escalation after leader close, so the process-group cleanup portion needs correction.
  • **Acceptance clause:** Preserve detached process-group timeout/abort cleanup and the trusted/NUL-validated command contract. — add test evidence or identify existing coverage. Call sites set `detached: true`, NUL validation is centralized in `validateShellToken`, and tests cover leader-level timeout/abort escalation. The cleanup guarantee is incomplete because `superviseChild` clears the SIGKILL escalation timer on leader `close`, which can leave descendant processes alive.
  • **Acceptance clause:** Related open PRs: — add test evidence or identify existing coverage. Comment context only; deterministic overlap data shows active related E2E PRs touching some of the same files, especially test(e2e): typed-shell-runner cutover (parity → retirement) #5106.
  • **Acceptance clause:** [test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner #4380 test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner](https://github.com/NVIDIA/NemoClaw/pull/4380\) — add test evidence or identify existing coverage. The patch preserves hardened shell assertion concepts such as real spawn, redaction, NUL rejection, and process groups, but the new supervisor’s early SIGKILL timer cancellation needs fixing to fully preserve process-group cleanup behavior.
  • **Acceptance clause:** Related open issues: — add test evidence or identify existing coverage. Comment context only; no additional acceptance clauses were provided for the related issue in the deterministic context.
  • **test/e2e-scenario/framework/shell/supervisor.ts SIGKILL escalation lifecycle** — Add `superviseChild keeps SIGKILL escalation armed after the leader exits on SIGTERM while a descendant ignores SIGTERM`.. `settle()` unconditionally clears `killTimer` on child `close`, canceling the scheduled SIGKILL escalation.
Since last review details

Current findings:

  • Source-of-truth review needed: test/e2e-scenario/framework/shell/supervisor.ts SIGKILL escalation lifecycle: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `settle()` unconditionally clears `killTimer` on child `close`, canceling the scheduled SIGKILL escalation.
  • Keep SIGKILL escalation armed after the process-group leader closes (test/e2e-scenario/framework/shell/supervisor.ts:139): The linked issue requires preserving detached process-group timeout/abort cleanup, but `settle()` clears the pending SIGKILL timer as soon as the supervised child emits `close`. That can regress the previous per-call-site cleanup behavior when the process-group leader exits on SIGTERM while a descendant in the same group ignores SIGTERM: the promise resolves, the SIGKILL escalation is canceled, and the descendant can keep running. The new test covers a leader that ignores SIGTERM, but not the descendant-only case.
    • Recommendation: Do not cancel the escalation timer just because the leader closed after timeout/abort, or otherwise send a final process-group SIGKILL before resolving when termination was started. Add a regression test where the leader exits on SIGTERM while a child process in the detached group ignores SIGTERM, then assert the descendant is reaped after the grace period.
    • Evidence: `terminate()` arms `killTimer` for SIGKILL, but `settle()` unconditionally runs `if (killTimer) clearTimeout(killTimer);` on `close`. Prior inline implementations either left the escalation timer armed after termination started or used an untracked nested timer, so this contradicts the acceptance clause: “Preserve detached process-group timeout/abort cleanup and the trusted/NUL-validated command contract.”

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/e2e-scenario/scenarios/orchestrators/phase.ts (1)

174-181: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pass declared secret values into redactString.

Line 203/206 and Line 453/456 only run the regex/context redactors. Since these children still receive action.secretEnv / step.secretEnv, a probe that echoes one of those values verbatim can now leak it into the evidence log and stderrTail unchanged.

🔧 Minimal fix
+    const explicitRedactions = (action.secretEnv ?? [])
+      .map((key) => env[key])
+      .filter((value): value is string => typeof value === "string" && value.length > 0);
+
     const supervised = await superviseChild(child, {
       timeoutMs: timeoutSeconds * 1_000,
       onStdout: (chunk) => {
-        logStream.write(redactString(chunk));
+        logStream.write(redactString(chunk, explicitRedactions));
       },
       onStderr: (chunk) => {
-        const redacted = redactString(chunk);
+        const redacted = redactString(chunk, explicitRedactions);
         logStream.write(redacted);
         stderrTail = (stderrTail + redacted).slice(-4096);
       },
     });

Mirror the same change in runShellStep using step.secretEnv.

Also applies to: 202-208, 390-397, 452-458

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/scenarios/orchestrators/phase.ts` around lines 174 - 181,
The regex/context-only redaction calls need to also consider declared secret
values so verbatim secrets from child processes aren't leaked; update
runShellStep to pass step.secretEnv into redactString (just like the change
already applied for action.secretEnv when building env) wherever you produce
evidence logs or stderrTail using redactString, ensuring both action.secretEnv
and step.secretEnv are forwarded to redactString in the code paths that create
evidence/log entries; search for runShellStep, redactString, step.secretEnv and
update the redaction calls to include the secret list.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-scenario/framework/shell/supervisor.ts`:
- Around line 101-114: The abort path can be misreported as a timeout because
the wall-clock timeout remains active after opts.signal triggers; update the
abort handling so that onAbort clears the timeout (clearTimeout(timeout)) before
calling terminate and ensure that terminate (or the surrounding logic that sets
timedOut) is not setting timedOut when termination was caused by opts.signal;
specifically modify the onAbort handler and/or terminate to clear the timer and
propagate an explicit abort reason (or a boolean) so timedOut remains false for
external cancels (references: timeout, timedOut, terminate, onAbort,
opts.signal).

In `@test/e2e-scenario/scenarios/orchestrators/phase.ts`:
- Around line 183-184: validateShellToken can throw on NUL bytes and those
exceptions currently escape runAction / runShellStep (seen where bashArgs are
mapped and where runShellStep is called), aborting the entire phase; wrap the
validateShellToken call(s) inside runShellStep/runAction so any thrown error is
caught and converted into a structured failure object with type "runner-infra"
(include the original error message in the failure details) and return that
result instead of letting the exception bubble to run()/runStep(); update both
call sites (the bashArgs mapping and the other occurrence around runShellStep)
to follow this pattern and ensure callers handle the structured failure return.

In `@test/e2e-scenario/scenarios/probes/diagnostics.ts`:
- Around line 89-95: When supervised.spawnError is present, don't collapse it
into a normal child exit path; instead record the spawn failure separately (e.g.
set a boolean like infraFailure/runnerInfra or a distinct sentinel variable)
while still appending the message to stderrTail, and leave exitCode
unset/nullable (or set to a distinct sentinel) so later logic that inspects
exitCode !== 0 can still differentiate a real child exit from a runner-infra
spawn failure; update the code around supervised.spawnError, exitCode and any
later classification logic that checks exitCode to use the new
infraFailure/runnerInfra signal when classifying failures.

In `@test/e2e-scenario/scenarios/probes/docs-validation.ts`:
- Around line 78-88: The runCheck() path in docs-validation.ts currently masks
supervised.spawnError by converting it into exitCode: 127; update runCheck() and
the DocsCheckResult shape so it returns the original spawn error instead of an
artificial exit code: add/return a spawnError (or error) field on
DocsCheckResult when supervised.spawnError is present (preserve
stderrTail/stdoutTail and elapsedMs), and ensure any callers of
runCheck()/DocsCheckResult (the docs check runner/runner-infra classifier)
inspect this new spawnError field rather than treating exitCode 127 as a normal
script failure; update type/interface for DocsCheckResult and usages that
construct/consume it accordingly (look for runCheck and DocsCheckResult in
docs-validation.ts and consumer code).

In `@test/e2e-scenario/scenarios/probes/util.ts`:
- Around line 126-141: The returned command result currently omits the
supervised.timedOut flag, so update both return paths (the spawnError branch and
the normal branch around supervised.spawnError, supervised.exitCode,
supervised.signal) to include timedOut: supervised.timedOut (or false if absent)
so callers can detect timeouts reliably; search for superviseChild /
SuperviseResult and ensure the returned object mirrors timedOut from the
supervise result (and then update downstream callers to branch on
result.timedOut instead of inferring from signal).

---

Outside diff comments:
In `@test/e2e-scenario/scenarios/orchestrators/phase.ts`:
- Around line 174-181: The regex/context-only redaction calls need to also
consider declared secret values so verbatim secrets from child processes aren't
leaked; update runShellStep to pass step.secretEnv into redactString (just like
the change already applied for action.secretEnv when building env) wherever you
produce evidence logs or stderrTail using redactString, ensuring both
action.secretEnv and step.secretEnv are forwarded to redactString in the code
paths that create evidence/log entries; search for runShellStep, redactString,
step.secretEnv and update the redaction calls to include the secret list.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 73dd647b-2149-4482-834c-320b399ff340

📥 Commits

Reviewing files that changed from the base of the PR and between 38c03b1 and da50821.

📒 Files selected for processing (8)
  • test/e2e-scenario/framework-tests/e2e-shell-supervisor.test.ts
  • test/e2e-scenario/framework/shell-probe.ts
  • test/e2e-scenario/framework/shell/supervisor.ts
  • test/e2e-scenario/framework/shell/trusted-command.ts
  • test/e2e-scenario/scenarios/orchestrators/phase.ts
  • test/e2e-scenario/scenarios/probes/diagnostics.ts
  • test/e2e-scenario/scenarios/probes/docs-validation.ts
  • test/e2e-scenario/scenarios/probes/util.ts

Comment thread test/e2e-scenario/framework/shell/supervisor.ts
Comment thread test/e2e-scenario/scenarios/orchestrators/phase.ts Outdated
Comment thread test/e2e-scenario/scenarios/probes/diagnostics.ts Outdated
Comment thread test/e2e-scenario/scenarios/probes/docs-validation.ts
Comment thread test/e2e-scenario/scenarios/probes/util.ts
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@laitingsheng laitingsheng added the refactor PR restructures code without intended behavior change label Jun 9, 2026
Drop main-side reintroduction of inline child-process lifecycle in
test/e2e-scenario/framework/shell-probe.ts and
test/e2e-scenario/scenarios/probes/util.ts; the refactor routes both
through framework/shell/supervisor.ts (process-group cleanup, SIGTERM
-> SIGKILL escalation, timeout, AbortSignal) and NUL-byte validation
through framework/shell/trusted-command.ts.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@laitingsheng laitingsheng added the v0.0.63 Release target label Jun 10, 2026
@cv cv enabled auto-merge (squash) June 10, 2026 06:31
@cv cv merged commit 602fde0 into main Jun 10, 2026
34 checks passed
@cv cv deleted the refactor/e2e-shell-spawn-consolidation branch June 10, 2026 06:33
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 27258067885
Target ref: e14300899bd43ef45ed7601d437664a921f1d0bc
Workflow ref: main
Requested jobs: network-policy-e2e,shields-config-e2e,telegram-injection-e2e,diagnostics-e2e
Summary: 4 passed, 0 failed, 0 skipped

Job Result
diagnostics-e2e ✅ success
network-policy-e2e ✅ success
shields-config-e2e ✅ success
telegram-injection-e2e ✅ success

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: e2e End-to-end tests, nightly failures, or validation infrastructure refactor PR restructures code without intended behavior change v0.0.63 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consolidate E2E fixture shell probe with scenario spawn infra

4 participants