fix(sandbox): grant /dev/pts so OpenClaw tmux-session flow can allocate a PTY (#4513)#5019
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds ChangesPTY Allocation Permission Fix for Tmux Session Flow
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…te a PTY (NVIDIA#4513) PR NVIDIA#4606 installed tmux in the sandbox image, but the bundled tmux-session flow still failed with `create window failed: fork failed: Permission denied` (QA reopened NVIDIA#4513; NVIDIA#4640 then degraded the e2e drive to a soft skip). Root cause is not fork/seccomp/nproc — plain fork() works and nproc is nearly empty. It is PTY allocation: the OpenShell sandbox landlock filesystem policy grants /dev/null and /dev/urandom but never /dev/pts, so forkpty() opening /dev/ptmx (-> /dev/pts/ptmx) and a /dev/pts/<n> slave is denied with EACCES. tmux surfaces that EACCES as "fork failed: Permission denied". Grant /dev/pts (read_write) in the OpenClaw base + permissive policies. Grant the directory, not /dev/ptmx — the supervisor refuses to chown the symlink, and the devpts directory grant covers ptmx + slaves via the landlock path hierarchy. This follows the same base+permissive pattern as the prior /home/linuxbrew (NVIDIA#3913) and GPU /proc additions, which permissive-runtime already unions on shields-down. Verified hermetically with the real openshell-sandbox supervisor + the actual edited policy: before, openpty()=EACCES and tmux new/list/kill fails; after, openpty() returns /dev/pts/0 and the full tmux lifecycle succeeds. Changes: - nemoclaw-blueprint/policies/openclaw-sandbox.yaml: add /dev/pts to filesystem_policy.read_write with the PTY rationale. - nemoclaw-blueprint/policies/openclaw-sandbox-permissive.yaml and agents/openclaw/policy-permissive.yaml: mirror /dev/pts so `shields down` never removes a live filesystem path (OpenShell rejects removals). - test/e2e/test-sandbox-operations.sh: restore TC-SBX-09 to a hard assertion (drop the NVIDIA#4640 fork-failure soft skip), add an explicit openpty() probe that pins the root cause, and fail loudly if devpts ever regresses. - test/runner.test.ts: assert both policies grant /dev/pts (and not /dev/ptmx) and that TC-SBX-09 no longer skips on fork failure. - docs/reference/network-policies.mdx, docs/security/best-practices.mdx: list /dev/pts as a default writable path with the tmux rationale. Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
9f01992 to
b75befd
Compare
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
nemoclaw-blueprint/policies/openclaw-sandbox.yaml (1)
34-44: Run the network-policy e2e job for this policy delta.Given this is a sandbox egress/SSRF-relevant policy change, please run the targeted
network-policy-e2eworkflow before merge to catch deny-by-default/hot-reload regressions early.As per coding guidelines, changes under
nemoclaw-blueprint/policies/**should runnetwork-policy-e2eselectively.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nemoclaw-blueprint/policies/openclaw-sandbox.yaml` around lines 34 - 44, This change to openclaw-sandbox.yaml touches sandbox egress/SSRF policy and must be validated by running the targeted network-policy-e2e workflow before merging; please trigger the selective network-policy-e2e job (the targeted e2e for changes under the policies blueprint) against this PR to catch deny-by-default and hot-reload regressions for the openclaw-sandbox.yaml policy delta and attach the passing job results to the review.Source: Coding guidelines
test/runner.test.ts (1)
927-939: ⚡ Quick winExtend PTY policy regression coverage to the agent permissive policy.
The assertion loop only covers two blueprint policy files, but this PR also changes
agents/openclaw/policy-permissive.yamlfor the same/dev/ptscontract. Add it here to keep the regression guard complete.Suggested patch
- const policyDir = path.join(repoRoot, "nemoclaw-blueprint", "policies"); - for (const policyFile of ["openclaw-sandbox.yaml", "openclaw-sandbox-permissive.yaml"]) { - it(`${policyFile} grants /dev/pts so PTY allocation (tmux) works`, () => { - const doc = YAML.parse(fs.readFileSync(path.join(policyDir, policyFile), "utf-8")); + for (const policyFile of [ + path.join("nemoclaw-blueprint", "policies", "openclaw-sandbox.yaml"), + path.join("nemoclaw-blueprint", "policies", "openclaw-sandbox-permissive.yaml"), + path.join("agents", "openclaw", "policy-permissive.yaml"), + ]) { + it(`${policyFile} grants /dev/pts so PTY allocation (tmux) works`, () => { + const doc = YAML.parse(fs.readFileSync(path.join(repoRoot, policyFile), "utf-8")); const readWrite: string[] = doc.filesystem_policy?.read_write ?? [];🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/runner.test.ts` around lines 927 - 939, The test iterates over only two policy files but misses the updated agent permissive policy, so add "agents/openclaw/policy-permissive.yaml" to the list passed to the for-loop (the array iterating as for (const policyFile of [...]) ) so the assertions that YAML.parse(fs.readFileSync(path.join(policyDir, policyFile), "utf-8")) produces a filesystem_policy.read_write array containing "/dev/pts" and not "/dev/ptmx" also run against that permissive policy; update the array of filenames used by the test (referenced as policyFile and policyDir, and using YAML.parse and read_write) to include the agent permissive policy.docs/reference/network-policies.mdx (1)
31-33: ⚡ Quick winAdjust docs style for the PTY explanation block.
This paragraph uses multiple em dashes and splits one sentence across multiple source lines. LLM pattern detected.
As per coding guidelines, use one sentence per line and flag multiple em dashes per paragraph.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/reference/network-policies.mdx` around lines 31 - 33, The PTY explanation for `/dev/pts` should be reformatted to follow the one-sentence-per-line rule and avoid multiple em dashes: rewrite the paragraph so each sentence occupies its own source line, remove or replace the em dashes in the sentence mentioning `tmux`, `script`, and interactive shells (use commas or a parenthetical phrase instead), and ensure the explanation about failure mode (`fork failed: Permission denied`) remains a separate sentence for clarity; update the `/dev/pts` paragraph accordingly in the network-policies PTY explanation block.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@docs/reference/network-policies.mdx`:
- Around line 31-33: The PTY explanation for `/dev/pts` should be reformatted to
follow the one-sentence-per-line rule and avoid multiple em dashes: rewrite the
paragraph so each sentence occupies its own source line, remove or replace the
em dashes in the sentence mentioning `tmux`, `script`, and interactive shells
(use commas or a parenthetical phrase instead), and ensure the explanation about
failure mode (`fork failed: Permission denied`) remains a separate sentence for
clarity; update the `/dev/pts` paragraph accordingly in the network-policies PTY
explanation block.
In `@nemoclaw-blueprint/policies/openclaw-sandbox.yaml`:
- Around line 34-44: This change to openclaw-sandbox.yaml touches sandbox
egress/SSRF policy and must be validated by running the targeted
network-policy-e2e workflow before merging; please trigger the selective
network-policy-e2e job (the targeted e2e for changes under the policies
blueprint) against this PR to catch deny-by-default and hot-reload regressions
for the openclaw-sandbox.yaml policy delta and attach the passing job results to
the review.
In `@test/runner.test.ts`:
- Around line 927-939: The test iterates over only two policy files but misses
the updated agent permissive policy, so add
"agents/openclaw/policy-permissive.yaml" to the list passed to the for-loop (the
array iterating as for (const policyFile of [...]) ) so the assertions that
YAML.parse(fs.readFileSync(path.join(policyDir, policyFile), "utf-8")) produces
a filesystem_policy.read_write array containing "/dev/pts" and not "/dev/ptmx"
also run against that permissive policy; update the array of filenames used by
the test (referenced as policyFile and policyDir, and using YAML.parse and
read_write) to include the agent permissive policy.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 76839e37-5309-41b6-b896-9118c68d7ce4
📒 Files selected for processing (7)
agents/openclaw/policy-permissive.yamldocs/reference/network-policies.mdxdocs/security/best-practices.mdxnemoclaw-blueprint/policies/openclaw-sandbox-permissive.yamlnemoclaw-blueprint/policies/openclaw-sandbox.yamltest/e2e/test-sandbox-operations.shtest/runner.test.ts
…ne docs (NVIDIA#4513) Address CodeRabbit review on NVIDIA#5019: - test/runner.test.ts: the /dev/pts regression loop covered only the two nemoclaw-blueprint policies; add agents/openclaw/policy-permissive.yaml (also changed by this PR) so the guard is complete across all three OpenClaw policy files. - docs/reference/network-policies.mdx: reformat the /dev/pts explanation to one sentence per line and drop the em dashes, per the docs style guideline. Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
|
@coderabbitai Addressed the two actionable nitpicks in ab355ba:
On the |
|
Tip For best results, initiate chat on the files or code changes.
On the The PR looks good to merge from my side. |
|
@coderabbitai review |
✅ Action performedReview finished.
|
## Summary - Add v0.0.62 release notes from Discussion #5100 and link release highlights to the relevant docs pages. - Document the release's GPU sandbox recreation, sandbox-side local inference verification, and Hermes dashboard port guard in the command and inference references. - Refresh generated NemoClaw user skills for the release-prep docs set. ## Source Summary - #4956 -> `docs/reference/commands.mdx`: Document CDI-first Docker GPU recreation behavior for Linux Docker-driver sandboxes. - #5024 -> `docs/inference/use-local-inference.mdx`: Document sandbox-runtime verification of the `inference.local` local inference route. - #5018 -> `docs/reference/commands.mdx`: Document Jetson/Tegra device-node group propagation for sandbox CUDA initialization. - #5012, #4763, #4706, #5030, #5015 -> `docs/about/release-notes.mdx`: Summarize onboarding and recovery reliability fixes, including the reserved Hermes API port guard. - #5017 and #5043 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Summarize mutable OpenClaw config recovery and host-side `agents list` coverage. - #5010 and #5016 -> `docs/about/release-notes.mdx`: Summarize Hermes upstream metadata visibility and WhatsApp QR rendering reliability. - #5045 and prior source docs in the v0.0.62 range -> `.agents/skills/`: Refresh generated user-skill references from the current docs source. ## Skipped - #5019 -> skipped for new prose because it touched `openclaw-sandbox-permissive.yaml`, which matches `docs/.docs-skip`. Existing source docs remain the source for generated skill synchronization. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` (passes; Fern reports 0 errors and 1 hidden warning) - Pre-commit hooks passed during commit, including docs-to-skills verification, markdown lint, gitleaks, and skills YAML tests. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added `nemoclaw <name> agents list` command. * v0.0.62 release notes added summarizing onboarding and recovery improvements. * **Bug Fixes** * Improved GPU sandbox onboarding reliability (NVIDIA CDI path, Jetson/Tegra device handling). * Better local inference verification and recovery for Linux Docker-driver GPU sandboxes. * Quieter/earlier handling of onboarding drift and port collisions. * **Documentation** * Expanded GPU passthrough, inference verification, writable paths (`/dev/pts`), port 8642 restriction, and command examples. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>
Summary
The reopened #4513: PR #4606 installed
tmuxin the sandbox image, but OpenClaw's bundled tmux-session flow still failed withcreate window failed: fork failed: Permission denied, and #4640 then degraded the e2e drive to a soft skip. This grants/dev/ptsin the OpenClaw sandbox landlock filesystem policy so the flow can actually allocate a PTY, and restores the e2e to a hard assertion.Related Issue
Fixes #4513
Changes
Root cause (diagnosed, not assumed): it is not fork/seccomp/nproc — plain
fork()works andnproc(512) is nearly empty. It is PTY allocation. The OpenShell sandbox landlockfilesystem_policygrants/dev/nulland/dev/urandombut never/dev/pts, soforkpty()opening/dev/ptmx(→/dev/pts/ptmx) and a/dev/pts/<n>slave is denied withEACCES. tmux surfaces thatEACCESascreate window failed: fork failed: Permission denied.nemoclaw-blueprint/policies/openclaw-sandbox.yaml: add/dev/ptstofilesystem_policy.read_writewith the PTY rationale. Grant the directory, not/dev/ptmx— the supervisor refuses tochownthe/dev/ptmxsymlink, and the devpts directory grant coversptmx+ slaves via the landlock path hierarchy.nemoclaw-blueprint/policies/openclaw-sandbox-permissive.yamlandagents/openclaw/policy-permissive.yaml: mirror/dev/ptssoshields downnever removes a live filesystem path (OpenShell rejects live filesystem removals). Same base+permissive pattern as the prior/home/linuxbrew([Sandbox] brew policy preset cannot bootstrap Homebrew: sandbox lacks filesystem write to /home/linuxbrew #3913) and GPU/procadditions, whichpermissive-runtimealready unions on shields-down.test/e2e/test-sandbox-operations.sh(TC-SBX-09): restore the lifecycle drive to a hard assertion (drop the test(e2e): skip TC-SBX-09 lifecycle drive when sandbox blocks tmux fork #4640 fork-failure soft skip), add an explicitopenpty()probe that pins the root cause, and fail loudly if devpts ever regresses. ThePTY_OKsentinel is emitted by a shell&& echo(not inside the python source) so a Python traceback echoing the source line cannot make the probe falsely pass.test/runner.test.ts: assert both OpenClaw policies grant/dev/pts(and not/dev/ptmx), and thatTC-SBX-09no longer skips on fork failure.docs/reference/network-policies.mdx,docs/security/best-practices.mdx: list/dev/ptsas a default writable path with the tmux rationale.Verification
Reporter-workflow E2E — exact tmux-session/forkpty flow inside the hardened NemoClaw sandbox.
Reproduced and verified hermetically with the real
openshell-sandboxsupervisor (which applies the live seccomp + landlock +RLIMIT_NPROChardening; no gateway/GPU needed), running insideghcr.io/nvidia/nemoclaw/sandbox-base+tmux, driven by the actual editedopenclaw-sandbox.yamlas the policy data.Command (same shape as
TC-SBX-09):Logs:
This is the exact bundled tmux-session lifecycle (
new-session→list-sessions→kill-session) the reporter hit. Pipeline coverage: the restoredTC-SBX-09(test/e2e/test-sandbox-operations.sh) drives this same lifecycle plus theopenpty()root-cause probe against a live OpenShell sandbox on the scheduled E2E job, now as a hard assertion (afork failedis a failure, not a skip).PR CI on head
b75befddb: all required checks green —static-checks,build-typecheck,cli-test-shards (1–5),plugin-tests,unit-vitest-linux,ShellCheck,CodeQL,wsl-e2e,macos-e2e,dco-check,commit-lint,codebase-growth-guardrails(run27184680225).npx prek run --all-filespasses (all formatter/lint/security/docs hooks green; the heavyTest (CLI)hook shows only load-induced timeout flakes that pass in isolation — this change touches nosrc/code)npm test— targeted suites pass (runner,policies,initial-policy,permissive-runtime,policy-tiers,shields); CIcli-test-shards (1–5)+plugin-tests+unit-vitest-linuxall green on PR headType of Change
Notes — known, pre-existing limitation (out of scope)
On an OpenClaw sandbox created before this change and upgraded without recreate, a
shields downthenshields upcycle can leave the sandbox in shields-down state:shields downadds/dev/ptsvia the permissive policy, andshields uprestores the pre-upgrade live snapshot, which omits/dev/pts(OpenShell rejects the resulting live filesystem removal). This is identical to the existing/home/linuxbrew(#3913), GPU/proc, and Hermes/opt/hermesbase-policy additions — shields-up restore has never normalized the snapshot for any path. Such a sandbox has not received the creation-locked filesystem fix anyway; the remediation (recreate / re-onboard) resolves both the tmux fix and this edge case. A symmetric snapshot-normalization in shields-restore would address the whole class and belongs in its own change.Signed-off-by: Yimo Jiang yimoj@nvidia.com
Summary by CodeRabbit
Bug Fixes
Documentation
Tests