Skip to content

policy_fn: extend execve argv freeze to peer processes (#27)#33

Merged
congwang-mk merged 1 commit intomainfrom
issue-27-peer-freeze
May 1, 2026
Merged

policy_fn: extend execve argv freeze to peer processes (#27)#33
congwang-mk merged 1 commit intomainfrom
issue-27-peer-freeze

Conversation

@congwang-mk
Copy link
Copy Markdown
Contributor

Summary

  • Closes the cross-process TOCTOU vector raised by @Changaco on issue Sandbox escape by racing seccomp notifications? #27: a peer process in the sandbox can alias argv pages via MAP_SHARED mappings (memfd, SysV shm, shared file mmap) or share mm_struct via clone(CLONE_VM), and mutate argv between the supervisor's read and the kernel's post-Continue re-read. Sibling-thread freeze (PR policy_fn: drop path strings, keep argv via sibling-thread freeze (#27) #29) closed only the same-TGID case.
  • Replaces freeze_siblings_for_execve with freeze_sandbox_for_execve, which enumerates every TGID in ProcessIndex (the canonical sandbox-membership set, populated by register_child_if_new in resource.rs) and PTRACE_SEIZE+PTRACE_INTERRUPTs every TID via /proc/<tgid>/task. The supervisor's sequential notification dispatch (notif.rs:999-1001) prevents new clone/fork notifications from completing during the freeze, so the snapshot is stable without any new locking.
  • Sibling threads die in de_thread and the kernel reaps their ptrace state automatically (unchanged). Peer threads survive execve and are PTRACE_DETACHed after NOTIF_SEND so they resume normally.

Why enumerate-and-freeze rather than a static CLONE_VM block

Considered three approaches:

  1. Invariants: BPF deny CLONE_VM & ~CLONE_THREAD + per-execve /proc/<pid>/maps privacy check + sibling freeze. Provable but bans a legal Linux clone variant globally for one syscall's TOCTOU; the BPF mask is a magic bit-pattern that obscures the filter.
  2. Enumerate-and-freeze (this PR): wider freeze, no syscall restrictions. Primitives (ProcessIndex, sibling-freeze pattern) already exist; ~80 lines on top.
  3. execve-on-behalf: ptrace-inject a private anon page for argv. Heavyweight for a rare syscall.

(2) was the right tradeoff: existing primitives compose, no per-execve /proc/<pid>/maps walk, no static restriction on a legal clone variant. The same freeze_sandbox_for_<syscall> shape generalizes if future syscalls need TOCTOU protection on re-read user memory.

Test plan

  • cargo test -p sandlock-core --lib — 223 passed (includes new freeze_sandbox_includes_peer_process regression test)
  • cargo test -p sandlock-core --test integration test_policy_fn — 13 passed (covers deny_by_argv which exercises the live freeze path through the supervisor)
  • Full integration suite to be run in CI

Notes for reviewers

  • notif.rs:932-967 is the new dispatch site. Note that detach happens after send_response, not before — peer threads must remain frozen until the kernel completes its argv re-read.
  • The freeze_sandbox_for_execve failure mode is unchanged from the old function: any partial-freeze error rolls back all already-frozen tasks and propagates the error, which the dispatcher converts to EPERM to keep the argv-safety invariant fail-closed.
  • policy_fn.rs:65-71 and the SyscallEvent.argv doc comment are updated to reflect the actual guarantee (sandbox-wide pause, not just sibling threads).

🤖 Generated with Claude Code

@congwang-mk congwang-mk force-pushed the issue-27-peer-freeze branch from 46bd3bf to 1c45cda Compare May 1, 2026 21:14
Signed-off-by: Cong Wang <cwang@multikernel.io>
@congwang-mk congwang-mk force-pushed the issue-27-peer-freeze branch from 1c45cda to 32c9a76 Compare May 1, 2026 21:16
@congwang-mk congwang-mk merged commit 84971c6 into main May 1, 2026
8 checks passed
@congwang-mk congwang-mk deleted the issue-27-peer-freeze branch May 1, 2026 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant