Review fixes: close leaks, UAF race, framing, sanitizer CI#4
Merged
Conversation
If enif_mutex_create returned NULL the resource was returned to Elixir
with res->lock == NULL and res->fd set. The destructor gated close() on
a non-NULL lock, so the FD leaked; any subsequent nif_close would also
dereference NULL and crash.
Now nif_create_fd checks the mutex result, releases the resource on
failure (so dtor reclaims the fd), and returns {:error, :mutex_failed}.
io_resource_dtor no longer gates close() on lock presence.
nif_read / nif_write copied res->fd under lock and released the lock before the read()/write() syscall. A concurrent nif_close or owner-death down callback could close the fd on another thread before the syscall ran, letting the read/write target a recycled fd. Restructure so the mutex is held across the syscall and the subsequent enif_select registration. Since fds are O_NONBLOCK, the lock is held only briefly. Move the actual close() into the enif_select stop callback. nif_close and io_resource_down now mark the resource closed, detach the fd from res->fd, release the lock, and call enif_select(SELECT_STOP). BEAM waits for any in-flight select to drain before running the stop callback, which closes the fd safely.
In :consume mode, kick_stderr_read is called from init/1 to arm
enif_select. If the initial Pipe.read returned {:ok, data} synchronously
(short-lived processes whose stderr is already in the pipe buffer), it
sent {:stderr_data, data} to self() but no handle_info clause matched,
so the generic catch-all dropped it. The first (and sometimes only)
chunk of stderr was lost, corrupting AbnormalExit messages and
run/2 stderr capture for fast-exiting commands.
Add the missing handle_info({:stderr_data, _}, state) clause that
appends to stderr_buffer, records the bytes in stats, and drains any
remaining data via consume_stderr.
Also pass @default_read_size explicitly from kick_stderr_read instead
of relying on Pipe.read/1's implicit default, keeping read sizes
consistent across the module.
Two issues in the stdin write path:
1. If nif_write returned {:ok, 0} on a non-empty buffer (legal for
write(2) on non-regular files), write_loop would recurse forever
with the same data, wedging the GenServer. Park the caller and
wait for :ready_output instead of spinning.
2. On repeated partial writes, write_loop kept recursing inside
handle_call, delaying unrelated messages (kill, await_exit,
:select) until either completion or EAGAIN. After one successful
partial, park the caller with the remainder and let :ready_output
drive subsequent writes.
Apply the same shape to retry_write_loop on the :ready_output path.
The event loop read up to 16 bytes from the UDS socket and passed them straight to handle_command, which only parsed buf[0] and ignored the remainder. Coalesced commands (e.g. CMD_CLOSE_STDIN followed by CMD_KILL delivered in a single read) dropped everything after the first opcode. A partial tail crossing a read boundary was also lost. Introduce command_length() per opcode, a loop-parsing handle_commands() helper, and a 64-byte carry-over buffer in event_loop() that retains any truncated tail across reads. Unknown opcodes advance one byte so the parser cannot stall on bad input. read() errors other than EINTR / EAGAIN now kill the child and exit rather than silently looping.
send_fds:
* Loop on EINTR and require ret == 1 (the full 1-byte send) instead
of accepting any positive return.
* Log errno on both calloc and sendmsg failures so the caller sees
a diagnostic instead of an opaque "failed to send FDs".
Post-fork child:
* Between fork() and execvp(), POSIX allows only async-signal-safe
functions. Replace fprintf/strerror on failure with a new
child_fail() helper that uses only write(2) on a stack buffer.
* Check setsid, TIOCSCTTY, and every dup2 return value. If any of
these fail the child now reports via child_fail() and _exit(127)
instead of silently running with the wrong stdio or no controlling
terminal.
kill_child() previously called waitpid(child_pid, NULL, 0) after SIGKILL. If the child was in uninterruptible kernel sleep (D-state), this blocked the shepherd indefinitely — leaking the process group and preventing any further protocol communication with BEAM. Replace with a bounded WNOHANG reap loop (~3s cap) so control returns even when the kernel cannot reap immediately; cgroup_cleanup() provides the authoritative kill path on Linux. The event-loop SIGCHLD handler reaped at most one child per drain. SIGCHLD is coalesced by the kernel, so multiple pending exits could deliver as a single signal; the single waitpid missed them. Loop with waitpid(-1, ..., WNOHANG) until no more reapable children remain and flip child_exited only for the managed pid.
* Validate every snprintf return value for cgroup full_path / procs_path / kill_path; treat truncation as error instead of silently writing to the wrong file. * Validate uds_path length before strncpy into sockaddr_un.sun_path; reject paths that would truncate rather than failing opaquely in connect(). * Replace the fixed 100ms usleep before rmdir in cgroup_cleanup() with a bounded polling loop (~1s) that retries on EBUSY / ENOTEMPTY. Stops cgroup directories leaking when the kernel reaper is slow. * Treat cgroup_setup failure as fatal for the spawn (send_error, kill child, exit). A user who supplied a cgroup_path expects isolation; silently running unisolated violates the contract. * Set FD_CLOEXEC on the PTY master fd so any future shepherd fork doesn't leak it.
Stream.resource's after callback is only run on normal termination or
{:halt, _}. If the consumer crashed mid-iteration (e.g. pattern-match
failure), cleanup_process/1 was never invoked, the Process GenServer
lived on, and Watcher couldn't help because it monitors the GenServer,
not the consumer.
Add an optional :owner option to NetRunner.Process that takes a pid
to monitor. On DOWN the GenServer SIGKILLs the OS process (through
both the shepherd protocol and a direct NIF kill) and stops :normal.
NetRunner.Stream.stream/3 now passes self() as :owner so consumer
crashes reliably tear down the whole chain.
process.ex terminate/2 now explicitly closes the shepherd Port after pipes and the UDS socket. The Port is linked so it would die with the GenServer anyway, but closing it explicitly makes the cleanup order deterministic. watcher.ex replaced Process.sleep(5000) inside handle_info/2 with a Process.send_after(:escalate_to_sigkill, 5000) message and a new handle_info/2 clause. A Watcher can now still receive messages (e.g. supervisor shutdown) during the SIGTERM grace window instead of being wedged unresponsive.
* Exec.read_uds_message: replace peek-then-recv (which can desync if the first byte arrives ahead of its payload) with opcode-first reads. Read 1 byte for the opcode, then the opcode-specific tail. Timeouts bumped to 500ms. * Exec.spawn_process: validate cmd and args before Port.open. Reject non-binary, empty, or NUL-byte-containing values. Passing these through Port.open's args: is undefined behaviour on the C side. * Exec.cleanup_listener: propagate File.rm errors except :enoent (which is expected when the shepherd already unlinked the socket). * Signal.resolve: reject integer signals outside the POSIX 1..31 range instead of forwarding arbitrary ints to kill(2).
Operations.park previously stored the GenServer.from tuple but did not monitor the caller pid. If a caller timed out its GenServer.call or crashed while parked on EAGAIN, its entry stayed in pending until the Process exited; finish_exit then replied to a dead pid. For a long- running process with many client timeouts this leaked memory. Operations now takes Process.monitor(caller_pid) in park/4 and tracks a second map (monitors: caller_monitor_ref → op_ref). pop/2 demonitors the caller. New pop_by_monitor/2 lets the GenServer remove an entry when it observes the caller's :DOWN. reply_all/2 flushes all monitors. The Process GenServer's :DOWN handler now dispatches: if the monitor ref matches owner_ref (the stream consumer) it SIGKILLs and stops, else it tries Operations.pop_by_monitor to prune a dead parked caller.
Two defects in NetRunner.Daemon:
* The generic handle_info({:DOWN, ...}) clause matched any :DOWN and
dropped it. A drain Task crashing (abnormal reason) looked identical
to normal EOF, so a failing drain stopped invisibly and output
thereafter piled up in the pipe until blocking the child process.
Narrow the handler to refs we recognise (drain_ref / stderr_drain_ref)
and log a Logger.warning when reason != :normal.
* drain_loop relied on reader.() never raising. If the Proc GenServer
terminated mid-call or on_output blew up on a non-UTF-8 chunk, the
linked drain Task crashed, which propagated to the Daemon through
Task.async's link and crashed the whole daemon. Wrap drain_loop in
try/rescue/catch and isolate on_output in its own rescue so neither
path can take down the Daemon.
Signal.resolve previously maintained an Elixir @signals allow-list that duplicated the strcmp list inside nif_signal_number in C. Adding or renaming a signal meant editing two places and any drift produced a silent "unknown_signal" even when the other side supported it. Delegate to the NIF for any atom input; integer-range validation stays in Elixir. The NIF's existing {:error, :unknown_signal} return surfaces unsupported atoms unchanged.
Enable \`SANITIZE=1 make all\` (or the \`make asan\` convenience target) to rebuild the NIF and shepherd with AddressSanitizer + UBSan. When SANITIZE=1 the build switches to -O1 and drops _FORTIFY_SOURCE (they conflict with ASan's memcpy interceptors). Add a new CI job \`sanitizers\` that runs the full Elixir test suite with LD_PRELOAD=libasan so the NIF and shepherd catch memory-safety and undefined-behaviour bugs in the C layer on every push. Gate the publish job on the sanitizers job passing.
The earlier change that parked the caller after a single partial write
broke large-buffer streaming. Parking skipped the NIF's EAGAIN path,
which is the only place enif_select is armed for :ready_output; the
parked caller then hung forever because no readiness signal would
arrive. A 100 KB cat stream test reliably timed out after 60s.
Restore the original write_loop shape (recurse until the NIF returns
either :eagain or completion) so enif_select stays in charge of
readiness. Keep the {:ok, 0} guard from the original review fix — if
the kernel ever returns 0 on a non-empty write we sleep(1) and retry
instead of spinning the dirty scheduler. retry_write_loop mirrors the
same shape.
Also add a test_helper.exs hook that purges stale /tmp/net_runner_*.sock
files before and after every test run; stops accumulation from tests
that crash before cleanup_listener runs.
Runs \`printf 'a\0b\0c'\` through the shepherd and asserts the full 5 bytes survive the read path. Guards against any accidental use of String functions on the output binary that would truncate at the first NUL.
Credo strict flagged the cond/true as a refactoring opportunity; with only two branches an if/else is clearer and identical in behaviour.
Public-API fix: NetRunner.run pattern-matched {:ok, pid} on Proc.start,
which raised MatchError for any {:error, reason} — including the new
validation errors from E12. Return the error tuple directly.
New regression tests:
* test/signal_test.exs — Signal.resolve range, unknown atoms, bad
types, resolve! raising behaviour (E15).
* test/process_test.exs —
* NUL-byte rejection in cmd/args (E12)
* Proc.start with empty cmd rejected (E12)
* stderr-only fast-exit command: clean exit + bytes_err stats
nonzero (E1)
* :owner monitor SIGKILLs OS process when owner crashes (E3)
* test/net_runner_test.exs —
* run/2 and stream/2 surface validation errors instead of crashing
* Timeout path reaps OS processes (no sleep pid leaks after five
100ms timeouts against sleep 30)
Fix CI regression from C9 (treat cgroup setup failure as fatal):
The existing Linux-only test assumed CI had write access to
/sys/fs/cgroup/; it does not. Before C9 this silently succeeded
(child ran outside the cgroup). After C9 the shepherd correctly
reports {:error, _}. Rewrite the test to accept either outcome
— success on a privileged run, a clean error on an unprivileged
one. Either confirms the option is plumbed through.
Add a negative test for path validation (absolute paths and
traversal are rejected).
CHANGELOG: document every fix from the review pass under
[Unreleased] — FD leak on mutex failure, use-after-close race,
lost stderr, write-loop spin, shepherd framing, sendmsg
hardening, post-fork signal safety, waitpid bounds, SIGCHLD
reap loop, cgroup/path validation, stream consumer-crash
cleanup, Watcher send_after, parked-caller monitors, UDS read
redesign, NUL-byte validation, run/2 error surface, File.rm
errors, Signal range/source-of-truth, Daemon drain resilience,
explicit Port.close, ASan/UBSan build + CI job, stale socket
sweep, regression tests.
README: new sections on input validation / error returns,
binary output with NUL bytes, the Command DSL, an error-handling
cheatsheet, and examples for :owner monitoring and per-call
kill_timeout.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the findings from the professional code review in
/Users/nmaroulis/.claude/plans/do-a-thoroough-code-wild-bengio.md,addressing zero-leak / zero-zombie guarantees across the NIF, shepherd,
and Elixir layers. 18 commits, each a focused fix.
Critical fixes (correctness)
nif_create_fd) — previously thedtor gated
close()onres->lockbeing non-NULL, so an ENOMEMduring mutex create leaked the fd and armed a NULL-deref in any
later
nif_close.nif_read/nif_writecopiedfdout ofthe lock before the syscall; a concurrent
nif_close/ owner-deathcould close the fd before the syscall ran, hitting a recycled fd.
Lock is now held across the syscall and
enif_select; the actualclose()runs in theio_resource_stopcallback after BEAM drainsany pending select.
:consumemode —kick_stderr_readsent{:stderr_data, _}toself()but nohandle_infomatched, sofirst-chunk stderr for short-lived processes was dropped.
write_loop{:ok, 0}spin — if the kernel ever returned 0 ona non-empty write, the GenServer spun forever. Bounded with a 1 ms
sleep-retry.
High-severity fixes (shepherd C layer)
buf[0], discarding anycoalesced/tail commands. Now frames are length-dispatched per
opcode with a 64-byte carry-over buffer across
poll()iterations.sendmsginsend_fds— EINTR loop, verify full 1-byte send,log errno on failure instead of returning an opaque
-1.fprintf/strerrorin thepost-fork child window with a
write(2)-basedchild_fail()helper (async-signal-safe). Checked every
dup2/setsid/TIOCSCTTYreturn; on failure the child exits 127 with adiagnostic instead of running with broken stdio.
waitpidafter SIGKILL — bounded WNOHANG loop (~3 s cap) soshepherd can't hang on a D-state child.
SIGCHLDreaping — loopwaitpid(-1, ..., WNOHANG)per drain;previously one signal could leave zombies if ever coalesced.
snprintfreturn,reject too-long UDS paths, set
FD_CLOEXECon PTY master, treatuser-requested cgroup setup failure as fatal, and replace the fixed
100 ms usleep in
cgroup_cleanupwith a bounded polling rmdir.Elixir layer hardening
optional
:ownerpid, monitors it, and SIGKILL-stops the OS processon DOWN.
NetRunner.Stream.stream/3passesself()so a consumercrash tears down the chain even though
Stream.resource'safterdidn't fire.
Process.sleep(5_000)inhandle_inforeplaced withProcess.send_after/3+ a new clause.Operations.park/4nowProcess.monitors the caller. Dead callers are pruned on DOWNinstead of accumulating in the pending map.
read_uds_message— replaced peek-then-recv (race-prone whenthe payload arrives a moment after the opcode) with opcode-first
reads and proper timeouts.
cmd and args at the spawn boundary.
:enoenton the UDS socketpath, propagate real errors.
Signal.resolvedelegates to the NIFinstead of maintaining a duplicate allow-list; integer range
clamped to
1..31.task dies abnormally; wrap
drain_loopso reader/logger exceptionscan't take the daemon down via Task link.
terminate/2explicitly closes the shepherdPortafter theUDS socket so the teardown order is deterministic.
Tooling
SANITIZE=1 make all(ormake asan). CIgains a
sanitizersjob that builds the NIF and shepherd withAddressSanitizer + UBSan, preloads
libasan, and runs the fullmix test. The publish job depends on it.test_helper.exs(before + after suite).Test plan
mix compile --warnings-as-errorsmix format --check-formattedmix credo --strict(clean)mix dialyzer(0 errors)mix test— 120 passed, 10× consecutive runs all greenSANITIZE=1 make clean allbuilds locally on macOSsanitizersjob runs clean on Linux (verified on push)