Conversation
|
I am testing Linux aarch64 userspace under elfuse on macOS with an Ubuntu arm64 rootfs. After the latest update, the previous Current failure: Full relevant log: The important change is that this error no longer appears: Expected behavior: Actual behavior: Suspected area: It looks like /dev/ptmx and TIOCSWINSZ are now improved, but /dev/pts/N slave path handling may still be incomplete. The runtime likely needs to map the Linux /dev/pts/N path back to the host PTY slave path created for the corresponding PTY master. |
|
I got this crash log when running the app |
Determine if commit b2fa153 resolves the issue. |
|
Hi @jserv , This is crash log after I use your commit. |
Bundles the elfuse-side fixes the foot + Wayland compositor path needed to stop crashing after the PTY emulation work landed. The common thread is fault paths the PR #89 reproduction exercises and the prior runtime quietly mis-emulated. guest.c: drain worker vCPUs in guest_destroy before any hv_vm_unmap. thread_destroy_all_vcpus releases handles but does not block on the owning pthread leaving hv_vcpu_run, so a worker still inside the guest at unmap time took a stage-2 translation fault on its next instruction fetch -- the EC=0x20 syndrome=0x82000007 the PR #89 reporter hit. exit_group already runs request + interrupt + join; the destroy path needs the same prefix because forkipc.c's vcpu_run_loop returns straight into guest_destroy without going through the guest exit_group handler. Wake signals (futex, wakeup-pipe, hv_vcpus_exit) cover workers blocked outside the vCPU loop so the 100ms join cap does not detach live pthreads onto the imminent munmap. futex.[ch]: futex_interrupt_consume is now a test-and-clear edge trigger; previously the sticky flag forkipc.c set on the last clone-thread exit kept every later epoll_pwait/ppoll/futex_wait returning -EINTR until execve cleared it, and in foot's case execve never came. poll.c, signal.c, and the futex wait paths switch to the consume variant. mem.c: hvf_apply_file_overlay refuses MAP_SHARED of a read-only backing fd up front. Apple HVF rejects post-overlay hv_vm_map with HV_DENIED when the underlying host VA loses write capability, so the overlay path silently swapped MAP_SHARED ranges to MAP_PRIVATE snapshots; routing read-only fds straight to the pread fallback returns -EACCES at the right layer and keeps the fork-child overlay re-install path quiet. io.c + abi.h: fallocate now handles FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE. The Linux semantic (reads in the punched range return zero, file size unchanged) maps to macOS F_PUNCHHOLE when the offset and length are block-aligned, and falls back to pwrite of zero pages for the one-byte probe foot's wl_shm pool issues at startup. Without the fallback, the probe returned EINVAL and foot disabled punch-hole for the whole session. crashreport.c: when an EL0 fault crash report fires, dump the guest page-table walk for the faulting VA plus the segment and region that should have backed it. The PT walk surfaced the hvf_segments=0 smoking gun behind the worker-drain race.
|
Hi @jserv , The code is working and issue has gone. I faced another issue but it's about compatible of Wayland not issue in elfuse anymore. Thanks, |
sys_ioctl previously had no case for TIOCSWINSZ, so foot's initial
master-side resize hit the default -ENOTTY arm and aborted terminal
startup. The minimal one-line fix is insufficient on macOS: the host
/dev/ptmx master is not itself a tty. TIOCSWINSZ / TIOCGWINSZ on the
bare master return ENOTTY until something has opened the corresponding
slave at least once, and the stored winsize gets cleared whenever the
slave refcount drops to zero (verified empirically on macOS 15). Linux
ptmx masters are tty fds in their own right, so guests assume those
ioctls work without an open slave.
pty_open_master bridges the gap by eagerly opening one slave host fd
that elfuse holds for the lifetime of the master and never exposes to
the guest. A per-master side table records ({master,slave}_host_fd,
linux_pts_num, slave_path). fd_cleanup_entry drops the keepalive when
the master closes; sys_close's fast-close branch also calls
proc_pty_close_keepalive so single-thread closes do not bypass the slow
path. duplicate_guest_fd mirrors the keepalive via a dup-under-lock so
dup/dup2/fcntl(F_DUPFD) aliases each keep a slave reference, then
registers BEFORE fd_alloc publishes the new guest fd so a sibling close
racing the install drops the duped keepalive too.
Guest /dev/pts/N opens and stats resolve through the captured ptsname(3)
string rather than a /dev/ttys%03lu reformat that breaks on any future
macOS naming change or unusual minor encoding. The synthesized stat
publishes st_rdev = (136 << 24) | minor in the macOS encoding so the
fs-stat translation layer (mac_to_linux_dev) yields a Linux dev_t whose
major(rdev) equals UNIX98_PTY_SLAVE_MAJOR, satisfying glibc ptsname's
device-type check. pty_open_master fails the ptmx open with EMFILE when
the table is full instead of returning a master fd whose pts number
cannot round-trip through /dev/pts/N. /dev/pts is added to
path_might_use_stat_intercept.
Four new ioctls: TIOCSWINSZ passes through to the host (now valid
because the keepalive opened the slave); TIOCGPTN reports the captured
pts number; TIOCSPTLCK(0) maps to host unlockpt(3) (lock-after-unlock
is rejected because macOS exposes no re-lock primitive); TIOCGPTPEER
opens the captured slave path with translate_open_flags and rejects
unsupported bits with EINVAL. TIOCSWINSZ / TIOCGWINSZ / TIOCGPTN
defensively call proc_pty_master_adopt(guest_fd) first so a master
received via SCM_RIGHTS lazily registers its own keepalive before the
host ioctl. Adopt uses fd_snapshot_and_dup to atomically pin the
canonical (host_fd, generation) with a probe dup, performs the slave
open against the probe, then re-validates and inserts under joint
fd_lock + pty_keepalive_lock so a sibling close+recycle between
validation and insert cannot attach the keepalive to the wrong file.
pty_keepalive_register_locked returns the existing pts_num atomically
on duplicate so the idempotent path never re-reads under the lock-free
race window.
Fork IPC propagates the keepalive table. fork_ipc_send_pty_keepalives
walks proc_pty_snapshot_keepalive output, matches each entry's
master_host_fd against the parent's fd_table to recover the guest fd
that is stable across the IPC, and ships (guest_fd, linux_pts_num,
slave_path) records plus an SCM_RIGHTS batch of dup'd slave fds.
fork_ipc_recv_pty_keepalives resolves each guest_fd through the child's
just-installed fd_table to recover the child-side master host fd, then
calls proc_pty_restore_keepalive which sets FD_CLOEXEC on the inherited
slave and registers the pair under the wire-transmitted linux_pts_num
(no reparse of the path string). Both phases run after fork_ipc_{send,
recv}_fd_table so the guest-fd-to-host-fd lookup exists in both
directions.
Close #88
The keepalive table previously cleared the entire entry on master close,
which dropped the (linux_pts_num, slave_path) mapping a forked child's
subsequent open("/dev/pts/N") relied on. foot/sshd/openssh sftp-server
all close(master) in the child after fork and BEFORE opening the slave,
so the slave open landed in the cleared lookup and returned ENOENT even
though the parent still held the master and the macOS slave node was
openable.
proc_pty_close_keepalive now retains linux_pts_num + slave_path for
fork-restored entries flagged stale_open_once, keeping the inherited
slave host fd to pin the macOS tty across the close-before-open window.
The next translated /dev/pts/N open consumes the stale entry, closes
the retained slave, and clears the slot. Ordinary local master closes
still drop the mapping immediately, so the path cache cannot persist
beyond one consumer.
pty_keepalive_register_locked prefers a stale-path slot with the same
pts number on insert (deterministic macOS minor mapping makes reuse
path-correct), then an empty slot, then evicts the lowest-index stale
slot. Live entries are never evicted. pty_keepalive_register_recycled
expires any stale-path entries holding the same slave_path before the
new master takes the slot, so a recycled minor cannot inherit the
prior tenant's cached translation.
pty_lookup_slave_path and pty_open_slave walk all entries with a
non-empty slave_path, preferring live over stale. pty_open_pts_dir
enumerates the same set so open and readdir stay consistent: a child
that just opened its slave via the stale-path mapping no longer sees
an entry that is open(2)-reachable but readdir-invisible.
Extract pty_keepalive_lock_acquire and pty_keepalive_find_master_locked so the ten pthread_once + lock acquire pairs and four "scan by master_host_fd" loops collapse to one call site each. Drop the redundant BSS-zero field assignments from pty_keepalive_init now that pty_keepalive_clear_slot_locked resets every field on slot reclaim, so the BSS dependency only matters for the first-touch sentinel. Switch pty_keepalive_register_locked to str_copy_trunc, collapse three close + saved-errno blocks in pty_open_master into close_keep_errno, and fold the success and failure cleanup loops of fork_ipc_send_pty_keepalives so payload_slave_fds is closed once at the tail. Replace proc_pty_restore_keepalive's four scattered close paths with a single goto drop trailer and use ARRAY_SIZE in fork_ipc_send_pty_keepalives instead of an open-coded divisor.
Linux O_PATH means "path-only": the device hook must not run, no pty pair gets allocated, and the resulting fd only supports fstat plus *at-style operations. Forwarding the open to pty_open_master broke this because every probe of /dev/ptmx allocated a new pty and grew /dev/pts indefinitely. proc_intercept_open now short-circuits /dev/ptmx + O_PATH to a /dev/null backing fd. /dev/null is harmless, never a directory, and the guest's I/O and ioctl paths are already gated by FD_PATH so the backing fd is never visible. proc_intercept_stat synthesizes the matching character-device stat with rdev = (5, 2) so fstat through the FD_PATH gate reports the standard Linux ptmx device numbers, and the stat-translation layer's mac_to_linux_dev produces the right values in the guest's struct stat. sys_fstat picks up the synthetic stat when an FD_PATH fd carries a non-empty proc_path -- the route any future virtual-path-backed fd (/proc, /dev/ptmx, etc.) needs. To keep that proc_path safe under sibling close + reopen races, fold the proc_path install into fd_alloc_opened_host's existing (type, host_fd) tuple-revalidation window: the resolver runs before fd_lock and the install happens inside it, alongside linux_flags and the urandom bitmap. The previous post-publish unlocked write let a recycled slot inherit the stale proc_path string, which on the FD_PATH path could surface another file's fstat as /dev/ptmx.
Bundles the elfuse-side fixes the foot + Wayland compositor path needed to stop crashing after the PTY emulation work landed. guest.c: drain worker vCPUs in guest_destroy before any hv_vm_unmap. thread_destroy_all_vcpus releases handles but does not block on the owning pthread leaving hv_vcpu_run, so a worker still inside the guest at unmap time took a stage-2 translation fault on its next instruction fetch. exit_group already runs request + interrupt + join; the destroy path needs the same prefix because forkipc.c's vcpu_run_loop returns straight into guest_destroy without going through the guest exit_group handler. Wake signals (futex, wakeup-pipe, hv_vcpus_exit) cover workers blocked outside the vCPU loop so the 100ms join cap does not detach live pthreads onto the imminent munmap. futex.[ch]: futex_interrupt_consume is now a test-and-clear edge trigger; previously the sticky flag forkipc.c set on the last clone-thread exit kept every later epoll_pwait/ppoll/futex_wait returning -EINTR until execve cleared it, and in foot's case execve never came. poll.c, signal.c, and the futex wait paths switch to the consume variant. mem.c: hvf_apply_file_overlay refuses MAP_SHARED of a read-only backing fd up front. Apple HVF rejects post-overlay hv_vm_map with HV_DENIED when the underlying host VA loses write capability, so the overlay path silently swapped MAP_SHARED ranges to MAP_PRIVATE snapshots; routing read-only fds straight to the pread fallback returns -EACCES at the right layer and keeps the fork-child overlay re-install path quiet. io.c + abi.h: fallocate now handles FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE. The Linux semantic (reads in the punched range return zero, file size unchanged) maps to macOS F_PUNCHHOLE when the offset and length are block-aligned, and falls back to pwrite of zero pages for the one-byte probe foot's wl_shm pool issues at startup. Without the fallback, the probe returned EINVAL and foot disabled punch-hole for the whole session. crashreport.c: when an EL0 fault crash report fires, dump the guest page-table walk for the faulting VA plus the segment and region that should have backed it. The PT walk surfaced the hvf_segments=0 smoking gun behind the worker-drain race.
A targeted lgetxattr probe surfaced that elfuse's errno mapping table covered every other divergent macOS errno in the 35..102 range but lost the xattr-specific pair: macOS ENOATTR(93) and ENODATA(96) both fell into the linux_errno() default and surfaced as Linux EINVAL(22). fontconfig and glibc treat "attribute not found" as ENODATA(61), so the wrong errno made lgetxattr on a missing attr look like a malformed call and short-circuited their probes. Add a LINUX_ENODATA constant, route both macOS values through it in translate.c (guarded on the alias case so duplicate switch labels do not break the compile when ENOATTR == ENODATA on a given SDK), and add a tests/test-xattr.c that pins five round-trips: lgetxattr on a regular file returns the stored value, lgetxattr on a symlink without its own attr reports ENODATA (not EINVAL), getxattr on the same symlink follows to the target, lgetxattr after lsetxattr installs a symlink-owned attr returns the link value, and lgetxattr on a missing attr reports ENODATA. The dispatch wiring at syscall.c:285 was already in place; only the translation cap was missing.
futex.c: signal_pending_lockfree consults the atomic sig_pending_hint without confirming under sig_lock, which the helper itself documented as a stale-true source -- rt_sigprocmask masking a queued signal does not update the hint, so the next reader sees pending && unblocked and synthesizes -EINTR even though signal_pending() (the slow-path confirm) would have reported no deliverable signal. Both futex_wait call sites already release the bucket lock before consulting signal state (the OS-sync path holds no bucket lock at all; the bucket path explicitly drops b->lock around the signal check and re-checks waiter.woken after re-acquiring), so the slow-path call is safe at both sites without lock-order risk. Route both through signal_pending() and delete the lockfree variant.
Good to hear that. When |
|
Nice. Look forward to use |

sys_ioctl previously had no case for TIOCSWINSZ, so foot's initial master-side resize hit the default -ENOTTY arm and aborted terminal startup. The minimal one-line fix is insufficient on macOS: the host /dev/ptmx master is not itself a tty. TIOCSWINSZ / TIOCGWINSZ on the bare master return ENOTTY until something has opened the corresponding slave at least once, and the stored winsize gets cleared whenever the slave refcount drops to zero (verified empirically on macOS 15). Linux ptmx masters are tty fds in their own right, so guests assume those ioctls work without an open slave.
pty_open_master bridges the gap by eagerly opening one slave host fd that elfuse holds for the lifetime of the master and never exposes to the guest. A per-master side table records (master_host_fd, slave_host_fd, linux_pts_num, slave_path). fd_cleanup_entry drops the keepalive when the master closes; sys_close's fast-close branch also calls proc_pty_close_keepalive so single-thread closes do not bypass the slow path. duplicate_guest_fd mirrors the keepalive via a dup-under-lock so dup/dup2/fcntl(F_DUPFD) aliases each keep a slave reference, then registers BEFORE fd_alloc publishes the new guest fd so a sibling close racing the install drops the duped keepalive too.
Guest /dev/pts/N opens and stats resolve through the captured ptsname(3) string rather than a /dev/ttys%03lu reformat that breaks on any future macOS naming change or unusual minor encoding. The synthesized stat publishes st_rdev = (136 << 24) | minor in the macOS encoding so the fs-stat translation layer (mac_to_linux_dev) yields a Linux dev_t whose major(rdev) equals UNIX98_PTY_SLAVE_MAJOR, satisfying glibc ptsname's device-type check. pty_open_master fails the ptmx open with EMFILE when the table is full instead of returning a master fd whose pts number cannot round-trip through /dev/pts/N. /dev/pts is added to path_might_use_stat_intercept.
Four new ioctls: TIOCSWINSZ passes through to the host (now valid because the keepalive opened the slave); TIOCGPTN reports the captured pts number; TIOCSPTLCK(0) maps to host unlockpt(3) (lock-after-unlock is rejected because macOS exposes no re-lock primitive); TIOCGPTPEER opens the captured slave path with translate_open_flags and rejects unsupported bits with EINVAL. TIOCSWINSZ / TIOCGWINSZ / TIOCGPTN defensively call proc_pty_master_adopt(guest_fd) first so a master received via SCM_RIGHTS lazily registers its own keepalive before the host ioctl. Adopt uses fd_snapshot_and_dup to atomically pin the canonical (host_fd, generation) with a probe dup, performs the slave open against the probe, then re-validates and inserts under joint fd_lock + pty_keepalive_lock so a sibling close+recycle between validation and insert cannot attach the keepalive to the wrong file. pty_keepalive_register_locked returns the existing pts_num atomically on duplicate so the idempotent path never re-reads under the lock-free race window.
Fork IPC propagates the keepalive table. fork_ipc_send_pty_keepalives walks proc_pty_snapshot_keepalive output, matches each entry's master_host_fd against the parent's fd_table to recover the guest fd that is stable across the IPC, and ships (guest_fd, linux_pts_num, slave_path) records plus an SCM_RIGHTS batch of dup'd slave fds. fork_ipc_recv_pty_keepalives resolves each guest_fd through the child's just-installed fd_table to recover the child-side master host fd, then calls proc_pty_restore_keepalive which sets FD_CLOEXEC on the inherited slave and registers the pair under the wire-transmitted linux_pts_num (no reparse of the path string). Both phases run after fork_ipc_{send,recv}_fd_table so the guest-fd-to-host-fd lookup exists in both directions.
Close #88
Summary by cubic
Adds Linux PTY ioctls and
devptsemulation on macOS so terminal apps start cleanly and/dev/pts/Nbehaves like Linux. Also hardens teardown and signal/wait paths, adds punch‑holefallocatesupport, and fixes xattr ENODATA translation.New Features
/dev/ptmxmaster; adopt viaSCM_RIGHTS; mirrors acrossdup/fork;FD_CLOEXEC; returnsEMFILEwhen full; drops on close; survives child close‑before‑open with fork IPC restore.TIOCSWINSZ/TIOCGWINSZ,TIOCGPTN,TIOCSPTLCK(0)->unlockpt(3),TIOCGPTPEER; intercept/dev/ptsdir and/dev/pts/Nopen/stat; consistent withreaddir; addtests/test-pty.c.O_PATHon/dev/ptmxwithout allocating a PTY; back with/dev/null; synthesizefstatrdev(5,2)viaproc_path.fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE): kernel punch‑hole when aligned; zero‑pwrite fallback otherwise.Bug Fixes
ppoll/pselect/epoll/futex_waitnow only returnEINTRon real signals.MAP_SHAREDoverlays for read‑only fds and route to snapshotpreadto avoidHV_DENIED.ENOATTR/ENODATAto LinuxENODATAfor xattr syscalls; addtests/test-xattr.c.Written for commit f5c640f. Summary will update on new commits.