Skip to content

Emulate Linux PTY ioctls and /dev/pts/N paths#89

Merged
jserv merged 7 commits into
mainfrom
pty-ioctl
Jun 9, 2026
Merged

Emulate Linux PTY ioctls and /dev/pts/N paths#89
jserv merged 7 commits into
mainfrom
pty-ioctl

Conversation

@jserv

@jserv jserv commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

sys_ioctl previously had no case for TIOCSWINSZ, so foot's initial master-side resize hit the default -ENOTTY arm and aborted terminal startup. The minimal one-line fix is insufficient on macOS: the host /dev/ptmx master is not itself a tty. TIOCSWINSZ / TIOCGWINSZ on the bare master return ENOTTY until something has opened the corresponding slave at least once, and the stored winsize gets cleared whenever the slave refcount drops to zero (verified empirically on macOS 15). Linux ptmx masters are tty fds in their own right, so guests assume those ioctls work without an open slave.

pty_open_master bridges the gap by eagerly opening one slave host fd that elfuse holds for the lifetime of the master and never exposes to the guest. A per-master side table records (master_host_fd, slave_host_fd, linux_pts_num, slave_path). fd_cleanup_entry drops the keepalive when the master closes; sys_close's fast-close branch also calls proc_pty_close_keepalive so single-thread closes do not bypass the slow path. duplicate_guest_fd mirrors the keepalive via a dup-under-lock so dup/dup2/fcntl(F_DUPFD) aliases each keep a slave reference, then registers BEFORE fd_alloc publishes the new guest fd so a sibling close racing the install drops the duped keepalive too.

Guest /dev/pts/N opens and stats resolve through the captured ptsname(3) string rather than a /dev/ttys%03lu reformat that breaks on any future macOS naming change or unusual minor encoding. The synthesized stat publishes st_rdev = (136 << 24) | minor in the macOS encoding so the fs-stat translation layer (mac_to_linux_dev) yields a Linux dev_t whose major(rdev) equals UNIX98_PTY_SLAVE_MAJOR, satisfying glibc ptsname's device-type check. pty_open_master fails the ptmx open with EMFILE when the table is full instead of returning a master fd whose pts number cannot round-trip through /dev/pts/N. /dev/pts is added to path_might_use_stat_intercept.

Four new ioctls: TIOCSWINSZ passes through to the host (now valid because the keepalive opened the slave); TIOCGPTN reports the captured pts number; TIOCSPTLCK(0) maps to host unlockpt(3) (lock-after-unlock is rejected because macOS exposes no re-lock primitive); TIOCGPTPEER opens the captured slave path with translate_open_flags and rejects unsupported bits with EINVAL. TIOCSWINSZ / TIOCGWINSZ / TIOCGPTN defensively call proc_pty_master_adopt(guest_fd) first so a master received via SCM_RIGHTS lazily registers its own keepalive before the host ioctl. Adopt uses fd_snapshot_and_dup to atomically pin the canonical (host_fd, generation) with a probe dup, performs the slave open against the probe, then re-validates and inserts under joint fd_lock + pty_keepalive_lock so a sibling close+recycle between validation and insert cannot attach the keepalive to the wrong file. pty_keepalive_register_locked returns the existing pts_num atomically on duplicate so the idempotent path never re-reads under the lock-free race window.

Fork IPC propagates the keepalive table. fork_ipc_send_pty_keepalives walks proc_pty_snapshot_keepalive output, matches each entry's master_host_fd against the parent's fd_table to recover the guest fd that is stable across the IPC, and ships (guest_fd, linux_pts_num, slave_path) records plus an SCM_RIGHTS batch of dup'd slave fds. fork_ipc_recv_pty_keepalives resolves each guest_fd through the child's just-installed fd_table to recover the child-side master host fd, then calls proc_pty_restore_keepalive which sets FD_CLOEXEC on the inherited slave and registers the pair under the wire-transmitted linux_pts_num (no reparse of the path string). Both phases run after fork_ipc_{send,recv}_fd_table so the guest-fd-to-host-fd lookup exists in both directions.

Close #88


Summary by cubic

Adds Linux PTY ioctls and devpts emulation on macOS so terminal apps start cleanly and /dev/pts/N behaves like Linux. Also hardens teardown and signal/wait paths, adds punch‑hole fallocate support, and fixes xattr ENODATA translation.

  • New Features

    • PTY keepalive: hidden slave per /dev/ptmx master; adopt via SCM_RIGHTS; mirrors across dup/fork; FD_CLOEXEC; returns EMFILE when full; drops on close; survives child close‑before‑open with fork IPC restore.
    • PTY ioctls and paths: TIOCSWINSZ/TIOCGWINSZ, TIOCGPTN, TIOCSPTLCK(0) -> unlockpt(3), TIOCGPTPEER; intercept /dev/pts dir and /dev/pts/N open/stat; consistent with readdir; add tests/test-pty.c.
    • Honor O_PATH on /dev/ptmx without allocating a PTY; back with /dev/null; synthesize fstat rdev (5,2) via proc_path.
    • fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE): kernel punch‑hole when aligned; zero‑pwrite fallback otherwise.
  • Bug Fixes

    • Quiesce worker vCPUs and wake sleepers before guest destroy; crash reports now include a guest page‑table walk on EL0 faults.
    • Futex EINTR: switch to test‑and‑clear interrupt; confirm deliverable signals under lock; ppoll/pselect/epoll/futex_wait now only return EINTR on real signals.
    • Refuse MAP_SHARED overlays for read‑only fds and route to snapshot pread to avoid HV_DENIED.
    • Translate macOS ENOATTR/ENODATA to Linux ENODATA for xattr syscalls; add tests/test-xattr.c.

Written for commit f5c640f. Summary will update on new commits.

Review in cubic

cubic-dev-ai[bot]

This comment was marked as resolved.

@doanbaotrung

Copy link
Copy Markdown

I am testing Linux aarch64 userspace under elfuse on macOS with an Ubuntu arm64 rootfs. After the latest update, the previous TIOCSWINSZ: Inappropriate ioctl for device failure appears to be fixed, but foot now fails later while opening the PTY slave device.

Current failure:

err: slave.c:239: failed to open pseudo terminal slave device: No such file or directory
err: slave.c:554: /bin/sh: failed to execute: No such file or directory
err: fdm.c:228: no such FD: 6

Full relevant log:

=== [2026-06-08 10:12:48] Launching app: foot ===
WARNING: __aarch64_have_lse_atomics not found
warn: main.c:464: 'C' is not a UTF-8 locale, falling back to 'C.UTF-8'
warn: config.c:4069: DejaVu Sans: font does not appear to be monospace; check your config, or disable this warning by setting [tweak].font-monospace-warn=no
xkbcommon: ERROR: [XKB-679] No Compose file for locale "C.UTF-8": failed to use fallback "en_US.UTF-8"
xkbcommon: ERROR: [XKB-679] couldn't find a Compose file for locale "C.UTF-8" (mapped to "C.UTF-8")
warn: wayland.c:1368: failed to instantiate compose table; dead keys (compose) will not work
err: slave.c:239: failed to open pseudo terminal slave device: No such file or directory
err: slave.c:554: /bin/sh: failed to execute: No such file or directory
err: fdm.c:228: no such FD: 6
warn: terminal.c:2040: failed to trim memory
ELF entry   : 0x561c0
Load range  : 0x0 – 0x90ca0
Segments    : 2
guest_bootstrap_prepare...
guest_bootstrap_create_vcpu...
entering vcpu_run_loop...
exit code: 230

The important change is that this error no longer appears:
failed to set initial TIOCSWINSZ: Inappropriate ioctl for device
So the PTY setup now gets past window-size initialization, but fails when resolving/opening the pseudo terminal slave.

Expected behavior:
foot should be able to create a PTY pair, open the slave side, and spawn /bin/sh.

Actual behavior:
PTY slave opening fails with ENOENT, then shell startup fails.

Suspected area:
Linux PTY/devpts emulation, especially the relationship between:

/dev/ptmx
/dev/pts/N
ptsname()
grantpt()
unlockpt()
TIOCGPTN
TIOCSPTLCK
TIOCGPTLCK
TIOCGPTPEER

It looks like /dev/ptmx and TIOCSWINSZ are now improved, but /dev/pts/N slave path handling may still be incomplete. The runtime likely needs to map the Linux /dev/pts/N path back to the host PTY slave path created for the corresponding PTY master.

@doanbaotrung

Copy link
Copy Markdown

@jserv

I got this crash log when running the app

=== [2026-06-08 13:40:14] Launching app: foot ===
[Prefix] ubuntu-arm64 kind=linux arch=aarch64 runner=elfuse root=/Users/dbaotrung/.muplar/prefixes/ubuntu-arm64 rootfs=/Users/dbaotrung/.muplar/prefixes/ubuntu-arm64/rootfs
warn: main.c:464: 'C' is not a UTF-8 locale, falling back to 'C.UTF-8'
warn: config.c:4069: DejaVu Sans: font does not appear to be monospace; check your config, or disable this warning by setting [tweak].font-monospace-warn=no
xkbcommon: ERROR: [XKB-679] No Compose file for locale "C.UTF-8": failed to use fallback "en_US.UTF-8"
xkbcommon: ERROR: [XKB-679] couldn't find a Compose file for locale "C.UTF-8" (mapped to "C.UTF-8")
warn: wayland.c:1368: failed to instantiate compose table; dead keys (compose) will not work
13:40:15 ERROR src/syscall/proc.c:1931: elfuse: worker: unexpected exception EC=0x20 syndrome=0x82000007 VA=0x42c688 PA=0x42c688

╔══════════════════════════════════════════════════════════╗
║                   elfuse crash report                    ║
╚══════════════════════════════════════════════════════════╝

## Environment
- elfuse version: 54aa908
- macOS: 26.5 (Darwin 25.5.0)
- hardware: Apple M2 Pro

## Crash
- type: UNEXPECTED_EC
- detail: EC=0x20 syndrome=0x82000007 VA=0x42c688

## Binary
- path: /usr/bin/foot
- sysroot: /Users/dbaotrung/.muplar/prefixes/ubuntu-arm64/rootfs
- cmdline: /usr/bin/foot

## Registers
PC   = 0x000000000042c688  CPSR = 0x0000000080000000

@jserv

jserv commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

I got this crash log when running the app

Determine if commit b2fa153 resolves the issue.

@doanbaotrung

doanbaotrung commented Jun 8, 2026

Copy link
Copy Markdown

Hi @jserv ,

This is crash log after I use your commit.


=== [2026-06-08 21:33:09] Launching app: foot ===
[Prefix] ubuntu-arm64 kind=linux arch=aarch64 runner=elfuse root=/Users/dbaotrung/.muplar/prefixes/ubuntu-arm64 rootfs=/Users/dbaotrung/.muplar/prefixes/ubuntu-arm64/rootfs
WARNING: __aarch64_have_lse_atomics not found
warn: main.c:464: 'C' is not a UTF-8 locale, falling back to 'C.UTF-8'
warn: config.c:4069: DejaVu Sans: font does not appear to be monospace; check your config, or disable this warning by setting [tweak].font-monospace-warn=no
xkbcommon: ERROR: [XKB-679] No Compose file for locale "C.UTF-8": failed to use fallback "en_US.UTF-8"
xkbcommon: ERROR: [XKB-679] couldn't find a Compose file for locale "C.UTF-8" (mapped to "C.UTF-8")
warn: wayland.c:1368: failed to instantiate compose table; dead keys (compose) will not work
21:33:10 ERROR src/syscall/proc.c:1931: elfuse: worker: unexpected exception EC=0x20 syndrome=0x82000007 VA=0x42c688 PA=0x42c688

╔══════════════════════════════════════════════════════════╗
║                   elfuse crash report                    ║
╚══════════════════════════════════════════════════════════╝

## Environment
- elfuse version: b2fa153
- macOS: 26.5 (Darwin 25.5.0)
- hardware: Apple M2 Pro

## Crash
- type: UNEXPECTED_EC
- detail: EC=0x20 syndrome=0x82000007 VA=0x42c688

## Binary
- path: /usr/bin/foot
- sysroot: /Users/dbaotrung/.muplar/prefixes/ubuntu-arm64/rootfs
- cmdline: /usr/bin/foot

## Registers
PC   = 0x000000000042c688  CPSR = 0x0000000080000000
ESR  = 0x0000000092000007  EC=0x24 (Data abort (lower EL))
FAR  = 0x0000000000000008  ELR  = 0x000000000042c688
SPSR = 0x0000000080000000  SCTLR= 0x0000000034d0d985
SP0  = 0x000000021920a610  TPIDR= 0x000000021920b740

X0   = 0x0000000000000000  X1   = 0x0000000000000001  X2   = 0x000000000107f5b8  X3   = 0x000000000107f5c8
X4   = 0x000000000000000d  X5   = 0x00000000ffffffff  X6   = 0x0000000000000000  X7   = 0x00000000ffffffff
X8   = 0x0000000000000062  X9   = 0x000000021920b128  X10  = 0x0000000000000036  X11  = 0x000000000000000a
X12  = 0x0000000000000000  X13  = 0x000000020043bffc  X14  = 0x0000000000000000  X15  = 0x0000000000000000
X16  = 0x0000000000000001  X17  = 0x00000002004465a0  X18  = 0xfffffffffffc1000  X19  = 0x000000000107e5c0
X20  = 0x0000000000000000  X21  = 0x0000000000000006  X22  = 0x000000000107f610  X23  = 0x000000020043bea0
X24  = 0x0000000000000000  X25  = 0x000000021920b020  X26  = 0x0000000000000000  X27  = 0x0000000200b74740
X28  = 0x00000002189fc000  X29  = 0x000000021920a710  X30  = 0x000000000042c684

## Memory layout
guest_size  = 0x10000000000 (1048576 MB, 40-bit IPA)
brk         = 0x1000000 .. 0x1200000
mmap RW     = 0x200000000 .. 0x218200000 (next 0x240000000)
mmap RX     = 0x10000000 .. 0x10200000 (next 0x10000000)
interp_base = 0xff00000000  mmap_limit = 0xfe00000000
nregions    = 173

jserv added a commit that referenced this pull request Jun 9, 2026
Bundles the elfuse-side fixes the foot + Wayland compositor path needed
to stop crashing after the PTY emulation work landed. The common thread
is fault paths the PR #89 reproduction exercises and the prior runtime
quietly mis-emulated.

guest.c: drain worker vCPUs in guest_destroy before any hv_vm_unmap.
thread_destroy_all_vcpus releases handles but does not block on the
owning pthread leaving hv_vcpu_run, so a worker still inside the
guest at unmap time took a stage-2 translation fault on its next
instruction fetch -- the EC=0x20 syndrome=0x82000007 the PR #89
reporter hit. exit_group already runs request + interrupt + join;
the destroy path needs the same prefix because forkipc.c's
vcpu_run_loop returns straight into guest_destroy without going
through the guest exit_group handler. Wake signals (futex,
wakeup-pipe, hv_vcpus_exit) cover workers blocked outside the vCPU
loop so the 100ms join cap does not detach live pthreads onto the
imminent munmap.

futex.[ch]: futex_interrupt_consume is now a test-and-clear edge
trigger; previously the sticky flag forkipc.c set on the last
clone-thread exit kept every later epoll_pwait/ppoll/futex_wait
returning -EINTR until execve cleared it, and in foot's case execve
never came. poll.c, signal.c, and the futex wait paths switch to
the consume variant.

mem.c: hvf_apply_file_overlay refuses MAP_SHARED of a read-only
backing fd up front. Apple HVF rejects post-overlay hv_vm_map with
HV_DENIED when the underlying host VA loses write capability, so
the overlay path silently swapped MAP_SHARED ranges to MAP_PRIVATE
snapshots; routing read-only fds straight to the pread fallback
returns -EACCES at the right layer and keeps the fork-child overlay
re-install path quiet.

io.c + abi.h: fallocate now handles FALLOC_FL_PUNCH_HOLE |
FALLOC_FL_KEEP_SIZE. The Linux semantic (reads in the punched
range return zero, file size unchanged) maps to macOS F_PUNCHHOLE
when the offset and length are block-aligned, and falls back to
pwrite of zero pages for the one-byte probe foot's wl_shm pool
issues at startup. Without the fallback, the probe returned EINVAL
and foot disabled punch-hole for the whole session.

crashreport.c: when an EL0 fault crash report fires, dump the
guest page-table walk for the faulting VA plus the segment and
region that should have backed it. The PT walk surfaced the
hvf_segments=0 smoking gun behind the worker-drain race.
@jserv

jserv commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

This is crash log after I use your commit.

Determine whether commit ca81461 resolves the issue.

On my side, foot now works correctly with both elfuse and wilco (my in-house Wayland compositor, which I plan to open-source), as shown below.
foot with elfuse + wilco

cubic-dev-ai[bot]

This comment was marked as resolved.

@doanbaotrung

Copy link
Copy Markdown

Hi @jserv ,

The code is working and issue has gone. I faced another issue but it's about compatible of Wayland not issue in elfuse anymore.

Thanks,
Trung

jserv added 7 commits June 9, 2026 11:14
sys_ioctl previously had no case for TIOCSWINSZ, so foot's initial
master-side resize hit the default -ENOTTY arm and aborted terminal
startup. The minimal one-line fix is insufficient on macOS: the host
/dev/ptmx master is not itself a tty. TIOCSWINSZ / TIOCGWINSZ on the
bare master return ENOTTY until something has opened the corresponding
slave at least once, and the stored winsize gets cleared whenever the
slave refcount drops to zero (verified empirically on macOS 15). Linux
ptmx masters are tty fds in their own right, so guests assume those
ioctls work without an open slave.

pty_open_master bridges the gap by eagerly opening one slave host fd
that elfuse holds for the lifetime of the master and never exposes to
the guest. A per-master side table records ({master,slave}_host_fd,
linux_pts_num, slave_path). fd_cleanup_entry drops the keepalive when
the master closes; sys_close's fast-close branch also calls
proc_pty_close_keepalive so single-thread closes do not bypass the slow
path. duplicate_guest_fd mirrors the keepalive via a dup-under-lock so
dup/dup2/fcntl(F_DUPFD) aliases each keep a slave reference, then
registers BEFORE fd_alloc publishes the new guest fd so a sibling close
racing the install drops the duped keepalive too.

Guest /dev/pts/N opens and stats resolve through the captured ptsname(3)
string rather than a /dev/ttys%03lu reformat that breaks on any future
macOS naming change or unusual minor encoding. The synthesized stat
publishes st_rdev = (136 << 24) | minor in the macOS encoding so the
fs-stat translation layer (mac_to_linux_dev) yields a Linux dev_t whose
major(rdev) equals UNIX98_PTY_SLAVE_MAJOR, satisfying glibc ptsname's
device-type check. pty_open_master fails the ptmx open with EMFILE when
the table is full instead of returning a master fd whose pts number
cannot round-trip through /dev/pts/N. /dev/pts is added to
path_might_use_stat_intercept.

Four new ioctls: TIOCSWINSZ passes through to the host (now valid
because the keepalive opened the slave); TIOCGPTN reports the captured
pts number; TIOCSPTLCK(0) maps to host unlockpt(3) (lock-after-unlock
is rejected because macOS exposes no re-lock primitive); TIOCGPTPEER
opens the captured slave path with translate_open_flags and rejects
unsupported bits with EINVAL. TIOCSWINSZ / TIOCGWINSZ / TIOCGPTN
defensively call proc_pty_master_adopt(guest_fd) first so a master
received via SCM_RIGHTS lazily registers its own keepalive before the
host ioctl. Adopt uses fd_snapshot_and_dup to atomically pin the
canonical (host_fd, generation) with a probe dup, performs the slave
open against the probe, then re-validates and inserts under joint
fd_lock + pty_keepalive_lock so a sibling close+recycle between
validation and insert cannot attach the keepalive to the wrong file.
pty_keepalive_register_locked returns the existing pts_num atomically
on duplicate so the idempotent path never re-reads under the lock-free
race window.

Fork IPC propagates the keepalive table. fork_ipc_send_pty_keepalives
walks proc_pty_snapshot_keepalive output, matches each entry's
master_host_fd against the parent's fd_table to recover the guest fd
that is stable across the IPC, and ships (guest_fd, linux_pts_num,
slave_path) records plus an SCM_RIGHTS batch of dup'd slave fds.
fork_ipc_recv_pty_keepalives resolves each guest_fd through the child's
just-installed fd_table to recover the child-side master host fd, then
calls proc_pty_restore_keepalive which sets FD_CLOEXEC on the inherited
slave and registers the pair under the wire-transmitted linux_pts_num
(no reparse of the path string). Both phases run after fork_ipc_{send,
recv}_fd_table so the guest-fd-to-host-fd lookup exists in both
directions.

Close #88
The keepalive table previously cleared the entire entry on master close,
which dropped the (linux_pts_num, slave_path) mapping a forked child's
subsequent open("/dev/pts/N") relied on. foot/sshd/openssh sftp-server
all close(master) in the child after fork and BEFORE opening the slave,
so the slave open landed in the cleared lookup and returned ENOENT even
though the parent still held the master and the macOS slave node was
openable.

proc_pty_close_keepalive now retains linux_pts_num + slave_path for
fork-restored entries flagged stale_open_once, keeping the inherited
slave host fd to pin the macOS tty across the close-before-open window.
The next translated /dev/pts/N open consumes the stale entry, closes
the retained slave, and clears the slot. Ordinary local master closes
still drop the mapping immediately, so the path cache cannot persist
beyond one consumer.

pty_keepalive_register_locked prefers a stale-path slot with the same
pts number on insert (deterministic macOS minor mapping makes reuse
path-correct), then an empty slot, then evicts the lowest-index stale
slot. Live entries are never evicted. pty_keepalive_register_recycled
expires any stale-path entries holding the same slave_path before the
new master takes the slot, so a recycled minor cannot inherit the
prior tenant's cached translation.

pty_lookup_slave_path and pty_open_slave walk all entries with a
non-empty slave_path, preferring live over stale. pty_open_pts_dir
enumerates the same set so open and readdir stay consistent: a child
that just opened its slave via the stale-path mapping no longer sees
an entry that is open(2)-reachable but readdir-invisible.
Extract pty_keepalive_lock_acquire and pty_keepalive_find_master_locked
so the ten pthread_once + lock acquire pairs and four "scan by
master_host_fd" loops collapse to one call site each. Drop the redundant
BSS-zero field assignments from pty_keepalive_init now that
pty_keepalive_clear_slot_locked resets every field on slot reclaim, so
the BSS dependency only matters for the first-touch sentinel.

Switch pty_keepalive_register_locked to str_copy_trunc, collapse three
close + saved-errno blocks in pty_open_master into close_keep_errno, and
fold the success and failure cleanup loops of
fork_ipc_send_pty_keepalives so payload_slave_fds is closed once at the
tail. Replace proc_pty_restore_keepalive's four scattered close paths
with a single goto drop trailer and use ARRAY_SIZE in
fork_ipc_send_pty_keepalives instead of an open-coded divisor.
Linux O_PATH means "path-only": the device hook must not run, no pty
pair gets allocated, and the resulting fd only supports fstat plus
*at-style operations. Forwarding the open to pty_open_master broke this
because every probe of /dev/ptmx allocated a new pty and grew /dev/pts
indefinitely.

proc_intercept_open now short-circuits /dev/ptmx + O_PATH to a
/dev/null backing fd. /dev/null is harmless, never a directory, and the
guest's I/O and ioctl paths are already gated by FD_PATH so the backing
fd is never visible. proc_intercept_stat synthesizes the matching
character-device stat with rdev = (5, 2) so fstat through the FD_PATH
gate reports the standard Linux ptmx device numbers, and the
stat-translation layer's mac_to_linux_dev produces the right values in
the guest's struct stat.

sys_fstat picks up the synthetic stat when an FD_PATH fd carries a
non-empty proc_path -- the route any future virtual-path-backed fd
(/proc, /dev/ptmx, etc.) needs. To keep that proc_path safe under
sibling close + reopen races, fold the proc_path install into
fd_alloc_opened_host's existing (type, host_fd) tuple-revalidation
window: the resolver runs before fd_lock and the install happens inside
it, alongside linux_flags and the urandom bitmap. The previous
post-publish unlocked write let a recycled slot inherit the stale
proc_path string, which on the FD_PATH path could surface another file's
fstat as /dev/ptmx.
Bundles the elfuse-side fixes the foot + Wayland compositor path needed
to stop crashing after the PTY emulation work landed.

guest.c: drain worker vCPUs in guest_destroy before any hv_vm_unmap.
thread_destroy_all_vcpus releases handles but does not block on the
owning pthread leaving hv_vcpu_run, so a worker still inside the
guest at unmap time took a stage-2 translation fault on its next
instruction fetch. exit_group already runs request + interrupt + join;
the destroy path needs the same prefix because forkipc.c's
vcpu_run_loop returns straight into guest_destroy without going
through the guest exit_group handler. Wake signals (futex, wakeup-pipe,
hv_vcpus_exit) cover workers blocked outside the vCPU loop so the 100ms
join cap does not detach live pthreads onto the imminent munmap.

futex.[ch]: futex_interrupt_consume is now a test-and-clear edge
trigger; previously the sticky flag forkipc.c set on the last
clone-thread exit kept every later epoll_pwait/ppoll/futex_wait
returning -EINTR until execve cleared it, and in foot's case execve
never came. poll.c, signal.c, and the futex wait paths switch to the
consume variant.

mem.c: hvf_apply_file_overlay refuses MAP_SHARED of a read-only backing
fd up front. Apple HVF rejects post-overlay hv_vm_map with HV_DENIED
when the underlying host VA loses write capability, so the overlay path
silently swapped MAP_SHARED ranges to MAP_PRIVATE snapshots; routing
read-only fds straight to the pread fallback returns -EACCES at the
right layer and keeps the fork-child overlay re-install path quiet.

io.c + abi.h: fallocate now handles FALLOC_FL_PUNCH_HOLE |
FALLOC_FL_KEEP_SIZE. The Linux semantic (reads in the punched range
return zero, file size unchanged) maps to macOS F_PUNCHHOLE when the
offset and length are block-aligned, and falls back to pwrite of zero
pages for the one-byte probe foot's wl_shm pool issues at startup.
Without the fallback, the probe returned EINVAL and foot disabled
punch-hole for the whole session.

crashreport.c: when an EL0 fault crash report fires, dump the guest
page-table walk for the faulting VA plus the segment and region that
should have backed it. The PT walk surfaced the hvf_segments=0 smoking
gun behind the worker-drain race.
A targeted lgetxattr probe surfaced that elfuse's errno mapping table
covered every other divergent macOS errno in the 35..102 range but lost
the xattr-specific pair: macOS ENOATTR(93) and ENODATA(96) both fell
into the linux_errno() default and surfaced as Linux EINVAL(22).
fontconfig and glibc treat "attribute not found" as ENODATA(61), so the
wrong errno made lgetxattr on a missing attr look like a malformed call
and short-circuited their probes.

Add a LINUX_ENODATA constant, route both macOS values through it in
translate.c (guarded on the alias case so duplicate switch labels do not
break the compile when ENOATTR == ENODATA on a given SDK), and add a
tests/test-xattr.c that pins five round-trips: lgetxattr on a regular
file returns the stored value, lgetxattr on a symlink without its own
attr reports ENODATA (not EINVAL), getxattr on the same symlink follows
to the target, lgetxattr after lsetxattr installs a symlink-owned attr
returns the link value, and lgetxattr on a missing attr reports ENODATA.
The dispatch wiring at syscall.c:285 was already in place; only the
translation cap was missing.
futex.c: signal_pending_lockfree consults the atomic sig_pending_hint
without confirming under sig_lock, which the helper itself documented
as a stale-true source -- rt_sigprocmask masking a queued signal does
not update the hint, so the next reader sees pending && unblocked and
synthesizes -EINTR even though signal_pending() (the slow-path confirm)
would have reported no deliverable signal. Both futex_wait call sites
already release the bucket lock before consulting signal state (the
OS-sync path holds no bucket lock at all; the bucket path explicitly
drops b->lock around the signal check and re-checks waiter.woken after
re-acquiring), so the slow-path call is safe at both sites without
lock-order risk. Route both through signal_pending() and delete the
lockfree variant.
@jserv

jserv commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

The code is working and issue has gone. I faced another issue but it's about compatible of Wayland not issue in elfuse anymore.

Good to hear that. When wilco becomes available, you can give it a try. The reason I decided to develop my own Wayland compositor for macOS is that I occasionally run into compatibility and performance issues.

@jserv jserv merged commit ec76e49 into main Jun 9, 2026
4 checks passed
@jserv jserv deleted the pty-ioctl branch June 9, 2026 03:24
@doanbaotrung

Copy link
Copy Markdown

Nice. Look forward to use wilco.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Linux PTY ioctl issue: foot fails with TIOCSWINSZ: Inappropriate ioctl for device

2 participants