Conversation
|
The The message says:
But in the parent ( memmove(&g->regions[i + 2], &g->regions[i + 1], ...); /* shift suffix */
guest_region_t *right = &g->regions[i + 1];
*right = *r; /* copied from the still-intact slot i */
right->start = end; ...
r->end = start; /* left half shortened AFTER right is read */So |
Repeat mprotect with the same prot on RELRO, JIT, and GC-style ranges no
longer walks page tables: a pre-check confirms every overlapping region
already records the requested prot and is not MAP_NORESERVE, in which
case the call returns 0 without touching the tracker or PTEs. The skip
is gated on a per-request safety helper that forces the slow path for
pure PROT_READ, because sys_mmap installs MEM_PERM_RW PTEs for non-exec
mappings (including PROT_READ requests) and only the explicit
guest_update_perms call inside mprotect tightens them to MEM_PERM_R.
tests/test-negative pins this behavior with a regression test that
mprotects PROT_READ onto a PROT_READ mmap and confirms a subsequent
write traps SIGSEGV.
The fast path runs before any mutation. When PTE work is required, it
runs BEFORE guest_region_set_prot so a -ENOMEM from guest_{update_perms,
extend_page_tables,invalidate_ptes} leaves the tracker at the OLD prot;
the next retry sees the mismatch and re-attempts instead of silently
no-op'ing on stale tracker state. The low-IPA branch was also missing
return-value checks on guest_update_perms and guest_invalidate_ptes;
both now propagate -LINUX_ENOMEM.
Three table-full split paths now arm a sticky guest_t.regions_tracker_
stale flag: guest_region_set_prot's two split sites and guest_region_
remove's interior-split fallback. The fast path checks this flag and
falls back to unconditional PTE work for the lifetime of the process
once the tracker has lied even once. Without arming the flag at the
remove site, an orphaned-but-still-mapped tail would present vacuously
uniform prot to the fast path and skip required PTE updates. The flag
is propagated across fork via the process-state IPC, which is bumped to
v11 to carry a uint8 stale snapshot after num_guest_regions. The child
also arms the flag locally if its receiving region table is smaller than
the parent's GUEST_MAX_REGIONS would have allowed, so a child built with
a tighter cap inherits the same conservative fast-path behavior.
guest_region_remove is restructured as a single in/out compaction pass.
The previous code walked the region array with one cursor and handled
each overlap kind in its own branch, mutating g->regions in place and
rescanning after a full-containment removal. The new pass keeps a write
cursor 'out' alongside the input cursor 'in' (out <= in by the
non-overlap invariant), so trim survivors are emitted into the
compaction front and the untouched suffix is shifted once at the end.
The two trim-only branches collapse into a single survivor-build block.
The interior split remains the only growth path: it snapshots the
source region, shifts the suffix to make room, writes both halves, and
returns.
guest_region_add_ex_owned_gpa replaces the O(n) bubble-insert with a
binary search plus single memmove. guest_region_range_{prot_uniform,
has_noreserve} use the same binary-search prefix-skip.
find_free_gap_inner stops iterating once a region's start crosses max_addr;
later regions cannot affect a candidate gap inside the search window.
Repeat mprotect with the same prot on RELRO, JIT, and GC-style ranges no longer walks page tables: a pre-check confirms every overlapping region already records the requested prot and is not MAP_NORESERVE, in which case the call returns 0 without touching the tracker or PTEs. The skip is gated on a per-request safety helper that forces the slow path for pure PROT_READ, because sys_mmap installs MEM_PERM_RW PTEs for non-exec mappings (including PROT_READ requests) and only the explicit guest_update_perms call inside mprotect tightens them to MEM_PERM_R. tests/test-negative pins this behavior with a regression test that mprotects PROT_READ onto a PROT_READ mmap and confirms a subsequent write traps SIGSEGV.
The fast path runs before any mutation. When PTE work is required, it runs BEFORE guest_region_set_prot so a -ENOMEM from guest_{update_perms, extend_page_tables,invalidate_ptes} leaves the tracker at the OLD prot; the next retry sees the mismatch and re-attempts instead of silently no-op'ing on stale tracker state. The low-IPA branch was also missing return-value checks on guest_update_perms and guest_invalidate_ptes; both now propagate -LINUX_ENOMEM.
guest_region_set_prot's two table-full split paths now set a sticky guest_t.regions_tracker_stale flag. The fast path checks this flag and falls back to unconditional PTE work for the lifetime of the process once the tracker has lied even once. The flag is not propagated through fork IPC v10, so a child that inherits a stale tracker re-arms the flag the next time it hits the same condition; the only window of incorrect skipping is the very first matching mprotect after such a fork.
guest_region_remove is restructured as a single in/out compaction pass. The interior-split branch snapshots the source region, shifts the suffix to make room, writes left and right halves, and returns. The previous in-place layout aliased the source slot when out == in: it overwrote *r with the left half and then read the right half from that clobbered slot, corrupting both halves on the only growth path. The trim-only paths consolidate to a single survivor block.
guest_region_add_ex_owned_gpa replaces the O(n) bubble-insert with a binary search plus single memmove. The two query helpers guest_region_range_prot_uniform and guest_region_range_has_noreserve use the same binary-search prefix-skip.
find_free_gap_inner stops iterating once a region's start crosses max_addr; later regions cannot affect a candidate gap inside the search window.
Summary by cubic
Short-circuits same-prot mprotect to skip page-table work when safe, and hardens region tracking to avoid stale or incorrect skips. This speeds up RELRO/JIT/GC workloads and fixes a region-split corruption bug.
New Features
Refactors
Written for commit 41d43fd. Summary will update on new commits.