Skip to content

feat: capture XMM and MXCSR state in MidHook context#13

Open
AlienDwarf wants to merge 5 commits into
devfrom
feature/midhook-xmm-fpu-context
Open

feat: capture XMM and MXCSR state in MidHook context#13
AlienDwarf wants to merge 5 commits into
devfrom
feature/midhook-xmm-fpu-context

Conversation

@AlienDwarf

Copy link
Copy Markdown
Owner

Extend the mid-function context bridge to snapshot the full SSE / floating-point state, not just the GPRs and flags, so handlers can observe and rewrite floating-point and SIMD register arguments in flight.

The stub now spills XMM0..XMM15 (XMM0..XMM7 on x86) with movups and the MXCSR control/status word with stmxcsr before calling the handler, and reloads them (applying any edits) with movups / ldmxcsr afterwards. movups is used so the spill needs no 16-byte stack alignment. HookContext gains an xmm array, the mxcsr field and a new Xmm type; the save order keeps the block contiguous and the handler pointer aliasing the struct exactly, locked down by offset_of! layout tests.

The legacy x87 stack registers remain out of scope (documented). Bumps the reserved stub capacity to 512 bytes to fit the larger save/restore block.

Adds runtime tests that observe and rewrite a double argument through XMM0 on x86_64, and serializes the mid-hook integration tests (each install suspends all other threads, so parallel installs on shared targets could collide).

Extend the mid-function context bridge to snapshot the full SSE / floating-point
state, not just the GPRs and flags, so handlers can observe and rewrite
floating-point and SIMD register arguments in flight.

The stub now spills XMM0..XMM15 (XMM0..XMM7 on x86) with movups and the MXCSR
control/status word with stmxcsr before calling the handler, and reloads them
(applying any edits) with movups / ldmxcsr afterwards. movups is used so the
spill needs no 16-byte stack alignment. HookContext gains an xmm array, the
mxcsr field and a new Xmm type; the save order keeps the block contiguous and
the handler pointer aliasing the struct exactly, locked down by offset_of!
layout tests.

The legacy x87 stack registers remain out of scope (documented). Bumps the
reserved stub capacity to 512 bytes to fit the larger save/restore block.

Adds runtime tests that observe and rewrite a double argument through XMM0 on
x86_64, and serializes the mid-hook integration tests (each install suspends
all other threads, so parallel installs on shared targets could collide).
The repository had accumulated rustfmt drift across many modules, examples and
tests, which fails the `cargo fmt --all --check` step in CI. Run rustfmt over
the whole tree so the formatting check passes. No functional changes.
CI runs `cargo clippy --all-targets --all-features -- -D warnings` for
x86_64 and i686, which turns lints into hard errors:

- delay.rs: collapse a nested `if` into a let-chain.
- registry.rs: replace `assert_eq!(.., true/false)` with `assert!`.
- tests/midhook.rs: the `triple` helper and `OBSERVED_ARG` static are only
  used by the x86_64-only register tests, so gate them (and the AtomicU64
  import) behind `target_arch = "x86_64"` to avoid dead-code errors on i686.
Add HookContext::redirect_rip (redirect_eip on x86): a field that is zero on
entry and, when a handler sets it to a code address, makes the context bridge
jump there after restoring the (possibly modified) register/XMM state instead
of continuing the original function - skipping the stolen instructions.

The stub now reserves a zeroed redirect slot at the top of the frame, and after
the handler returns selects redirect_rip (if non-zero) or the gateway via cmov
and writes it into that slot. Because the slot is the topmost field, the
register restore leaves rsp on it and a single indirect jmp transfers control.
The jump is a jmp, not a ret, so it leaves the CET shadow stack intact; on x86
the slot is padded to 8 bytes so the frame matches size_of::<HookContext>().

Add MidHook::resume_address() (= target + stolen_len) so a handler can redirect
past the patched region and otherwise continue normally.

Runtime tests cover redirecting a hooked entry to a same-ABI replacement on both
x86_64 and x86; layout/encoding tests pin the new slot offset and assert the
stub stays ret-free.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant