Allow releasing GVL when calling exported Wasm functions#603
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds an opt-in path to release Ruby’s GVL during Wasm function calls (via to_func(gvl: false)), while ensuring Ruby host callbacks still execute with the GVL held.
Changes:
- Extend
Extern#to_functo acceptgvl:and return aFuncconfigured to release the GVL duringcall. - Add
nogvl/with_gvlhelpers and wire them into Wasm calls and host-callback closures. - Add unit specs covering correctness, threading behavior, host callbacks, and exception propagation under GC stress.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| spec/unit/func_spec.rb | Adds specs validating GVL release behavior and callback/exception correctness under concurrency/GC stress. |
| ext/src/ruby_api/instance.rs | Updates Func::invoke callsite to pass the new GVL flag parameter. |
| ext/src/ruby_api/func.rs | Adds per-Func GVL configuration, releases GVL around Wasm calls, and re-acquires GVL for Ruby host callbacks. |
| ext/src/ruby_api/externals.rs | Adds gvl: kwarg parsing to Extern#to_func and updates YARD docs accordingly. |
| ext/src/helpers/nogvl.rs | Implements panic-safe trampolines plus with_gvl to re-enter the GVL from nogvl contexts. |
| ext/src/helpers/mod.rs | Re-exports with_gvl alongside nogvl. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let data = unsafe { &mut *(arg as *mut (Option<F>, MaybeUninit<thread::Result<R>>)) }; | ||
| let func = data.0.take().expect("closure called more than once"); | ||
| data.1.write(panic::catch_unwind(AssertUnwindSafe(func))); |
| counter = 0 | ||
| running = true | ||
| sibling = Thread.new { counter += 1 while running } | ||
|
|
||
| marks = [] | ||
| store = Store.new(engine) | ||
| mark = Func.new(store, [], []) { marks << counter } | ||
| linker = Linker.new(engine) | ||
| linker.define(store, "env", "mark", mark) | ||
| instance = linker.instantiate(store, mod) | ||
|
|
||
| instance.export("spin").to_func(gvl: false).call(500_000_000) | ||
|
|
||
| running = false | ||
| sibling.join |
| linker.define(store, "env", "mark", mark) | ||
| instance = linker.instantiate(store, mod) | ||
|
|
||
| instance.export("spin").to_func(gvl: false).call(500_000_000) |
| // Borrow `ty` so the `move` closure captures a reference, not the owned value. | ||
| let ty = &ty; | ||
| with_gvl(move || { |
saulecabrera
left a comment
There was a problem hiding this comment.
Currently we have ~4 nogvl uses, all them involve CPU bound tasks, in which there are no callbacks to Ruby. I understand the motivation behind this change, but I think this is inherently unsafe and as such, I expect the documentation around it to reflect that clearly e.g., one case that I think that is not covered in the current state is the interaction between Wasmtime and Ruby through Wasmtime's API:
- A store declared with
store = Store.new(engine, limits: {memory_size: 1_000_000}) - Will end-up calling Wasmtime's resource limiter
- Which in turn will call
rb_gc_adjust_memory_usage
According to the docs for rb_gc_adjust_memory_usage:
/**
* Informs that there are external memory usages. Our GC runs when we are
* running out of memory. The amount of memory, however, can increase/decrease
* behind-the-scene. For instance DLLs can allocate memories using `mmap(2)`
* etc, which are opaque to us. Registering such external allocations using
* this function enables proper detection of how much memories an object used
* as a whole. That will trigger GCs more often than it would otherwise. You
* can also pass negative numbers here, to indicate that such external
* allocations are gone.
It seems to me that the GVL is needed in that case since GC could mutate the VM state?
|
@saulecabrera |
saulecabrera
left a comment
There was a problem hiding this comment.
@saulecabrera rb_gc_adjust_memory_usage calls objspace_malloc_increase_body which only bumps a thread-local int malloc_increase_local counter that is flushed to a global/shared counter with an atomic add - it never triggers gc work:
https://github.com/ruby/ruby/blob/2d9827db8b9fffe08a2f7dfb64ce5629a186bc93/gc/default/default.c#L8336
I dug deeper and you indeed correct, I misinterpreted the documentation.
In general the PR looks good to me, thank you.
One thing that I think we should address before landing it, is:
/// @param gvl [Boolean] When +false+, releases the GVL during the call so other Ruby threads run in parallel (each thread must use its own {Store})
Currently we are not enforcing the Store-per-thread requirement in any way. Wasmtime requires exclusive access to the Store when executing Wasm, and the GVL acquisition ensured that. With the change in this PR, exclusive access is no longer enforced. Currently a programmer error in which the same store is shared across multiple threads, will result in UB, e.g.,
require "wasmtime"
engine = Wasmtime::Engine.new
mod = Wasmtime::Module.new(engine, <<~WAT)
(module
(memory 1)
(func (export "grow") (param $n i32)
(local $i i32)
(loop $loop
(drop (memory.grow (i32.const 1)))
(local.set $i (i32.add (local.get $i) (i32.const 1)))
(br_if $loop (i32.lt_u (local.get $i) (local.get $n))))))
WAT
store = Wasmtime::Store.new(engine)
func = Wasmtime::Instance.new(store, mod).export("grow").to_func(gvl: false)
8.times.map { Thread.new { 2000.times { func.call(64) } } }.each(&:join)Results in the following:
Details
thread '<unnamed>' (1183065) panicked at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmtime-internal-unwinder-45.0.0/src/stackwalk.rs:265:13:
0x0 > 0x7f411f2f5fd0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' (1183065) panicked at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/core/src/panicking.rs:225:5:
panic in a function that cannot unwind
stack backtrace:
#<Thread:0x00007f4127361338 race.rb:28 sleep> terminated with exception (report_on_exception is true):
race.rb:28:in 'Wasmtime::Func#call': stack level too deep (SystemStackError)
from race.rb:28:in 'block (3 levels) in <main>'
from <internal:numeric>:257:in 'Integer#times'
from race.rb:28:in 'block (2 levels) in <main>'
#<Thread:0x00007f41273615b8 race.rb:28 run> terminated with exception (report_on_exception is true):
race.rb:28:in 'Wasmtime::Func#call': wasm trap: call stack exhausted (Wasmtime::Trap)
from race.rb:28:in 'block (3 levels) in <main>'
from <internal:numeric>:257:in 'Integer#times'
from race.rb:28:in 'block (2 levels) in <main>'
#<Thread:0x00007f4127360c30 race.rb:28 run> terminated with exception (report_on_exception is true):
race.rb:28:in 'Wasmtime::Func#call': wasm trap: call stack exhausted (Wasmtime::Trap)
from race.rb:28:in 'block (3 levels) in <main>'
from <internal:numeric>:257:in 'Integer#times'
from race.rb:28:in 'block (2 levels) in <main>'
0: 0x7f41263fc0ca - std[e28293b1aa0f68bd]::backtrace_rs::backtrace::libunwind::trace
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/std/src/../../backtrace/src/backtrace/libunwind.rs:117:9
1: 0x7f41263fc0ca - std[e28293b1aa0f68bd]::backtrace_rs::backtrace::trace_unsynchronized::<std[e28293b1aa0f68bd]::sys::backtrace::_print_fmt::{closure#1}>
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/std/src/../../backtrace/src/backtrace/mod.rs:66:14
2: 0x7f41263fc0ca - std[e28293b1aa0f68bd]::sys::backtrace::_print_fmt
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/std/src/sys/backtrace.rs:74:9
3: 0x7f41263fc0ca - <<std[e28293b1aa0f68bd]::sys::backtrace::BacktraceLock>::print::DisplayBacktrace as core[c1f1a4ba060b9bfa]::fmt::Display>::fmt
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/std/src/sys/backtrace.rs:44:26
4: 0x7f4125cc00ca - <core[c1f1a4ba060b9bfa]::fmt::rt::Argument>::fmt
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/core/src/fmt/rt.rs:152:76
5: 0x7f4125cc00ca - core[c1f1a4ba060b9bfa]::fmt::write
6: 0x7f4126402be2 - std[e28293b1aa0f68bd]::io::default_write_fmt::<std[e28293b1aa0f68bd]::sys::stdio::unix::Stderr>
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/std/src/io/mod.rs:639:11
7: 0x7f4126402be2 - <std[e28293b1aa0f68bd]::sys::stdio::unix::Stderr as std[e28293b1aa0f68bd]::io::Write>::write_fmt
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/std/src/io/mod.rs:1994:13
8: 0x7f41263e0268 - <std[e28293b1aa0f68bd]::sys::backtrace::BacktraceLock>::print
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/std/src/sys/backtrace.rs:47:9
9: 0x7f41263e0268 - std[e28293b1aa0f68bd]::panicking::default_hook::{closure#0}
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/std/src/panicking.rs:292:27
10: 0x7f41263f3f23 - std[e28293b1aa0f68bd]::panicking::default_hook
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/std/src/panicking.rs:319:9
11: 0x7f41263f41eb - std[e28293b1aa0f68bd]::panicking::panic_with_hook
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/std/src/panicking.rs:825:13
12: 0x7f41263e035c - std[e28293b1aa0f68bd]::panicking::panic_handler::{closure#0}
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/std/src/panicking.rs:691:13
13: 0x7f41263d7609 - std[e28293b1aa0f68bd]::sys::backtrace::__rust_end_short_backtrace::<std[e28293b1aa0f68bd]::panicking::panic_handler::{closure#0}, !>
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/std/src/sys/backtrace.rs:182:18
14: 0x7f41263e17ed - __rustc[b7974e8690430dd9]::rust_begin_unwind
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/std/src/panicking.rs:689:5
15: 0x7f4125cc078d - core[c1f1a4ba060b9bfa]::panicking::panic_nounwind_fmt::runtime
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/core/src/panicking.rs:122:22
16: 0x7f4125cc078d - core[c1f1a4ba060b9bfa]::panicking::panic_nounwind_fmt
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/core/src/intrinsics/mod.rs:2447:9
17: 0x7f4125cc070b - core[c1f1a4ba060b9bfa]::panicking::panic_nounwind
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/core/src/panicking.rs:225:5
18: 0x7f4125cc0897 - core[c1f1a4ba060b9bfa]::panicking::panic_cannot_unwind
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/core/src/panicking.rs:337:5
19: 0x7f41266ea33a - wasmtime::runtime::vm::sys::unix::signals::trap_handler::h1bf4963deb2108bb
at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmtime-45.0.0/src/runtime/vm/sys/unix/signals.rs:131:1
20: 0x7f4143042790 - <unknown>
21: 0x7f4143b02072 - <unknown>
22: 0x7f4143b020b8 - <unknown>
23: 0x7f4126b298c0 - wasmtime::runtime::vm::vmcontext::VMFuncRef::array_call_native::h1728168b3caf9d67
at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmtime-45.0.0/src/runtime/vm/vmcontext.rs:1013:13
24: 0x7f4126b298c0 - wasmtime::runtime::vm::vmcontext::VMFuncRef::array_call::hac19d9adf30e5c8e
at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmtime-45.0.0/src/runtime/vm/vmcontext.rs:970:30
25: 0x7f4126b298c0 - wasmtime::runtime::func::Func::call_unchecked_raw::{{closure}}::h2c37f901691d6116
at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmtime-45.0.0/src/runtime/func.rs:1032:13
26: 0x7f4126b298c0 - wasmtime::runtime::vm::traphandlers::catch_traps::{{closure}}::h2f910d63d194e1a1
at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmtime-45.0.0/src/runtime/vm/traphandlers.rs:433:32
27: 0x7f4126b298c0 - wasmtime::runtime::vm::traphandlers::<impl wasmtime::runtime::vm::traphandlers::call_thread_state::CallThreadState>::with::{{closure}}::h5b5ff149099e55b8
at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmtime-45.0.0/src/runtime/vm/traphandlers.rs:722:50
28: 0x7f4126b298c0 - wasmtime::runtime::vm::traphandlers::tls::set::hd5748944ac135bcf
at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmtime-45.0.0/src/runtime/vm/traphandlers.rs:1448:13
29: 0x7f4126b298c0 - wasmtime::runtime::vm::traphandlers::<impl wasmtime::runtime::vm::traphandlers::call_thread_state::CallThreadState>::with::hbdd3c78dae7e90de
at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmtime-45.0.0/src/runtime/vm/traphandlers.rs:722:25
30: 0x7f4126b298c0 - wasmtime::runtime::vm::traphandlers::catch_traps::h55a16d44ad0ed0b7
at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmtime-45.0.0/src/runtime/vm/traphandlers.rs:430:59
31: 0x7f4126b298c0 - wasmtime::runtime::func::invoke_wasm_and_catch_traps::hbfd968b4c0aafa4d
at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmtime-45.0.0/src/runtime/func.rs:1475:18
32: 0x7f4126b298c0 - wasmtime::runtime::func::Func::call_unchecked_raw::h45ce1b487b244a4d
at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmtime-45.0.0/src/runtime/func.rs:1031:9
33: 0x7f4126b28f4c - wasmtime::runtime::func::Func::call_unchecked::h7b7e3b95310918f7
at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmtime-45.0.0/src/runtime/func.rs:1021:18
34: 0x7f4126b28f4c - wasmtime::runtime::func::Func::call_impl_do_call::hd2afc53b4e77b672
at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmtime-45.0.0/src/runtime/func.rs:1194:18
35: 0x7f41269b5913 - wasmtime::runtime::func::Func::call::h5d1d0bd3c6fe960f
at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmtime-45.0.0/src/runtime/func.rs:972:23
36: 0x7f41269b5913 - wasmtime_rb::ruby_api::func::Func::invoke::{{closure}}::h45c97147af0fc4ea
at /home/saul/Developer/wasmtime-rb-pr603/ext/src/ruby_api/func.rs:239:27
37: 0x7f41269b5913 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::hc0f429c7cc95f1eb
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/core/src/panic/unwind_safe.rs:274:9
38: 0x7f41269b5913 - std::panicking::catch_unwind::do_call::h87c26e4aeb7d2572
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/std/src/panicking.rs:581:40
39: 0x7f41269b5913 - std::panicking::catch_unwind::hee2154c175e4e7eb
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/std/src/panicking.rs:544:19
40: 0x7f41269b5913 - std::panic::catch_unwind::h8d36c8e6b4a2d0db
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/std/src/panic.rs:359:14
41: 0x7f41269b5913 - wasmtime_rb::helpers::nogvl::call_trampoline::h40d74d84b948c278
at /home/saul/Developer/wasmtime-rb-pr603/ext/src/helpers/nogvl.rs:26:18
42: 0x7f4143714e5f - rb_nogvl
43: 0x7f41269bfe74 - wasmtime_rb::helpers::nogvl::nogvl::hfeb6885bcae9ac5f
at /home/saul/Developer/wasmtime-rb-pr603/ext/src/helpers/nogvl.rs:38:9
44: 0x7f41269bfe74 - wasmtime_rb::ruby_api::func::Func::invoke::h12aa475e4e9cdd19
at /home/saul/Developer/wasmtime-rb-pr603/ext/src/ruby_api/func.rs:239:13
45: 0x7f41269c2784 - wasmtime_rb::ruby_api::func::Func::call::ha079600b32568203
at /home/saul/Developer/wasmtime-rb-pr603/ext/src/ruby_api/func.rs:193:9
46: 0x7f41269c2784 - core::ops::function::Fn::call::hae67ab55cc9199db
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/core/src/ops/function.rs:79:5
47: 0x7f41269c2784 - magnus::method::MethodCAry::call_convert_value::h6b84f277aa5139cf
at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/magnus-0.8.2/src/method.rs:601:9
48: 0x7f41269c2784 - magnus::method::MethodCAry::call_handle_error::{{closure}}::h057d3392f9b3fa23
at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/magnus-0.8.2/src/method.rs:607:18
49: 0x7f41269c2784 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h796ba3a1edd7b139
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/core/src/panic/unwind_safe.rs:274:9
50: 0x7f41269c2784 - std::panicking::catch_unwind::do_call::ha3241ac1b6e8aead
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/std/src/panicking.rs:581:40
51: 0x7f41269c2784 - std::panicking::catch_unwind::h91e08d6b7a923e01
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/std/src/panicking.rs:544:19
52: 0x7f41269c2784 - std::panic::catch_unwind::h2ff73df1a66deb50
at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/std/src/panic.rs:359:14
53: 0x7f41269c2784 - magnus::method::MethodCAry::call_handle_error::hdb3fc8003d083d67
at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/magnus-0.8.2/src/method.rs:606:25
54: 0x7f41269c2784 - wasmtime_rb::ruby_api::func::init::anon::hbf63a54941d3c2fc
at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/magnus-0.8.2/src/method.rs:850:19
55: 0x7f4143753de1 - vm_call_cfunc_with_frame
56: 0x7f414376a7ff - vm_exec_core
57: 0x7f4143771f07 - rb_vm_exec
58: 0x7f4143775c1c - rb_vm_invoke_proc
59: 0x7f4143712f35 - thread_do_start_proc
60: 0x7f4143713576 - thread_start_func_2
61: 0x7f41437145dc - nt_start
62: 0x7f414309dd53 - start_thread
63: 0x7f414312563c - __clone3
64: 0x0 - <unknown>
thread caused non-unwinding panic. aborting.
thread '<unnamed>' (1183069) panicked at /home/saul/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wasmtime-internal-unwinder-45.0.0/src/stackwalk.rs:265:13:
0x0 > 0x7f411eaedfd0
thread '<unnamed>' (1183069) panicked at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/core/src/panicking.rs:225:5:
panic in a function that cannot unwind
In this case, what I expect is a recoverable Wasmtime::Error rather than a crash.
|
@saulecabrera unfortunately fully preventing the crash when misused isn't easy/cheap. The issue is that it is not only Func#call that needs guarding - once the GVL is released, any concurrent Store access from another thread can cause the crash as well. Store accessors (Memory#read/#write, Global, Table, Store#data, fuel) would need the same guard, and those checks would run on ordinary Store/Memory operations that every user relies on, not just gvl: false ones. So every existing user would pay for a guard that only matters when a Store is misused (shared across threads with gvl: false). |
saulecabrera
left a comment
There was a problem hiding this comment.
At the very least I think we should update the documentation to reflect the safety guarantees. See my suggestion below.
Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com>
Head branch was pushed to by a user without write access
This PR adds a
gvl:boolean kwarg toExtern#to_func. With.to_func(gvl: false)the GVL is released during the Wasm call, enabling true parallelism in Ruby.