fix(wasm): serialize Module calls and guard against use-after-close#181
Conversation
Wazero's api.Function.Call is not goroutine-safe and the linear memory backing a single Module instance is shared across all calls. Concurrent api.call() invocations from Goja scripts (or any other parallel callers) interleaved malloc/free with Memory().Read, producing nil pointer dereferences inside readFromGuest and, in some unwinds, the fatal "unexpected SPWRITE function runtime.morestack" abort. Add a per-Module call mutex around every entry point that touches m.mod or m.fn* (Execute, Configure, Tools, Name, Healthy, Metadata) so each malloc -> write -> Call -> read -> free sequence runs as one critical section, matching wazero's contract. Add a closed atomic flag set by Close so callers racing with Loader.LoadPlugin's reload get a clean ErrModuleClosed instead of dereferencing a freed memory instance. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
acmacalister
left a comment
There was a problem hiding this comment.
Solid fix — the double-checked lock + CAS pattern is correct, the concurrency comment in the struct doc is exactly what future readers need, and the test suite is comprehensive. Two minor suggestions inline.
|
|
||
| // Subsequent calls must not panic. | ||
| result, err := mod.Execute(ctx, "example_echo", map[string]any{"message": "x"}) | ||
| if err == nil && (result == nil || !result.IsError) { |
There was a problem hiding this comment.
The condition err == nil && (result == nil || !result.IsError) would silently pass if Execute ever returned (nil, someErr) instead of the expected {IsError: true} result — the test would see err != nil and short-circuit without checking the result. Since the behaviour here is deterministic (closed module always returns {IsError: true}, nil), it doesn't matter today, but it's a subtle trap for anyone who later changes the error path.
Splitting the assertion makes the intent unambiguous:
if err != nil {
t.Fatalf("Execute after Close returned unexpected Go error: %v", err)
}
if result == nil || !result.IsError {
t.Fatal("expected Execute after Close to return an IsError result")
}There was a problem hiding this comment.
Done in c21b6c8 — split into separate Go-error and IsError-result assertions.
| } | ||
| if tools := mod.Tools(); tools != nil { | ||
| t.Errorf("Tools() should return nil after Close, got %d tools", len(tools)) | ||
| } |
There was a problem hiding this comment.
TestModule_ExecuteAfterClose checks Execute, Healthy, and Tools, but Name() and Configure() go through the same double-check pattern and aren't covered here. Two more assertions would complete the after-close coverage:
if got := mod.Name(); got != "unknown" {
t.Errorf("Name() after Close = %q, want "unknown"", got)
}
if err := mod.Configure(ctx, mcp.Credentials{}); !errors.Is(err, ErrModuleClosed) {
t.Errorf("Configure after Close: got %v, want ErrModuleClosed", err)
}There was a problem hiding this comment.
Done in c21b6c8 — added Name() and Configure() after-close assertions.
Address PR feedback: split the Execute-after-Close assertion so a Go error and an IsError result are checked separately (the prior combined condition could silently pass if Execute ever returned (nil, err)), and add explicit after-close checks for Name() and Configure() so every double-checked entry point is covered. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
go1.26.4 includes fixes for GO-2026-5039 (net/textproto) and GO-2026-5037 (crypto/x509), both of which govulncheck flags as reachable from this codebase. CI installs via go-version-file: go.mod, so bumping the directive is sufficient. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
acmacalister
left a comment
There was a problem hiding this comment.
Solid fix. The double-checked locking pattern is correct — setting closed before acquiring callMu ensures in-flight calls drain before wazero teardown, and the re-check under the lock closes the TOCTOU window. Just two small gaps in Metadata() coverage inline.
| case 2: | ||
| _ = mod.Name() | ||
| case 3: | ||
| _ = mod.Healthy(context.Background()) |
There was a problem hiding this comment.
Metadata() acquires callMu through the same double-check pattern added here, but it's not in the switch. Since we're already spinning up 16 goroutines, adding a fifth case costs nothing and makes sure the race detector actually sees it:
const goroutines = 20
// ...
case 4:
_ = mod.Metadata(context.Background())(Also update switch id % 4 → id % 5.)
There was a problem hiding this comment.
Done in 1e22593 — added Metadata() as case 4 (goroutines bumped to 20, % 5).
| } | ||
| if tools := mod.Tools(); tools != nil { | ||
| t.Errorf("Tools() should return nil after Close, got %d tools", len(tools)) | ||
| } |
There was a problem hiding this comment.
The PR summary lists Metadata alongside Execute, Configure, Tools, Name, and Healthy as methods protected by the double-check pattern, but it's the only one without an after-close assertion here. Worth rounding it out:
if got := mod.Metadata(); got != nil {
t.Errorf("Metadata() should return nil after Close, got %v", got)
}There was a problem hiding this comment.
Done in 1e22593 — added Metadata() nil assertion after Close.
Address PR feedback: add Metadata() to the mixed concurrent goroutine switch (now 20 goroutines % 5) and assert Metadata() returns nil after Close — both were missing from the after-close / concurrency matrix even though Metadata() goes through the same double-checked lock. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
acmacalister
left a comment
There was a problem hiding this comment.
Solid fix. The double-check locking pattern is the right approach here, the concurrency model is clearly documented on the struct, and the Close() CAS-before-lock ordering correctly handles the race between in-flight callers and a concurrent teardown. Four tests covering concurrent execution, mixed method calls, after-close semantics across every entry point, and the Close-vs-Execute race — that is thorough coverage. Build, race-detected tests, and lint all clean.
Summary
nil pointer dereferenceatwasm/memory.go:51and afatal error: traceback / unexpected SPWRITE function runtime.morestack— both rooted in concurrent invocation of a single wazeroapi.Module.Modulecall mutex around every entry point that touchesm.modorm.fn*(Execute,Configure,Tools,Name,Healthy,Metadata) and anatomic.Boolclosedflag so racing callers get a cleanErrModuleClosedinstead of dereferencing a freed wazero memory instance.Why
Wazero documents (
api/wasm.go:378-381) thatFunction.Callis not goroutine-safe and must not be reentered until the previous call returns. A singleModuleshares one linear memory and one malloc/free heap, so eachmalloc -> Memory().Write -> Call -> Memory().Read -> freesequence must run as one atomic critical section.Goja scripts fire many
api.call()invocations from a single tool call, and the MCP server handles concurrent JSON-RPC requests against shared wasm-backed integrations. Without serialization, malloc/free interleaved withMemory().Readand produced the nil-pointer panic inreadFromGuest. When that panic crossed Goja'srunTryInnerprotected boundary while another panic was active, the runtime re-panicked and aborted with themorestacktraceback.Loader.LoadPluginalso closes the old module while another goroutine may be mid-call. The newclosedflag makes those callers fail withErrModuleClosedrather than touching a freed memory instance.Test plan
TestModule_ConcurrentExecute(32 goroutines × 8 iterations) — reproduces the panic without the fix; passes with-raceafter the fix.TestModule_ConcurrentMixedexercisesExecute/Tools/Name/Healthyin parallel.TestModule_ExecuteAfterClose— calls afterClosereturn error, never panic.TestModule_ConcurrentExecuteAndCloseracesCloseagainst in-flightExecutecalls; asserts no goroutine panics.make ci(build + vet + test-race + lint + gosec + govulncheck) clean.Follow-ups (out of scope)