Skip to content

fix(wasm): serialize Module calls and guard against use-after-close#181

Merged
daltoniam merged 4 commits into
mainfrom
fix/wasm-concurrent-execute
Jun 3, 2026
Merged

fix(wasm): serialize Module calls and guard against use-after-close#181
daltoniam merged 4 commits into
mainfrom
fix/wasm-concurrent-execute

Conversation

@daltoniam
Copy link
Copy Markdown
Owner

Summary

  • Fixes the two recent production crashes — a nil pointer dereference at wasm/memory.go:51 and a fatal error: traceback / unexpected SPWRITE function runtime.morestack — both rooted in concurrent invocation of a single wazero api.Module.
  • Adds a per-Module call mutex around every entry point that touches m.mod or m.fn* (Execute, Configure, Tools, Name, Healthy, Metadata) and an atomic.Bool closed flag so racing callers get a clean ErrModuleClosed instead of dereferencing a freed wazero memory instance.

Why

Wazero documents (api/wasm.go:378-381) that Function.Call is not goroutine-safe and must not be reentered until the previous call returns. A single Module shares one linear memory and one malloc/free heap, so each malloc -> Memory().Write -> Call -> Memory().Read -> free sequence must run as one atomic critical section.

Goja scripts fire many api.call() invocations from a single tool call, and the MCP server handles concurrent JSON-RPC requests against shared wasm-backed integrations. Without serialization, malloc/free interleaved with Memory().Read and produced the nil-pointer panic in readFromGuest. When that panic crossed Goja's runTryInner protected boundary while another panic was active, the runtime re-panicked and aborted with the morestack traceback.

Loader.LoadPlugin also closes the old module while another goroutine may be mid-call. The new closed flag makes those callers fail with ErrModuleClosed rather than touching a freed memory instance.

Test plan

  • New TestModule_ConcurrentExecute (32 goroutines × 8 iterations) — reproduces the panic without the fix; passes with -race after the fix.
  • New TestModule_ConcurrentMixed exercises Execute/Tools/Name/Healthy in parallel.
  • New TestModule_ExecuteAfterClose — calls after Close return error, never panic.
  • New TestModule_ConcurrentExecuteAndClose races Close against in-flight Execute calls; asserts no goroutine panics.
  • make ci (build + vet + test-race + lint + gosec + govulncheck) clean.

Follow-ups (out of scope)

  • For higher parallelism we could instantiate one wazero module per concurrent caller (pool). Correctness is now sound with serialization; per-instance pooling is a perf optimization for a future PR.

Wazero's api.Function.Call is not goroutine-safe and the linear memory
backing a single Module instance is shared across all calls. Concurrent
api.call() invocations from Goja scripts (or any other parallel callers)
interleaved malloc/free with Memory().Read, producing nil pointer
dereferences inside readFromGuest and, in some unwinds, the fatal
"unexpected SPWRITE function runtime.morestack" abort.

Add a per-Module call mutex around every entry point that touches m.mod
or m.fn* (Execute, Configure, Tools, Name, Healthy, Metadata) so each
malloc -> write -> Call -> read -> free sequence runs as one critical
section, matching wazero's contract. Add a closed atomic flag set by
Close so callers racing with Loader.LoadPlugin's reload get a clean
ErrModuleClosed instead of dereferencing a freed memory instance.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@acmacalister acmacalister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid fix — the double-checked lock + CAS pattern is correct, the concurrency comment in the struct doc is exactly what future readers need, and the test suite is comprehensive. Two minor suggestions inline.

Comment thread wasm/concurrent_test.go Outdated

// Subsequent calls must not panic.
result, err := mod.Execute(ctx, "example_echo", map[string]any{"message": "x"})
if err == nil && (result == nil || !result.IsError) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition err == nil && (result == nil || !result.IsError) would silently pass if Execute ever returned (nil, someErr) instead of the expected {IsError: true} result — the test would see err != nil and short-circuit without checking the result. Since the behaviour here is deterministic (closed module always returns {IsError: true}, nil), it doesn't matter today, but it's a subtle trap for anyone who later changes the error path.

Splitting the assertion makes the intent unambiguous:

if err != nil {
	t.Fatalf("Execute after Close returned unexpected Go error: %v", err)
}
if result == nil || !result.IsError {
	t.Fatal("expected Execute after Close to return an IsError result")
}

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in c21b6c8 — split into separate Go-error and IsError-result assertions.

Comment thread wasm/concurrent_test.go
}
if tools := mod.Tools(); tools != nil {
t.Errorf("Tools() should return nil after Close, got %d tools", len(tools))
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestModule_ExecuteAfterClose checks Execute, Healthy, and Tools, but Name() and Configure() go through the same double-check pattern and aren't covered here. Two more assertions would complete the after-close coverage:

if got := mod.Name(); got != "unknown" {
	t.Errorf("Name() after Close = %q, want "unknown"", got)
}
if err := mod.Configure(ctx, mcp.Credentials{}); !errors.Is(err, ErrModuleClosed) {
	t.Errorf("Configure after Close: got %v, want ErrModuleClosed", err)
}

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in c21b6c8 — added Name() and Configure() after-close assertions.

daltoniam and others added 2 commits June 3, 2026 09:48
Address PR feedback: split the Execute-after-Close assertion so a Go
error and an IsError result are checked separately (the prior combined
condition could silently pass if Execute ever returned (nil, err)), and
add explicit after-close checks for Name() and Configure() so every
double-checked entry point is covered.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
go1.26.4 includes fixes for GO-2026-5039 (net/textproto) and GO-2026-5037
(crypto/x509), both of which govulncheck flags as reachable from this
codebase. CI installs via go-version-file: go.mod, so bumping the
directive is sufficient.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@acmacalister acmacalister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid fix. The double-checked locking pattern is correct — setting closed before acquiring callMu ensures in-flight calls drain before wazero teardown, and the re-check under the lock closes the TOCTOU window. Just two small gaps in Metadata() coverage inline.

Comment thread wasm/concurrent_test.go
case 2:
_ = mod.Name()
case 3:
_ = mod.Healthy(context.Background())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata() acquires callMu through the same double-check pattern added here, but it's not in the switch. Since we're already spinning up 16 goroutines, adding a fifth case costs nothing and makes sure the race detector actually sees it:

const goroutines = 20
// ...
case 4:
	_ = mod.Metadata(context.Background())

(Also update switch id % 4id % 5.)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 1e22593 — added Metadata() as case 4 (goroutines bumped to 20, % 5).

Comment thread wasm/concurrent_test.go
}
if tools := mod.Tools(); tools != nil {
t.Errorf("Tools() should return nil after Close, got %d tools", len(tools))
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR summary lists Metadata alongside Execute, Configure, Tools, Name, and Healthy as methods protected by the double-check pattern, but it's the only one without an after-close assertion here. Worth rounding it out:

if got := mod.Metadata(); got != nil {
	t.Errorf("Metadata() should return nil after Close, got %v", got)
}

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 1e22593 — added Metadata() nil assertion after Close.

Address PR feedback: add Metadata() to the mixed concurrent goroutine
switch (now 20 goroutines % 5) and assert Metadata() returns nil after
Close — both were missing from the after-close / concurrency matrix
even though Metadata() goes through the same double-checked lock.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@acmacalister acmacalister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid fix. The double-check locking pattern is the right approach here, the concurrency model is clearly documented on the struct, and the Close() CAS-before-lock ordering correctly handles the race between in-flight callers and a concurrent teardown. Four tests covering concurrent execution, mixed method calls, after-close semantics across every entry point, and the Close-vs-Execute race — that is thorough coverage. Build, race-detected tests, and lint all clean.

@daltoniam daltoniam merged commit 6032834 into main Jun 3, 2026
5 checks passed
@daltoniam daltoniam deleted the fix/wasm-concurrent-execute branch June 3, 2026 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants