fix(wasm): serialize Module calls and guard against use-after-close by daltoniam · Pull Request #181 · daltoniam/switchboard

daltoniam · 2026-06-02T19:10:21Z

Summary

Fixes the two recent production crashes — a nil pointer dereference at wasm/memory.go:51 and a fatal error: traceback / unexpected SPWRITE function runtime.morestack — both rooted in concurrent invocation of a single wazero api.Module.
Adds a per-Module call mutex around every entry point that touches m.mod or m.fn* (Execute, Configure, Tools, Name, Healthy, Metadata) and an atomic.Bool closed flag so racing callers get a clean ErrModuleClosed instead of dereferencing a freed wazero memory instance.

Why

Wazero documents (api/wasm.go:378-381) that Function.Call is not goroutine-safe and must not be reentered until the previous call returns. A single Module shares one linear memory and one malloc/free heap, so each malloc -> Memory().Write -> Call -> Memory().Read -> free sequence must run as one atomic critical section.

Goja scripts fire many api.call() invocations from a single tool call, and the MCP server handles concurrent JSON-RPC requests against shared wasm-backed integrations. Without serialization, malloc/free interleaved with Memory().Read and produced the nil-pointer panic in readFromGuest. When that panic crossed Goja's runTryInner protected boundary while another panic was active, the runtime re-panicked and aborted with the morestack traceback.

Loader.LoadPlugin also closes the old module while another goroutine may be mid-call. The new closed flag makes those callers fail with ErrModuleClosed rather than touching a freed memory instance.

Test plan

New TestModule_ConcurrentExecute (32 goroutines × 8 iterations) — reproduces the panic without the fix; passes with -race after the fix.
New TestModule_ConcurrentMixed exercises Execute/Tools/Name/Healthy in parallel.
New TestModule_ExecuteAfterClose — calls after Close return error, never panic.
New TestModule_ConcurrentExecuteAndClose races Close against in-flight Execute calls; asserts no goroutine panics.
make ci (build + vet + test-race + lint + gosec + govulncheck) clean.

Follow-ups (out of scope)

For higher parallelism we could instantiate one wazero module per concurrent caller (pool). Correctness is now sound with serialization; per-instance pooling is a perf optimization for a future PR.

Wazero's api.Function.Call is not goroutine-safe and the linear memory backing a single Module instance is shared across all calls. Concurrent api.call() invocations from Goja scripts (or any other parallel callers) interleaved malloc/free with Memory().Read, producing nil pointer dereferences inside readFromGuest and, in some unwinds, the fatal "unexpected SPWRITE function runtime.morestack" abort. Add a per-Module call mutex around every entry point that touches m.mod or m.fn* (Execute, Configure, Tools, Name, Healthy, Metadata) so each malloc -> write -> Call -> read -> free sequence runs as one critical section, matching wazero's contract. Add a closed atomic flag set by Close so callers racing with Loader.LoadPlugin's reload get a clean ErrModuleClosed instead of dereferencing a freed memory instance. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

acmacalister

Solid fix — the double-checked lock + CAS pattern is correct, the concurrency comment in the struct doc is exactly what future readers need, and the test suite is comprehensive. Two minor suggestions inline.

acmacalister · 2026-06-02T19:27:47Z

+
+	// Subsequent calls must not panic.
+	result, err := mod.Execute(ctx, "example_echo", map[string]any{"message": "x"})
+	if err == nil && (result == nil || !result.IsError) {


The condition err == nil && (result == nil || !result.IsError) would silently pass if Execute ever returned (nil, someErr) instead of the expected {IsError: true} result — the test would see err != nil and short-circuit without checking the result. Since the behaviour here is deterministic (closed module always returns {IsError: true}, nil), it doesn't matter today, but it's a subtle trap for anyone who later changes the error path.

Splitting the assertion makes the intent unambiguous:

if err != nil { t.Fatalf("Execute after Close returned unexpected Go error: %v", err) } if result == nil || !result.IsError { t.Fatal("expected Execute after Close to return an IsError result") }

Done in c21b6c8 — split into separate Go-error and IsError-result assertions.

acmacalister · 2026-06-02T19:27:47Z

+	}
+	if tools := mod.Tools(); tools != nil {
+		t.Errorf("Tools() should return nil after Close, got %d tools", len(tools))
+	}


TestModule_ExecuteAfterClose checks Execute, Healthy, and Tools, but Name() and Configure() go through the same double-check pattern and aren't covered here. Two more assertions would complete the after-close coverage:

if got := mod.Name(); got != "unknown" { t.Errorf("Name() after Close = %q, want "unknown"", got) } if err := mod.Configure(ctx, mcp.Credentials{}); !errors.Is(err, ErrModuleClosed) { t.Errorf("Configure after Close: got %v, want ErrModuleClosed", err) }

Done in c21b6c8 — added Name() and Configure() after-close assertions.

Address PR feedback: split the Execute-after-Close assertion so a Go error and an IsError result are checked separately (the prior combined condition could silently pass if Execute ever returned (nil, err)), and add explicit after-close checks for Name() and Configure() so every double-checked entry point is covered. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

go1.26.4 includes fixes for GO-2026-5039 (net/textproto) and GO-2026-5037 (crypto/x509), both of which govulncheck flags as reachable from this codebase. CI installs via go-version-file: go.mod, so bumping the directive is sufficient. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

acmacalister

Solid fix. The double-checked locking pattern is correct — setting closed before acquiring callMu ensures in-flight calls drain before wazero teardown, and the re-check under the lock closes the TOCTOU window. Just two small gaps in Metadata() coverage inline.

acmacalister · 2026-06-03T15:36:12Z

+			case 2:
+				_ = mod.Name()
+			case 3:
+				_ = mod.Healthy(context.Background())


Metadata() acquires callMu through the same double-check pattern added here, but it's not in the switch. Since we're already spinning up 16 goroutines, adding a fifth case costs nothing and makes sure the race detector actually sees it:

const goroutines = 20 // ... case 4: _ = mod.Metadata(context.Background())

(Also update switch id % 4 → id % 5.)

Done in 1e22593 — added Metadata() as case 4 (goroutines bumped to 20, % 5).

acmacalister · 2026-06-03T15:36:12Z

+	}
+	if tools := mod.Tools(); tools != nil {
+		t.Errorf("Tools() should return nil after Close, got %d tools", len(tools))
+	}


The PR summary lists Metadata alongside Execute, Configure, Tools, Name, and Healthy as methods protected by the double-check pattern, but it's the only one without an after-close assertion here. Worth rounding it out:

if got := mod.Metadata(); got != nil { t.Errorf("Metadata() should return nil after Close, got %v", got) }

Done in 1e22593 — added Metadata() nil assertion after Close.

Address PR feedback: add Metadata() to the mixed concurrent goroutine switch (now 20 goroutines % 5) and assert Metadata() returns nil after Close — both were missing from the after-close / concurrency matrix even though Metadata() goes through the same double-checked lock. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

acmacalister

Solid fix. The double-check locking pattern is the right approach here, the concurrency model is clearly documented on the struct, and the Close() CAS-before-lock ordering correctly handles the race between in-flight callers and a concurrent teardown. Four tests covering concurrent execution, mixed method calls, after-close semantics across every entry point, and the Close-vs-Execute race — that is thorough coverage. Build, race-detected tests, and lint all clean.

acmacalister approved these changes Jun 2, 2026

View reviewed changes

daltoniam and others added 2 commits June 3, 2026 09:48

acmacalister approved these changes Jun 3, 2026

View reviewed changes

daltoniam merged commit 6032834 into main Jun 3, 2026
5 checks passed

daltoniam deleted the fix/wasm-concurrent-execute branch June 3, 2026 20:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(wasm): serialize Module calls and guard against use-after-close#181

fix(wasm): serialize Module calls and guard against use-after-close#181
daltoniam merged 4 commits into
mainfrom
fix/wasm-concurrent-execute

daltoniam commented Jun 2, 2026

Uh oh!

acmacalister left a comment

Uh oh!

acmacalister Jun 2, 2026

Uh oh!

daltoniam Jun 3, 2026

Uh oh!

acmacalister Jun 2, 2026

Uh oh!

daltoniam Jun 3, 2026

Uh oh!

acmacalister left a comment

Uh oh!

acmacalister Jun 3, 2026

Uh oh!

daltoniam Jun 3, 2026

Uh oh!

acmacalister Jun 3, 2026

Uh oh!

daltoniam Jun 3, 2026

Uh oh!

acmacalister left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

daltoniam commented Jun 2, 2026

Summary

Why

Test plan

Follow-ups (out of scope)

Uh oh!

acmacalister left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acmacalister left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acmacalister left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants